Professional Documents
Culture Documents
a r t i c l e i n f o a b s t r a c t
Article history: Context: In software development, Testing is an important mechanism both to identify defects and assure
Received 9 March 2010 that completed products work as specified. This is a common practice in single-system development, and
Received in revised form 30 November 2010 continues to hold in Software Product Lines (SPL). Even though extensive research has been done in the
Accepted 4 December 2010
SPL Testing field, it is necessary to assess the current state of research and practice, in order to provide
Available online 16 December 2010
practitioners with evidence that enable fostering its further development.
Objective: This paper focuses on Testing in SPL and has the following goals: investigate state-of-the-art
Keywords:
testing practices, synthesize available evidence, and identify gaps between required techniques and exist-
Software product lines
Software testing
ing approaches, available in the literature.
Mapping study Method: A systematic mapping study was conducted with a set of nine research questions, in which 120
studies, dated from 1993 to 2009, were evaluated.
Results: Although several aspects regarding testing have been covered by single-system development
approaches, many cannot be directly applied in the SPL context due to specific issues. In addition, partic-
ular aspects regarding SPL are not covered by the existing SPL approaches, and when the aspects are cov-
ered, the literature just gives brief overviews. This scenario indicates that additional investigation,
empirical and practical, should be performed.
Conclusion: The results can help to understand the needs in SPL Testing, by identifying points that still
require additional investigation, since important aspects regarding particular points of software product
lines have not been addressed yet.
Ó 2010 Elsevier B.V. All rights reserved.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
2. Related work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
3. Literature review method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
4. Research directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
4.1. Protocol definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
4.2. Question structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
4.3. Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
5. Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.1. Search strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.2. Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.3. Studies selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.3.1. Reliability of inclusion decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
5.4. Quality evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
5.5. Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6. Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
0950-5849/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.infsof.2010.12.003
408 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
Fig. 1. The systematic mapping process (adapted from Petersen et al. [67]).
These studies can be considered good sources of information increased rapidly, as shown in Fig. 4, it justifies the need of more
on this subject. In order to develop our work, we considered up to date empirical research in this area to contribute to the com-
every mentioned study, since they bring relevant information. munity investigations.
However, we have noticed that important aspects, such as
regression testing, testing of non-functional requirements and
the relation between variant binding time and testability, were 3. Literature review method
not covered by them in an extent that should be possible to
map out the current status of research and practice of the area. The method used in this research is a Systematic Mapping Study
Thus, we categorized a set of important research areas under (henceforth abbreviated to as ‘MS’) [12,67]. A MS provides a sys-
SPL testing, focusing on aspects addressed by the studies men- tematic and objective procedure for identifying the nature and ex-
tioned before as well as the areas they did not addressed, but tent of the empirical study data that is available to answer a
are directly related to SPL practices, in order to perform critical particular research question [12].
analysis and appraisal. In order to accomplish our goals in this While a Systematic Review is a mean of identifying, evaluating
work, we followed the guidelines for mapping studies develop- and interpreting all available research relevant to a particular
ment presented in [12]. We also included threats mitigation question [41], a MS intends to ‘map out’ the research undertaken
strategies in order to have the most reliable results. rather than to answer detailed research question [12,67]. A well-
We believe our study states current and relevant information on organized set of good practices and procedures for undertaking
research topics that can complement others previously published. MS in the software engineering context is defined in [12,67], which
By current, we mean that, as the number of studies published has establishes the base for the study presented in this paper. It is
410 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
worthwhile to highlight that the importance and use of MS in the To help to plan new research, avoiding unnecessary duplication
software engineering area is increasing [1,5,12,15,33,40,67,71], of effort and error;
showing the relevance and potential of the method. Nevertheless, To identify gaps and clusters in a set of primary studies, in order
of the same way as systematic reviews [10,13,51,56,78], we need to identify topics and areas to perform more complete system-
more MS related to software product lines, in order to evolve the atic reviews.
field with more evidence [43].
A MS comprises the analysis of primary studies that investigate The experimental software engineering community is working
aspects related predefined research questions, aiming at integrat- towards the definition of standard processes for conducting map-
ing and synthesizing evidence to support or refute particular re- ping studies. This effort can be checked out in Petersen et al.
search hypotheses. The main reasons to perform a MS can be [67], a study describing how to conduct systematic mapping stud-
stated as follows, as defined by Budgen et al. [12]: ies in software engineering. The paper provide a well defined pro-
cess which serves as a starting point for our work. We merged
To make an unbiased assessment of as many studies as possible, ideas from Petersen et al. [67] with good practices defined in the
identifying existing gaps in current research and contributing to guidelines published by Kitchenham and Charters [41]. This way,
the research community with the reliable synthesis of the data; we could apply a process for mapping study including good prac-
To provide a systematic procedure for identifying the nature tices of conducting systematic reviews, making better use of the
and extent of the empirical study data that is available to both techniques.
answer research questions; This blending process enabled us to include topics not covered
To map out the research that has been undertaken; by Petersen et al. [67] in their study, such as:
P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423 411
Protocol. This artifact was adopted from systematic review To avoid duplication, we detail the content of the protocol in the
guidelines. Our initial activity in this study was to develop a Section 5, as we describe how the study was conducted.
protocol, i.e. a plan defining the basic mapping study proce-
dures. Searching in the literature, we noticed that some studies 4.2. Question structure
created a protocol (e.g. [2]), but others do not (e.g. [15,67]).
Even though this is not a mandatory artifact, as mentioned by The research questions were framed by three criteria:
Petersen et al. [67], authors who created a protocol in their
studies encourage the use this artifact as being important to Population. Published scientific literature reporting software
evaluate and calibrate the mapping study process. testing and SPL testing.
Collection form. This artifact was also adopted from systematic Intervention. Empirical studies involving SPL Testing practices,
review guidelines and its main purpose is to help the research- techniques, methods and processes.
ers in order to collect all the information needed to address the Outcomes. Type and quantity of evidence relating to various SPL
review questions, study quality criteria and classification testing approaches, in order to identify practices, activities and
scheme. research issues concerning to this area.
Quality criteria. The purpose of quality criteria is to evaluate the
studies, as a means of weighting their relevance against others. 4.3. Research questions
Quality criteria are commonly used when performing system-
atic literature reviews. The quality criteria were evaluated inde- As previously stated, the objective of this study is to under-
pendently by two researchers, hopefully reducing the likelihood stand, characterize and summarize evidence, identifying activities,
of erroneous results. practical and research issues regarding research directions in SPL
Testing. We focused on identifying how the existing approaches
Some elements, as proposed by Petersen et al. [67] were also deal with testing in SPL. In order to define the research questions,
changed and/or rearranged in this study, such as: our efforts were based on topics addressed by previous research on
SPL testing [20,46,79]. In addition, the research questions defini-
Phasing mapping study. As can be seen in Fig. 1, the process was tion task was aided by discussions with expert researchers and
explicitly split into three main phases: 1 – Research directives, 2 practitioners, in order to encompass relevant and still open issues.
– Data collection and 3 – Results. It is in line with systematic Nine research questions were derived from the objective of the
reviews practices [41], which defines planning, conducting and study. Answering these questions led a detailed investigation of
reporting phases. Phases are named differently from what is practices arising from the identified approaches, which support
defined for systematic reviews, but the general idea and objec- both industrial and academic activities. The research questions,
tive for each phase was followed. In the first, the protocol and and the rationale for their inclusion, are detailed below.
the research questions are established. This is the most impor-
tant phase, since the research goal is satisfied with answers to Q1. Which testing strategies are adopted by the SPL Testing
these questions. The second phase comprises the execution of approaches? This question is intended to identify the testing
the MS, in which the search for primary studies is performed. strategies adopted by a software product line approach [79].
This consider a set of inclusion and exclusion criteria, used in By strategy, we mean understanding when assets are tested,
order to select studies that may contain relevant results accord- considering the differentiation between the two SPL develop-
ing to the goals of the research. In third phase, the classification ment processes: core asset and product development.
scheme is developed. This was built considering two facets, in Q2. What are the existing static and dynamic analysis techniques
which one structured the topic in terms of the research ques- applied to the SPL context? This question is intended to identify
tions, and other considered different research types as defined the analysis type (static and dynamic testing [54]) applied along
in [67]. The results of a meticulous analysis performed with the software development life cycle.
every selected primary study is reported, in a form of a mapping Q3. Which testing levels commonly applicable in single-systems
study. All phases are detailed in next sections. development are also used in the SPL approaches? Ammann and
Offutt [4] and Jaring et al. [29] advocate different levels of test-
ing (unit, integration, system and acceptance tests) where each
4. Research directives
level is associated with a development phase, emphasizing
development and testing equally.
This section presents the first phase of the mapping study pro-
Q4. How do the product line approaches handle regression testing
cess, in which the protocol and research questions are defined.
along software product line life cycle? Regression testing is done
when changes are made to already tested artifacts [36,76].
4.1. Protocol definition Regression tests often are automated since test cases related
to the core assets may be repeated every time a new product
The protocol forms the research plan for an empirical study, and is derived [63]. Thus, this question investigates the regression
is an important resource for anyone who is planning to undertake a techniques applied to SPL.
study or considering performing any form of replication study. Q5. How do the SPL approaches deal with tests of non-functional
In this study, the purpose of the protocol is to guide the research requirements? This question seeks clarification on how tests of
objectives and clearly define how it should be performed, through non-functional requirements should be handled.
defining research questions and planning how the sources and Q6. How do the testing approaches in an SPL organization handle
studies selected will be used to answer those questions. Moreover, commonality and variability? An undiscovered defect in the com-
the classification scheme to be adopted in this study was prior de- mon core assets of a SPL will affect all applications and thus will
fined and documented in the protocol. have a severe effect on the overall quality of the SPL [68]. In this
Incremental reviews to the protocol were performed in accor- sense, answering this question requires an investigation into
dance with the MS method. The protocol was revisited in order how the testing approaches handle commonality issues through
to update it based on new information collected as the study the software life cycle, as well as gathering information on how
progressed. variability affects testability.
412 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
Q7. How do variant binding times affect SPL testability? According Table 1
to [29], variant binding time determines whether a test can be List of research strings.
1 2
http://www.cin.ufpe.br/sople/testing/ms/ http://www.informatik.uni-trier.de/ley/db/
P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423 413
SPL approaches which address testing concerns. Approaches that acting as reviewers. A high-level agreement existed before the
include information on methods and techniques and how they study was included. In case the researchers did not agree after dis-
are handled and, how variabilities and commonalities influence cussion, an expert in the area was contacted to discuss and give
software testability. appropriate guidance.
SPL testing approaches which address static and dynamic analysis.
Approaches that explicitly describe how static and dynamic
5.4. Quality evaluation
testing applies to different testing phases.
SPL testing approaches which address software testing effort con-
In addition to general inclusion/exclusion criteria, the quality
cerns. Approaches that describe the existence of automated
evaluation mechanism, usually applied in systematic reviews
tools as well as other strategies used in order to reduce test
[18,19,44], was applied in this study in order to assess the trust-
effort, and metrics applied in this context.
worthiness of the primary studies. This assessment is necessary
to limit bias in conducting this empirical study, to gain insight into
Studies were excluded if they involved:
potential comparisons, and to guide interpretation of findings.
The quality criteria we used served as a means of weighting the
SPL approaches with insufficient information on testing. Studies
importance of individual studies, enhancing our understanding,
that do not have detailed information on how they handle SPL
and developing more confidence in the analysis.
testing concepts and activities.
As mapping study guidelines [67] does not establish a formal
Duplicated studies. When the same study was published in dif-
evaluation in the sense of quality criteria, we chose to assess each
ferent papers, the most recent was included.
of the primary studies by principles of good practice for conducting
Or if the study had already been included from another source.
empirical research in software engineering [41], tailoring the idea
of assessing studies by a set of criteria to our specific context.
Fig. 3 depicts a Bar Chart with the results categorized by source
Thus, the quality criteria for this evaluation is presented in
and filter, as described in Section 5.2. Fig. 4 shows the distribution
Table 2. Criteria grouped as A covered a set of issues pertaining
of the primary studies, considering the publication year. This Fig-
to quality that need to be considered when appraising the studies
ure briefly gives us the impression that the SPL Testing area is
identified in the review, according to [42]. Groups B and C assess
becoming more interesting, whereas the growing number of pub-
the quality considering SPL Testing concerns. The former was
lications claims the trend that many solutions have become re-
focused on identifying how well the studies address testing issues
cently available (disregarding 2009, since many studies might
not be made available by search engines until the time the search
Table 2
was performed, and thus we did not consider in this study). Quality criteria.
An important point to highlight is that, between 2004 and 2008
an important international workshop devoted specifically to SPL Group ID Quality criteria
testing, the SPLiT workshop,3 demonstrated the interest of the re- A 1 Are there any roles described?
search community on expanding this field. Fig. 5 shows the amount 2 Are there any guideline described?
3 Are there inputs and outputs described?
of publications considering their sources. In fact, it can be seen that 4 Does it detail the test artifacts?
peaks in Fig. 4 match with the years when this workshop occurred.
B 5 Does it detail the validation phase?
All the studies are listed in Appendix A. 6 Does it detail the verification phase?
7 Does it deal with Testing in Requirements phase?
5.3.1. Reliability of inclusion decisions 8 Does it deal with Testing in Architectural phase?
9 Does it deal with Testing in Implementation phase?
The reliability of decisions to include a study is ensured by hav- 10 Does it deal with Testing in Deployment phase?
ing multiple researchers to evaluate each study. The study was
C 11 Does it deal with binding time?
conducted by two research assistants (the two first authors) who 12 Does it deal with variability testing?
were responsible for performing the searches and summarizing 13 Does it deal with commonality testing?
the results of the mapping study, with other members of the team 14 Does it deal with effort reduction?
15 Does it deal with non-functional tests?
3
16 Does it deal with any test measure?
c.f. http://www.biglever.com/split2008/
414 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
6. Outcomes
Table 3
Research type facet.
Classes Description
Validation Techniques investigated are novel and have not yet been implemented in practice. Techniques used are for example experiments, i.e., work done in
research the lab
Evaluation Techniques are implemented in practice and an evaluation of the technique is conducted. That means, it is shown how the technique is
research implemented in practice (solution implementation) and what are the consequences of the implementation in terms of benefits and drawbacks
(implementation evaluation). This also includes to identify problems in industry
Solution proposal A solution for a problem is proposed, the solution can be either novel or a significant extension of an existing technique. The potential benefits and
the applicability of the solution is shown by a small example or a good line of argumentation
Philosophical These papers sketch a new way of looking at existing things by structuring the field in form of a taxonomy or conceptual framework
Papers
Opinion papers These papers express the personal opinion of somebody whether a certain technique is good or bad, or how things should been done. They do not
rely on related work and research methodologies
Experience Experience papers explain what and how something has been done in practice. It has to be the personal experience of the author
Papers
Incremental testing of product lines: The first product is tested Division of responsibilities: This strategy relates to select testing
individually and the following products are tested using regres- levels to be applied in both domain and application engineering,
sion testing techniques [26,76]. Regression testing focuses on depending upon the objective of each phase, i.e. whether think-
ensuring that everything used to work still works, i.e. the prod- ing about developing for or with reuse [79]. This division can be
uct features previously tested are re-tested through a regression clearly seen when the assets are unit tested in domain engineer-
technique. ing and, when instantiated in application engineering, integra-
Opportunistic reuse of test assets: This strategy is applied to reuse tion, system and acceptance testing are performed.
application test assets. Assets for one application are developed.
Then, the application derived from the product line use the As SPL Testing should be a reuse-based test derivation for testing
assets developed for the first application. This form of reuse is products within a product line [84], the Testing product by product
not performed systematically, which means that there is no and Opportunistic reuse of test assets strategies cannot be considered
method that supports the activity of selecting the test assets effective for the SPL context, since the first does not consider the re-
[75]. use benefits which results in costs of testing resembling single-sys-
Design test assets for reuse: Test assets are created as early as tems development. In the second, no method is applied, hence, the
possible in domain engineering. Domain test aims at testing activity may not be repeatable, and may not avoid the redundant
common parts and preparing for testing variable parts [30]. In re-execution of test cases, which can thus increase costs.
application engineering, these test assets are reused, extended These strategies can be considered a feasible grouping of what
and refined to test specific applications [30,75]. General studies on SPL testing approaches have been addressing, which
approaches to achieve core assets reuse are: repository, core can show us a more generic view on the topic.
assets certification, and partial integration [84]. Kishi and Noda 6.2.2. Static and dynamic analysis
[39] state that a verification model can be shared among appli- An effective quality strategy for a software product line requires
cations that have similarities. The SPL principle design for reuse both static and dynamic analysis techniques. Techniques for static
is fully addressed by this strategy, which can enable the overall analysis are often dismissed as more expensive, but in a software
goals of reducing cost, shortening time-to-market, and increas- product line, the cost of static analysis can be amortized over multi-
ing quality [75]. ple products.
416 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
A number of studies advocate the use of inspections and Acceptance testing: Acceptance testing is conducted by the cus-
walkthroughs [29,54,79] and formal verification techniques, as tomer but often the developing organization will create and
static analysis techniques/methods for SPL, to be conducted prior execute a preliminary set of acceptance tests. In a software
to dynamic analysis, i.e. with the presence of executable code. product line organization, commonality among the tests needed
[54] presents an approach for Guided Inspection, aimed at apply- for the various products is leveraged to reduce costs.
ing the discipline of testing to the review of non-software assets.
In [39], a model checker is defined that focuses on design verifi- A similar division is stated by Wieringa et al. [55], in which the
cation instead of code verification. This strategy is considered author defines two separated test processes used in product line
effective because many defects are injected during the design organization, Core Asset Testing and Product Testing.
phase [39]. Some authors [64,75,83] also include system testing in core asset
Regarding dynamic analysis, some studies [29,47] recommend development. The rationale for including such a level is to produce
the V-model phases, commonly used in single-systems, to struc- abstract test assets to be further reused and adapted when deriving
ture a series of dynamic analysis. The V-model gives equal weight products in the product development phase.
to development and testing rather than treating testing as an after-
thought [25]. However, despite the well-defined test process pre- 6.2.4. Regression testing
sented by V-model, its use in SPL context requires some Even though regression testing techniques have been re-
adaptation, as applied in [29]. searched for many years, as stated in [21,26,76], no study gives evi-
The relative amount of dynamic and static analysis depends on dence on regression testing practices applied to SPL. Some
both technical and managerial strategies. Technically, series of fac- information is presented by a few studies [46,57], where just a
tors such as test-first development or model-based development brief overview on the importance of regression testing is given,
determine the focus. Model-based development emphasizes static but they do not take into account the issues specific to SPLs.
analysis of models while test-first development emphasizes dy- McGregor [54] reports that when a core asset is modified due to
namic analysis. Managerial strategies such as reduced time to mar- evolution or correction, they are tested using a blend of regression
ket, lower cost and improved product quality determine the depth testing and development testing. According to him, the modified
to which analysis should be carried. portion of the asset should be exercised using:
6.2.3. Testing levels Existing functional tests if the specification of the asset has not
Some of the analyzed studies (e.g. [29,47]) divide SPL testing changed;
according to the two primary software product line activities: core If the specifications has changed, new functional tests are cre-
asset and product development. ated and executed; and
Core asset development: Some testing activities are related to Structural tests created to cover the new code created during
the development of test assets and test execution to be per- the modification.
formed to evaluate the quality of the assets, which will be fur- He also highlights the importance of regression test selection
ther instantiated in the application engineering phase. The two techniques and the automation of the regression execution.
basic activities include developing test artifacts that can be re- Kauppinen and Taina [37] advocate that the testing process
used efficiently during application engineering and applying tests should be iterative, and based on test execution results, new test
to the other assets created during domain engineering [34,70]. cases should be generated and tests scripts may be updated during
Regarding types of testing, the following are performed in do- a modification. These test cases are repeated during regression
main engineering: testing each time a modification is made.
Kolb [45] highlights that the major problems in a SPL context
Unit testing: Testing of the smallest unit of software implemen- are the large number of variations and their combinations, redun-
tation. This unit can be basically a class, or even a module, a dant work, the interplay between generic components and prod-
function, or a software component. The granularity level uct-specific components, and regression testing.
depends on the strategy adopted. The purpose of unit testing Jin-hua et al. [30] emphasize the importance of regression test-
is to determine whether this basic element performs as ing when a component or a related component cluster are changed,
required through verification of the code produced during the saying that regression testing is crucial to perform on the applica-
coding phase. tion architecture, which aims to evaluate the application architec-
Integration testing: This testing is applied as the modules are ture with its specification. Some researchers also developed
integrated with each other or within the reference in domain- approaches to evaluate architecture-based software by using
level V&V when the architecture calls for specific domain com- regression testing [27,58,59].
ponents to be integrated in multiple systems. This type of
testing is also performed during application engineering [55]. 6.2.5. Non-functional testing
Li et. al. [49] present an approach for generating integration Non-functional issues have a great impact on the architecture
tests from unit tests. design, where predictability of the non-functional characteristics
of any application derived from the SPL is crucial for any re-
Product development: Activities here are related to the selection source-constrained product. These characteristics are well-known
and instantiation of assets to build specific product test assets, quality attributes, such as response time, performance, availability,
design additional product specific tests, and execute tests. The fol- and scalability, that might differ in instances of a product line.
lowing types of testing can be performed in application engineering: According to [23], testing non-functional quality attributes is
equally important as functional testing.
System testing: System testing ensures that the final product By analyzing the studies, it was noticed that some of them pro-
matches the required features [61]. According to [24], system pose the creation or execution of non-functional tests. Reis and
testing evaluates the features and functions of an entire product Metzger [72] presents a technique to support the development of
and validates that the system works the way the user expects. A reusable performance test scenarios to be further reused in appli-
form of system testing can be carried out on the software archi- cation engineering. Feng et al. [22] highlight the importance of
tecture using a static analysis approach. non-functional concerns (performance, reliability, dependability,
P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423 417
etc.). Ganesan et al. [23] describe a work intended to develop an ward. After performing the traditional test phases in application
environment for testing the response time and load of a product engineering, the approach suggests tests to be performed towards
line, however due to the constrained experimental environment verifying if the application contains the set of functionalities re-
there was no visible performance degradation observed. quired, and nothing else.
In single-system development, different non-functional testing
techniques are applicable for different types of testing, the same 6.2.8. Effort reduction
might hold for SPL, but no experience reports were found to sup- Some authors consider testing the bottleneck in SPL, since the
port this statement. cost of testing product lines is becoming more costly than testing
single systems [45,47]. Although applications in a SPL share com-
6.2.6. Commonality and variability testing mon components, they must be tested individually in system test-
Commonality, as an inherent concept in the SPL theory, is nat- ing level. This high cost makes testing an attractive target for
urally addressed by many studies, such as stated by Pohl et al. improvements [63]. Test effort reduction strategies can have sig-
[70], in which the major task of domain testing is the development nificant impact on productivity and profitability [53]. We found
of common test artifacts to be further reused in application testing. some strategies regarding this issue. They are described as follows:
The increasing size and complexity of applications can result in
a higher number of variation points and variants, which makes Reuse of test assets: Test assets – mainly test cases, test scenarios
testing all combinations of the functionality almost impossible in and test results – [53] are created to be reusable, which conse-
practice. Managing variability and testability is a trade-off. The quently impacts the effort reduction. According to [37,84], an
large amount of variability in a product line increases the number approach to achieve the reuse of core assets comes from the exis-
of possible testing combinations. Thus, testing techniques that con- tence of an asset repository. It usually requires an initial testing
sider variability issues and thus reduce effort are required. effort for its construction, but throughout the process, these assets
Cohen et al. [14] introduce cumulative variability coverage, do not need to be rebuilt, they can be rather used as is. Another
which accumulates coverage information through a series of devel- strategy considers the creation of test assets as extensively as pos-
opment activities, to be further exploited in a target testing activ- sible in domain engineering, anticipating also the variabilities by
ities for product line instances. creating documents templates and abstract test cases. Test cases
Another solution, proposed by Kolb and Muthig [47], is the and other concrete assets are used as is and the abstract ones are
imposition of constraints in the architecture. Instead of having extended or refined to test the product-specific aspects in applica-
components with large amount of variability it is better for test- tion engineering. In [50], a method for monitoring the interfaces of
ability to separate commonalities and variabilities and encapsulate every component during test execution is proposed, observing
variabilities as subcomponents. Aiming to reduce the retest of commonality issues in order to avoid repetitive execution. As
components and products when modifications are performed, mentioned before in Section 6.2.6, the systematic reuse of test
independence of feature and components, as well as the reduction assets, especially test cases, are the focus of many studies, each
of side effects, reduce the effort required for adequate testing. offering novel and/or extended approaches. The reason for dealing
Tevanlinna et al. [79] highlight the importance of asset trace- with assets reuse in a systematic manner is that it can enable effort
ability from requirements to implementation. There are some ways reduction, since redundant work may be avoided when deriving
to achieve this traceability between test assets and implementa- many products from the product line. In this context, the search
tion, as reported by McGregor et al. [52], in which the design of for an effective approach has been noticed throughout the past
each product line test asset matches the variation implementation recent years, as can be seen in [53,55,61,66,75]. Hence, it is feasible
mechanism for a component. to infer that there is not a general solution for dealing with system-
The selected approaches handle variability in a range of different atic reuse in SPL testing yet.
manners, usually expliciting variability as early as possible in UML Test automation tools: Automatic testing tools to support testing
use cases [28,35,77] that will further be used to design test cases, activities [16] is a way to achieve effort reduction. Methods
as described in the requirement-based approaches [8,60]. Moreover, have been proposed to automatically generate test cases from
model-based approaches introduce variability into test models, cre- single system models expecting to reduce testing effort
ated through use cases and their scenarios [74,75], and specifying [28,49,60], such as mapping the models of an SPL to functional
variability into feature models and activity diagrams [64,66]. They test cases in order to automatically generate and select func-
are usually concerned about reusing test case in a systematic man- tional test cases for an application derived [65]. Automatic test
ner through variability handling as [3,83] report. execution is an activity that should be carefully managed to
avoid false failures since unanticipated or unreported changes
6.2.7. Variant binding time can occur in the component under test. These changes should
According to [52], the binding of different variants requires dif- be reflected in the corresponding automated tests [16].
ferent binding time (Compile Time, Link Time, Execution Time and
Post-Execution Time), which requires different mechanisms (e.g. 6.2.9. Test measurement
inheritance, parameterization, overloading and conditional compi- Test measurement is an important activity applied in order to
lation). They are suitable for different variability implementation calibrate and adjust approaches. Adequacy of testing can be mea-
schemes. The different mechanisms result in different types of de- sured based on the concept of a coverage criterion. Metrics related
fects, test strategies, and test processes. to test coverage are applied to extract information, and are useful
This issue is also addressed by Jaring et al. [29], in their Variability for the whole project. We investigated how test coverage has been
and Testability Interaction Model, which is responsible for modeling applied by existing approaches regarding SPL issues.
the interaction between variability binding and testability in the According to [79], there is only one way to completely guarantee
context of the V-model. The decision regarding the best moment that a program is fault-free, to execute it on all possible inputs,
to test a variant is clearly important. The earliest point at which a which is usually impossible or at least impractical. It is even more
decision is bound is the point at which the binding should be tested. difficult if the variations and all their constraints are considered. Test
In our findings, the approach presented in [75] deals with test- coverage criteria are a way to measure how completely a test suite
ing variant binding time as a form of ensuring that the application exercises the capabilities of a piece of software. These measures
comprises the correct set of features, as the customer looks for- can be used to define the space of inputs to a program. It is possible
418 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
Table 4 Regarding the maturity of the field in terms of validation and eval-
Research questions (RQ) and primary studies. uation research and solution papers, other studies report results in
RQ Primary Studies line with our findings, e.g. [80]. Hence, we realize that this is not a
Q1 [3,8,9,20,29,30,35,38,39,45–47,54,55,64,66,72–75,83,84] problem solely to SPL testing, but rather it involves, in a certain
Q2 [3,17,20,39,54] way, other software engineering practices.
Q3 [3,20,24,29,36,34,30,46,47,49,50,54,55,57,64,61,69,73,75,83,84] We also realize that researchers are not concerned about Expe-
Q4 [27,30,37,46,54,57] rience Reports on their personal experience using particular ap-
Q5 [22,23,54,55,60,72]
Q6 [3,6,8,9,14,16,20,22,24,29,34,35,39,47,49,50,52,61,66,68,69,72–75,83,84]
proaches. Practitioners in the field should report results on the
Q7 [14,29,30,52,68] adoption, in the real world of the techniques proposed and reported
Q8 [3,8,16,20,22,24,28,29,35–39,45–47,49,50,53,54,60–62,65,66,68,73– in the literature. Moreover, authors should Express Opinions
75,84] about the desirable direction of SPL Testing research, expressing
Q9 [3,27,30,36,62,66,75]
their experts viewpoint.
In fact, the volume of literature devoted to testing software
to systematically sample this space and test only a portion of the fea- product lines attests to the importance assigned to it by the prod-
sible system behavior [14]. The use of covering arrays as a test cov- uct line community. In the following subsection we detail what we
erage strategy is addressed in [14]. Kauppinen and Tevanlinna [38] considered most relevant in our analysis.
define coverage criteria for estimating the adequacy of testing in a
SPL context. They propose two coverage criteria for framework- 6.3.1. Main findings of the study
based product lines: hook and template coverage, that is, variation We identified a number of test strategies that have been ap-
points open for customization in a framework are implemented as plied to software product lines. Many of these strategies address
hook classes and stable parts as template classes. They are used to different aspects of the testing process and can be applied simulta-
measure the coverage of frameworks or other collections of classes neously. However, we have no evidence about the effectiveness of
in an application by counting the structures or hook method refer- combining strategies, and in which context it could be suitable. The
ences from them instead of single methods or classes. analyzed studies do not cover this potential. There is only a brief
indication that the decision about which kind of strategy to adopt
6.3. Analysis of the results and mapping of studies depends on a set of factors such as software development process
model, languages used, company and team size, delivery time, and
The analysis of the results enables us to present the amount of budget. Moreover, it is a decision made in the planning stage of the
studies that match each category addressed in this study. It makes product line organization since the strategy affects activities that
it possible to identify what have been emphasized in past research begin during requirements definition. But it still remains as
and thus to identify gaps and possibilities for future research [67]. hypotheses, that need to be supported or refuted through formal
Initially, let us analyze the distribution of studies regarding our experiments and/or case studies.
analysis point of view. Figs. 6 and 7, that present respectively the A complete testing process should define both static and dy-
frequencies of publications according to the classes of the research namic analyses. We found that even though some studies empha-
facet and according to the research questions addressed by them size the importance of static analysis, few detail how this is
(represented by Q1 to Q9). Table 4 details Fig. 7 showing which pa- performed in a SPL context [39,54,79], despite its relevance in sin-
pers answer each research question. It is valid to mention that, in gle-system development. Static analysis is particularly important
both categories, it was possible to have a study matching more in a product line process since many of the most useful assets
than one topic. Hence, the total amount verified in Figs. 6 and 7 ex- are non-code assets and particularly the quality of the software
ceeds the final set of primary studies selected for detailed analysis. architecture is critical to success.
When merging these two categories, we have a quick overview Specific testing activities are divided across the two types of
of the evidence gathered from the analysis of the SPL testing field. activities: domain engineering and application engineering.
We used a bubble plot to represent the interconnected frequencies, Alternatively, the testing activities can be grouped into core asset
as shown in Fig. 8. This is basically a x–y scatterplot with bubbles and product development. From the set of studies, around four
in category intersections. The size of a bubble is proportional to the [29,30,36,20] adopt (or advocate the use of) the V-model as an ap-
number of articles that are in the pair of categories corresponding proach to represent testing throughout the software development
to the bubble coordinates [67]. life cycle. As a widely adopted strategy in single-system develop-
The classification scheme applied in this paper enabled us to in- ment, tailoring V-model to SPL could result in improved quality.
fer that researchers are mostly in the business of proposing new However, there is no consensus on the correct set of testing levels
techniques and investigating their properties more than evaluating for each SPL phase.
and/or experiencing them in practice, through proposing new solu- We did not find evidence regarding the impact for the SPL of not
tions, as seen in Fig. 8. Solution Proposal is the topic with more performing a specific testing level in domain or application engi-
entries, considering the research facets. Within this facet, most neering. For example, is there any consequence if, for example
studies address the questions Q1 (testing strategies), Q3 (testing unit/integration/system testing was not performed in domain engi-
levels), Q6 (commonality and variability analysis) and Q8 (effort neering? We need investigations to verify such an aspect. Moreover,
reduction). They have really been the overall focus of researchers. what are the needed adaptations for the V-model to be effective in
On the other hand we have pointed out topics in which new solu- the SPL context? This is a point which experimentation is welcome,
tions are required, it is the case of Q2 (static and dynamic analysis in order to understand the behavior of testing levels in SPL.
interconnection in SPL Testing), Q4 (regression testing), Q5 (non- A number of the studies addressed, or assumed, that testing activ-
functional testing), Q7 (variant binding time) and Q9 (measures). ities are automated (e.g. [16,49]). In a software product line automa-
Although some topics present a relevant amount of entries in tion is more feasible because the resources required to automate are
this analysis, such as Q1, Q3, Q6 and Q8, as aforementioned, these amortized over the larger number of products. The resources are also
still lack field research, since the techniques investigated and pro- more narrowly focused due to the overlap of the products. Some of
posed are mostly novel and have usually not yet been imple- the studies illustrated that the use of domain specific languages,
mented in practice. We realize that currently, Validation and and the tooling for those languages, is more feasible in a software
Evaluation Research are weakly addressed in SPL Testing papers. product line context. Nevertheless, we need to understand if the
P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423 419
techniques are indeed effective when applying them in an industrial Quality evaluation: The quality attributes as well as the weight
context. We lack studies reporting results of this nature. used to quantify each of them might not properly represent
According to [45], one of the major problems in testing product the attributes importance. In order to mitigate this threat, the
lines is the large number of variations. The study reinforces the quality attributes were grouped in subsets to facilitate their fur-
importance of handling variability testing during all software life ther classification.
cycle. Unfamiliarity with other fields: The terms used in the search
In particular, the effect of variant binding time concerns was strings can have many synonyms, it is possible that we over-
considered in this study. A well-defined approach was found in looked some work.
[29], with information provided by case studies conducted in
an important electronic manufacturer. However, there are still
many issues to be considered regarding variation and testing, 8. Concluding remarks and future work
such as what is the impact of designing variations in test assets
regarding effort reduction? What are the most suitable strategy The main motivation for this work was to investigate the state-
to handle variability within test assets: use cases and test cases of-the-art in SPL testing, through systematically mapping the liter-
or maybe sequence or class diagrams? How to handle traceability ature in order to determine what issues have been studied, as well
and what is the impact of not handling such an issue, in respect as by what means, and provide a guide to aid researchers in plan-
to test assets. We also did not find information about the impact ning future research. This research was conducted through a Map-
of different binding times for testing in SPL, e.g. compile-time, ping Study, a useful technique for identifying the areas where there
scoping-time, etc. We also lack evidences on this direction. is sufficient information for a SR to be effective, as well as those
Regression testing does not belong to any one point in the soft- areas where more research is needed [12].
ware development life cycle and as a result there is a lack of The amount of approaches that handle different and specific as-
clarity in how regression testing should be handled. Despite this, pects in the SPL testing process (i.e. how to deal with variant bind-
it is clear that regression testing is important in the SPL context. ing time, regression testing and effort reduction), make the studies
Regression testing techniques include approaches to selecting the comparison a hard task, since they do not deal with the same goals
smallest test suite that will still find the most likely defects and or focus. Nevertheless, through this study we are able to identify
techniques that make automation of test execution efficient. which activities are handled by the existing approaches as well
From the amount of studies analyzed, a few addressed testing as understanding how the researchers are developing work in
non-functional requirements [22,54,55,60,72]. They point out that SPL testing. Some research points were identified throughout this
during architecture design static analysis can be used to give an early research and these can be considered an important input into plan-
indication of problems with non-functional requirements. One ning further research.
important point that should be considered when testing quality Searching the literature, some important aspects are not re-
attributes is the presence of trade-offs among them, for example, ported, and when they are found just a brief overview is given.
the trade-off between modularity and testability. This leads to nat- Regarding industrial experiences, we noticed they are rare in liter-
ural pairings of quality attributes and their associated tests. When ature. The existent case studies report small projects, containing
a variation point represents a variation in a quality attribute, the sta- results obtained from in company-specific application, which
tic analysis should be sufficiently complete to investigate different makes impracticable their reproduction in other context, due to
outcomes. Investigations towards making explicit which techniques the lack of details. This scenario depicts the need of experiment-
currently applied for single-system development can be adopted in ing SPL Testing approaches not in academia but rather in industry.
SPL are needed, since studies do not address such an issue. This study identified the growing interest in a well-defined SPL
Our mapping study has illustrated a number of areas in which Testing process, including tool support. Our findings in this sense
additional investigation would be useful, specially regarding are in line with a previous study conducted by Lamancha et al.
evaluation and validation research. In general, SPL testing lack [48], which reports on a systematic review on SPL testing, as men-
evidence, in many aspects. Regression test selection techniques, tioned in Section 2.
test automation and architecture-based regression testing are This mapping study also points out some topics that need addi-
points for future research as well as techniques that address tional investigation, such as quality attribute testing considering
the relationships between variability and testing and techniques variations in quality levels among products, how to maintain the
to handle traceability among test and development artifacts. traceability between development and test artifacts, and the man-
agement of variability through the whole development life cycle.
7. Threats to validity Regarding to the research method used, this study also contributed
improving the mapping study process, by defining and proposing
There are some threats to the validity of our study. They are de- ew steps as, protocol definition, collection form and quality
scribed and detailed as follows: criteria.
In our future agenda, we will combine the evidence identified
Research questions: The set of questions we defined might not have in this work with evidence from controlled experiments and
covered the whole SPL testing area, which implies that one may industrial SPL projects to define hypotheses and theories which
not find answers to the questions that concern them. As we consid- will be the base to design new methods, processes, and tools
ered this as a feasible threat, we had several discussion meetings for SPL testing.
with project members and experts in the area in order to calibrate
the questions. This way, even if we had not selected the most opti-
mum set of questions, we attempted to deeply address the most Acknowledgments
asked and considered open issues in the field.
Publication bias: We cannot guarantee that all relevant primary This work was partially supported by the National Institute of
studies were selected. It is possible that some relevant studies Science and Technology for Software Engineering (INES4), funded
were not chosen throughout the searching process. We miti- by CNPq and FACEPE, grants 573964/2008-4 and APQ-1037-1.03/08.
gated this threat to the extent possible by following references
in the primary studies. 4
http://www.ines.org.br
420 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
[18] T. Dybå, T. Dingsøyr, Empirical studies of agile software development: a [47] R. Kolb, D. Muthig, Making testing product lines more efficient by improving
systematic review, Information and Software Technology 50 (9–10) (2008) the testability of product line architectures, in: ROSATEA ’06: Proceedings of
833–859. the ISSTA Workshop on Role of Software Architecture for Testing and Analysis,
[19] T. Dybå, T. Dingsøyr, Strength of evidence in systematic reviews in software New York, NY, USA, 2006, pp. 22–27.
engineering, in: ESEM ’08: Proceedings of the Second ACM – IEEE International [48] B.P. Lamancha, M.P. Usaola, M.P. Velthius, Software product line testing – a
Symposium on Empirical Software Engineering and Measurement, ACM, New systematic review, in: ICSOFT International Conference on Software and Data
York, NY, USA, 2008, pp. 178–187. Technologies, INSTICC Press, 2009, pp. 23–30.
[20] O.O. Edwin, Testing in Software Product Lines. Master’s thesis, Department of [49] J.J. Li, D.M. Weiss, J.H. Slye, Automatic system test generation from unit tests of
Software Engineering and Computer Science, Blekinge Institute of Technology, exvantage product family, in: SPLIT ’07: Proceedings of the International
Sweden, 2007. Workshop on Software Product Line Testing, Kyoto, Japan, 2007a, pp. 73–
[21] E. Engström, M. Skoglund, P. Runeson, Empirical evaluations of regression test 80.
selection techniques: a systematic review, in: ESEM ’08: Proceedings of the [50] J.J. Li, B. Geppert, F. Roessler, D. Weiss, Reuse execution traces to reduce testing
Second ACM-IEEE International Symposium on Empirical Software of product lines, in: SPLIT ’07: Proceedings of the International Workshop on
Engineering and Measurement, New York, NY, USA, 2008, pp. 22–31 Software Product Line Testing, Kyoto, Japan, 2007b.
[22] Y. Feng, X. Liu, J. Kerridge, A product line based aspect-oriented generative unit [51] L.B. Lisboa, V.C. Garcia, D. Lucrédio, E.S. de Almeida, S.R. de Lemos Meira, R.P.
testing approach to building quality components, in: COMPSAC’07 – de Mattos Fortes, A systematic review of domain analysis tools, Information
Proceedings of the 31st Annual International Computer Software and and Software Technology 52 (1) (2010) 1–13.
Applications Conference, Washington, DC, USA, 2007, pp. 403–408. [52] J. McGregor, P. Sodhani, S. Madhavapeddi, Testing variability in a software
[23] D. Ganesan, U. Maurer, M. Ochs, B. Snoek, M. Verlage, Towards testing product line, in: SPLIT ’04: Proceedings of the International Workshop on
response time of instances of a web-based product line, in: SPLIT’05: Software Product Line Testing, Boston, Massachusetts, USA, 2004, p. 45
International Workshop on Software Product Line Testing, Rennes, France, [53] J.D. McGregor, Structuring test assets in a product line effort, in: ICSE’01: In
2005 Proceedings of the 2nd International Workshop on Software Product Lines:
[24] B. Geppert, J.J. Li, F. Rler, D.M. Weiss, Towards generating acceptance tests for Economics, Architectures, and Implications, Toronto, Ontario, Canada, 2001a,
product lines, in: ICSR’04: Proceedings of 8th International Conference on pp. 89–92
Software Reuse, 2004, pp. 35–48. [54] J.D. McGregor, Testing a software product line, Technical Report CMU/SEI-
[25] R.F. Goldsmith, D. Graham, The forgotten phase, in: Software Development 2001-TR-022, 2001b.
Magazine, 2002, pp. 45–47. [55] J.D. McGregor, Building reusable test assets for a product line, in: ICSR’02:
[26] T.L. Graves, M.J. Harrold, J.-M. Kim, A. Porter, G. Rothermel, An empirical study Proceedings of 7th International Conference on Software Reuse, Austin, Texas,
of regression test selection techniques, ACM Transaction on Software USA, 2002, pp. 345–346
Engineering Methodology 10 (2) (2001) 184–208. [56] M.B.S. Moraes, E.S. Almeida, S.R. de Lemos Meira, A systematic review on
[27] M.J. Harrold, Architecture-based regression testing of evolving systems, in: software product lines scoping, in: ESELAW’09: Proceedings of the VI
ROSATEA’98: International Worshop on Role of Architecture in Testing and Experimental Software Engineering Latin American Workshop, Sao Carlos-SP,
Analysis, Marsala, Sicily, Italy, 1998, pp. 73–77. Brazil, 2009
[28] J. Hartmann, M. Vieira, A. Ruder, A UML-based approach for validating product [57] H. Muccini, A. van der Hoek, Towards testing product line architectures,
lines, in: SPLIT ’04: Proceedings of the International Workshop on Software Electronic Notes in Theoretical Computer Science 82 (6) (2003).
Product Line Testing, Boston, MA, 2004, pp. 58–65. [58] H. Muccini, M.S. Dias, D.J. Richardson, Towards software architecture-based
[29] M. Jaring, R.L. Krikhaar, J. Bosch, Modeling variability and testability interaction regression testing, in: WADS ’05: Proceedings of the Workshop on Architecting
in software product line engineering, in: ICCBSS’08: 7th International Dependable Systems, New York, NY, USA, 2005, pp. 1–7.
Conference on Composition-Based Software Systems, 2008, pp. 120–129. [59] H. Muccini, M. Dias, D.J. Richardson, Software architecture-based regression
[30] L. Jin-hua, L. Qiong, L. Jing, The w-model for testing software product lines, in: testing, Journal of Systems and Software 79 (10) (2006) 1379–1396.
ISCSCT ’08: Proceedings of the International Symposium on Computer Science [60] C. Nebut, F. Fleurey, Y.L. Traon, J.-M. JTzTquel, A requirement-based approach
and Computational Technology, Los Alamitos, CA, USA, 2008, pp. 690–693. to test product families, in: PFE’03: Proceedings of 5th International Workshop
[31] N. Juristo, A.M. Moreno, S. Vegas, Reviewing 25 years of testing technique Software Product-Family Engineering, Siena, Italy, 2003, pp. 198–210.
experiments, Empirical Software Engineering 9 (1–2) (2004) 7–44. [61] C. Nebut, Y.L. Traon, J.-M. Jézéquel, System testing of product lines: from
[32] N. Juristo, A.M. Moreno, W. Strigel, Guest editors’ introduction: software requirements to test cases, in: [34], 2006, pp. 447–477.
testing practices in industry, IEEE Software 23 (4) (2006) 19–21. [62] D. Needham, S. Jones, A software fault tree metric, in: ICSM’06: Proceedings of
[33] N. Juristo, A.M. Moreno, S. Vegas, M. Solari, In search of what we the International Conference on Software Maintenance, Philadelphia,
experimentally know about unit testing, IEEE Software 23 (6) (2006) 72–80. Pennsylvania, USA, 2006, pp. 401–410.
[34] E. Kamsties, K. Pohl, S. Reis, A. Reuys, Testing variabilities in use case models, [63] L.M. Northrop, P.C. Clements, A framework for software product line practice,
in: PFE’03: Proceedings of 5th International Workshop Software Product- version 5.0. Technical report, Software Engineering Institute, 2007.
Family Engineering, Siena, Italy, 2003, pp. 6–18 [64] E. Olimpiew, H. Gomaa, Reusable system tests for applications derived from
[35] S. Kang, J. Lee, M. Kim, W. Lee, Towards a formal framework for product line software product lines, in: SPLIT ’05: Proceedings of the International
test development, in: CIT ’07: Proceedings of the 7th IEEE International Workshop on Software Product Line Testing, Rennes, France, 2005a.
Conference on Computer and Information Technology, Washington, DC, USA, [65] E.M. Olimpiew, H. Gomaa, Model-based testing for applications derived from
2007, pp. 921–926 software product lines, in: A-MOST’05: Proceedings of the 1st International
[36] R. Kauppinen, Testing framework-based software product lines, Master’s Workshop on Advances in model-based testing, New York, NY, USA., 2005b,
thesis, University of Helsinki Department of Computer Science, 2003. pp. 1–7.
[37] R. Kauppinen, J. Taina, Rita environment for testing framework-based software [66] E.M. Olimpiew, H. Gomaa, Reusable model-based testing, in: ICSR ’09:
product lines, in: SPLST’03 – Proceedings of the 8th Symposium on Proceedings of the 11th International Conference on Software Reuse, Berlin,
Programming Languages and Software Tools, Kuopio, Finland, 2003, pp. 58–69 Heidelberg, 2009, pp. 76–85.
[38] R. Kauppinen, J. Taina, A. Tevanlinna, Hook and template coverage criteria for [67] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in
testing framework-based software product families, in: SPLIT ’04: Proceedings software engineering, in: EASE ’08: Proceedings of the 12th International
of the International Workshop on Software Product Line Testing, Boston, MA, Conference on Evaluation and Assessment in Software Engineering, University
USA, 2004, pp. 7–12 of Bari, Italy, 2008.
[39] T. Kishi, N. Noda, Formal verification and software product lines, [68] K. Pohl, A. Metzger, Software product line testing, Communications of the ACM
Communications of the ACM 49 (12) (2006) 73–77. 49 (12) (2006) 78–81.
[40] B. Kitchenham, What’s up with software metrics? A preliminary mapping [69] K. Pohl, E. Sikora, Documenting variability in test artefacts, in: Software
study, Journal of Systems and Software 83 (1) (2010) 37–51. Product Lines, Springer, 2005, pp. 149–158.
[41] B. Kitchenham, S. Charters, Guidelines for performing Systematic Literature [70] K. Pohl, G. Böckle, F.J. van der Linden, Software Product Line Engineering:
Reviews in Software Engineering, Technical Report EBSE 2007-001, Keele Foundations, Principles and Techniques, Springer, 2005.
University and Durham University Joint Report, 2007. [71] R. Pretorius, D. Budgen, A mapping study on empirical evidence related to the
[42] B.A. Kitchenham, S.L. Pfleeger, L.M. Pickard, P.W. Jones, D.C. Hoaglin, K.E. Emam, models and forms used in the UML, in: ESEM’08: Proceedings of Empirical
J. Rosenberg, Preliminary guidelines for empirical research in software Software Engineering and Measurement, Kaiserslautern, Germany, 2008, pp.
engineering, IEEE Transactions on Software Engineering 28 (8) (2002) 721–734. 342–344.
[43] B.A. Kitchenham, T. Dyba, M. Jorgensen, Evidence-based software engineering, [72] S. Reis, A.P.K. Metzger, A reuse technique for performance testing of software
in: ICSE’04 : Proceedings of the 26th International Conference on Software product lines, in: SPLIT ’05: Proceedings of the International Workshop on
Engineering, Washington, DC, USA, 2004, pp. 273–281 Software Product Line Testing, Baltimore, Maryland, USA, 2006.
[44] B.A. Kitchenham, E. Mendes, G.H. Travassos, Cross versus within-company cost [73] S. Reis, A. Metzger, K. Pohl, Integration testing in software product line
estimation studies: a systematic review, IEEE Transactions on Software engineering: a model-based technique, in: FASE’07: Proceedings of the
Engineering 33 (5) (2007) 316–329. Fundamental Approaches to Software Engineering, Braga, Portugal, 2007, pp.
[45] R. Kolb, A risk-driven approach for efficiently testing software product lines, 321–335.
in: IESEF’03 – Fraunhofer Institute for Experimental Software Engineering, [74] A. Reuys, E. Kamsties, K. Pohl, S. Reis, Model-based system testing of software
2003. product families, in: CAiSE’05: Proceedings of International Conference on
[46] R. Kolb, D. Muthig, Challenges in testing software product lines, in: Advanced Information Systems Engineering, 2005, pp. 519–534.
CONQUEST’03 – Proceedings of 7th Conference on Quality Engineering in [75] A. Reuys, S. Reis, E. Kamsties, K. Pohl, The scented method for testing software
Software Technology, Nuremberg, Germany, 2003, pp. 81–95 product lines, in: [34], 2006, pp. 479–520.
P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423 423
[76] G. Rothermel, M.J. Harrold, Analyzing regression test selection techniques, [81] D.M. Weiss, The product line hall of fame, in: SPLC ’08: Proceedings of the 2008
IEEE Transactions on Software Engineering 22 (8) (1996) 529–551. 12th International Software Product Line Conference, IEEE Computer Society,
[77] J. Rumbaugh, I. Jacobson, G. Booch, Unified Modeling Language Reference Washington, DC, USA, 2008, p. 39.
Manual, second ed., Pearson Higher Education, 2004 [82] R. Wieringa, N.A.M. Maiden, N.R. Mead, C. Rolland, Requirements engineering
[78] E.D. Souza Filho, R. Oliveira Cavalcanti, D.F. Neiva, T.H. Oliveira, L.B. Lisboa, E.S. paper classification and evaluation criteria: a proposal and a discussion,
Almeida, S.R. Lemos Meira, Evaluating domain design approaches using Requirements Engineering 11 (1) (2006) 102–107.
systematic review, in: ECSA ’08: Proceedings of the 2nd European [83] A. Wübbeke, Towards an efficient reuse of test cases for software product lines,
conference on Software Architecture, Berlin, Heidelberg, 2008, pp. 50–65. in: SPLC’08: Proceedings of Software Product Line Conference, 2008, pp. 361–
[79] A. Tevanlinna, J. Taina, R. Kauppinen, Product family testing: a survey, ACM 368.
SIGSOFT Software Engineering Notes 29 (2) (2004) 12. [84] H. Zeng, W. Zhang, D. Rine, Analysis of testing effort by using core assets in
[80] D. Šmite, C. Wohlin, T. Gorschek, R. Feldt, Empirical evidence in global software software product line testing, in: SPLIT ’04: Proceedings of the International
engineering: a systematic review, Empirical Software Engineering 15 (1) Workshop on Software Product Line Testing, Boston, MA. 2004, pp. 1–6.
(2010) 91–118.