Information and Software Technology

Information and Software Technology 53 (2011) 407–423
Contents lists available at ScienceDirect
Information and Software Technology

journal homepage: www.elsevier.com/locate/infsof
A systematic mapping study of software product lines testing

Paulo Anselmo da Mota Silveira Neto a,b,⇑, Ivan do Carmo Machado a,b, John D. McGregor d,
Eduardo Santana de Almeida a,c, Silvio Romero de Lemos Meira a,b
a
RiSE - Reuse in Software Engineering, Recife, PE, Brazil
b
Informatics Center, Federal University of Pernambuco, Recife, PE, Brazil
c
Computer Science Department, Federal University of Bahia, Salvador, BA, Brazil
d
Computer Science Department, Clemson University, Clemson, SC, USA
a r t i c l e i n f o a b s t r a c t
Article history: Context: In software development, Testing is an important mechanism both to identify defects and assure
Received 9 March 2010 that completed products work as specified. This is a common practice in single-system development, and
Received in revised form 30 November 2010 continues to hold in Software Product Lines (SPL). Even though extensive research has been done in the
Accepted 4 December 2010
SPL Testing field, it is necessary to assess the current state of research and practice, in order to provide
Available online 16 December 2010
practitioners with evidence that enable fostering its further development.
Objective: This paper focuses on Testing in SPL and has the following goals: investigate state-of-the-art
Keywords:
testing practices, synthesize available evidence, and identify gaps between required techniques and exist-
Software product lines
Software testing
ing approaches, available in the literature.
Mapping study Method: A systematic mapping study was conducted with a set of nine research questions, in which 120
studies, dated from 1993 to 2009, were evaluated.
Results: Although several aspects regarding testing have been covered by single-system development
approaches, many cannot be directly applied in the SPL context due to specific issues. In addition, partic-
ular aspects regarding SPL are not covered by the existing SPL approaches, and when the aspects are cov-
ered, the literature just gives brief overviews. This scenario indicates that additional investigation,
empirical and practical, should be performed.
Conclusion: The results can help to understand the needs in SPL Testing, by identifying points that still
require additional investigation, since important aspects regarding particular points of software product
lines have not been addressed yet.
Ó 2010 Elsevier B.V. All rights reserved.
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
2. Related work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
3. Literature review method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
4. Research directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
4.1. Protocol definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
4.2. Question structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
4.3. Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
5. Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.1. Search strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.2. Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.3. Studies selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.3.1. Reliability of inclusion decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
5.4. Quality evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
5.5. Data extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6. Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
⇑ Corresponding author at: RiSE - Reuse in Software Engineering, Recife, PE,

Brazil.
E-mail address: pauloadmsn@gmail.com (P.A. da Mota Silveira Neto).
0950-5849/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved.
doi:10.1016/j.infsof.2010.12.003
408 P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423
6.1. Classification scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

6.2. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6.2.1. Testing strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
6.2.2. Static and dynamic analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
6.2.3. Testing levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
6.2.4. Regression testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
6.2.5. Non-functional testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
6.2.6. Commonality and variability testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
6.2.7. Variant binding time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
6.2.8. Effort reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
6.2.9. Test measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
6.3. Analysis of the results and mapping of studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
6.3.1. Main findings of the study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
7. Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
8. Concluding remarks and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Appendix A. Quality studies scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
Appendix B. List of conferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Appendix C. List of journals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
1. Introduction ested in identifying practices adopted in single systems develop-

ment that may be suitable for SPL.
The increasing adoption of Software Product Lines practices in The study also highlights the gaps and identifies trends for re-
industry has yielded decreased implementation costs, reduced search and development. Moreover, it is based on analysis of inter-
time to market and improved quality of derived products esting issues, guided by a set of research questions. This systematic
[17,63]. In this approach, as in single-system development, testing mapping process was conducted from July to December in 2009.
is essential [36] to uncover defects [68,75]. A systematic testing The remainder of this paper is organized as follows: Section 2
approach can save significant development effort, increase prod- presents the related work. In Section 3 the method used in this
uct quality and, customer satisfaction and lower maintenance study is described. Section 4 presents the planning phase and
costs [32]. the research questions addressed by this study. Section 5 de-
As defined in [54], testing in SPL aims to examine core assets, scribes its execution, presenting the search strategy used and
shared by many products derived from a product line, their indi- the resultant selected studies. Section 6 presents the classification
vidual parts and the interaction among them. Thus, testing in this scheme adopted in this study and reports the findings. In Section 7
context encompasses activities from the validation of the initial the threats to validity are described. Section 8 draws some conclu-
requirements to activities performed by customers to complete sions and provides recommendations for further research on this
the acceptance of a product, and confirms that testing is still the topic.
most effective method of quality assurance, as observed in [46].
However, despite the obvious benefits aforementioned, the
state of software testing practice is not as advanced in general as 2. Related work
software development techniques [32] and, the same holds true
in the SPL context [37,79]. From an industry point of view, with As mentioned before, the literature on SPL Testing provides a
the growing SPL adoption by companies [81], more efficient and large number of studies, regarding both general and specific issues,
effective testing methods and techniques for SPL are needed, since as will be discussed later on in this study. Amongst them, we have
the currently available techniques, strategies and methods make identified some studies developed in order to gather and evaluate
testing a very challenging process [46]. Moreover, the SPL Testing the available evidence in the area. They are thus considered as hav-
field has attracted the attention of many researchers in the last ing similar ideas to our mapping study and are next described.
years, which result in a large number of publications regarding A survey on SPL Testing was performed by Tevanlinna et al.
general and specific issues. However, the literature has provided [79]. They studied approaches to product line testing methodology
lots of approaches, strategies and techniques, but rather surpris- and processes that have been developed for or that can be applied
ingly little in the way of widely-known empirical assessment of to SPL, laying emphasis on regression testing. The study also eval-
their effectiveness. uates the state-of-the-art in SPL testing, up to the date of the paper,
This paper presents a systematic mapping study [67], per- 2004, and highlighted problems to be addressed.
formed in order to map out the SPL Testing field, through synthe- A thesis on SPL Testing published in 2007 by Edwin [20], inves-
sizing evidence to suggest important implications for practice, as tigated testing in SPL and possible improvements in testing steps,
well as identifying research trends, open issues, and areas for tools selections and application applied in SPL testing. It was con-
improvement. Mapping study [67] is an evidence-based approach, ducted using the systematic review approach.
applied in order to provide an overview of a research area, and A systematic review was performed by Lamancha et al. [48] and
identify the quantity and type of research and results available published in 2009. Its main goal was to identify experience reports
within it. The results are gained from a defined approach to locate, and initiatives carried out in Software Engineering related to test-
assess and aggregate the outcomes from relevant studies, thus pro- ing in software product lines. In order to accomplish that, the
viding a balanced and objective summary of the relevant evidence. authors classified the primary studies in seven categories, includ-
Hence, the goal of this investigation is to identify, evaluate, and ing: Unit testing, Integration testing, functional testing, SPL Archi-
synthesize state-of-the-art testing practices in order to present tecture, Embedded system, testing process and testing effort in SPL.
what has been achieved so far in this discipline. We are also inter- After that a summary of each area was presented.
P.A. da Mota Silveira Neto et al. / Information and Software Technology 53 (2011) 407–423 409
Fig. 1. The systematic mapping process (adapted from Petersen et al. [67]).
Fig. 2. Stages of the selection process.
These studies can be considered good sources of information increased rapidly, as shown in Fig. 4, it justifies the need of more
on this subject. In order to develop our work, we considered up to date empirical research in this area to contribute to the com-
every mentioned study, since they bring relevant information. munity investigations.
However, we have noticed that important aspects, such as
regression testing, testing of non-functional requirements and
the relation between variant binding time and testability, were 3. Literature review method
not covered by them in an extent that should be possible to
map out the current status of research and practice of the area. The method used in this research is a Systematic Mapping Study
Thus, we categorized a set of important research areas under (henceforth abbreviated to as ‘MS’) [12,67]. A MS provides a sys-
SPL testing, focusing on aspects addressed by the studies men- tematic and objective procedure for identifying the nature and ex-
tioned before as well as the areas they did not addressed, but tent of the empirical study data that is available to answer a
are directly related to SPL practices, in order to perform critical particular research question [12].
analysis and appraisal. In order to accomplish our goals in this While a Systematic Review is a mean of identifying, evaluating
work, we followed the guidelines for mapping studies develop- and interpreting all available research relevant to a particular
ment presented in [12]. We also included threats mitigation question [41], a MS intends to ‘map out’ the research undertaken
strategies in order to have the most reliable results. rather than to answer detailed research question [12,67]. A well-
We believe our study states current and relevant information on organized set of good practices and procedures for undertaking
research topics that can complement others previously published. MS in the software engineering context is defined in [12,67], which
By current, we mean that, as the number of studies published has establishes the base for the study presented in this paper. It is
Fig. 3. Primary studies filtering categorized by source.
Fig. 4. Distribution of primary studies by their publication years.
worthwhile to highlight that the importance and use of MS in the To help to plan new research, avoiding unnecessary duplication
software engineering area is increasing [1,5,12,15,33,40,67,71], of effort and error;
showing the relevance and potential of the method. Nevertheless, To identify gaps and clusters in a set of primary studies, in order
of the same way as systematic reviews [10,13,51,56,78], we need to identify topics and areas to perform more complete system-
more MS related to software product lines, in order to evolve the atic reviews.
field with more evidence [43].
A MS comprises the analysis of primary studies that investigate The experimental software engineering community is working
aspects related predefined research questions, aiming at integrat- towards the definition of standard processes for conducting map-
ing and synthesizing evidence to support or refute particular re- ping studies. This effort can be checked out in Petersen et al.
search hypotheses. The main reasons to perform a MS can be [67], a study describing how to conduct systematic mapping stud-
stated as follows, as defined by Budgen et al. [12]: ies in software engineering. The paper provide a well defined pro-
cess which serves as a starting point for our work. We merged
To make an unbiased assessment of as many studies as possible, ideas from Petersen et al. [67] with good practices defined in the
identifying existing gaps in current research and contributing to guidelines published by Kitchenham and Charters [41]. This way,
the research community with the reliable synthesis of the data; we could apply a process for mapping study including good prac-
To provide a systematic procedure for identifying the nature tices of conducting systematic reviews, making better use of the
and extent of the empirical study data that is available to both techniques.
answer research questions; This blending process enabled us to include topics not covered
To map out the research that has been undertaken; by Petersen et al. [67] in their study, such as:
Protocol. This artifact was adopted from systematic review To avoid duplication, we detail the content of the protocol in the
guidelines. Our initial activity in this study was to develop a Section 5, as we describe how the study was conducted.
protocol, i.e. a plan defining the basic mapping study proce-
dures. Searching in the literature, we noticed that some studies 4.2. Question structure
created a protocol (e.g. [2]), but others do not (e.g. [15,67]).
Even though this is not a mandatory artifact, as mentioned by The research questions were framed by three criteria:
Petersen et al. [67], authors who created a protocol in their
studies encourage the use this artifact as being important to Population. Published scientific literature reporting software
evaluate and calibrate the mapping study process. testing and SPL testing.
Collection form. This artifact was also adopted from systematic Intervention. Empirical studies involving SPL Testing practices,
review guidelines and its main purpose is to help the research- techniques, methods and processes.
ers in order to collect all the information needed to address the Outcomes. Type and quantity of evidence relating to various SPL
review questions, study quality criteria and classification testing approaches, in order to identify practices, activities and
scheme. research issues concerning to this area.
Quality criteria. The purpose of quality criteria is to evaluate the
studies, as a means of weighting their relevance against others. 4.3. Research questions
Quality criteria are commonly used when performing system-
atic literature reviews. The quality criteria were evaluated inde- As previously stated, the objective of this study is to under-
pendently by two researchers, hopefully reducing the likelihood stand, characterize and summarize evidence, identifying activities,
of erroneous results. practical and research issues regarding research directions in SPL
Testing. We focused on identifying how the existing approaches
Some elements, as proposed by Petersen et al. [67] were also deal with testing in SPL. In order to define the research questions,
changed and/or rearranged in this study, such as: our efforts were based on topics addressed by previous research on
SPL testing [20,46,79]. In addition, the research questions defini-
Phasing mapping study. As can be seen in Fig. 1, the process was tion task was aided by discussions with expert researchers and
explicitly split into three main phases: 1 – Research directives, 2 practitioners, in order to encompass relevant and still open issues.
– Data collection and 3 – Results. It is in line with systematic Nine research questions were derived from the objective of the
reviews practices [41], which defines planning, conducting and study. Answering these questions led a detailed investigation of
reporting phases. Phases are named differently from what is practices arising from the identified approaches, which support
defined for systematic reviews, but the general idea and objec- both industrial and academic activities. The research questions,
tive for each phase was followed. In the first, the protocol and and the rationale for their inclusion, are detailed below.
the research questions are established. This is the most impor-
tant phase, since the research goal is satisfied with answers to Q1. Which testing strategies are adopted by the SPL Testing
these questions. The second phase comprises the execution of approaches? This question is intended to identify the testing
the MS, in which the search for primary studies is performed. strategies adopted by a software product line approach [79].
This consider a set of inclusion and exclusion criteria, used in By strategy, we mean understanding when assets are tested,
order to select studies that may contain relevant results accord- considering the differentiation between the two SPL develop-
ing to the goals of the research. In third phase, the classification ment processes: core asset and product development.
scheme is developed. This was built considering two facets, in Q2. What are the existing static and dynamic analysis techniques
which one structured the topic in terms of the research ques- applied to the SPL context? This question is intended to identify
tions, and other considered different research types as defined the analysis type (static and dynamic testing [54]) applied along
in [67]. The results of a meticulous analysis performed with the software development life cycle.
every selected primary study is reported, in a form of a mapping Q3. Which testing levels commonly applicable in single-systems
study. All phases are detailed in next sections. development are also used in the SPL approaches? Ammann and
Offutt [4] and Jaring et al. [29] advocate different levels of test-
ing (unit, integration, system and acceptance tests) where each
4. Research directives
level is associated with a development phase, emphasizing
development and testing equally.
This section presents the first phase of the mapping study pro-
Q4. How do the product line approaches handle regression testing
cess, in which the protocol and research questions are defined.
along software product line life cycle? Regression testing is done
when changes are made to already tested artifacts [36,76].
4.1. Protocol definition Regression tests often are automated since test cases related
to the core assets may be repeated every time a new product
The protocol forms the research plan for an empirical study, and is derived [63]. Thus, this question investigates the regression
is an important resource for anyone who is planning to undertake a techniques applied to SPL.
study or considering performing any form of replication study. Q5. How do the SPL approaches deal with tests of non-functional
In this study, the purpose of the protocol is to guide the research requirements? This question seeks clarification on how tests of
objectives and clearly define how it should be performed, through non-functional requirements should be handled.
defining research questions and planning how the sources and Q6. How do the testing approaches in an SPL organization handle
studies selected will be used to answer those questions. Moreover, commonality and variability? An undiscovered defect in the com-
the classification scheme to be adopted in this study was prior de- mon core assets of a SPL will affect all applications and thus will
fined and documented in the protocol. have a severe effect on the overall quality of the SPL [68]. In this
Incremental reviews to the protocol were performed in accor- sense, answering this question requires an investigation into
dance with the MS method. The protocol was revisited in order how the testing approaches handle commonality issues through
to update it based on new information collected as the study the software life cycle, as well as gathering information on how
progressed. variability affects testability.
Q7. How do variant binding times affect SPL testability? According Table 1
to [29], variant binding time determines whether a test can be List of research strings.
performed at a given development or deployment phase. Thus, Research strings

the identification and analysis of the suitable moment to bind a 1 Verification AND validation AND (‘‘product line’’ OR ‘‘product family’’ OR
variant determines the appropriate testing technique to handle ‘‘SPL’’)
the specific variant. 2 ‘‘Static analysis’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
Q8. How do the SPL approaches deal with test effort reduction? The 3 ‘‘Dynamic testing’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
4 ‘‘Dynamic analysis’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
objective is to analyze within selected approaches the most 5 Test AND level AND (‘‘product line’’ OR ‘‘product family’’ OR SPL)
suitable ways to achieve effort reduction, as well as to under- 6 Variability OR commonality AND testing
stand how they can be accomplished within the testing levels. 7 Variability AND commonality AND testing AND (‘‘product line’’ OR
Q9. Do the approaches define any measures to evaluate the testing ‘‘product family’’ OR ‘‘SPL’’)
8 Binding AND test AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
activities? This question requires an investigation into the data
9 Test AND ‘‘effort reduction’’ AND (‘‘product line’’ OR ‘‘product family’’ OR
collected by the various SPL approaches with respect to testing ‘‘SPL’’)
activities. 10 ‘‘Test effort’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
11 ‘‘Test effort reduction’’ AND (‘‘product line’’ OR ‘‘product family’’ OR
‘‘SPL’’)
5. Data collection 12 ‘‘Test automation’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
13 ‘‘Regression test’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
In order to answer the research questions, data was collected 14 ‘‘Non-functional test’’ AND (‘‘product line’’ OR ‘‘product family’’ OR
from the research literature. These activities involved developing ‘‘SPL’’)
15 Measure AND test AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
a search strategy, identifying data sources, selecting studies to ana-
16 ‘‘Testing framework’’ AND (‘‘product line’’ OR ‘‘product family’’ OR ‘‘SPL’’)
lyze, and data analysis and synthesis. 17 Performance OR security AND (‘‘product line’’ OR ‘‘product family’’ OR
‘‘SPL’’)
18 Evaluation OR validation AND (‘‘product line’’ OR ‘‘product family’’ OR
5.1. Search strategy
‘‘SPL’’)
The search strategy was developed by reviewing the data

needed to answer each of the research questions. since they are considered the world leading publishers for high
The initial set of keywords was refined after a preliminary quality publications [11].
search returned too many results with few relevance. We used sev- Next, conference proceedings were also searched. In cases
eral combinations of search items until we had achieved a suitable which the conference keep the proceedings in a website, making
set of keywords. These are: Verification, Validation; Product Line, them available, we accessed the website. When proceedings were
Product Family; Static Analysis, Dynamic Analysis; Variability, not available by the conference website, the search was done
Commonality, Binding; Test Level; Test Effort, Test Measure; Non-func- through DBLP Computer Science Bibliography. 2
tional Testing; Regression Testing, Test Automation, Testing Frame- When searching conference proceedings and journals, many
work, Performance, Security, Evaluation, Validation, as well as their were the results that had already been found in the search through
similar nouns and syntactic variations (e.g. plural form). All terms digital libraries. In this case, we discarded the last results, consid-
were combined with the term ‘‘Product Line’’ and ‘‘Product Family’’ ering only the first, that had already been included in our results
by using Boolean ‘‘AND’’ operator. They all were joined each other list.
by using ‘‘OR’’ operator so that it could improve the completeness The lists of Conferences and Journals used in the search for pri-
of the results. The complete list of search strings is available in Ta- mary studies are available in Appendices B and C.
ble 1 and also in a website developed to show detailed information After performing the search for publications in conferences,
on this MS.1 journals, using digital libraries and proceedings, we noticed that
known publications, commonly referenced by other studies in this
5.2. Data sources field, such as important technical reports and thesis, had not been
included in our results list. We thus decided to include these grey
The search included important journals and conferences regard- literature entries. Grey literature is used to describe materials not
ing the research topic such as Software Engineering, SPL, Software published commercially or indexed by major databases.
Verification, Validation and Testing and Software Quality. The search
was also performed using the ‘snow-balling’ process, following up
the references in papers and it was extended to include grey liter- 5.3. Studies selection
ature sources, seeking relevant white papers, industrial (and tech-
nical) reports, thesis, work-in-progress, and books. The set of search strings was thus applied within the search en-
We restricted the search to studies published up to December gines, specifically in those mentioned in the previous section. The
2009. We indeed did not establish an inferior year-limit, since studies selection involved a screening process composed of three
our intention was to have a broader coverage of this research field. filters, in order to select the most suitable results, since the likeli-
This was decided due to many important issues that emerged ten hood of retrieving not adequate studies might be high. Fig. 2 briefly
or more years ago are still considered open issues, as pointed out describes what was considered in each filter. Moreover, the Fig-
in [7,31]. ure depicts the amount of studies remaining after applying each
The initial step was to perform a search using the terms de- filter.
scribed in Subsection 5.1, at the digital libraries web search en- The inclusion criteria were used to select all studies during the
gines. We considered publications retrieved from ScienceDirect, search step. After that, the same exclusion criteria was firstly ap-
SCOPUS, IEEE Xplore, ACM Digital Library and Springer Link tools. plied in the studies title and after in the abstracts and conclusions.
The second step was to search within top international, peer-re- Regarding the inclusion criteria, the studies were included if they
viewed journals published by Elsevier, IEEE, ACM and Springer, involved:
1 2
http://www.cin.ufpe.br/sople/testing/ms/ http://www.informatik.uni-trier.de/ley/db/
Fig. 5. Amount of studies vs. sources.
SPL approaches which address testing concerns. Approaches that acting as reviewers. A high-level agreement existed before the
include information on methods and techniques and how they study was included. In case the researchers did not agree after dis-
are handled and, how variabilities and commonalities influence cussion, an expert in the area was contacted to discuss and give
software testability. appropriate guidance.
SPL testing approaches which address static and dynamic analysis.
Approaches that explicitly describe how static and dynamic
5.4. Quality evaluation
testing applies to different testing phases.
SPL testing approaches which address software testing effort con-
In addition to general inclusion/exclusion criteria, the quality
cerns. Approaches that describe the existence of automated
evaluation mechanism, usually applied in systematic reviews
tools as well as other strategies used in order to reduce test
[18,19,44], was applied in this study in order to assess the trust-
effort, and metrics applied in this context.
worthiness of the primary studies. This assessment is necessary
to limit bias in conducting this empirical study, to gain insight into
Studies were excluded if they involved:
potential comparisons, and to guide interpretation of findings.
The quality criteria we used served as a means of weighting the
SPL approaches with insufficient information on testing. Studies
importance of individual studies, enhancing our understanding,
that do not have detailed information on how they handle SPL
and developing more confidence in the analysis.
testing concepts and activities.
As mapping study guidelines [67] does not establish a formal
Duplicated studies. When the same study was published in dif-
evaluation in the sense of quality criteria, we chose to assess each
ferent papers, the most recent was included.
of the primary studies by principles of good practice for conducting
Or if the study had already been included from another source.
empirical research in software engineering [41], tailoring the idea
of assessing studies by a set of criteria to our specific context.
Fig. 3 depicts a Bar Chart with the results categorized by source
Thus, the quality criteria for this evaluation is presented in
and filter, as described in Section 5.2. Fig. 4 shows the distribution
Table 2. Criteria grouped as A covered a set of issues pertaining
of the primary studies, considering the publication year. This Fig-
to quality that need to be considered when appraising the studies
ure briefly gives us the impression that the SPL Testing area is
identified in the review, according to [42]. Groups B and C assess
becoming more interesting, whereas the growing number of pub-
the quality considering SPL Testing concerns. The former was
lications claims the trend that many solutions have become re-
focused on identifying how well the studies address testing issues
cently available (disregarding 2009, since many studies might
not be made available by search engines until the time the search
Table 2
was performed, and thus we did not consider in this study). Quality criteria.
An important point to highlight is that, between 2004 and 2008
an important international workshop devoted specifically to SPL Group ID Quality criteria
testing, the SPLiT workshop,3 demonstrated the interest of the re- A 1 Are there any roles described?
search community on expanding this field. Fig. 5 shows the amount 2 Are there any guideline described?
3 Are there inputs and outputs described?
of publications considering their sources. In fact, it can be seen that 4 Does it detail the test artifacts?
peaks in Fig. 4 match with the years when this workshop occurred.
B 5 Does it detail the validation phase?
All the studies are listed in Appendix A. 6 Does it detail the verification phase?
7 Does it deal with Testing in Requirements phase?
5.3.1. Reliability of inclusion decisions 8 Does it deal with Testing in Architectural phase?
9 Does it deal with Testing in Implementation phase?
The reliability of decisions to include a study is ensured by hav- 10 Does it deal with Testing in Deployment phase?
ing multiple researchers to evaluate each study. The study was
C 11 Does it deal with binding time?
conducted by two research assistants (the two first authors) who 12 Does it deal with variability testing?
were responsible for performing the searches and summarizing 13 Does it deal with commonality testing?
the results of the mapping study, with other members of the team 14 Does it deal with effort reduction?
15 Does it deal with non-functional tests?
3
16 Does it deal with any test measure?
c.f. http://www.biglever.com/split2008/
quality criteria. The following information was extracted from each

study: title and authors; source: conference/journal; publication year;
the answers for research questions addressed by the study; summary:
a brief overview on its strengths and weak points; quality criteria score
according to the Table 2; reviewer name; and the date of the review.
At the beginning of the study, we decided that when several
studies were reported in the same paper, each relevant study
was treated separately. Although, this situation did not occur.
6. Outcomes
In this section, we describe the classification scheme and the re-

sults of data extraction. When having the classification scheme in
place, the relevant studies are sorted into the scheme, which is
Fig. 6. Distribution of papers according to classification scheme. the real data extraction process. The results of this process is the
mapping of studies, as presented at the end of this section, together
with concluding remarks.
6.1. Classification scheme
We decided to use the idea of categorizing studies in facets, as

described by Petersen et al. [67], since we considered this as a
structured way of doing such a task. Our classification scheme
assembled two facets. One facet structured the topic in terms of
the research questions we defined. The other considered the type
of research.
In the second, our study used the classification of research ap-
proaches described by Wieringa et al. [82]. According to Petersen
et al. [67], which also used this approach, the research facet which
reflects the research approach used in the papers is general and
independent from a specific focus area. The classes that form the
research facet are described in Table 3.
Fig. 7. Distribution of papers according to intervention. The classification was performed after applying the filtering
process, i.e. only the final set of studies was classified and are con-
sidered. The results of the classification is presented at the end of
along the SPL development life cycle which is usually composed in
this section (Fig. 8).
general by scoping, requirement, design, implementation and test-
ing phases. The latter evaluated how well our research questions
6.2. Results
were addressed by individual studies. This way a better quality
score matched studies which covered the larger amount of
In this sub-section, each topic presents the findings of a sub-re-
questions.
search question, highlighting evidences gathered from the data
The main purpose of this grouping is justified by the difficulty
extraction process. These results populate the classification
faced in establishing a reliable relationship between final quality
scheme, which evolves while doing the data extraction.
score and the real quality of each study. Some primary studies
(e.g. one which addresses some issue in a very detailed way) are
6.2.1. Testing strategy
referenced in several other primary studies, but if we apply the
By analyzing the primary studies, we have found a wide variety of
complete quality criteria items, the final score is lower than others
testing strategies. Tevanlinna and Reuys, respectively [75] and [79]
which do not have the same relevance. This way, we intended to
present a similar set of strategies to SPL testing development, that
have a more valid and reliable quality assessment instrument.
are applicable to any development effort since the descriptions of
Each of the 45 studies was assessed independently by the
the strategies are generic. We herein use the titles of the topics they
researchers according to the 16 criteria shown in Table 2. Taken to-
outlined, after making some adjustments, as a structure for aggre-
gether, these criteria provided a measure of the extent to which we
gating other studies which use a similar approach, as follows:
could be confident that a particular study could give a valuable
contribution to the mapping study. Each of the studies was graded
Testing product by product: This approach ignores the possibility
on a trichotomous (yes, partly or no) scale and tagged 1, 0.5 and 0.
of reuse benefits. This approach offers the best guarantee of
We did not use the grade to serve as a threshold for the inclusion
product quality but is extremely costly. In [30], a similar
decision, but rather to identify the primary studies that would
approach is presented, named as pure application strategy, in
form a valid foundation for our study. We note that, overall, the
which testing is performed only for a concrete product in the
quality of the studies was good. It is possible to check every grade
product development. No test is performed in the core asset
in Appendix A, where the most relevant are highlighted.
development. Moreover, in this strategy, tests for each derived
application are developed independently from each other,
5.5. Data extraction which results in an extremely high test effort, as pointed out
by [75]. This testing strategy is similar to the test in single-prod-
The data extraction forms must be designed to collect all the uct engineering, because without reuse the same test effort is
information needed to address the research questions and the required for each new application.
Table 3
Research type facet.
Classes Description
Validation Techniques investigated are novel and have not yet been implemented in practice. Techniques used are for example experiments, i.e., work done in
research the lab
Evaluation Techniques are implemented in practice and an evaluation of the technique is conducted. That means, it is shown how the technique is
research implemented in practice (solution implementation) and what are the consequences of the implementation in terms of benefits and drawbacks
(implementation evaluation). This also includes to identify problems in industry
Solution proposal A solution for a problem is proposed, the solution can be either novel or a significant extension of an existing technique. The potential benefits and
the applicability of the solution is shown by a small example or a good line of argumentation
Philosophical These papers sketch a new way of looking at existing things by structuring the field in form of a taxonomy or conceptual framework
Papers
Opinion papers These papers express the personal opinion of somebody whether a certain technique is good or bad, or how things should been done. They do not
rely on related work and research methodologies
Experience Experience papers explain what and how something has been done in practice. It has to be the personal experience of the author
Papers
Fig. 8. Visualization of a systematic map in the form of a bubble plot.
Incremental testing of product lines: The first product is tested Division of responsibilities: This strategy relates to select testing
individually and the following products are tested using regres- levels to be applied in both domain and application engineering,
sion testing techniques [26,76]. Regression testing focuses on depending upon the objective of each phase, i.e. whether think-
ensuring that everything used to work still works, i.e. the prod- ing about developing for or with reuse [79]. This division can be
uct features previously tested are re-tested through a regression clearly seen when the assets are unit tested in domain engineer-
technique. ing and, when instantiated in application engineering, integra-
Opportunistic reuse of test assets: This strategy is applied to reuse tion, system and acceptance testing are performed.
application test assets. Assets for one application are developed.
Then, the application derived from the product line use the As SPL Testing should be a reuse-based test derivation for testing
assets developed for the first application. This form of reuse is products within a product line [84], the Testing product by product
not performed systematically, which means that there is no and Opportunistic reuse of test assets strategies cannot be considered
method that supports the activity of selecting the test assets effective for the SPL context, since the first does not consider the re-
[75]. use benefits which results in costs of testing resembling single-sys-
Design test assets for reuse: Test assets are created as early as tems development. In the second, no method is applied, hence, the
possible in domain engineering. Domain test aims at testing activity may not be repeatable, and may not avoid the redundant
common parts and preparing for testing variable parts [30]. In re-execution of test cases, which can thus increase costs.
application engineering, these test assets are reused, extended These strategies can be considered a feasible grouping of what
and refined to test specific applications [30,75]. General studies on SPL testing approaches have been addressing, which
approaches to achieve core assets reuse are: repository, core can show us a more generic view on the topic.
assets certification, and partial integration [84]. Kishi and Noda 6.2.2. Static and dynamic analysis
[39] state that a verification model can be shared among appli- An effective quality strategy for a software product line requires
cations that have similarities. The SPL principle design for reuse both static and dynamic analysis techniques. Techniques for static
is fully addressed by this strategy, which can enable the overall analysis are often dismissed as more expensive, but in a software
goals of reducing cost, shortening time-to-market, and increas- product line, the cost of static analysis can be amortized over multi-
ing quality [75]. ple products.
A number of studies advocate the use of inspections and Acceptance testing: Acceptance testing is conducted by the cus-
walkthroughs [29,54,79] and formal verification techniques, as tomer but often the developing organization will create and
static analysis techniques/methods for SPL, to be conducted prior execute a preliminary set of acceptance tests. In a software
to dynamic analysis, i.e. with the presence of executable code. product line organization, commonality among the tests needed
[54] presents an approach for Guided Inspection, aimed at apply- for the various products is leveraged to reduce costs.
ing the discipline of testing to the review of non-software assets.
In [39], a model checker is defined that focuses on design verifi- A similar division is stated by Wieringa et al. [55], in which the
cation instead of code verification. This strategy is considered author defines two separated test processes used in product line
effective because many defects are injected during the design organization, Core Asset Testing and Product Testing.
phase [39]. Some authors [64,75,83] also include system testing in core asset
Regarding dynamic analysis, some studies [29,47] recommend development. The rationale for including such a level is to produce
the V-model phases, commonly used in single-systems, to struc- abstract test assets to be further reused and adapted when deriving
ture a series of dynamic analysis. The V-model gives equal weight products in the product development phase.
to development and testing rather than treating testing as an after-
thought [25]. However, despite the well-defined test process pre- 6.2.4. Regression testing
sented by V-model, its use in SPL context requires some Even though regression testing techniques have been re-
adaptation, as applied in [29]. searched for many years, as stated in [21,26,76], no study gives evi-
The relative amount of dynamic and static analysis depends on dence on regression testing practices applied to SPL. Some
both technical and managerial strategies. Technically, series of fac- information is presented by a few studies [46,57], where just a
tors such as test-first development or model-based development brief overview on the importance of regression testing is given,
determine the focus. Model-based development emphasizes static but they do not take into account the issues specific to SPLs.
analysis of models while test-first development emphasizes dy- McGregor [54] reports that when a core asset is modified due to
namic analysis. Managerial strategies such as reduced time to mar- evolution or correction, they are tested using a blend of regression
ket, lower cost and improved product quality determine the depth testing and development testing. According to him, the modified
to which analysis should be carried. portion of the asset should be exercised using:
6.2.3. Testing levels Existing functional tests if the specification of the asset has not
Some of the analyzed studies (e.g. [29,47]) divide SPL testing changed;
according to the two primary software product line activities: core If the specifications has changed, new functional tests are cre-
asset and product development. ated and executed; and
Core asset development: Some testing activities are related to Structural tests created to cover the new code created during
the development of test assets and test execution to be per- the modification.
formed to evaluate the quality of the assets, which will be fur- He also highlights the importance of regression test selection
ther instantiated in the application engineering phase. The two techniques and the automation of the regression execution.
basic activities include developing test artifacts that can be re- Kauppinen and Taina [37] advocate that the testing process
used efficiently during application engineering and applying tests should be iterative, and based on test execution results, new test
to the other assets created during domain engineering [34,70]. cases should be generated and tests scripts may be updated during
Regarding types of testing, the following are performed in do- a modification. These test cases are repeated during regression
main engineering: testing each time a modification is made.
Kolb [45] highlights that the major problems in a SPL context
Unit testing: Testing of the smallest unit of software implemen- are the large number of variations and their combinations, redun-
tation. This unit can be basically a class, or even a module, a dant work, the interplay between generic components and prod-
function, or a software component. The granularity level uct-specific components, and regression testing.
depends on the strategy adopted. The purpose of unit testing Jin-hua et al. [30] emphasize the importance of regression test-
is to determine whether this basic element performs as ing when a component or a related component cluster are changed,
required through verification of the code produced during the saying that regression testing is crucial to perform on the applica-
coding phase. tion architecture, which aims to evaluate the application architec-
Integration testing: This testing is applied as the modules are ture with its specification. Some researchers also developed
integrated with each other or within the reference in domain- approaches to evaluate architecture-based software by using
level V&V when the architecture calls for specific domain com- regression testing [27,58,59].
ponents to be integrated in multiple systems. This type of
testing is also performed during application engineering [55]. 6.2.5. Non-functional testing
Li et. al. [49] present an approach for generating integration Non-functional issues have a great impact on the architecture
tests from unit tests. design, where predictability of the non-functional characteristics
of any application derived from the SPL is crucial for any re-
Product development: Activities here are related to the selection source-constrained product. These characteristics are well-known
and instantiation of assets to build specific product test assets, quality attributes, such as response time, performance, availability,
design additional product specific tests, and execute tests. The fol- and scalability, that might differ in instances of a product line.
lowing types of testing can be performed in application engineering: According to [23], testing non-functional quality attributes is
equally important as functional testing.
System testing: System testing ensures that the final product By analyzing the studies, it was noticed that some of them pro-
matches the required features [61]. According to [24], system pose the creation or execution of non-functional tests. Reis and
testing evaluates the features and functions of an entire product Metzger [72] presents a technique to support the development of
and validates that the system works the way the user expects. A reusable performance test scenarios to be further reused in appli-
form of system testing can be carried out on the software archi- cation engineering. Feng et al. [22] highlight the importance of
tecture using a static analysis approach. non-functional concerns (performance, reliability, dependability,
etc.). Ganesan et al. [23] describe a work intended to develop an ward. After performing the traditional test phases in application
environment for testing the response time and load of a product engineering, the approach suggests tests to be performed towards
line, however due to the constrained experimental environment verifying if the application contains the set of functionalities re-
there was no visible performance degradation observed. quired, and nothing else.
In single-system development, different non-functional testing
techniques are applicable for different types of testing, the same 6.2.8. Effort reduction
might hold for SPL, but no experience reports were found to sup- Some authors consider testing the bottleneck in SPL, since the
port this statement. cost of testing product lines is becoming more costly than testing
single systems [45,47]. Although applications in a SPL share com-
6.2.6. Commonality and variability testing mon components, they must be tested individually in system test-
Commonality, as an inherent concept in the SPL theory, is nat- ing level. This high cost makes testing an attractive target for
urally addressed by many studies, such as stated by Pohl et al. improvements [63]. Test effort reduction strategies can have sig-
[70], in which the major task of domain testing is the development nificant impact on productivity and profitability [53]. We found
of common test artifacts to be further reused in application testing. some strategies regarding this issue. They are described as follows:
The increasing size and complexity of applications can result in
a higher number of variation points and variants, which makes Reuse of test assets: Test assets – mainly test cases, test scenarios
testing all combinations of the functionality almost impossible in and test results – [53] are created to be reusable, which conse-
practice. Managing variability and testability is a trade-off. The quently impacts the effort reduction. According to [37,84], an
large amount of variability in a product line increases the number approach to achieve the reuse of core assets comes from the exis-
of possible testing combinations. Thus, testing techniques that con- tence of an asset repository. It usually requires an initial testing
sider variability issues and thus reduce effort are required. effort for its construction, but throughout the process, these assets
Cohen et al. [14] introduce cumulative variability coverage, do not need to be rebuilt, they can be rather used as is. Another
which accumulates coverage information through a series of devel- strategy considers the creation of test assets as extensively as pos-
opment activities, to be further exploited in a target testing activ- sible in domain engineering, anticipating also the variabilities by
ities for product line instances. creating documents templates and abstract test cases. Test cases
Another solution, proposed by Kolb and Muthig [47], is the and other concrete assets are used as is and the abstract ones are
imposition of constraints in the architecture. Instead of having extended or refined to test the product-specific aspects in applica-
components with large amount of variability it is better for test- tion engineering. In [50], a method for monitoring the interfaces of
ability to separate commonalities and variabilities and encapsulate every component during test execution is proposed, observing
variabilities as subcomponents. Aiming to reduce the retest of commonality issues in order to avoid repetitive execution. As
components and products when modifications are performed, mentioned before in Section 6.2.6, the systematic reuse of test
independence of feature and components, as well as the reduction assets, especially test cases, are the focus of many studies, each
of side effects, reduce the effort required for adequate testing. offering novel and/or extended approaches. The reason for dealing
Tevanlinna et al. [79] highlight the importance of asset trace- with assets reuse in a systematic manner is that it can enable effort
ability from requirements to implementation. There are some ways reduction, since redundant work may be avoided when deriving
to achieve this traceability between test assets and implementa- many products from the product line. In this context, the search
tion, as reported by McGregor et al. [52], in which the design of for an effective approach has been noticed throughout the past
each product line test asset matches the variation implementation recent years, as can be seen in [53,55,61,66,75]. Hence, it is feasible
mechanism for a component. to infer that there is not a general solution for dealing with system-
The selected approaches handle variability in a range of different atic reuse in SPL testing yet.
manners, usually expliciting variability as early as possible in UML Test automation tools: Automatic testing tools to support testing
use cases [28,35,77] that will further be used to design test cases, activities [16] is a way to achieve effort reduction. Methods
as described in the requirement-based approaches [8,60]. Moreover, have been proposed to automatically generate test cases from
model-based approaches introduce variability into test models, cre- single system models expecting to reduce testing effort
ated through use cases and their scenarios [74,75], and specifying [28,49,60], such as mapping the models of an SPL to functional
variability into feature models and activity diagrams [64,66]. They test cases in order to automatically generate and select func-
are usually concerned about reusing test case in a systematic man- tional test cases for an application derived [65]. Automatic test
ner through variability handling as [3,83] report. execution is an activity that should be carefully managed to
avoid false failures since unanticipated or unreported changes
6.2.7. Variant binding time can occur in the component under test. These changes should
According to [52], the binding of different variants requires dif- be reflected in the corresponding automated tests [16].
ferent binding time (Compile Time, Link Time, Execution Time and
Post-Execution Time), which requires different mechanisms (e.g. 6.2.9. Test measurement
inheritance, parameterization, overloading and conditional compi- Test measurement is an important activity applied in order to
lation). They are suitable for different variability implementation calibrate and adjust approaches. Adequacy of testing can be mea-
schemes. The different mechanisms result in different types of de- sured based on the concept of a coverage criterion. Metrics related
fects, test strategies, and test processes. to test coverage are applied to extract information, and are useful
This issue is also addressed by Jaring et al. [29], in their Variability for the whole project. We investigated how test coverage has been
and Testability Interaction Model, which is responsible for modeling applied by existing approaches regarding SPL issues.
the interaction between variability binding and testability in the According to [79], there is only one way to completely guarantee
context of the V-model. The decision regarding the best moment that a program is fault-free, to execute it on all possible inputs,
to test a variant is clearly important. The earliest point at which a which is usually impossible or at least impractical. It is even more
decision is bound is the point at which the binding should be tested. difficult if the variations and all their constraints are considered. Test
In our findings, the approach presented in [75] deals with test- coverage criteria are a way to measure how completely a test suite
ing variant binding time as a form of ensuring that the application exercises the capabilities of a piece of software. These measures
comprises the correct set of features, as the customer looks for- can be used to define the space of inputs to a program. It is possible
Table 4 Regarding the maturity of the field in terms of validation and eval-
Research questions (RQ) and primary studies. uation research and solution papers, other studies report results in
RQ Primary Studies line with our findings, e.g. [80]. Hence, we realize that this is not a
Q1 [3,8,9,20,29,30,35,38,39,45–47,54,55,64,66,72–75,83,84] problem solely to SPL testing, but rather it involves, in a certain
Q2 [3,17,20,39,54] way, other software engineering practices.
Q3 [3,20,24,29,36,34,30,46,47,49,50,54,55,57,64,61,69,73,75,83,84] We also realize that researchers are not concerned about Expe-
Q4 [27,30,37,46,54,57] rience Reports on their personal experience using particular ap-
Q5 [22,23,54,55,60,72]
Q6 [3,6,8,9,14,16,20,22,24,29,34,35,39,47,49,50,52,61,66,68,69,72–75,83,84]
proaches. Practitioners in the field should report results on the
Q7 [14,29,30,52,68] adoption, in the real world of the techniques proposed and reported
Q8 [3,8,16,20,22,24,28,29,35–39,45–47,49,50,53,54,60–62,65,66,68,73– in the literature. Moreover, authors should Express Opinions
75,84] about the desirable direction of SPL Testing research, expressing
Q9 [3,27,30,36,62,66,75]
their experts viewpoint.
In fact, the volume of literature devoted to testing software
to systematically sample this space and test only a portion of the fea- product lines attests to the importance assigned to it by the prod-
sible system behavior [14]. The use of covering arrays as a test cov- uct line community. In the following subsection we detail what we
erage strategy is addressed in [14]. Kauppinen and Tevanlinna [38] considered most relevant in our analysis.
define coverage criteria for estimating the adequacy of testing in a
SPL context. They propose two coverage criteria for framework- 6.3.1. Main findings of the study
based product lines: hook and template coverage, that is, variation We identified a number of test strategies that have been ap-
points open for customization in a framework are implemented as plied to software product lines. Many of these strategies address
hook classes and stable parts as template classes. They are used to different aspects of the testing process and can be applied simulta-
measure the coverage of frameworks or other collections of classes neously. However, we have no evidence about the effectiveness of
in an application by counting the structures or hook method refer- combining strategies, and in which context it could be suitable. The
ences from them instead of single methods or classes. analyzed studies do not cover this potential. There is only a brief
indication that the decision about which kind of strategy to adopt
6.3. Analysis of the results and mapping of studies depends on a set of factors such as software development process
model, languages used, company and team size, delivery time, and
The analysis of the results enables us to present the amount of budget. Moreover, it is a decision made in the planning stage of the
studies that match each category addressed in this study. It makes product line organization since the strategy affects activities that
it possible to identify what have been emphasized in past research begin during requirements definition. But it still remains as
and thus to identify gaps and possibilities for future research [67]. hypotheses, that need to be supported or refuted through formal
Initially, let us analyze the distribution of studies regarding our experiments and/or case studies.
analysis point of view. Figs. 6 and 7, that present respectively the A complete testing process should define both static and dy-
frequencies of publications according to the classes of the research namic analyses. We found that even though some studies empha-
facet and according to the research questions addressed by them size the importance of static analysis, few detail how this is
(represented by Q1 to Q9). Table 4 details Fig. 7 showing which pa- performed in a SPL context [39,54,79], despite its relevance in sin-
pers answer each research question. It is valid to mention that, in gle-system development. Static analysis is particularly important
both categories, it was possible to have a study matching more in a product line process since many of the most useful assets
than one topic. Hence, the total amount verified in Figs. 6 and 7 ex- are non-code assets and particularly the quality of the software
ceeds the final set of primary studies selected for detailed analysis. architecture is critical to success.
When merging these two categories, we have a quick overview Specific testing activities are divided across the two types of
of the evidence gathered from the analysis of the SPL testing field. activities: domain engineering and application engineering.
We used a bubble plot to represent the interconnected frequencies, Alternatively, the testing activities can be grouped into core asset
as shown in Fig. 8. This is basically a x–y scatterplot with bubbles and product development. From the set of studies, around four
in category intersections. The size of a bubble is proportional to the [29,30,36,20] adopt (or advocate the use of) the V-model as an ap-
number of articles that are in the pair of categories corresponding proach to represent testing throughout the software development
to the bubble coordinates [67]. life cycle. As a widely adopted strategy in single-system develop-
The classification scheme applied in this paper enabled us to in- ment, tailoring V-model to SPL could result in improved quality.
fer that researchers are mostly in the business of proposing new However, there is no consensus on the correct set of testing levels
techniques and investigating their properties more than evaluating for each SPL phase.
and/or experiencing them in practice, through proposing new solu- We did not find evidence regarding the impact for the SPL of not
tions, as seen in Fig. 8. Solution Proposal is the topic with more performing a specific testing level in domain or application engi-
entries, considering the research facets. Within this facet, most neering. For example, is there any consequence if, for example
studies address the questions Q1 (testing strategies), Q3 (testing unit/integration/system testing was not performed in domain engi-
levels), Q6 (commonality and variability analysis) and Q8 (effort neering? We need investigations to verify such an aspect. Moreover,
reduction). They have really been the overall focus of researchers. what are the needed adaptations for the V-model to be effective in
On the other hand we have pointed out topics in which new solu- the SPL context? This is a point which experimentation is welcome,
tions are required, it is the case of Q2 (static and dynamic analysis in order to understand the behavior of testing levels in SPL.
interconnection in SPL Testing), Q4 (regression testing), Q5 (non- A number of the studies addressed, or assumed, that testing activ-
functional testing), Q7 (variant binding time) and Q9 (measures). ities are automated (e.g. [16,49]). In a software product line automa-
Although some topics present a relevant amount of entries in tion is more feasible because the resources required to automate are
this analysis, such as Q1, Q3, Q6 and Q8, as aforementioned, these amortized over the larger number of products. The resources are also
still lack field research, since the techniques investigated and pro- more narrowly focused due to the overlap of the products. Some of
posed are mostly novel and have usually not yet been imple- the studies illustrated that the use of domain specific languages,
mented in practice. We realize that currently, Validation and and the tooling for those languages, is more feasible in a software
Evaluation Research are weakly addressed in SPL Testing papers. product line context. Nevertheless, we need to understand if the
techniques are indeed effective when applying them in an industrial Quality evaluation: The quality attributes as well as the weight
context. We lack studies reporting results of this nature. used to quantify each of them might not properly represent
According to [45], one of the major problems in testing product the attributes importance. In order to mitigate this threat, the
lines is the large number of variations. The study reinforces the quality attributes were grouped in subsets to facilitate their fur-
importance of handling variability testing during all software life ther classification.
cycle. Unfamiliarity with other fields: The terms used in the search
In particular, the effect of variant binding time concerns was strings can have many synonyms, it is possible that we over-
considered in this study. A well-defined approach was found in looked some work.
[29], with information provided by case studies conducted in
an important electronic manufacturer. However, there are still
many issues to be considered regarding variation and testing, 8. Concluding remarks and future work
such as what is the impact of designing variations in test assets
regarding effort reduction? What are the most suitable strategy The main motivation for this work was to investigate the state-
to handle variability within test assets: use cases and test cases of-the-art in SPL testing, through systematically mapping the liter-
or maybe sequence or class diagrams? How to handle traceability ature in order to determine what issues have been studied, as well
and what is the impact of not handling such an issue, in respect as by what means, and provide a guide to aid researchers in plan-
to test assets. We also did not find information about the impact ning future research. This research was conducted through a Map-
of different binding times for testing in SPL, e.g. compile-time, ping Study, a useful technique for identifying the areas where there
scoping-time, etc. We also lack evidences on this direction. is sufficient information for a SR to be effective, as well as those
Regression testing does not belong to any one point in the soft- areas where more research is needed [12].
ware development life cycle and as a result there is a lack of The amount of approaches that handle different and specific as-
clarity in how regression testing should be handled. Despite this, pects in the SPL testing process (i.e. how to deal with variant bind-
it is clear that regression testing is important in the SPL context. ing time, regression testing and effort reduction), make the studies
Regression testing techniques include approaches to selecting the comparison a hard task, since they do not deal with the same goals
smallest test suite that will still find the most likely defects and or focus. Nevertheless, through this study we are able to identify
techniques that make automation of test execution efficient. which activities are handled by the existing approaches as well
From the amount of studies analyzed, a few addressed testing as understanding how the researchers are developing work in
non-functional requirements [22,54,55,60,72]. They point out that SPL testing. Some research points were identified throughout this
during architecture design static analysis can be used to give an early research and these can be considered an important input into plan-
indication of problems with non-functional requirements. One ning further research.
important point that should be considered when testing quality Searching the literature, some important aspects are not re-
attributes is the presence of trade-offs among them, for example, ported, and when they are found just a brief overview is given.
the trade-off between modularity and testability. This leads to nat- Regarding industrial experiences, we noticed they are rare in liter-
ural pairings of quality attributes and their associated tests. When ature. The existent case studies report small projects, containing
a variation point represents a variation in a quality attribute, the sta- results obtained from in company-specific application, which
tic analysis should be sufficiently complete to investigate different makes impracticable their reproduction in other context, due to
outcomes. Investigations towards making explicit which techniques the lack of details. This scenario depicts the need of experiment-
currently applied for single-system development can be adopted in ing SPL Testing approaches not in academia but rather in industry.
SPL are needed, since studies do not address such an issue. This study identified the growing interest in a well-defined SPL
Our mapping study has illustrated a number of areas in which Testing process, including tool support. Our findings in this sense
additional investigation would be useful, specially regarding are in line with a previous study conducted by Lamancha et al.
evaluation and validation research. In general, SPL testing lack [48], which reports on a systematic review on SPL testing, as men-
evidence, in many aspects. Regression test selection techniques, tioned in Section 2.
test automation and architecture-based regression testing are This mapping study also points out some topics that need addi-
points for future research as well as techniques that address tional investigation, such as quality attribute testing considering
the relationships between variability and testing and techniques variations in quality levels among products, how to maintain the
to handle traceability among test and development artifacts. traceability between development and test artifacts, and the man-
agement of variability through the whole development life cycle.
7. Threats to validity Regarding to the research method used, this study also contributed
improving the mapping study process, by defining and proposing
There are some threats to the validity of our study. They are de- ew steps as, protocol definition, collection form and quality
scribed and detailed as follows: criteria.
In our future agenda, we will combine the evidence identified
Research questions: The set of questions we defined might not have in this work with evidence from controlled experiments and
covered the whole SPL testing area, which implies that one may industrial SPL projects to define hypotheses and theories which
not find answers to the questions that concern them. As we consid- will be the base to design new methods, processes, and tools
ered this as a feasible threat, we had several discussion meetings for SPL testing.
with project members and experts in the area in order to calibrate
the questions. This way, even if we had not selected the most opti-
mum set of questions, we attempted to deeply address the most Acknowledgments
asked and considered open issues in the field.
Publication bias: We cannot guarantee that all relevant primary This work was partially supported by the National Institute of
studies were selected. It is possible that some relevant studies Science and Technology for Software Engineering (INES4), funded
were not chosen throughout the searching process. We miti- by CNPq and FACEPE, grants 573964/2008-4 and APQ-1037-1.03/08.
gated this threat to the extent possible by following references
in the primary studies. 4
http://www.ines.org.br
Appendix A. Quality studies scores
Id REF Study title Year A B C

1 Condron [16] A domain approach to test automation of product lines 2004 2 0 2
2 Feng et al. [22] A product line based aspect-oriented generative unit testing approach to 2007 1.5 0 2.5
building quality components
3 Nebut et al. [60] A requirement-based approach to test product families 2003 2.5 1 1.5
4 Reis and Metzger [72] A reuse technique for performance testing of software product lines 2006 1.5 2 3
5 Kolb [45] A risk-driven approach for efficiently testing software product lines 2003 2 1 2.5
6 Needham and Jones [62] A software fault tree metric 2006 0 0 1
7 Hartmann et al. [28] A UML-based approach for validating product lines 2004 1 2 0.5
8 Zeng et al. [84] Analysis of testing effort by using core assets in software product line 2004 1 1.5 2.5
testing
9 Harrold [27] Architecture-based regression testing of evolving systems 1998 0 0.5 2
10 Li et al. [49] Automatic integration test generation from unit tests of eXVantage product 2007 1 1 2
family
11 McGregor [55] Building reusable test assets for a product line 2002 2 2 0.5
12 Kolb and Muthig [46] Challenges in testing software product lines 2003 0 3 1.5
13 Cohen et al. [14] Coverage and adequacy in software product line testing 2006 1 1.5 2
14 Pohl and Sikora [69] Documenting variability in test artefacts 2005 1 0 1
15 Kishi and Noda [39] Formal verification and software product lines 2006 2 1.5 2
16 Kauppinen et al. [38] Hook and template coverage criteria for testing framework-based software 2004 0.5 0.5 3
product families
17 Reis et al. [73] Integration testing in software product line engineering: a model-based 2007 1 0 3
technique
18 Kolb and Muthig [47] Making testing product lines more efficient by improving the testability of 2006 1 1.5 1.5
product line architectures
19 Reuys et al. [74] Model-based system testing of software product families 2005 2 1 3.5
20 Olimpiew and Gomaa [65] Model-based testing for applications derived from software product lines 2005 0 1 1
21 Jaring et al. [29] Modeling variability and testability interaction in software product line 2008 2.5 6 3.5
engineering
22 Bertolino and Gnesi [8] PLUTO: a test methodology for product families 2003 0.5 1 3
23 Olimpiew and Gomaa [66] Reusable model-based testing 2009 3 0.5 3.5
24 Olimpiew and Gomaa [64] Reusable system tests for applications derived from software product lines 2005 2.5 1 1
25 Li et al. [50] Reuse execution traces to reduce testing of product lines 2007 0 0.5 2
26 Kauppinen and Taina [37] RITA environment for testing framework-based software product lines 2003 0 0 0.5
27 Pohl and Metzger [68] Software product line testing exploring principles and potential solutions 2006 0.5 0 2.5
28 McGregor [53] Structuring test assets in a product line effort 2001 1.5 1 0.5
29 Nebut et al. [61] System testing of product lines from requirements to test cases 2006 0 2 2
30 McGregor [54] Testing a software product line 2001 4 1.5 2
31 Denger and Kolb [17] Testing and inspecting reusable product line components: first empirical 2006 0 1 0.5
results
32 Kauppinen [36] Testing framework-based software product lines 2003 0.5 0.5 2
33 Edwin [20] Testing in software product line 2007 2 2.5 2
34 Al-Dallal and Sorenson [3] Testing software assets of framework-based product families during 2008 3 1 4
application engineering stage
35 Kamsties et al. [34] Testing variabilities in use case models 2003 0.5 1.5 1.5
36 McGregor et al. [52] Testing variability in a software product line 2004 0 1 2.5
37 Reuys et al. [75] The ScenTED method for testing software product lines 2006 3 1 4.5
38 Jin-hua et al. [30] The W-Model for testing software product lines 2008 1 3 1.5
39 Kang et al. [35] Towards a formal framework for product line test development 2007 2 2 1
40 Lamancha and Macario Polo Towards an automated testing framework to manage variability using the 2009 0 0 1
Usaola [6] UML testing profile
41 Wübbeke [83] Towards an efficient reuse of test cases for software product lines 2008 0 0 2
42 Geppert et al. [24] Towards generating acceptance tests for product lines 2004 0.5 1.5 2
43 Muccini and van der Hoek Towards testing product line architectures 2003 0 2.5 1
[57]
44 Ganesan et al. [23] Towards testing response time of instances of a web-based product line 2005 1 1.5 1
45 Bertolino and Gnesi [9] Use case-based testing of product lines 2003 1 1 2.5
⁄
The shaded lines represent the most relevant studies according to the grades.
Appendix B. List of conferences Appendix C. List of journals
Acronym Conference name Journals

AOSD International conference on aspect-oriented ACM Transactions on Software Engineering and Methodology
software development (TOSEM)
APSEC Asia Pacific software engineering conference Communications of the ACM (CACM)
ASE International conference on automated ELSEVIER Information and Software Technology (IST)
software engineering ELSEVIER Journal of Systems and Software (JSS)
CAiSE International conference on advanced IEEE Software
information systems engineering IEEE Computer
CBSE International symposium on component-based IEEE Transactions on Software Engineering
software engineering Journal of Software Maintenance Research and Practice
COMPSAC International computer software and Software Practice and Experience Journal
applications conference Software Quality Journal
CSMR European conference on software maintenance Software Testing, Verification and Reliability
and reengineering
ECBS International conference and workshop on the
engineering of computer based systems
ECOWS European conference on web services
References
ECSA European conference on software architecture
ESEC European software engineering conference [1] W. Afzal, R. Torkar, R. Feldt, A systematic mapping study on non-functional
ESEM Empirical software engineering and search-based software testing, in: SEKE’08: Proceedings of the 20th
International Conference on Software Engineering and Knowledge
measurement
Engineering, Redwood City, California, USA, 2008, pp. 488–493
WICSA Working IEEE/IFIP conference on software [2] W. Afzal, R. Torkar, R. Feldt, A systematic review of search-based testing for
architecture non-functional system properties, Information and Software Technology 51 (6)
FASE Fundamental approaches to software (2009) 957–976.
[3] J. Al-Dallal, P.G. Sorenson, Testing software assets of framework-based product
engineering families during application engineering stage, Journal of Software 3 (5) (2008)
GPCE International conference on generative 11–25.
programming and component engineering [4] P. Ammann, J. Offutt, Introduction to Software Testing, 1st ed., Cambridge
University Press, 2008.
ICCBSS International conference on composition-based [5] J. Bailey, D. Budgen, M. Turner, B. Kitchenham, P. Brereton, S. Linkman,
software systems Evidence relating to object-oriented software design: a survey, in: ESEM ’07:
ICSE International conference on software Proceedings of the First International Symposium on Empirical Software
Engineering and Measurement, Washington, DC, USA, 2007, pp. 482–484.
engineering [6] Beatriz Prez Lamancha, M.P. Macario Polo Usaola, Towards an automated
ICSM International conference on software testing framework to manage variability using the UML testing profile, in:
maintenance AST’09: Proceedings of the Workshop on Automation of Software Test (ICSE),
Vancouver, Canada, 2009, pp. 10–17.
ICSR International conference on software reuse
[7] A. Bertolino, Software testing research: achievements, challenges, dreams, in:
ICST International conference on software testing, FOSE ’07: Future of Software Engineering, Washington, DC, USA, 2007, pp. 85–
verification and validation 103.
[8] A. Bertolino, S. Gnesi, PLUTO: a test methodology for product families, in:
ICWS International conference on web services
Software Product-Family Engineering, 5th International Workshop, PFE, Siena,
IRI International conference on information reuse Italy, 2003a, pp. 181–197.
and integration [9] A. Bertolino, S. Gnesi, Use case-based testing of product lines, ACM SIGSOFT
ISSRE International symposium on software Software Engineering Notes 28 (5) (2003) 355–358.
[10] Y.M. Bezerra, T.A.B. Pereira, G.E. da Silveira, A systematic review of software
reliability engineering product lines applied to mobile middleware, in: ITNG ’09: Proceedings of the
MODELS International conference on model driven 2009 Sixth International Conference on Information Technology: New
engineering languages and systems Generations, Washington, DC, USA, 2009, pp. 1024–1029.
[11] P. Brereton, B.A. Kitchenham, D. Budgen, M. Turner, M. Khalil, Lessons
PROFES International conference on product focused from applying the systematic literature review process within the
software development and process software engineering domain, Journal of Systems and Software 80 (4)
improvement (2007) 571–583.
[12] D. Budgen, M. Turner, P. Brereton, B. Kitchenham, Using Mapping Studies in
QoSA International conference on the quality of Software Engineering, in: Proceedings of PPIG Psychology of Programming
software architectures Interest Group 2008, Lancaster University, UK, 2008, pp. 195–204.
QSIC International conference on quality software [13] L. Chen, M.A. Babar, N. Ali, Variability management in software product lines: a
systematic review, in: SPLC’09: Proceedings of 13th Software Product Line
ROSATEA International workshop on the role of software
Conference, San Francisco, CA, USA, 2009.
architecture in testing and analysis [14] M.B. Cohen, M.B. Dwyer, J. Shi, Coverage and adequacy in software product line
SAC Annual ACM symposium on applied computing testing, in: ROSATEA ’06: Proceedings of the ISSTA 2006 Workshop on Role of
Software Architecture for Testing and Analysis, ACM, New York, NY, USA, 2006,
SEAA Euromicro conference on software engineering
pp. 53–63.
and advanced applications [15] N. Condori-Fernandez, M. Daneva, K. Sikkel, R. Wieringa, O. Dieste, O. Pastor, A
SEKE International conference on software systematic mapping study on empirical evaluation of software requirements
engineering and knowledge engineering specifications techniques, in: ESEM ’09: Proceedings of the 2009 3rd
International Symposium on Empirical Software Engineering and
SERVICES Congress on services Measurement, Washington, DC, USA, 2009, pp. 502–505.
SPLC Software product line conference [16] C. Condron, A domain approach to test automation of product lines, in:
SPLiT Software product line testing workshop SPLiT’04: In International Workshop on Software Product Line Testing, Boston,
MA, USA, 2004, pp. 27–35.
TAIC PART Testing – academic & Industrial conference [17] C. Denger, R. Kolb, Testing and inspecting reusable product line components:
TEST International workshop on testing emerging first empirical results, in: ISESE’06: Proceedings of the International
software technology Symposium on Empirical Software Engineering, New York, NY, USA, 2006,
pp. 184–193
[18] T. Dybå, T. Dingsøyr, Empirical studies of agile software development: a [47] R. Kolb, D. Muthig, Making testing product lines more efficient by improving
systematic review, Information and Software Technology 50 (9–10) (2008) the testability of product line architectures, in: ROSATEA ’06: Proceedings of
833–859. the ISSTA Workshop on Role of Software Architecture for Testing and Analysis,
[19] T. Dybå, T. Dingsøyr, Strength of evidence in systematic reviews in software New York, NY, USA, 2006, pp. 22–27.
engineering, in: ESEM ’08: Proceedings of the Second ACM – IEEE International [48] B.P. Lamancha, M.P. Usaola, M.P. Velthius, Software product line testing – a
Symposium on Empirical Software Engineering and Measurement, ACM, New systematic review, in: ICSOFT International Conference on Software and Data
York, NY, USA, 2008, pp. 178–187. Technologies, INSTICC Press, 2009, pp. 23–30.
[20] O.O. Edwin, Testing in Software Product Lines. Master’s thesis, Department of [49] J.J. Li, D.M. Weiss, J.H. Slye, Automatic system test generation from unit tests of
Software Engineering and Computer Science, Blekinge Institute of Technology, exvantage product family, in: SPLIT ’07: Proceedings of the International
Sweden, 2007. Workshop on Software Product Line Testing, Kyoto, Japan, 2007a, pp. 73–
[21] E. Engström, M. Skoglund, P. Runeson, Empirical evaluations of regression test 80.
selection techniques: a systematic review, in: ESEM ’08: Proceedings of the [50] J.J. Li, B. Geppert, F. Roessler, D. Weiss, Reuse execution traces to reduce testing
Second ACM-IEEE International Symposium on Empirical Software of product lines, in: SPLIT ’07: Proceedings of the International Workshop on
Engineering and Measurement, New York, NY, USA, 2008, pp. 22–31 Software Product Line Testing, Kyoto, Japan, 2007b.
[22] Y. Feng, X. Liu, J. Kerridge, A product line based aspect-oriented generative unit [51] L.B. Lisboa, V.C. Garcia, D. Lucrédio, E.S. de Almeida, S.R. de Lemos Meira, R.P.
testing approach to building quality components, in: COMPSAC’07 – de Mattos Fortes, A systematic review of domain analysis tools, Information
Proceedings of the 31st Annual International Computer Software and and Software Technology 52 (1) (2010) 1–13.
Applications Conference, Washington, DC, USA, 2007, pp. 403–408. [52] J. McGregor, P. Sodhani, S. Madhavapeddi, Testing variability in a software
[23] D. Ganesan, U. Maurer, M. Ochs, B. Snoek, M. Verlage, Towards testing product line, in: SPLIT ’04: Proceedings of the International Workshop on
response time of instances of a web-based product line, in: SPLIT’05: Software Product Line Testing, Boston, Massachusetts, USA, 2004, p. 45
International Workshop on Software Product Line Testing, Rennes, France, [53] J.D. McGregor, Structuring test assets in a product line effort, in: ICSE’01: In
2005 Proceedings of the 2nd International Workshop on Software Product Lines:
[24] B. Geppert, J.J. Li, F. Rler, D.M. Weiss, Towards generating acceptance tests for Economics, Architectures, and Implications, Toronto, Ontario, Canada, 2001a,
product lines, in: ICSR’04: Proceedings of 8th International Conference on pp. 89–92
Software Reuse, 2004, pp. 35–48. [54] J.D. McGregor, Testing a software product line, Technical Report CMU/SEI-
[25] R.F. Goldsmith, D. Graham, The forgotten phase, in: Software Development 2001-TR-022, 2001b.
Magazine, 2002, pp. 45–47. [55] J.D. McGregor, Building reusable test assets for a product line, in: ICSR’02:
[26] T.L. Graves, M.J. Harrold, J.-M. Kim, A. Porter, G. Rothermel, An empirical study Proceedings of 7th International Conference on Software Reuse, Austin, Texas,
of regression test selection techniques, ACM Transaction on Software USA, 2002, pp. 345–346
Engineering Methodology 10 (2) (2001) 184–208. [56] M.B.S. Moraes, E.S. Almeida, S.R. de Lemos Meira, A systematic review on
[27] M.J. Harrold, Architecture-based regression testing of evolving systems, in: software product lines scoping, in: ESELAW’09: Proceedings of the VI
ROSATEA’98: International Worshop on Role of Architecture in Testing and Experimental Software Engineering Latin American Workshop, Sao Carlos-SP,
Analysis, Marsala, Sicily, Italy, 1998, pp. 73–77. Brazil, 2009
[28] J. Hartmann, M. Vieira, A. Ruder, A UML-based approach for validating product [57] H. Muccini, A. van der Hoek, Towards testing product line architectures,
lines, in: SPLIT ’04: Proceedings of the International Workshop on Software Electronic Notes in Theoretical Computer Science 82 (6) (2003).
Product Line Testing, Boston, MA, 2004, pp. 58–65. [58] H. Muccini, M.S. Dias, D.J. Richardson, Towards software architecture-based
[29] M. Jaring, R.L. Krikhaar, J. Bosch, Modeling variability and testability interaction regression testing, in: WADS ’05: Proceedings of the Workshop on Architecting
in software product line engineering, in: ICCBSS’08: 7th International Dependable Systems, New York, NY, USA, 2005, pp. 1–7.
Conference on Composition-Based Software Systems, 2008, pp. 120–129. [59] H. Muccini, M. Dias, D.J. Richardson, Software architecture-based regression
[30] L. Jin-hua, L. Qiong, L. Jing, The w-model for testing software product lines, in: testing, Journal of Systems and Software 79 (10) (2006) 1379–1396.
ISCSCT ’08: Proceedings of the International Symposium on Computer Science [60] C. Nebut, F. Fleurey, Y.L. Traon, J.-M. JTzTquel, A requirement-based approach
and Computational Technology, Los Alamitos, CA, USA, 2008, pp. 690–693. to test product families, in: PFE’03: Proceedings of 5th International Workshop
[31] N. Juristo, A.M. Moreno, S. Vegas, Reviewing 25 years of testing technique Software Product-Family Engineering, Siena, Italy, 2003, pp. 198–210.
experiments, Empirical Software Engineering 9 (1–2) (2004) 7–44. [61] C. Nebut, Y.L. Traon, J.-M. Jézéquel, System testing of product lines: from
[32] N. Juristo, A.M. Moreno, W. Strigel, Guest editors’ introduction: software requirements to test cases, in: [34], 2006, pp. 447–477.
testing practices in industry, IEEE Software 23 (4) (2006) 19–21. [62] D. Needham, S. Jones, A software fault tree metric, in: ICSM’06: Proceedings of
[33] N. Juristo, A.M. Moreno, S. Vegas, M. Solari, In search of what we the International Conference on Software Maintenance, Philadelphia,
experimentally know about unit testing, IEEE Software 23 (6) (2006) 72–80. Pennsylvania, USA, 2006, pp. 401–410.
[34] E. Kamsties, K. Pohl, S. Reis, A. Reuys, Testing variabilities in use case models, [63] L.M. Northrop, P.C. Clements, A framework for software product line practice,
in: PFE’03: Proceedings of 5th International Workshop Software Product- version 5.0. Technical report, Software Engineering Institute, 2007.
Family Engineering, Siena, Italy, 2003, pp. 6–18 [64] E. Olimpiew, H. Gomaa, Reusable system tests for applications derived from
[35] S. Kang, J. Lee, M. Kim, W. Lee, Towards a formal framework for product line software product lines, in: SPLIT ’05: Proceedings of the International
test development, in: CIT ’07: Proceedings of the 7th IEEE International Workshop on Software Product Line Testing, Rennes, France, 2005a.
Conference on Computer and Information Technology, Washington, DC, USA, [65] E.M. Olimpiew, H. Gomaa, Model-based testing for applications derived from
2007, pp. 921–926 software product lines, in: A-MOST’05: Proceedings of the 1st International
[36] R. Kauppinen, Testing framework-based software product lines, Master’s Workshop on Advances in model-based testing, New York, NY, USA., 2005b,
thesis, University of Helsinki Department of Computer Science, 2003. pp. 1–7.
[37] R. Kauppinen, J. Taina, Rita environment for testing framework-based software [66] E.M. Olimpiew, H. Gomaa, Reusable model-based testing, in: ICSR ’09:
product lines, in: SPLST’03 – Proceedings of the 8th Symposium on Proceedings of the 11th International Conference on Software Reuse, Berlin,
Programming Languages and Software Tools, Kuopio, Finland, 2003, pp. 58–69 Heidelberg, 2009, pp. 76–85.
[38] R. Kauppinen, J. Taina, A. Tevanlinna, Hook and template coverage criteria for [67] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson, Systematic mapping studies in
testing framework-based software product families, in: SPLIT ’04: Proceedings software engineering, in: EASE ’08: Proceedings of the 12th International
of the International Workshop on Software Product Line Testing, Boston, MA, Conference on Evaluation and Assessment in Software Engineering, University
USA, 2004, pp. 7–12 of Bari, Italy, 2008.
[39] T. Kishi, N. Noda, Formal verification and software product lines, [68] K. Pohl, A. Metzger, Software product line testing, Communications of the ACM
Communications of the ACM 49 (12) (2006) 73–77. 49 (12) (2006) 78–81.
[40] B. Kitchenham, What’s up with software metrics? A preliminary mapping [69] K. Pohl, E. Sikora, Documenting variability in test artefacts, in: Software
study, Journal of Systems and Software 83 (1) (2010) 37–51. Product Lines, Springer, 2005, pp. 149–158.
[41] B. Kitchenham, S. Charters, Guidelines for performing Systematic Literature [70] K. Pohl, G. Böckle, F.J. van der Linden, Software Product Line Engineering:
Reviews in Software Engineering, Technical Report EBSE 2007-001, Keele Foundations, Principles and Techniques, Springer, 2005.
University and Durham University Joint Report, 2007. [71] R. Pretorius, D. Budgen, A mapping study on empirical evidence related to the
[42] B.A. Kitchenham, S.L. Pfleeger, L.M. Pickard, P.W. Jones, D.C. Hoaglin, K.E. Emam, models and forms used in the UML, in: ESEM’08: Proceedings of Empirical
J. Rosenberg, Preliminary guidelines for empirical research in software Software Engineering and Measurement, Kaiserslautern, Germany, 2008, pp.
engineering, IEEE Transactions on Software Engineering 28 (8) (2002) 721–734. 342–344.
[43] B.A. Kitchenham, T. Dyba, M. Jorgensen, Evidence-based software engineering, [72] S. Reis, A.P.K. Metzger, A reuse technique for performance testing of software
in: ICSE’04 : Proceedings of the 26th International Conference on Software product lines, in: SPLIT ’05: Proceedings of the International Workshop on
Engineering, Washington, DC, USA, 2004, pp. 273–281 Software Product Line Testing, Baltimore, Maryland, USA, 2006.
[44] B.A. Kitchenham, E. Mendes, G.H. Travassos, Cross versus within-company cost [73] S. Reis, A. Metzger, K. Pohl, Integration testing in software product line
estimation studies: a systematic review, IEEE Transactions on Software engineering: a model-based technique, in: FASE’07: Proceedings of the
Engineering 33 (5) (2007) 316–329. Fundamental Approaches to Software Engineering, Braga, Portugal, 2007, pp.
[45] R. Kolb, A risk-driven approach for efficiently testing software product lines, 321–335.
in: IESEF’03 – Fraunhofer Institute for Experimental Software Engineering, [74] A. Reuys, E. Kamsties, K. Pohl, S. Reis, Model-based system testing of software
2003. product families, in: CAiSE’05: Proceedings of International Conference on
[46] R. Kolb, D. Muthig, Challenges in testing software product lines, in: Advanced Information Systems Engineering, 2005, pp. 519–534.
CONQUEST’03 – Proceedings of 7th Conference on Quality Engineering in [75] A. Reuys, S. Reis, E. Kamsties, K. Pohl, The scented method for testing software
Software Technology, Nuremberg, Germany, 2003, pp. 81–95 product lines, in: [34], 2006, pp. 479–520.
[76] G. Rothermel, M.J. Harrold, Analyzing regression test selection techniques, [81] D.M. Weiss, The product line hall of fame, in: SPLC ’08: Proceedings of the 2008
IEEE Transactions on Software Engineering 22 (8) (1996) 529–551. 12th International Software Product Line Conference, IEEE Computer Society,
[77] J. Rumbaugh, I. Jacobson, G. Booch, Unified Modeling Language Reference Washington, DC, USA, 2008, p. 39.
Manual, second ed., Pearson Higher Education, 2004 [82] R. Wieringa, N.A.M. Maiden, N.R. Mead, C. Rolland, Requirements engineering
[78] E.D. Souza Filho, R. Oliveira Cavalcanti, D.F. Neiva, T.H. Oliveira, L.B. Lisboa, E.S. paper classification and evaluation criteria: a proposal and a discussion,
Almeida, S.R. Lemos Meira, Evaluating domain design approaches using Requirements Engineering 11 (1) (2006) 102–107.
systematic review, in: ECSA ’08: Proceedings of the 2nd European [83] A. Wübbeke, Towards an efficient reuse of test cases for software product lines,
conference on Software Architecture, Berlin, Heidelberg, 2008, pp. 50–65. in: SPLC’08: Proceedings of Software Product Line Conference, 2008, pp. 361–
[79] A. Tevanlinna, J. Taina, R. Kauppinen, Product family testing: a survey, ACM 368.
SIGSOFT Software Engineering Notes 29 (2) (2004) 12. [84] H. Zeng, W. Zhang, D. Rine, Analysis of testing effort by using core assets in
[80] D. Šmite, C. Wohlin, T. Gorschek, R. Feldt, Empirical evidence in global software software product line testing, in: SPLIT ’04: Proceedings of the International
engineering: a systematic review, Empirical Software Engineering 15 (1) Workshop on Software Product Line Testing, Boston, MA. 2004, pp. 1–6.
(2010) 91–118.

Information and Software Technology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Information and Software Technology

Uploaded by

Copyright:

Available Formats

Information and Software Technology 53 (2011) 407–423

Contents lists available at ScienceDirect

Information and Software Technology

A systematic mapping study of software product lines testing

⇑ Corresponding author at: RiSE - Reuse in Software Engineering, Recife, PE,

6.1. Classification scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

1. Introduction ested in identifying practices adopted in single systems develop-

Fig. 2. Stages of the selection process.

Fig. 3. Primary studies ﬁltering categorized by source.

Fig. 4. Distribution of primary studies by their publication years.

performed at a given development or deployment phase. Thus, Research strings

The search strategy was developed by reviewing the data

Fig. 5. Amount of studies vs. sources.

quality criteria. The following information was extracted from each

In this section, we describe the classiﬁcation scheme and the re-

6.1. Classiﬁcation scheme

We decided to use the idea of categorizing studies in facets, as

Fig. 8. Visualization of a systematic map in the form of a bubble plot.

Appendix A. Quality studies scores

Id REF Study title Year A B C

Appendix B. List of conferences Appendix C. List of journals

Acronym Conference name Journals

You might also like