Professional Documents
Culture Documents
Sistematik Literatur Study
Sistematik Literatur Study
Backward Snowballing
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
Year Year
No. of S1 0 0 0 1 2 10 6 12 14 20 17 No. of S1 0 0 0 0 0 4 3 0 9 7 4
Papers S2 0 1 3 2 7 8 3 7 7 10 13 Papers S2 0 1 2 0 2 2 2 1 1 1 3
Thus, 14 papers (out of 26) should have been possible to find papers is increasing in the past decade. In S2, the number
in S2. The reason for missing them in S2 might be either of published papers seems to be constant over the years.
due to researchers’ error or because they were not cited by However, we should mention that this comparison is not
other relevant papers found in the backward snowballing. fully fair because the total number of papers shall be com-
In the list of unique papers found through backward snow- pared against each other instead of considering only the
balling (S2), six papers (out of 15) should have been pos- unique ones. The question is whether the differences are
sible to find, although they were not (see Table 4 in the due to the different search strategies.
Appendix). Therefore, we checked the complete list of pa-
pers found initially in S1 (including 534 papers). Two out 4.3 Distribution of Research Types
of six papers were excluded in S1 in the later data analy-
Next, we wanted to compare the research types using the
ses because the research was already published in another
classification from Wieringa et al. [16]. In summary, the
paper found in S1. One of the original papers is missing in
types are defined as follows:
S2 while one is included. This indicates that the researchers
Evaluation Research: Techniques, methods, tools or other
have evaluated papers in S1 and S2 in a slightly different
solutions are implemented and evaluated in practice, and the
way. Six other studies are published in different databases
outcomes are investigated.
than databases of S1 and hence cannot be found in S1.
Validation Research: A novel solution is developed and
Three papers ([B4], [B1] and [B7]) use a slightly different
evaluated in a laboratory setting.
terminology and hence they are not within the set of defined
Solution Proposal: A solution for a research problem is
keywords in S1 (see Table 4 in the Appendix). Therefore,
proposed, and the benefits are discussed, but not evaluated.
if the same terms are used in the abstract, they cannot be
Conceptual Proposal or Philosophical Paper: It structures
found in S1. For example, a study such as [B4] uses ”ex-
an area in the form of a taxonomy or conceptual framework,
tremely” in the title and we could guess that it refers to ”Ex-
hence provides a new way of looking at existing things.
treme Programming”. However, it is surprisingly not found
Experience Paper: It includes the experience of the author
in S1 although by reading the abstract in S2, we judged it as
on what and how something happened in practice.
relevant. This might indicate the different ways used by the
Opinion Paper: The personal opinion on a special matter
researchers and the search engines to interpret the terms in
is discussed in an opinion paper without relying on related
the text. In the case of [B1], ”remotely located” is used in the
work and research methodologies.
title, which could not be found in the searches in databases
since the equivalent term of ”remote team” was formulated
4.3.1 First Comparison
in S1. It was found through backward snowballing because
only by seeing these words in the title of the article, we could Both studies have found a majority of papers to be re-
immediately recognize that ”remotely located” means global ported as experience reports in which practitioners have
or distributed. The examples illustrate the challenges in for- reported their own experiences on a specific issue and the
mulating search strings while in snowballing it may become method applied to alleviate it [16]. It should be noted that
evident to the researcher to include a paper when reading the number of papers in S1 and S2 for each research type
the title of a paper. is different. This difference is, however, expected due to the
fact that the number of papers found in S1 and S2 are dif-
4.2 Distribution of Papers ferent. In addition, the order of research types according to
The next step in the comparison is to compare the distri- their frequency is different. The order in S1 is: 1) experience
bution of papers across the years. report, 2) evaluation, 3) opinion, 4) solution, 5) validation
and 6) philosophical, and in S2 it is 1) experience, 2) val-
4.2.1 First Comparison idation, 3) evaluation, 4) solution, 5) philosophical and 6)
opinion. It is surprising that we found no opinion paper in
As mentioned above, the first comparison includes all pa-
S2 while it was the third most frequent research type in S1.
pers found in both studies, while the second comparison (see
below) compare the unique papers in each of the two stud-
ies. As shown in Table 2, the number of papers found in S1 4.3.2 Second Comparison
and S2 in each year (1999-2009) is not the same. When it comes to a comparison of the unique papers, the
However, the pattern of distribution does not seem to be majority of the current research was found to be in form of
completely different and both indicate that the number of experience reports in both studies. And in addition to the
papers has grown in the past decade. identified research types in S1, a solution paper is found in
S2.
4.2.2 Second Comparison
The number of unique papers found by each study in each 4.4 Countries Involved in GSE
year is presented in Table 3. We have found no unique pa- It is also possible to compare the most common combina-
pers in S1 before 2004. Considering the number of papers tions of collaboration. The combinations include both global
in 2009, it is hard to conclude that the number of published collaboration and distributed development.
4.4.1 First Comparison
In both studies, the collaboration between USA-India is
found to be the most popular, and then distributed develop-
ment within USA although the exact numbers are different.
5. DISCUSSION
5.1 Time and Effort Required
We cannot certainly claim that SLRs are more time con-
suming in formulating the search strings because snowballing
requires some formulations too. However, SLRs require sep-
arate formulation for each database whilst snowballing does
not explicitly require searching in more than one database.
The number of initial papers in S1 was 534 and 109 in
S2 which indicates greater time and effort spent to refine
the searchers as well as identifying the relevant papers and
Figure 4: Patterns of Agile GSE Combinations discarding the irrelevant ones in the database searches ap-
proach (4.9 times more papers in S1 than S2).
5.2 Noise vs. Included Papers 6. CONCLUSIONS
Considering the term ”Agile” which is a very general word In this paper, we evaluated two different first steps for
and used in many papers in many disciplines, we found a conducting systematic literature studies. This was done by
lot of noise in S1. Due to this, we had to limit the search to comparing two secondary studies on Agile practices in GSE,
abstract, title, and keywords. But still the number of irrel- which were performed by the same researchers but using
evant papers was much higher than the number of included different search methods. First, we compared the studies
and analyzed papers (85% noise). In the snowballing search, against each other whether the same set of papers was found
the balance seemed to be more reasonable (32% noise). and if the included papers had resulted in the same con-
clusions. Secondly, we excluded the common papers from
5.3 Judgments of Papers both studies and performed new analyses with the remain-
In snowballing, most of the judgments were done based ing unique papers for each study. Considering the fact that
on the title of the paper when going backward through the these comparisons did not indicate any remarkable differ-
reference lists (or forward in the citations if being applied). ences between the two different studies, we compared the
In some cases, we judged papers once more based on their actual results found using the two different search methods
abstract, i.e. it resulted in performing a stepwise judgment. applied. A summary of the findings is provided below. After
In the SLR, the judgments were done on title and abstract comparing the two secondary studies in two different ways
at the same time. It should be noted that the papers with (with the common papers and with only the unique papers
no relevant keywords in the title might be missed in snow- in each study), we did not find any major differences be-
balling. On the other hand, the papers that use different tween the findings of the analyses. The figures and numbers
wordings (as in the example with cross-continent) might not were not the same, but the general interpretation of them is
be caught in an SLR. quite similar. We can summarize our findings as follows for
the two research questions.
5.4 Prior Experience RQ1. To what extent do we find the same research papers
using two different review approaches?
Prior experience of the researcher in the area of the stud-
To answer the RQ1, we may observe that the papers found
ies as well as in performing the secondary studies may affect
are different both in the number and the actual papers. In
the results. The differences can be seen, for example, when
addition, the final set of papers used in data analyses, was
judging the relevancy of the papers. An experienced person
also found to be different, although 27 papers were common.
already knows several papers and knows several active re-
This is not really surprising given that we only used the first
searchers in the area, which may affect the reliability of the
step in the two search methods, i.e. according to the differ-
secondary study regarding the relevancy of included papers.
ent guidelines used. It is highly likely that the overlap would
increase if we conducted snowballing from the papers found
5.5 Ease of Use in the database searches, and also if we did forward snow-
We found the snowballing approach to be more under- balling when starting with backward snowballing. However,
standable and easy to follow in particular it is believed to it is should be noted that a majority of the papers are the
be easier for novice researchers. The SLR provides a lot same despite only comparing the staring point for the com-
of guidelines, which is good on one hand, but on the other parison, i.e. database search vs. backward snowballing.
hand novice researchers might find it confusing rather than RQ2. To what extent do we come to the same conclusions
helpful. using two different review approaches?
The answer to the RQ2 is more important, since it con-
5.6 Identical Authors Risk cerns the actual findings. Regardless of the differences in the
A potential threat in snowballing is that we might find actual numbers and figures, similar pattern were identified
several papers from the same authors since their previous in both studies and hence similar conclusions were drawn.
research is usually relevant and is cited. Thus, the results However, when excluding the same papers from both stud-
of snowballing approach, might be biased by over present- ies and analyzing only the remaining unique papers of each
ing specific authors’ research. On the other hand, database study, the identified patterns seem to be slightly different,
searches method performs searches on all papers in the database which may be due to having fewer papers (a smaller sample).
which eliminates this risk. Therefore, it is not easy to draw any general conclusions with
respect to the RQ2.
5.7 General Remark on Literature However, given the overlap, despite only conducting the
It should be noted that a general problem with systematic first part in the guidelines, it indicates that the actual con-
literature reviews in software engineering is that in many clusions are at least not highly dependent on whether using
cases existing papers are hard to classify and analyze since database searches or snowballing. It is also quite obvious
in many of the published studies, the contextual information that the overlap will become larger if combining the two
is not well documented or the studies are not conducted in a search strategies, although the downside being that it gen-
realistic setting [20]. We have observed that insufficient con- erates more work. Systematic literature studies are quite
textual information hinders synthesizing the evidence from time consuming. Snowballing might be more efficient when
some studies (in particular industrial experience reports). the keywords for searching include general terms (e.g. Ag-
Thus, we recommend practitioners and researchers working ile), because it dramatically reduces the amount of noise in
with industry to follow guidelines provided by Petersen and database searches. Our personal experience confirms this.
Wohlin [17] for documenting the contextual information. However, we recommend applying both backward and for-
ward snowballing.
Although these conclusions, recommendations, and find- [12] H. Zhang, M. A. Babar, P. Tell (2011): Identifying
ings are based on our experiences with this comparison study Relevant Studies in Software Engineering. Information
as well as previous secondary studies, they seem to be in and Software Technology 53(6): 625-637.
alignment with some previous studies [8] and [10], but con- [13] H. Zhang, M. A. Babar, X. Bai, J. Li, L. Huang
tradictory with some others [9]. In anyway, more such com- (2011): An Empirical Assessment of a Systematic
parison studies are required to be able to compare the meth- Search Process for Systematic Reviews. In the
ods fairly. Proceedings of the 15th International Conference on
Evaluation and Assessment in Software Engineering,
7. ACKNOWLEDGMENTS pp. 56-65.
[14] B. Kitchenham, P. Brereton, Z. Li, D. Budgen, A.
This research was funded by the Industrial Excellence
Burn (2001): Repeatability of Systematic Literature
Center EASE - Embedded Applications Software Engineer-
Reviews. In the Proceedings of the 15th International
ing, (http://ease.cs.lth.se).
Conference on Evaluation and Assessment in Software
Engineering, pp. 46-55.
8. REFERENCES [15] K. Petersen, R. Feldt, S. Mujtaba, M. Mattsson
[1] B. Kitchenham, T. Dybå, M. Jørgensen (2004): (2008): Systematic Mapping Studies in Software
Evidence-based Software Engineering. In Proceeding of Engineering. In the Proceedings of the 12th
the 27th IEEE International Conference on Software International Conference on Evaluation and Assessment
Engineering, pp. 273-281, IEEE Computer Society. in Software Engineering.
[2] J. Webster, R. T. Watson (2002.): Analyzing the Past [16] R. Wieringa, N. A. M. Maiden, N. R. Mead, C.
to Prepare for the Future: Writing a Literature Review. Rolland (2006): Requirements Engineering Paper
MIS Quarterly 26(2): xiii-xxiii. Classification and Evaluation Criteria: a Proposal and
[3] M. A. Babar, H. Zhang (2009): Systematic Literature a Discussion. Journal of Requirements Engineering
Reviews in Software Engineering: Preliminary Results 11(1): 102-107.
from Interviews with Researchers. In the Proceedings of [17] K. Petersen, C. Wohlin (2009): Context in Industrial
the 3rd International Symposium On Empirical Software Engineering Research. 3rd International
Software Engineering And Measurement, pp. 346-355, Symposium on Empirical Software Engineering and
IEEE Computer Society. Measurement, pp. 401-404.
[4] S. Jalali, C. Wohlin (2011): Global Software [18] S. Jalali, C. Wohlin (2011): Systematic Literature
Engineering and Agile Practices: A Systematic Review. Studies: Database Searches vs. Backward Snowballing.
Journal of Software: Evolution and Process, published Technical Report,
online: DOI: 10.1002/smr.561. http://www.wohlin.eu/Database Snowballing.pdf.
[5] S. Jalali, C. Wohlin (2011): Global Software [19] S. Jalali, C. Wohlin (2010): Agile Practices in Global
Engineering and Agile Practices: A Systematic Review Software Engineering - a Systematic Map. In 5th IEEE
through Snowballing, Technical Report, International Conference on Global Software
http://www.wohlin.eu/GS Search Agile and Global.pdf. Engineering, Princeton, USA, pp. 45-54.
[6] T. Dybå, B. Kitchenham, M. Jørgensen (2005): [20] M. Ivarsson, T. Gorschek (2011): A method for
Evidence-based Software Engineering for Practitioners. Evaluating Rigor and Industrial Relevance of
IEEE Software 22(1): 58-65. Technology Evaluations. Empirical Software
[7] B. Kitchenham, S. Charters (2007): Guidelines for Engineering 16(3): 365-395.
Performing Systematic Literature Reviews in Software
Engineering. Version 2.3, Technical Report, Software APPENDIX
Engineering Group, Keele University and Department Table 4 visualizes the differences between unique set of pa-
of Computer Science, University of Durham. pers found through S1 and S2.
[8] T. Greenhalgh, R. Peacock (2005): Effectiveness and
Efficiency of Search Methods in Systematic Reviews of
Complex Evidence: Audit of Primary Sources. BMJ
331(7524): 1064-1065.
[9] P. Brereton, B. Kitchenham, D. Budgen, M. Turner, M.
Khalil (2007): Lessons from Applying the Systematic
Literature Review Process within the Software
Engineering Domain. Journal of Systems and Software
80(4): 571-583.
[10] M. Skoglund, P. Runeson (2009): Reference-based
Search Strategies in Systematic Reviews. In the
Proceedings of the 13th International Conference on
Evaluation and Assessment in Software Engineering,
Durham, England.
[11] S. MacDonell, M. Shepperd, B. Kitchenham, E.
Mendes (2010): How Reliable are Systematic Reviews
in Empirical Software Engineering?. IEEE Transactions
on Software Engineering 36(5): 676-687.
Keyword 1
Table 4: Differences of Papers in S1 and S2
Keyword 2
Comments
Expected
Database
Study
Year
Ref.