You are on page 1of 62

Search Based Software Engineering: Trends,

Techniques and Applications

By
Mark Harman, University College London
S. Afshin Mansouri, Brunel University
Yuanyuan Zhang, University College London
Presented to Dr. Sajjad Ahmed Siddiqui

Presented by Aftab Rafique


OVERVIEW
 In the recent past there has been a dramatic increase in work on Search
Based Software Engineering (SBSE)
 An approach to Software Engineering (SE) in which Search Based Optimization
(SBO) algorithms are used to address problems in SE.
 SBSE has been applied to problems throughout the SE life cycle, from
requirements and project planning to maintenance and reengineering.
 The approach is attractive because it offers a suite of adaptive automated and
semi-automated solutions in situations typified by large complex problem
spaces with multiple competing and conflicting objectives.
OVERVIEW

This paper provides a review and classification of literature on SBSE. The paper
identifies research trends and relationships between the techniques applied
and the applications to which they have been applied and highlights gaps in the
literature and avenues for further research.
SEQUENCE
 Introduction
 Background
 Classification Scheme
 Requirement / Specification
 Design and Tools Techniques
 Software / program verification and Model Checking
 Distribution Maintenance and Enhancement
 Management
 Analysis of Techniques & Applications
 How SBSE Reunites Previously divergent Areas of SE
 Overlooked and Emerging Areas / Future Trends
INTRODUCTION
Software Engineering (SE) often considers problems that involve finding a suitable balance
between competing and potentially conflicting goals. There is often a large set of choices and
finding good solutions can be hard. For instance, the following is an illustrative list of SE
questions: -
 What is the smallest set of test cases that covers all branches in this program?
 What is the best way to structure the architecture of this system to enhance its
maintainability?
 What is the set of requirements that balances software development cost and customer
satisfaction?
 What is the best allocation of resources to this software development project?
 What is the best sequence of refactoring steps to apply to this system?
INTRODUCTION

 Answers to these questions might be expected from literature on testing,


design, requirements engineering, SE management and refactoring
respectively.
 It might appear that these questions, which involve different aspects of
software engineering, would be covered by different conferences and
specialized journals and would have little in common.
 However, all of these questions are essentially optimization questions.
INTRODUCTION
 They are typical of the kinds of problem for which SBSE is well adapted and with
which each has been successfully formulated as a search based optimization
problem.
 In this survey, SBSE has been applied to testing, design, requirements, project
management and refactoring.
 This survey will show that work on SBSE applied to each of these five areas
addresses each of the five questions raised above.
 This breadth of applicability is one of the enduring appeals of SBSE.
INTRODUCTION

 It has been argued that the virtual nature of software makes it well suited for
SBO [Harman2010]. This is because fitness is computed directly in terms of
the engineering artifact, without the need for the simulation and modeling
inherent in all other approaches to engineering optimization.
 The field of SE is also imbued with rich metrics that can be useful initial
candidates for fitness functions [Harman and Clark 2004]. This paper aims to
provide a comprehensive survey of SBSE
INTRODUCTION

 It presents research activity in categories drawn from the ACM subject


categories within SE.
 For each it lists the papers, drawing out common themes, such as the type of
search technique used, the fitness definitions and the nature of evaluation
 A wide range of different optimization and search techniques can and have
been used.
 The most widely used are Local Search, Simulated Annealing (SA), Genetic
Algorithms (GAs), Genetic Programming (GP) and Hill Climbing (HC).
INTRODUCTION
 As the paper reveals ,54% of the overall SBSE literature is concerned with SE applications

relating to testing.

 There have been several important surveys in this widely studied general area [Afzal et al.

2009; Ali et al. 2010; McMinn 2004]. For this reason, the present survey will report overall

trends in the wider SBSE literature (including Search Based Testing), but it will defer to

these other three surveys for details on the specific sub-field of Search Based Testing.

 The reader is also referred to an earlier (but considerably longer) version of this paper

[Harmanetal.2009] that contains a detailed section on testing.


INTRODUCTION

 There has been a considerable increase in the quantity of SBSE research


over the past few years. Figure1(a).
 Despite the excellent work in the surveys listed above, there remains, to
date, no comprehensive survey of the whole field of study concerning
trends in research.
 It is therefore timely to review the SBSE literature, the relationships
between the applications to which it has been applied, the techniques used,
trends and open problems.
INTRODUCTION
The primary contributions of this survey are as follows: -

Publication Numbers
The trend of publications on SBSE and Software Engineering topic area.
INTRODUCTION
The primary contributions of this survey are as follows: -

Topic Area
The trend of publications on SBSE and Software Engineering topic area.
INTRODUCTION
 Coverage and Completeness.
 The survey gathers publication data and trends, covering SBSE from its early origins to
a publication ‘census date’ of December 31st 2008.
 This census date is chosen for pragmatic reasons. As this survey reveals, there is a
notably increasing trend of publication in SBSE.
 The growth in activity in this area makes a survey useful, but it also means that it may
not be feasible to conduct a detailed survey after this date.
 Classification.
 The classification of SE areas allows us to identify gaps in the literature, indicating
possible areas of SE that could (but have yet to) benefit from the application of SBSE.
 Similarly, the analysis of search techniques used, allows us to identify SBO algorithms
that have yet to receive significant attention.
 Formal Concept Analysis (FCA) is applied [Snelting 1998] in order to explore the
relationships between techniques and the applications to which they have been
applied.
INTRODUCTION

 Trend Analysis
 The survey presents numeric data concerning trends which give a
quantitative assessment of the growth in the area and the distributions of
activity among the SE domains that have received attention.
 We are also able to identify recent growth areas.
BACKGROUND
 Although interest in SBSE has witnessed a recent dramatic rise, its origins can be traced back to early work
on optimization in SE in the 1970s.
 The earliest currently known attempt to apply optimization to a SE problem was reported by Miller and
Spooner [1976] in [1976] in the area of software testing.
 The term SBSE was first used by Harman and Jones [2001a] in 2001. This paper acted as a ‘manifesto’ for
SBSE, but it should also be noted that much earlier, Carl Chang has also used his IEEE Software editorial to
promote the more widespread use of evolutionary computation in SE in 1994 [Chang 1994].
 Figure1(a)provides a histogram charting SBSE publication growth overtime, while Figure 1(b) shows the
proportion of papers that fall into each of the different SE application area subject categories.
 Harman and Jones [Harman 2007b; Harman and Jones 2001a] identified two key ingredients for the
application of SBO to SE problems:
 The choice of the representation of the problem; and
 The definition of the fitness function.
BACKGROUND
 This simplicity and ready applicability makes SBSE a very attractive option.
 Typically , a software engineer will have a suitable representation for the problem, because
one cannot do much engineering without a way to represent the problem in hand.
 Further more, many problems in SE have a rich and varied set of software metrics
associated with them that naturally form good initial candidates for fitness
functions[Harman andClark2004].
 With these two ingredients it becomes possible to implement SBO algorithms. Naturally,
there is a lot more to the application of these techniques, but these two simple ingredients
are sufficient to get started with experimentation.
 Paulding et al. 2007 presented a framework for experimental investigation of the different
algorithms.
 An overview of search techniques is available in other surveys[Har- man 2007b], while a
more detailed treatment of search methodologies can be found in the book edited by Burke
and Kendall [Burke and Kendall 2005].
BACKGROUND

CLASSIFICATION SCHEME
 Classification of SE activities is taken from the Association for Computing Machinery
(ACM) Computing Classification System, projected onto those SE areas to which SBSE has
been applied (see Table 1).
 A list of query keywords was constructed for each of the activities and each of the search
techniques (see Table 2).
 For example, the search term used to locate papers on Search Based
Requirements/Specifications (D.2.1) was: ((requirements OR specifications OR next
release OR release planning OR requirements selection OR requirements analysis OR
COTS OR requirements prioritization OR requirements triage) AND We used the following
sources from which to search: Google Scholar, IEEE Xplore Digital Library, ACM Digital
Library, SpringerLink, ScienceDirect and Wiley Inter- Science.
 We also asked the researchers in the field to check the references and notify us of the
missing references.
BACKGROUND

REQUIREMENTS/SPECIFICATIONS
Requirements engineering is a vital part of the SE process [Cheng and Atlee 2007], to which
SBSE has also been applied in order to optimize choices among requirements, the
prioritization of requirements and the relationships between requirements and
implementations. Bagnall et al.
[Bagnall et al. 2001] suggested the term Next Release Problem (NRP) for requirements
release planning and described various metaheuristic optimization algorithms, including
greedy algorithms, branch and bound, SA and HC.
The authors did not give any value property to each requirement. They only used an
associated
BACKGROUND

REQUIREMENTS/ SPECIFICATIONS
 Requirements engineering is a vital part of the SE process [Cheng and Atlee 2007], to
which SBSE has also been applied in order to optimize choices among requirements, the
prioritization of requirements and the relationships between requirements and
implementations.
 Bagnall et al. [Bagnall et al. 2001] suggested the term Next Release Problem (NRP) for
requirements release planning and described various metaheuristic optimization
algorithms, including greedy algorithms, branch and bound, SA and HC. The authors did
not give any value property to each requirement. They only used an associated
DESIGN TOOLS AND TECHNIQUES
 In other engineering disciplines SBO is widely used as a means of developing better designs.
 Where the rear e widely accepted metrics, such as cohesion and coupling, there has been much
work on optimizing these [Doval et al. 1999; Harman et al. 2002, 2005; Mahdavi et al. 2003b;
Mancoridis et al. 1999, 1998; Mitchell and Mancoridis 2002, 2003, 2008; Mitchell et al. 2002,
2004].
 However, this previous work on cohesion and coupling, is not concerned with design per se.
Rather, it is concerned with the problem of re-constructing the module boundaries of a system
after implementation.
 As such, this previous work is categorized as work on maintenance, rather than work on
design in this survey. R¨aih¨a [R¨aih¨a 2010] provided a recent detailed survey of SBSE
techniques for both design problems and re-design (maintenance) problems in SE.
 It would be natural to suppose that work on design patterns [Gamma et al. 1995] could and
should form a foundation for a strand of work on SBSE for design.
 This possibility has recently been explored in detail by Raiha et al. [Raiha 2008a,b; Raiha
et al. 2008], who proposed a GA-based approach to automatically synthesize software
architectures consisting of several design patterns.
 Other authors have proposed new SBSE approaches, specifically targeted at the design
phase of the software development process.
 Feldt [1999] presented a model to explore the difficulty in early software development
phases by using GP and also describes a prototype of interactive software development
workbench called WISE that uses biomimetic algorithms [Feldt 2002].
In the traditional N-version computing approach, different teams
of programmers are deployed to develop the different (and
hopefully, therefore, diverse) solutions to the same problem. Of
course, the development of different versions of a system in this
manner is a highly expensive solution to the problem of
robustness and fault tolerance; it can and has only been used in
highly safety-critical situations, where the expense might be
justified. Though it was not directly the intention of the work,
Feldt’s work also showed that by using GP to evolve the required
diverse solutions to the same problem, there is the potential to use
SBSE techniques to overcome the expense that was previously
inherent in N-version computing.
SOFTWARE/PROGRAM VERIFICATION AND MODEL CHECKING
 Model checking is an area of research that could well benefit from more research on SBSE techniques,
because model checking throw sup enormous search spaces and there are candidate metrics to guide a
search.
 Software/Program Verification (ACM: D.2.4) is given in Table 8. Godefroid was the first to apply SBO to
explore the state space used in model checking [Godefroid 1997]. Where the state space is too large to be
fully checked, search based optimization can be used to identify isomorphic subgraphs and to seek out
counter examples.
 Alba et al.[AlbaandChicano2007a,b,c;Albaetal.2008; Chicano and Alba 2008b,b,c] also showed how Ant
Colony Optimization (ACO) can be used to explore the state space used in model checking to seek
counter examples.
 Mahanti and Banerjee [2006] also proposed an approach for model checking, using ACO and Particle
Swarm Optimization (PSO) techniques.. They present an approach that combines Hoare–logic–style
assertion based specifications and model checking within a GP framework [He et al. 2008].
DISTRIBUTION, MAINTENANCE AND ENHANCEMENT

Software maintenance is the process of enhancing and optimizing deployed software (software release), as

well as remedying defects. It involves changes to the software in order to correct defects and deficiencies

found during field usage as well as the addition of new functionality to improve the software’s usability and

applicability. Much of the work on the application of SBSE to these topics has tended to focus on two strands

of research, each of which has attracted a great deal of interest and around which a body of work has been

produced. The first topic to be addressed is search based software modularization. More recently, there have

also been several developments in search based approaches to the automation of refactoring. The previous

work on distribution, maintenance and enhancement is discussed in more detail in the following two

subsections, which separately consider work on modularization and refactoring.


 Other work on SBSE application in distribution, maintenance and
enhancement that does not fall into these two categories has considered the
evolution of programming languages [Van Belle and Ackley 2002], real time
task allocation [Bate and Emberson 2006; Emberson and Bate 2007], quality
prediction based on the classification of metrics by a GA [Vivanco and Pizzi
2004] and legacy systems migration [Sahraoui et al. 2002].
 SBSE has also been applied to the concept assignment problem. Gold et al.
[2006] applied GAs and HC to find overlapping concept assignments.
Traditional techniques (which do not use SBSE) cannot handle overlapping
concept boundaries, because the space of possible assignments grows too
rapidly.
 The formulation of this problem as an SBSE problem allows this large space
MANAGEMENT
 SE management is concerned with the management of complex activities being carried out in
different stages of the software life cycle, seeking to optimize both the processes of software
production as well as the products produced by this process.
 Task and resource allocation, scheduling and cost & effort estimation have been among the
most frequently considered problems studied in this category.
 Papers on SBSE for management can be roughly categorized according to whether they
concern project planning activities or whether they create predictive models for cost
estimation to provide decision support to software project managers.
Project Planning

Chang et al. [Chang 1994; Chang et al. 1994, 1998, 2001; Chao et al. 1993] were the first to use SBSE on

software management problems. Their early work on search based software project management [Chang

1994; Chang et al. 1994; Chao et al. 1993] introduced the Software Project Management Net (SPM Net)

approach for project scheduling and resource allocation, evaluating SPM Net on simulated project data.

SPM Net deals with project scheduling and resource allocation.


Though there has been much interest in the difficulty of the problem of software
project management, there remain a number of unresolved challenges, including:
(1)Robustness. It may not be sufficient to find a project plan that leads to early
completion time. It may be more important to find plans that are robust in the
presence of changes. Such a robust plan may be sub-optimal with respect to the
completion time objective. This may be a worthwhile sacrifice for greater certain- ty
in the worst case completion time, should circumstances change. These forms of
‘robustness trade-off’ have been widely studied in the optimization literature [Beyer
and Sendhoff 2007].
2 Poor Estimates. All work on software project estimation has had to contend with the
problem of notoriously poor estimates [Shepperd 2007]. Much of the work on SBSE for
project management has implicitly assumed that reliable estimates are available at the start
of the project planning phase. This is an unrealistic assumption. More work is required in
order to develop techniques for software project planning that are able to handle situations
in which estimates are only partly re- liable.
2. Integration. Software project management is a top level activity in the software
development life cycle. It draws in other activities such as design, development, testing, and
maintenance. As such, project management is ideally not an activity that can be optimized in
isolation. In order to achieve wider applicability for the SBSE approach to software project
management, it will be necessary to develop techniques that can integrate management
activities with these other engineering activities
Cost Estimation Software project cost estimation is known to be a
very demanding task [Shepperd 2007]. For all forms of project, not
merely those involving software, project estimation activities are hard
problems, because of the inability to ‘predict the unpredictable’ and
the natural tendency to allocate either arbitrary (or zero) cost to
unforeseen (and un- foreseeable) necessitated activities. The problem
of estimation is arguably more acute for software projects than it is
for projects in general, because of: -
1. The inherent uncertainties involved in software development;
2. The comparative youth of the SE as a discipline; and
3. The wide variety of disparate tasks to which SE solutions can be
applied.
ANALYSIS OF TECHNIQUES & APPLICATIONS
Figure 1(a) showed the trend of growth in publications in
SBSE, while Figure 1(b) showed how the application areas
within SE have been covered. In this section a further and
deeper analysis of the overall area is provided using bar
graphs to show the relative frequency of application of
optimization techniques, together with a Formal Concept
Lattice to show the relationships between application areas
and techniques applied.
This simple local search technique is often derided in the optimization
literature, yet it can be effective and has a number of advantages over more
sophisticated algorithms:
(1)It is efficient: both quick to implement and fast in execution.
(2)Though it may become trapped in a local optima, it can be restated multiple times. As
such, for problems in which a quick answer is required that is merely ‘good enough’ – a
solution which is sufficiently better than the current one so that the effort in adopting it
would off set the effort, Hcoftenserves the purpose; the choice of other techniques
may denote something of a ‘sledge hammer to crack a nut’
(3)It gives a sense of the landscape structure. Because HC performs a local search and
ascends the ‘nearest hill to the start point’, with multiple restarts, it can be a quick and
effective way of obtaining a first approximation to the structure of the landscape.
These properties of HC make it well suited to new application areas of
SBSE (or indeed for any new optimization problem). The technique can
be used to quickly and reliably obtain initial results, test out a putative
fitness function formulation, and to assess the structure of the search
landscape. In SBSE, where many new application areas are still being
discovered, HC denotes a useful tool: providing fast, reliable and
understandable initial results. It should be tried before more
sophisticated algorithms are deployed
OVERLOOKED AND EMERGING AREAS

• Information Theoretic Fitness

• Optimization of Source Code Analysis

• Security and Protection

• Protocols

• Interactive Optimization

• Online Optimization
OVERLOOKED AND EMERGING AREAS
• Some areas of SBSE activity have been considered briefly in the literature and then
appear to have been overlooked by subsequent research. This section highlights these
areas. That is, topics that have been addressed, shown promising results, but which
have attracted neither follow-on studies nor many citations. Given the initially patchy
nature of work on SBSE and the recent upsurge in interest and activity, these
potentially overlooked areas may be worthy of further study.
• Furthermore, this survey comes at a time when SBSE research is becoming
widespread, but before it has become mainstream. Section considers both emergent
and overlooked areas together; these areas denote either SE subareas or optimization
potentialities that remain to be more fully explored
• Information Theoretic Fitness
• Lutz [Lutz 2001], considered the problem of hierarchical decomposition of software. The fitness function used by Lutz is
based upon an information-theoretic formulation inspired by Shannon [Shannon 1948]. The function awards high fitness
scores to hierarchies that can be expressed most simply (in information theoretic terms), with the aim of rewarding the
more ‘understandable’ designs.
• The paper by Lutz is one of the few to use information theoretic measurement as a fitness mechanism. This novel and
innovative approach to fitness may have wider SBSE applications.
• Recently, Feldt et al. [2008] also used an information theoretic model, drawing on the observation that the information
content of an object can be assessed by the degree to which it can be compressed (this is the so-called Kolmogorov
complexity).
• This recent work may be an indication that information theoretic fitness is not likely to remain an ‘overlooked area’ for
much longer. The authors believe that there is tremendous potential in the use of information theory as a source of
valuable fitness for SE; after all, SE is an information-rich discipline, so an information theoretic fitness function would
seem to be a natural choice.
Optimization of Source Code Analysis
• Only a few papers appear to concern source code based SBSE. This is likely to be a growth area, since many source
code analysis and manipulation problems are either inherently undecidable or present scalability issues. The source
code analysis community has long been concerned with a very rigid model of analysis, in which conservative
approximation is the favored approach to coping with the underlying undecidability of the analysis problem.
• However, more recently, Ernst’s seminal work on the detection of likely invariants
• [Ernst 2000], which spawned the widely-used and influential Daikon tool [Ernst et al. 2001] demonstrated that
unsound analyses can yield extremely valuable results. The full potential of this observation has yet to be realized.
Through the application of SBSE, it will be possible to search for interesting features and to provide probabalistic
source code analyses that, like the Daikon work, may not be sound, but would nonetheless turn out to be useful.
• A summary of the papers addressing problems related to Coding Tools and Techniques (ACM: D.2.3) is given in Table
7. All of these papers could be regarded as rep- resenting an emerging area of optimization for source code analysis
using SBSE. Hart and Shepperd [2002] addressed the automatic evolution of controller programs by ap- plying GAs
to improve the quality of the output vector, while Di Penta et al. [Di Penta et al. 2008; Di Penta and Taneja 2005]
proposed a GA based approach for grammar inference from program examples toward suitable grammars. The
grammar captures the subset of the programming language used by the programmer and can be used to
understand and reason about programming language idioms and styles Jiang et al. [2007, 2008] used search based
algorithms to decompose the program into slices and to search for useful dependence structures. The search
problem involves the space of subsets of program slices, seeking those that denote decomposable but disparate
elements of code using metaheuristic search and also greedy algorithms. The results showed that, as procedures
become larger, there was a statistically significant trend for them to become also increasingly split table.
• Security and Protection
• There have been very few papers on the application of SBSE to problems of security. A summary of the papers
addressing Security and Protection areas (ACM: K.6.5) is given in Table 13. This is sure to change, given the
importance of this area of application. The challenge is often to find a way to encode a security problem as a fitness
function. Often security aspects have a decidedly Boolean character to them; either a security problem is present
or it is absent. In order to fully apply SBSE techniques to find security problems, it will be necessary to find a way to
formulate fitness functions that offer a guiding gradient toward an optimum.
• Some authors have managed to do this. Dozier et al. [2004] described how the de- sign of AIS-based Intrusion
Detection Systems (IDSs) can be improved through the use of evolutionary hackers in the form of GENERTIA red
teams (GRTs) to discover holes found in the immune system. Dozier et al. [2007] compared a hacker with 12
evolutionary hackers based on PSO that have been used as vulnerability analyzers for AIS-based IDSs. Del Grosso et
al. [Del Grosso et al. 2005, 2008] showed how SBSE can be used to detect buffer overflow vulnerabilities, thereby
helping to guard against ‘stack smash’ attacks.
Protocols
• Protocol correctness, efficiency, security and cost are all aspects of protocol definitions that can and have been explored
using SBSE. Alba and Troya [1996] presented a first attempt in applying a GA for checking the correctness of communication
protocols (ex- pressed as a pair of communicating FSMs). Clark and Jacob [2000] used GAs in the design and development of
Burrows, Abadi and Needham (BAN) protocols optimizing for the trade-off between protocol security, efficiency and cost. This
was subsequent- ly extended by Clark and Jacob [2001], who applied GAs and SA approaches to the problem addressed in
Clark and Jacob [2000]. El-Fakih et al. [1999] used the 0-1 ILP and GAs to solve the message exchange optimization problem
for distributed applications in order to reduce the communication cost. Ferreira et al. [2008] proposed PSO to detect network
protocol errors in concurrent systems. A summary of the papers addressing problems in the area of Network Protocols (ACM:
C.2.2) using search based approach is given in Table 4
Interactive Optimization
• All of the fitness functions so far considered in the literature on SBSE have
been fully automated. This seems to be a pre-requisite; fast fitness
computation is needed for repeated evaluation during the progress of the
search. However, outside the SBSE domain of application, there has been
extensive work on fitness functions that incorporate human judgement [Funes
et al. 2004]. This form of search is known as interactive optimization and it is
clearly relevant in many aspects of SE, such as capturing inherently intuitive
value judgements about design preferences [Simons and Parmee 2008b].
• In SE, interactive optimization could be used in a number of ways. Many
problems may naturally benefit from human evaluation of fitness. For
example, in design problems, the constraints that govern the design process
may be ill-defined or subjective.
On Line Optimization
• All applications of SBSE of which the authors are aware, concern what might be termed ‘static’ or ‘offline’
optimization problems. That is, problems where the algorithm is executed off line in order to find a solution to
the problem in hand. This is to be contrasted with ‘dynamic’ or ‘on line’ SBSE, in which the solutions are
repeatedly generated in real time and applied during the lifetime of the execution of the system to which the
solution applies.

• The static nature of the search problems studied in the existing literature on SBSE has tended to delimit the
choice of algorithms and the methodology within which the use of search is applied. PSO [Zhang et al. 2005] and
ACO [Dorigo and Blum 2005] techniques have not been widely used in the SBSE literature. These techniques
work well in situations where the problem is rapidly changing and the current best solution must be continually
adapted.

• It seems likely that the ever changing and dynamic nature of many SE problems would suggest possible
application areas for ACO and PSO techniques
SBSE for Non Functional Properties
• There has been much work on stress testing [Alander et al. 1997; Briand et al. 2005, 2006;
Garousi 2006, 2008; Garousi et al. 2008; Mantere 2003] and temporal testing
• [Alander et al. 1997, 1998, 1997, 1996; Dillon 2005; Groß 2000, 2001; Groß et al. 2000;
• The problem of QoS introduced by Canfora et al. [2005a], also denotes an area of non-
functional optimization in SE which has recently witnessed an upsurge in activity and interest
[Jaeger and Mu¨ hl 2007; Ma and Zhang 2008; Su et al. 2007; Zhang et al. 2006, 2007].
• It seems likely that the drive to ever smaller devices and to massively networked devices will
make these issues far more pressing in future, thereby engendering more research in this area.
These are important emergent SE paradigms, though perhaps not widely regarded as current
mainstream SE. Afzal et al. [Afzal et al. 2009] provided a detailed in-depth survey of
approaches to testing non-functional requirements, to which the reader is referred for a more
detailed treatment of this area.
Multiobjective Optimization

• SE problems are typically multiobjective problems. The objectives that have to be met are often competing
and contradictory. For example, in project planning, seeking earliest completion time at the cheapest overall
cost will lead to a conflict of objectives. However, there is no necessary simple trade-off between the two,
making it desirable to find ‘sweet spots’ that optimize both.
• Suppose a problem is to be solved that has n fitness function, f1, . . . , fn that take some vector of
parameters x. One simple-minded way to optimize these multiple objectives is to combine them into a
single aggregated fitness, F , according to a set of coefficients determine precisely how much each
element of fitness matters. For example, if two fitness functions, f1 and f2 are combined using F = 2 · f1(x) +
f2(x) then the coefficients c1 = 2, c2 = 1 explicitly capture the belief that the property denoted by fitness
function f1 is twice as important as that denoted by fitness function f2. The consequence is that the search
may be justified in rejecting a solution that produces a marked improvement in f2, if it also produces a smaller
reduction in the value of f1.
• Most work on SBSE uses software metrics in one form or another as fitness functions [Harman and Clark
2004]. However, the metrics used are often those that are measured on an ordinal scale [Shepperd 1995]. As
such, it is not sensible to combine these metrics into an aggregate fitness in the manner described above. The
use of Pareto optimality is an alternative to aggregated fitness. It is superior in many ways. Under Pareto
optimality, one solution is better than (i.e. dominates) another if it is better according to at least one of the
individual fitness functions and no worse according to all of the others.
When searching for solutions to a problem using Pareto optimality, the search yields
a set of solutions that are non-dominated. That is, each member of the non-dominated set is no worse than any of the
others in the set, but also cannot be said to be better. Any set of non-dominated solutions forms a Pareto front. Consider
Figure 6, which depicts the computation of Pareto optimality for two imaginary fitness functions (objective 1 and
objective 2).
In the figure, points S1, S2 and S3 lie on the Pareto front, while S4 and S5 are
dominated. Interested readers may refer to Collette and Siarry [2004] for
further details about multiobjective optimization and Pareto optimality.
Recently, research on SBSE has started to move from single objective
formulations to multiobjective formulations, with an increasing focus on Pareto
optimal optimization techniques. Recent work has produced multiobjective
formulations of problems in many application areas within SE including
requirements [Finkelstein et al. 2008; Zhang et al. 2007], testing [Del Grosso et
al. 2005; Everson and Fieldsend 2006; Har- man et al. 2007], quality assurance
[Khoshgoftaar et al. 2004b], refactoring [Harman and Tratt 2007] and project
management [Alba and Chicano 2007d].
• Co-Evolution
• In co-evolutionary computation, two or more populations of solutions evolve simultaneously with the fitness of each
depending upon the current population of the other. The idea, as so far applied in SBSE work, is to capture a predator-prey
model of evolution, in which both evolving populations are stimulated to evolve to better solutions.
• Mantere [2003] also proposed a co-evolutionary approach to automatically generate
• test images for the image processing software. Adamopoulos et al. [2004] suggested the application of co-evolution in
mutation testing, arguing that this could be used to evolve sets of mutants and sets of test cases, where the test cases act as
predators and the mutants as their prey. Arcuri et al. [Arcuri 2008; Arcuri and Yao 2007] used co- evolution to evolve
programs and their test data from specifications using co-evolution. Arcuri and Yao [Arcuri 2008; Arcuri and Yao 2008] also
developed a co-evolutionary model of bug fixing, in which one population essentially seeks out patches that are able to
pass test cases, while test cases can be produced from an oracle in an attempt to find the shortcomings of a current
population of proposed patches. In this way the patch is the prey, while the test cases, once again, act as predators. The
approach assumes the
• existence of a specification to act the oracle.
• Co-evolution can also be conducted in a co-operative manner, though this remains unexplored in SBSE work. It is likely to be
productive in finding ways in which aspects of a system can be co-evolved to work better together and, like the previously
studied competitive co-evolutionary paradigm, offers great potential for further application in SBSE.
• Though all of these may not occur in the same systems, they are all
the subject of change, and should a suitable fitness function be
found, can therefore be evolved. Where two such populations are
evolved in isolation, but participate in the same over- all software
system, it would seem a logical ‘next step’, to seek to evolve these
populations together; the fitness of one is likely to have an impact on
the fitness of another, so evolution in isolation may not be capable of
locating the best solutions. Like the move from single to multiple
objectives, the migration from evolution to co-evolution offers the
chance to bring together theory and real world reality.
 This section briefly reviews some of the benefits that can be expected to
accrue from further development of the field of search based SE.
 These benefits are pervading, though often implicit, themes in SBSE
research.
 To borrow the nomenclature of aspect oriented software development,
these are the ‘cross cutting concerns’ of the SBSE world; advantages that
can be derived from almost all applications at various points in their use.
FUTURE BENEFITS TO BE EXPECTED FROM OPTIMIZATION IN SE
 One of the striking features of the SBSE research program that emerges from this survey is
the wide variety of different SE problems to which SBSE has been applied.
 Clearly, testing remains a predominant application, with 54% of all SBSE papers targeting
various aspects of testing.
 Survey reveals, there are few areas of SE activity to which SBO remains unapplied.
 This generality and applicability arises from the very nature of SE. The two primary tasks that
have to be undertaken before a search based approach can be applied to a SE problem are
the definition of a representation of the problem and the fitness function that captures the
objective or objectives to be optimized.
 Once these two tasks are accomplished, it is possible to begin to get results from the
application of many SBO techniques.
FUTURE BENEFITS TO BE EXPECTED FROM OPTIMIZATION IN SE

 In other engineering disciplines, it may not be easy to represent a problem; the physical properties of the
engineering artifact may mean that simulation is the only economical option.
 This puts the optimization algorithm at one stage removed from the engineering problem at hand.
Furthermore, for other engineering disciplines, it may not be obvious how to measure the properties of the
engineering artifact to be optimized.
 Even where the measurements required may be obvious, it may not be easy to collect the readings; once
again the physical properties of the engineering materials may be a barrier to the application of
optimization techniques.
 However, software has no physical manifestation. Therefore, there are fewer problems with the
representation of a software artifact, since almost all software artifacts are, by their very nature, based on
intangible ‘materials’ such as information, processes and logic.
FUTURE BENEFITS TO BE EXPECTED FROM OPTIMIZATION IN SE

 This intangibility has made many problems for SE. However, by contrast,
within the realm of SBSE, it is a significant advantage.
 There are few SE problems for which there will be no representation, and
the readily available representations are often ready to use ‘out of the box’
for SBSE.
Scalability
 One of the biggest problems facing software engineers is that of scalability of results.
Many approaches that are elegant in the laboratory, turn out to be inapplicable in the
field, because they lack scalability.
 Fortunately, one of the attractions of the search based model of optimization is that it is
naturally parallelizable. HC can be performed in parallel, with each climb starting at a
different point [Mahdavi et al. 2003b]. GAs, being population based, are also naturally
parallel; the fitness of each individual can be computed in parallel, with minimal
overheads [Asadi et al. 2010; Mitchell et al. 2001]. Search algorithms in general and SBSE
in particular, therefore offer a ‘killer application’ for the emergent paradigm of ubiquitous
user-level parallel computing.
 This trend toward greater parallelism, the need for scalable SE and the natural
 parallelism of many SBSE techniques all point to a likely significant development of parallel
SBSE to address the issue of SE scalability. Recent work by Yoo et al. [2011] has also
suggested possibilities in the use of General Purpose Graphics Processing Units (GPGPU)
for cheap and effective scalability of SBSE problems.
Robustness
 In some SE applications, solution robustness may be as important as solution functionality.
 For example, it may be better to locate an area of the search space that is rich in fit solutions, rather than
identifying an even fitter solution that is surrounded by a set of far less fit solutions.
 In this way, the search seeks stable and fruitful areas of the landscape, such that near neighbors of the
proposed solution are also highly fit according to the fitness function.
 This would have advantages where the solution needs to be not merely ‘good enough’ but also ‘strong
enough’ to withstand small changes in problem character [Beyer and Sendhoff 2007].
 Hitherto, research on SBSE has tended to focus on the production of the fittest possible results. However,
many application areas require solutions in a search space that may be subject to change.
 This makes robustness a natural property to which the re- search community could and should turn its
attention. Relevant
Feedback and Insight
 False intuition is often the cause of major error in software engineering, leading to
misunderstood specifications, poor communication of requirements and implicit
assumptions in designs. SBSE can address this problem.
 Unlike human-based search, automated search techniques carry with them no bias. They
automatically scour the search space for the solutions that best fit the (stated) human
assumptions in the fitness function.
 This is one of the central strengths of the search based approach. It has been widely
observed that search techniques are good at producing unexpected answers.
Feedback and Insight
 For example, EAs have led to patented designs for digital filters [Schnier et al. 2004] and
the discovery of patented antenna designs [Linden 2002].
 Automated search techniques will effectively work in tandem with the human, in an
iterative process of refinement, leading to better fitness functions and thereby, better
encapsulation of human assumptions and intuition.
Summary
• This paper has provided a detailed survey and review of the area of SE activity that has
come to be known as SBSE. As the survey shows, the past five years have witnessed a
particularly dramatic increase in SBSE activity, with many new applications being
addressed.
• The paper has identified trends in SBSE research, providing data to highlight the growth in
papers and the predominance of software testing research. It also indicates that other
areas of activities are starting to receive significant attention: requirements, project
management, design, maintenance and reverse engineering, predominating. The paper
also provides a detailed categorization of papers, tabulating the techniques used, the
problems studied and the results presented in the literature to date. This de-tailed analysis
has allowed us to identify some missing areas of activity, some potential techniques that
have yet to be applied and emerging areas.
• The future of SBSE is a bright one. There are many areas to which the techniques

• associated with SBSE surely apply, but have yet to be fully considered. In existing areas of

application the results are already very encouraging.

• Developments emanating from the optimization community will present exciting

possibilities, while new challenges from the application domains will present interesting

new challenges.

• If we are to regard software engineering to be truly an engineering discipline, then surely

we should accept SBSE as a natural consequence; Is not optimization the cornerstone of

all engineering?

You might also like