You are on page 1of 17

The Journal of Systems and Software 82 (2009) 668–684

Contents lists available at ScienceDirect

The Journal of Systems and Software


journal homepage: www.elsevier.com/locate/jss

Using aspect orientation in legacy environments for reverse engineering using


dynamic analysis—An industrial experience report q
Bram Adams a,*, Kris De Schutter b, Andy Zaidman c, Serge Demeyer d, Herman Tromp a,
Wolfgang De Meuter b
a
Ghent University, INTEC, Sint-Pietersnieuwstraat 41, B-9000 Ghent, Belgium
b
Vrije Universiteit Brussel, PROG, Pleinlaan 2, B-1050 Brussels, Belgium
c
Delft University of Technology, SERG, Mekelweg 4, 2628CD Delft, The Netherlands
d
University of Antwerp, LORE, Middelheimlaan 1, B-2020 Antwerp, Belgium

a r t i c l e i n f o a b s t r a c t

Article history: This paper reports on the challenges of using aspect-oriented programming (AOP) to aid in re-engineer-
Received 11 February 2007 ing a legacy C application. More specifically, we describe how AOP helps in the important reverse engi-
Received in revised form 25 September neering step which typically precedes a re-engineering effort. We first present a comparison of the
2008
available AOP tools for legacy C code bases, and then argue on our choice of Aspicere, our own AOP imple-
Accepted 25 September 2008
Available online 8 October 2008
mentation for C. Then, we report on Aspicere’s application in reverse engineering a legacy industrial soft-
ware system and we show how we apply a dynamic analysis to regain insight into the system. AOP is
used for instrumenting the system and for gathering the data. This approach works and is conceptually
Keywords:
Dynamic analysis
very clean, but comes with a major quid pro quo: integration of AOP tools with the build system proves
Aspect-oriented programming an important issue. This leads to the question of how to reconcile the notion of modular reasoning within
Industrial case study traditional build systems with a programming paradigm which breaks this notion.
Program comprehension C Ó 2008 Elsevier Inc. All rights reserved.

1. Introduction To counter this phenomenon, a number of solutions to cope


with evolution have been proposed (Bennett, 1995; Sneed, 1996)
Legacy software is omnipresent: software that is still very use- in the field of re-engineering (Chikofsky and Cross, 1990). When
ful to an organisation – quite often even indispensable – but the applying these countermeasures in a reliable, economically sound
evolution of which becomes too great a burden Bennett (1995). and swift fashion, the software engineer would ideally like to have
This burden can be caused by an increase in the complexity (1) a deep insight into the application in order to start his/her re-
brought on by the normal evolution of the system (Sneed, 2005; engineering operation (Sneed, 2004; Carver and Montes de Oca,
Brodie and Stonebraker, 1995; Moise and Wong, 2003; Carver 1998; Lehman, 1998) and (2) a well-covering (set of) regression
and Montes de Oca, 1998; Demeyer et al., 2003; Lehman and test(s) to check whether the adaptations made are behavior-pre-
Belady, 1985). Classic symptoms include serving (Demeyer et al., 2003; Ducasse et al., 2006). In practice, leg-
acy applications seldom have up to date documentation (Moise
 a lack of experienced developers or maintainers, and Wong, 2003), nor do they have such a set of tests.
 a lack of up-to-date documentation, and For all these reasons we are interested in the re-engineering of
 technology that does not reflect the current (business) legacy E-type systems (software systems that solve a problem or
environment. implement a computer application in the real world (Lehman,
1996)). Recent research (Mens and Tourwé, 2008; Colyer and
q
Clement, 2004; Lämmel and De Schutter, 2005) suggests that as-
This article is an extension to our earlier paper Regaining Lost Knowledge through
pect-oriented programming (AOP) (Kiczales et al., 1997) plays an
Dynamic Analysis and Aspect Orientation, published in the proceedings of the
Conference on Software Maintenance and Re-engineering (CSMR’06) (Zaidman important role in this effort as it provides a modularised way to
et al., 2006). change the existing behaviour of a system without having to
* Corresponding author. Tel.: +32 92643318; fax: +32 92643593. destructively modify that system’s source code in any way. The
E-mail addresses: Bram.Adams@ieee.org (B. Adams), kdeschut@vub.ac.be (K. De
modularity provides us with opportunities for re-engineering,
Schutter), a.e.zaidman@tudelft.nl (A. Zaidman), Serge.Demeyer@ua.ac.be (S.
Demeyer), Herman.Tromp@ugent.be (H. Tromp), wdmeuter@vub.ac.be (W. De
while the non-invasiveness takes care of some of the psychological
Meuter). concerns associated with modifying business-critical source code.

0164-1212/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved.
doi:10.1016/j.jss.2008.09.031
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 669

We have been looking at applying AOP in forward engineering aspect we apply, the results we get from the analysis, and the val-
(Lämmel and De Schutter, 2005; Schutter and Adams, 2007; De idation of those results with the system’s developers. Section 5
Schutter, 2006; Adams and Schutter, 2007), as have others (Brunt- then discusses the problems encountered while trying to apply
ink et al., 2007; Bruntink et al., 2005; Mens and Tourwé, 2008), AOP to this system. Section 6 describes the threats to validity. Sec-
with success. Different from these, this paper takes a first look at tion 7 describes the related work, followed by Section 8 which
an opportunity for AOP in a reverse engineering setting. Reverse rounds up the discussion with our conclusions.
engineering is the essential first step in the re-engineering process,
and has been reported to take up to 60% of the required effort (Cor- 2. AOP tools for legacy C applications
bi, 1990).
As part of our research in the ARRIBA1 project, our focus is on As discussed by Mens and Tourwé (2008), aspect extraction and
industrial legacy systems. Considering this, we choose to use dy- evolution are two crucial activities when re-engineering a system
namic analysis for our reverse engineering process. This choice is using aspects. Failure or success of AOP for re-engineering depends
instigated by the fact that dynamic analysis allows us to follow a to a large extent on sufficient aspect language support. Without
goal-oriented strategy, i.e., it lets us analyze only those parts of the this, the re-engineered system risks becoming unmaintainable
system that we are really interested in (Zaidman, 2006). This goal- and even less manageable than the original system.
oriented strategy is certainly warranted considering the scale of typ- This section first provides a brief introduction on AOP, before
ical legacy applications. Furthermore, it puts us in the position to re- narrowing the focus to requirements for aspect languages for leg-
port on the benefits of using dynamic analysis in a large-scale acy systems. We then discuss the aspect languages for C which ex-
industrial legacy setting, of which reports are scarce (e.g., Eisenbarth isted at the time of starting our research. Finally, we compare the
et al., 2003; Callo Arias et al., in press). In order to enable this dy- aspect languages.
namic analysis, we introduce a simple tracing aspect into an indus-
trial system. Given that we only need to collect a representative 2.1. Aspect-oriented programming
trace of the running application in order for the dynamic analysis
to work, we could also have opted for dedicated tools such as Aspect-oriented programming (AOP) modularises so-called
DTRACE (Cantrill et al., 2004) or ATOM (Srivastava and Eustace, ‘‘crosscutting concerns” (CCCs) (Kiczales et al., 1997). When devel-
1994). There are two reasons we do not do this. One is that we are opers implement these concerns using traditional programming
looking at AOP as a tool in the entire re-engineering chain and not language techniques, two undesired phenomena typically crop
limited to a particular reverse engineering technique. As proposed up in the source code: scattering and tangling. The former corre-
by De Roover et al. (2006), aspects can generate reverse-engineering sponds to implementation fragments of a concern (e.g., caching),
results in such a way that re-engineering aspects can exploit these which occur at many places throughout the source code. Hence,
results to steer their re-engineering tasks. In this respect AOP is more a change to the implementation of the concern requires changes
interesting as it is more generally applicable than the aforemen- at many locations in the source code, which is tedious, error-prone
tioned tools. The second reason is that we also need to consider and hampers understandability. The situation is even worse, be-
how to get our aspects applied in real-life systems. As this paper cause at each location where a concern fragment occurs, it may
shows, even for something as simple as a tracing aspect, this is not be tangled (mixed) with fragments of other concerns. This means
trivial. Indeed, as the prototypical example of an extremely scattered that programmers need to understand the interplay between mul-
aspect, a tracing aspect actually provides us with something of a tiple concerns before being able to modify the caching concern.
stress test with respect to the support of aspects in the legacy AOP deals with these undesirable program properties by extracting
system. crosscutting concerns in a new kind of modules: aspects.
The experiment reported on in this paper is therefore on a mid- To date, AspectJ is still the primary aspect language in existence,
size real-life system which has accumulated a mix of Kernighan & both in research and in practice. This is an aspect language for Java
Ritchie (K&R) Kernighan and Ritchie (1978) as well as ANSI-C style which has introduced the concepts of advice, pointcut, join points,
code. This has an impact on our choice of AOP tool, which this pa- etc. An aspect is similar to a class or module, but can contain ‘‘ad-
per will also take into careful consideration. vice”, which consists of a ‘‘pointcut”2 and an ‘‘advice body”. Accord-
In short, the contributions of this paper are ing to the most common school, the implementation of crosscutting
concerns is extracted from the ‘‘base code”. The latter corresponds to
 a comprehensive overview of AOP tools for the C programming the implementation of the main concerns, the so-called ‘‘dominant
language, decomposition” which forms the backbone of the whole system.
 the introduction of a new AOP tool which fits our re-engineering CCC implementation fragments are separated from the base code
goals, and localised into (possibly) multiple advice bodies of an aspect.
 the application of a dynamic analysis on an industrial legacy Code separation is only one part of the effort required to resolve
application, scattering and tangling. One still needs to specify at which mo-
 a discussion of some of the problems found when applying AOP ments during the base program execution an advice body should
in a legacy setting. be invoked. Instead of embedding explicit calls to advice within
the base code, an advice is invoked automatically once a condition
The structure of the remainder of this paper is as follows: Sec- (pointcut) is satisfied. This inversion of dependencies (Martin and
tion 2 explores the possible AOP tools for legacy C systems. As Nordberg, 2001) forms the core idea behind AOP. The moments
we will see there is none that fits the bill and so Section 3 intro- in time when advice can be triggered are called ‘‘join points”, as
duces a tool of our own which has been created according to our this is where the main concern(s) and a CCC join each other. Estab-
re-engineering goals. Next, Section 4 describes an actual applica- lished kinds of join points are method calls and executions, vari-
tion of AOP in an industrial environment by showcasing a dynamic able access and manipulation, etc. A pointcut can make use of
analysis approach; we present the actual experiment, including the program structure, name patterns, dynamic program state, etc. to
describe the intended set of join points. It is, for example, possible

1
Architectural Resources for the Restructuring and Integration of Business Applications.
2
More info on this project at http://arriba.vub.ac.be/. Sometimes abbreviated to ‘‘PCD”, for ‘‘pointcut designator”.
670 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

to select all join points which occur in the control flow of another systems (De Schutter, 2006; Nágy, 2007; Bruntink et al., 2007). This
join point. The advice body can be executed before, after or around paper investigates this claim.
a matched join point. In the latter case, the advice can explicitly de-
cide whether to resume (‘‘proceed”) the advised join point. 2.2. Requirements for aspect languages for legacy systems
The process of matching join points with a pointcut and of exe-
cuting advice on a pointcut match is called ‘‘weaving”. Conceptu- Finding the right aspect language for re-engineering legacy sys-
ally, a ‘‘weaver” monitors the program execution and checks each tems is not an easy task, because these environments have other
join point to decide whether there is a match or not. In practice, needs than modern systems. Lämmel and De Schutter (2005) has
the set of interesting join points can be reduced based on analysis made an explicit account of the rationale behind and the design
of the pointcuts, or weaving can be moved completely to the com- of an aspect language support for typical legacy (Cobol) systems.
piler, with only a couple of dynamic checks (‘‘residues”) left at run- Other researchers have discussed specific facets of aspect language
time. Some aspect languages such as AspectJ also provide means design in legacy environments (Coady et al., 2001; Bruntink et al.,
for managing static crosscutting concerns, i.e., inter-type declara- 2005). From this work, we have distilled five requirements for as-
tions3 (ITDs) (Kiczales et al., 2001). Whereas advice alters program pect languages for the re-engineering of legacy systems:
behaviour, ITD alters types or may facilitate program verification Base integration. The aspect language constructs should blend
and error handling. The latter two applications solicit compiler feed- with the base programming language.
back if a user-specified pointcut matches during weaving. Type alter- Expressive pointcuts. The pointcut language has to make up for
ation allows classes and interfaces to be extended with new the weaker support for typing, structuring, etc. in the base
attributes or methods and may even change the inheritance hierar- language.
chy by adding new interfaces to be implemented or by changing the Generic advice. Advice should be robust to small variations in
superclass. The idea is that these structural modifications support types and context across the advised join points.
behavioural CCCs, which are implemented separately as advice, but Join point context. Advice should have access to join point
that they also allow base code developers to explicitly use the intro- context.
duced attributes or methods. Griswold et al. call this ‘‘language-level Available weaver. A solid weaver should be available.
obliviousness” (Griswold et al., 2006), i.e., developers are aware of The first requirement considers the psychological integration of
the woven aspects. If developers do not know anything about the a new technology. As Cobol programmers are fluent in writing Co-
possible aspects, one speaks of ‘‘designer obliviousness”, unless bol code and are mostly weary of new technologies, adoption of as-
developers may prepare the base code to expose better join points pects can be accelerated if the aspect language does not try to copy
(‘‘feature obliviousness”). Commonly, a distinction is made between or re-implement existing features (Durr et al., 2006) and is suited
‘‘homogeneous” and ‘‘heterogeneous” CCCs (Colyer and Clement, to the particular domain programmers are working in. Instead of
2004). Homogeneous concerns are said to look almost identical a separate aspect construct as in AspectJ, it is much more natural
everywhere they occur. On the other hand, heterogeneous concerns to adopt ordinary Cobol files as aspects. New constructs for point-
may vary widely between different occurrences. As a consequence, cuts and advice have to be added, but to lower the learning curve
the implementation of homogeneous concerns may be easily local- they should be as close as possible to existing language constructs
ised into one advice body, whereas heterogeneous concerns are (C preprocessor, etc.) and should be able to interact with them.
harder to implement in a robust way. In the latter case, the advanta- This also makes integration into existing development environ-
ges of AOP may seem to be limited, but this actually depends on the ments easier, because these only need to support the new advice
expressivity of the aspect language, i.e., the advice and pointcut lan- and pointcut concepts.
guage. The better the variability can be expressed in the aspect lan- The second and third requirements can be illustrated best by an
guage, the easier the heterogeneous advice can be extracted into example. Bruntink et al. (2004, 2005) have used AspectC (Coady
advice. AOP has been studied especially in the context of OO sys- et al., 2001), the first aspect language for C (described later), to
tems, as a means to overcome the problems of scattering and tan- implement an aspect which checks whether pointer arguments
gling in even the most advanced OO languages. Nevertheless, CCCs passed to a procedure correspond to a null pointer. As C does
are more fundamental than this. Anytime a problem is tackled by not have a kind of ‘‘super-type” similar to Java’s Object4 and it
making some structural design decisions, the remaining concerns does not support C++-like templates, there is no type-safe way to re-
have to fit into this main decomposition somehow. This problem is fer to a generic type. Because AspectC does not have explicit provi-
named the ‘‘tyranny of the dominant decomposition” (Tarr et al., sions for dealing with this, Bruntink et al. (2004, 2005) were
1999). Hence, CCCs occur not only in OO systems, but also in proce- forced to duplicate their argument checking advice for each occur-
dural or functional programs (Kiczales et al., 1997; Lämmel and De ring argument type and to use plain enumerations of procedure
Schutter, 2005; De Schutter, 2006), as these also start from a main names as pointcut. This situation impeded maintenance, as the long
decomposition of the system. Keeping in mind that OO languages of- enumeration-based pointcuts had to be adapted on every non-trivial
fer more powerful composition constructs than modular or proce- source code change, and changes to the advice logic had to be perco-
dural programming, this means that the latter have even less lated to all duplicates of the advice.
means to manage CCCs. Research has shown (Bruntink et al., 2004; To resolve these problems, Bruntink et al. (2004, 2005) have
Colyer and Clement, 2004; Bruntink et al., 2004; Bruntink et al., developed a domain-specific language (DSL) for parameter check-
2005; Bruntink et al., 2006; Bruntink et al., 2007) that CCCs repre- ing, which is translated by a preprocessor to AspectC advice.
sent an important evolution problem in legacy systems, especially Although they show that their solution greatly improves the source
if one takes the scale of these systems into account (millions of lines code quality, it still remains an ad hoc solution. Aspect languages
of code). Tangling and scattering of CCCs with the main concern for legacy systems should provide support for writing robust point-
heavily impact program understandability, while scattering in- cuts and for specifying generic advice, i.e., advice which is robust to
creases the cost of maintenance and reduces traceability of code small variability in types and context across all join points it ad-
fragments to the modeled concern. Various researchers have consid- vises. Robustness can further be improved by providing more
ered AOP as a viable solution to deal with these problems in legacy

4
C does have void pointers, which can point to anything, but using them
3
The original name for this feature was ‘‘introduction”. precludes compile-time type checking.
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 671

Table 1
Overview of the existing aspect languages for C. Each of the five sets of rows corresponds to one of the five requirements for aspect languages for legacy systems. A +/ indicates
good/bad support for a feature, whereas ‘‘N/A” signals when an entry is not applicable to a language. Because every aspect language supports access to function arguments and
global variables, a ‘‘” for ‘‘context” means that there is no additional means for access to join point context.

AspectC AspectC++ AspectX Arachne Aspicere


Domain Kernel General General Systems General
Base integration + .h XML + +
Preprocessor   # include  
PCD robustness  Regexp XPath + LMP
(Function) pointers     
ITD  + +  
Basic join points AspectJ AspectJ + + +
Dynamic join points AspectJ AspectJ  + 
Advanced join points  callsto/reachable comments + 
Variable access   + + 
Generic advice  + +  +
Aspect interaction Build Explicit Build Deployment Build
Advice interaction Lexical Lexical Lexical Lexical Lexical
Context  + + + +
This JoinPoint  +   +
Annotations     
Availability + + + + +
Weaver type Source2source Source2source Source2source Run-time Source
Optimisation  +   
K&R support +  + N/A +
IDE support  +   

advanced join points (variable access, control flow, etc.), whereas To summarise, aspect languages for legacy systems should
the ease of expressing the precedence between aspects or individ- blend naturally with the base programming language, should sup-
ual aspects is also important to keep in mind for genericity. The port generic advice, offer a means for composing expressive point-
fourth requirement, i.e., sophisticated access to join point context, cuts and access to join point context. A sufficiently mature weaver
refers to the base elements in terms of which pointcuts are ex- implementation is also needed.
pressed. To be able to specify robust patterns of join points, Gybels At the start of our research we had a choice of four AOP tools for
and Brichau (2003) and De Schutter (2006) have proposed access C: AspectC, AspectC++, AspectX and Arachne. We have evaluated
to program structure as a prerequisite. De Schutter has elaborated these four tools with respect to the five requirements. The follow-
on this by stressing the importance of weave-time metadata in ing sections discuss in more detail the data given in Table 1 which
pointcuts, i.e., logic facts which represent design information or re- summarises our results.
sults of reverse engineering analysis. They allow to write more ro-
bust pointcuts which are synchronised with design changes, more 2.3. AspectC
precise analysis results or, e.g., developer annotations. In general,
any kind of information could be offered as context to pointcuts AspectC was the first aspect language for C, inspired by As-
(Ostermann et al., 2005), or directly to advice. The latter is typically pectJ’s constructs. In Fig. 1, we are only interested in (lines 1–3)
obtained by means of an explicit join point object (e.g., named the execution of the vm_page_lookup procedure if this join point
‘‘thisJoinPoint”). lies in the control flow of an execution join point of the allocbuf
The fifth requirement seems trivial, but for many aspect lan- procedure. Instead of each of these join points (lines 5–6), the ad-
guages only a proof-of-concept implementation exists, which is vice body in lines 7–12 is executed. This contains normal C logic,
not able to cope with the actual code found in legacy systems. except for the call to proceed which enables to execute the ad-
Robustness to base language dialects and the ability to deal with vised join point from inside the around-advice. There is no explicit
language abuse (e.g., function pointers) are indispensable. The mo- aspect construct, as file boundaries are used for this. Contrary to
ment in time on which the weaver kicks in is important as well. AspectJ, variable accesses cannot be advised and there is no ITD
Many aspect languages for C feature a compile-time weaver, pri- support either.
marily because the typical domains where C shines (system soft- The advice signature does not specify a return type (line 5). In-
ware!) require highly efficient woven code. However, these stead, the aspect developer should determine the right return type
systems have other desirable properties too, such as availability when it is needed in the advice body, e.g., when declaring local
and debuggability. These are the application areas where run-time variables (line 7). Regular expressions cannot be used either in
weavers can be beneficial (Popovici et al., 2002) for, as they theo- pointcuts, which means that for every possible return type a sepa-
retically offer the capability to advise any running system. For this, rate pointcut and advice have to be written. This has caused the
most dynamic weavers are based on instrumentation libraries or problems of Bruntink et al. which are discussed in Section 2.2. Ac-
techniques such as code splicing (Engel and Freisleben, 2005), cess to typed context is possible (line 5). The precedence of multi-
i.e., tweaking the assembler code to jump to aspect code. Conse- ple pieces of advice on a common join point is determined by
quently, most of them do not require to parse or process the actual lexical order of the advices in the aspects and by the order in which
source code. Platform-independence of the instrumentation mech- the aspects are listed when invoking the weaver.
anisms in use is questionable, however. Also, there usually is no Initially (Coady et al., 2001), aspects were hand-compiled, but
opportunity to optimise the advised applications after the dynamic later on (Coady and Kiczales, 2003), a real weaver has been built.
weaving: they tend to become patchworks. This is not the case for AspectC seems unmaintained since 2003 without any official re-
static weavers. leases, but a weaver prototype had been available on request.
672 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

Fig. 1. AspectC aspect for page daemon wake-up in the FreeBSD kernel (Coady and Kiczales, 2003).

Fig. 2. AspectC++ aspect which converts return value error codes into C++ exceptions (Spinczyk and Lohmann, 2007).

2.4. AspectC++ vided. The callsto pointcut takes an execution pointcut and
deduces which call join points can call the join points described
AspectC++ (Spinczyk et al., 2002; Spinczyk and Lohmann, by the execution pointcut. The reachable pointcut is analogous,
2007)5 is the most mature and general-purpose aspect language but it calculates (via static analysis) all join points from which its
for C++ to date, but since its inception people have tried to use it argument join points can be reached. There are no set and get
for C too. Official support for this has never been a priority, however. pointcuts, i.e., access to variables is not reified as a join point, be-
Non-ANSI C code (the so-called ‘‘K&R”-code) cannot be parsed by the cause of aliasing problems introduced by pointers and because of
weaver. It is also not clear which constructs and pointcuts can be the unsound semantics of set regarding operator=. Just like As-
used for C and which ones cannot. The AspectC++ weaver is heavily pectJ, there are precedence directives to derive a total order be-
based on template instantiation to reduce memory footprint and tween aspects. Advice ordering within an aspect is ordered via
execution time. The woven code is valid C++ which needs a modern lexical conventions.
C++ compiler. There is a (commercial) IDE plugin. AspectC++ has coined the term ‘‘generic advice” (Lohmann
As Fig. 2 shows, AspectC++ is heavily influenced by AspectJ. et al., 2004) to refer to the powerful capabilities of templates for
There is an explicit aspect construct which is similar to a C++ obtaining highly reusable and robust advice. The idea is that As-
class, hence it needs to be declared inside a special ‘‘aspect header pectJ’s distinction between static and dynamic join point contexts
file”. Join point, advice and pointcut types are comparable to As- is generalised to C++’s strong compile-time template mechanism.
pectJ. Contrary to AspectJ, advice and inter-type declarations Compile-time context can be used to instantiate templated advice
(‘‘slices”) are specified in the same way. Because of this, the join and functions, such that there is no run-time overhead to
point model is said to be ‘‘unified”. Join points are implicitly typed, dynamically allocate or access context. Line 21 of Fig. 2 gives an
such that the weaver may check that they are only advised by cor- example of this. Because JoinPoint is just a class name
rectly typed advice. Regular expressions can be used to specify and JoinPoint::ARGS statically resolves to the correct
pointcuts (lines 3–6 in Fig. 2). Two new pointcut types are pro- number of function arguments of the advised call join point, the
stream_ params<JoinPoint,JoinPoint::ARGS> template is
instantiated at compile-time through template meta-program-
5
http://www.aspectc.org/. ming. This mechanism allows for very reusable and robust advice,
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 673

Fig. 3. Accesses to a float member are replaced by the result of a method call with XWeaver (see example from website).

which varies automatically based on the particular join point and namic join points. On the other hand, the focus on program
advice context. As a downside, the templates can easily get very transformation enables syntactic ITD of comments and even of in-
complex to understand, especially for C programmers who are clude/import statements.
used to the simpler semantics of the C preprocessor. XWeaver is implemented in Java. There is an Eclipse plugin
(AXDT) akin to the AspectJ AJDT, but command line or build script
2.5. AspectX/XWeaver access (via Ant) is also possible. XWeaver can generate an Ant file
based on a project file (XML). The latter specifies the important
XWeaver6 (Rohlik et al., 2004) is the name of the aspect weaver directories in the project and also the aspect configuration per sub-
associated with the AspectX language. It is conceived for tailoring set of base code modules and header files. Finally, the precedence
software frameworks to control systems. As quality control is impor- of advice is determined by the lexical order in the aspect files and
tant for this, XWeaver’s task is to generate woven code which syn- by the order in which aspects are read by the weaver (similar to
tactically resembles the base code layout and even updates AspectC).
comments (to document the woven code) such that the woven code
can be manually investigated. XWeaver does not work on the pro- 2.6. Arachne
gram AST, but on srcML. This is an XML representation of a program
in which only high-level program constructs are accessible. Com- Arachne7 is a dynamic aspect language for C which improves on
ments and include/import statements are retained. This format the obsolete lDiner framework (Douence et al., 2005). The Prolog-
makes XWeaver language-independent, in the sense that initial like pointcut language is based on a temporal sequence of procedure
C++ support has been extended to Java once srcML was released call and/or on (in)direct variable access join points. The resulting
for Java. Just as is the case with AspectC++, XWeaver can be used sequence pointcut is a natural means for advising protocol-like
for C systems too. The AspectX language is XML-based, as the advice behaviour, as each element of the sequence can be advised individ-
in Fig. 3 shows. XML Schema type-checks the syntax of the XML as- ually. The advice of Fig. 4 detects when more data are written into a
pects. AspectX allows the usage of XPath and XSLT technologies in heap-allocated buffer (lines 3–4) than initially allocated (lines 1–2).
the pointcut (lines 2–4) and advice (lines 14–15), respectively. XPath In that case, overflow is reported (line 5). The sequence ends when
is able to select the right XML nodes by navigating across the XML the buffer is deallocated (line 6). The latter is required to avoid fur-
tree. In the example, nodes of type float are selected (line 2) which ther run-time checks that are performed for the buffer allocated at
do not occur as the left-hand side of an assignment or ‘‘equals” con- that address.
dition (lines 3–4). The advice of lines 10–17 replaces (line 10) the se- To increase the expressivity of this language, Loriant et al.
lected elements using the XSLT transformation of lines 14–15. This (2006) later have added the possibility to bind specific context to
transformation capitalises the name of the advised join point XML each instance of a sequence. Also, a fakeEvent construct has
node. Hence, the user should have considerable knowledge of the been introduced to simulate calls to an arbitrary procedure when
program XML-model. Special symbols need to be escaped, as the some join point matches. These fake events can then trigger other
&gt; in line 14 shows. Inclusion of XML documents can be exploited advice, the pointcut of which is expressed in terms of that event.
to reuse a library of pointcuts. To summarise, the AspectX language Arachne uses clever assembler manipulation techniques to
is a very low-level aspect language which resides on the border with instrument a running system without having to pause it. Hence,
pure program transformation. the precedence of advice at a common join point is determined
Join point context (argument types/names, return types, etc.) is by the lexical order in the aspect and by the order of deploying
accessible via dollar-variables such as ${className}. Under the the aspects. Despite claims of robustness across computer architec-
hood, XWeaver models aspects as XSLT transformations, which tures, these techniques did not work on the various test machines
means that join point context actually corresponds to XSLT queries. we have tested it on.8 The last public release of Arachne dates back
Hence, users can extend XWeaver with new context queries. New to March 2005.
join point types can be added in a similar manner. More traditional
join points like execution exist, but all of them are purely stati-
cally determined based on the AST. There are no provisions for dy-
7
http://www.emn.fr/x-info/arachne/.
6 8
http://www.xweaver.org. Arachne is distributed as a live Linux distribution.
674 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

Fig. 4. Buffer overflow detection aspect (Douence et al., 2005).

2.7. Discussion Finally, Aspicere does not take inline assembler and preproces-
sor constructs such as macros or conditional compilation into
First, we have observed that most aspect languages blend well account.
with the base code. Second, regular expressions are the most wide-
spread mechanism to obtain expressive pointcuts. AspectX and 3.2. Pointcuts
Arachne are more powerful because of their syntactic program
transformation and sequence pointcuts, respectively. Third, ex- As argued in (De Schutter, 2006), legacy languages such as C
cept for AspectC++, the aspect languages provide some form of lack sufficient structure or reflective capabilities to be able to write
generic advice, especially by allowing access to a rich set of join crisp and robust pointcut patterns and advice. To deal with this,
point variables inside the advice body. AspectC++ on the other Aspicere’s pointcut language is heavily influenced by the querying
hand relies on C++ templates. It does not require developers to variant of logic meta-programming (LMP) (De Volder and D’Hondt,
change the weaver implementation to add extra context for 1999; Brichau et al., 2002; Gybels and Brichau, 2003). The basic
obtaining more expressive pointcuts. Fourth, aspect languages idea is that a program is represented as a collection of logic facts
with generic advice all have access to a wealth of join point con- and that a Turing-complete logic language is used to reason about
text. Fifth, AspectC and AspectX have sufficiently robust weavers, these program facts. Pointcuts can be expressed in terms of the
whereas AspectC++ generates more efficient woven code. Arachne program facts to compose powerful patterns based on program
does not need to parse the base code, but yields less optimised wo- structure using conjunction, disjunction and or negation (by
ven programs. Overall, AspectC++, AspectX and Arachne conceptu- unprovability). By carefully designing more advanced predicates
ally are the best aspect languages for C based on our five in terms of more primitive ones, a clean, layered pointcut language
requirements. Because AspectC++ does not support K&R C and gen- can be constructed. Adding new pointcuts comes down to defining
erates C++ code instead of C, AspectX reasons in terms of XML new logic predicates. By raising the level of abstraction, pointcut
transformation instead of in terms of join points, and Arachne does predicates can be brought closer to the actual problem domain.
not provide generic advice and is not platform-independent, none Lines 2–4 of Fig. 5 show an example pointcut in Aspicere. This
of the languages are really suited for the legacy system environ- pointcut matches all calls10 (line 2) to procedures, the name of
ments we are targeting. Instead, we have decided to design and which matches the regular expression ‘‘ato.”, i.e., the name starts
implement a new aspect language for C, i.e., Aspicere. The next sec- with ato and the fourth letter is arbitrary. Because Aspicere does
tion presents the design of Aspicere. not allow advising a call via a function pointer (see Section 3.1), this
pointcut only matches explicit function calls. To connect the invo-
3. Aspicere, an AOP language and tool for legacy C systems cation predicate to the ones in lines 3 and 4, Prolog’s unification al-
lows us to reuse the previously bound join point variable (Jp). This is
This section presents our aspect language for C, Aspicere9 (Zaid- a very natural way to express that two bound variables should be
man et al., 2006). We consider its rationale, the language itself and equal. The args predicate binds the call’s sole argument passed
the weaver we have developed for it. The last column of Table 1 via the argument list to the Src variable and also captures the pro-
summarises how Aspicere compares to the aspect languages for C cedure call’s return type as ReturnType.
discussed in the previous section. The unification has an interesting effect: if a variable can have
multiple values, each one is eventually used to find a complete
3.1. The join point model match of the logic rule. In Fig. 5, each function call which satisfies
the regular expression in line 2 (and the other conditions) leads to
As with the other aspect languages for C, procedures are the an extra match of the pointcut.
prime join point type in Aspicere. Similar to AspectJ, a distinction Apart from program structure, logic facts can also represent
is made between a call and an execution join point, i.e., a join weave-time metadata. This is especially useful for legacy systems,
point at the caller and callee side. This helps to distinguish be- because metadata facts can record information obtained via re-
tween advising all (execution) or just a number (call) of proce- verse engineering and can make it accessible in advice to re-engi-
dure invocations, which is important regarding shipping aspects neer the system. Design information, the actual composition of
with libraries or not. In addition, an execution join point is the source modules (is one executable built or are multiple libraries
easiest way of dealing with function pointers. Second, pointers also built?), information of base code modules selected by the user,
cause the problem of ‘‘aliasing”. A given (global) variable can be ac- etc. can all be passed to pointcuts and be used in the advice body.
cessed directly or via some pointer to it. Arachne (Section 2.6), e.g., Logic facts are useful to store metadata separately in a loosely cou-
tried to use the operating system’s page fault mechanism to detect pled fashion. To summarise, Aspicere has an expressive pointcut
variable access, but this caused extreme performance penalties. language which is able to abstract over implementation details of
Just like AspectC++ (Section 2.4), Aspicere does not support variable the base code (requirement two of Section 2.2). The binding of vari-
access join points such as get or set. ables enables access to a variety of join point contexts (require-
ment four), but we provide more details on this in the next section.

9 10
Aspicere: verb, Latin, ‘‘to look at”. Root of our modern word aspect. Available at Because of name clash issues in our weaver implementations, we use invoca-
http://users.ugent.be/badams/aspicere/. tion instead of call.
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 675

Fig. 5. Aspect to make conversion to numbers null pointer-proof.

3.3. Advice construct as for AspectC and AspectX) or to the join point itself. If one wants
to replace the value of the arguments, one should fill in the argu-
Aspicere’s aspects correspond to ordinary C modules with a new ments or assign directly to a bound argument. Another alternative
advice construct. The advice in Fig. 5 secures calls to the standard is to access the thisJoinPoint-like struct, which is accessible via
atoi, atol and atof procedures to prevent a program from crash- the join point variable (Jp). This struct contains the following
ing11 when a null pointer is passed as an argument. These three pro- fields:
cedures should parse a string (char) argument into an int, long or
double. Note that a single advice suffices to advise all three proce- nrArgs—number of arguments
dures because of the use of the so-called ‘‘template parameters” args—array of arguments
(lines 1 and 5), which are similar to C++ templates. returnValue—pointer to return value
An advice structure specifies fileName—name of file in which advised join point resides
functionName—name of advised function
 the advice return type (useless in case of before- or after-
advice); Aspicere provides generic advice with flexible access to join
 the name of the advice; point context (requirement three). It forms a hybrid between C
 a list of bound context variables12 visible to the advice body; and Prolog (requirement one).
 the type of advice (before, around or after (returning));
 (in case of after returning) binding of return value to a 3.4. Aspicere’s weaver
variable;
 the name of the join point variable where advice has to be Fig. 6 shows the architecture of the Aspicere weaver (named
woven; ‘‘Aspicere1”). It is a pure source-to-source weaver (Zaidman
 a pointcut (behind the colon); et al., 2006), which takes as input base code (one module at a time),
 the advice body. aspects (.ac) and Prolog modules (.pl), and generates woven C code
which can be compiled using the normal compiler. This means that
Aspicere’s advice body contains pure C code enhanced with tem- the weaver has to be integrated between the C preprocessor and
plate parameters. As the pointcut (see Section 3.2) consists of Pro- the C compiler. Aspicere’s parser is capable of handling C code made
log predicates with C-like conjunction, disjunction and negation up of a mix of K&R-, ANSI- and GNU-standards.
operators, this makes Aspicere a hybrid of pure C and a Prolog- Inside the weaver, a parser and unparser convert the C code to/
based pointcut language. To make the learning curve lower, Aspi- from an XML representation of the AST (analogous to AspectX in
cere enables a C-like syntax of Prolog’s conjunction, disjunction Section 2.5). The Prolog modules and the pointcuts (transformed
and negation operators (e.g., ‘‘&&” instead of ‘‘,”). into Prolog rules) are used to locate the right join point shadows13
The advice of Fig. 5 corresponds to around-advice on join (Hilsdale and Hugunin, 2004), i.e., the appropriate XML nodes of the
points Jp which satisfy the pointcut in lines 2–4. This pointcut AST. Once these have been found, the XML tree representing the base
has been explained in the previous section. Two typed variables code module can be transformed, as well as the aspects themselves.
are bound (ReturnType and Src) and are available for use as tem- Advice is converted into multiple C functions, one per combina-
plate parameters in the advice body. Src is a simple string, tion of type parameters used in the advice body. These transformed
whereas ReturnType represents an actual C TYPE. TYPE is a cus- advices are collected into the transformed aspect, which has to be
tom (meta)type we have added to Aspicere, because C does not reused as input to the weaving process of other files of the same
have reflective capabilities. One can use such a type parameter fur- application (the big loop in Fig. 6). At the same time, information
ther on in the binding list, as return type of the advice (line 1) and about matched join points is collected into a ‘‘join point match
of course inside the advice body itself (line 5). Type parameters repository” XML file, which is also reused in future weavings. The
help to achieve better static typing than the use of a catch-all void* woven base code module is converted back to C code, whereas
would allow, similar to C++ templates. Advice becomes robust to the transformed aspect possibly needs to be transformed further
small changes in, e.g., types, as Fig. 5 illustrates. when weaving other base code modules. Eventually, the resulting
Apart from template parameters, an advice body may contain a transformed aspect also has to be compiled and linked with the
proceed-call, similar to AspectJ. If no arguments are given, the join base code to form the complete woven system.
point’s original arguments are passed as is to any remaining ad-
vices on the same join point (the precedence rules are the same

13
A join point is a run-time concept. However, for each join point a corresponding
11
Some platforms (e.g., UnixWare) already handle this, others (e.g., GNU) do not. location in the source code can be found which contains the actual code which is
The advice shown here allows to abstract away from this difference in platform. executed by the join point (e.g., an actual procedure call forms the shadow of a call
12
Their names always start with a capital letter. join point). Shadows are used by compile-time weavers to statically weave aspects.
676 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

Fig. 6. Architecture of Aspicere1, Aspicere’s source-to-source weaver. The .ac-files represent aspects, while .pl-files contain logic predicates.

The availability of an aspect weaver for Aspicere fulfills the last TDFS application, other than what was described above. That is,
of the five requirements for aspect languages for legacy systems set we have no knowledge of its size or architecture. Discovering this
out in Section 2.2. The next section shows how Aspicere is applied. information is exactly the goal of the experiment.
While the concept of dynamic analysis goes back to at least the
4. Example in an industrial environment early seventies (Biermann and Feldman, 1972), there has recently
been a renewed interest in dynamic analysis techniques that deal
Kava, our industrial partner, is a non-profit organisation that with large-scale program comprehension (Greevy and Ducasse,
groups over a thousand Flemish pharmacists. While originally set 2005; Hamou-Lhadj et al., 2005; Zaidman and Demeyer, 2004;
up as a union for the pharmaceutical profession, they have evolved Zaidman et al., 2005; Zaidman and Demeyer, in press; Richner,
into a full-fledged service-oriented provider for pharmacists. 2002; Systä, 2000). This renewed interest can partly be explained
Some ten years ago they have developed a suite of applications by the increasing need to understand large-scale object-oriented
written in non-ANSI C, which has put them among the first in the software. This object-oriented software makes abundant use of
industry to have an automated tarification service. Due to succes- polymorphism, and the late binding mechanism makes it hard to
sive health care regulation changes they are very much aware of understand the software when only using static analysis
the necessity to adapt and re-engineer this service. Furthermore, techniques.
during their migration from non-ANSI C to ANSI C compliant ver- One such dynamic analysis technique, the key class identification
sions of their applications, they have noted that the documentation technique was developed in-house and was previously extensively
of these applications was outdated, making it difficult for new soft- validated with object-oriented software systems (Zaidman et al.,
ware engineers to get acquainted with the system. To help solve 2005; Zaidman and Demeyer, in press). This previous study has
this problem, Kava was interested in applying reverse engineering shown that the key class identification technique is able to find
techniques to their system. those classes in a system that need to be studied during early pro-
The developers at Kava have pointed us to the so-called TDFS14 gram comprehension phases. We found it particularly worthwhile
batch application. This is a part of the Kava system which is used as a to verify whether this key class identification technique could also
check to see whether adaptations in the system have any unforeseen prove its worth in non-object-oriented legacy systems where it
consequences. As such, it should be considered as a functional appli- would identify key modules rather than key classes.
cation—it outputs a detailed invoice of all prescriptions, ready to be The remainder of this section briefly describes the concept of
sent to the health care insurance institutions—but also as a kind of the key class identification technique. This is then followed by an
regression test. elaboration on its actual application in the Kava environment by
In agreement with Kava we have opted for the use of dynamic means of Aspicere. We summarise the results obtained from the
analysis in investigating this application. It is important to note analysis, and their validation with the original developers.
that this experiment assumes no knowledge of the details of the
4.1. Dynamic coupling-based analysis

14
TDFS: ‘‘TariferingsDienst Factuur en Statistiek”, or ‘‘Tarification Service for Within software systems, the concept of coupling is inevitable
Invoices and Statistics”. as program parts—be they classes or modules—work together to
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 677

reach a certain goal. Classes that exhibit a relatively high level of


coupling can be designated ‘‘influential”. Influential, because they
have a certain amount of control over what the application is doing
and how it is doing it. A similar observation of influential classes
was made by Tahvildari in her research about design flaws (Tahvil-
dari, 2003): ‘‘Usually, these most important concepts of a system
are implemented by very few key classes, which can be character-
ized by a number of properties: (1) they manage a large amount of
other classes or use them in order to implement their functionality,
(2) they are tightly coupled with other parts of the system and (3)
they tend to be rather complex, as they implement much of the
legacy system’s functionality”. As such, structural dependencies
between modules of a system can indicate modules that are inter-
esting for initial program comprehension (Robillard, 2005).
To be more specific, the key class identification technique uses
run-time export coupling, which is a measure for the degree to
which a module requests other modules to do work for it (delega-
tion). This gives us all actual dependencies that happen at run-
time, provided we have a well-covering execution scenario. How-
ever, coupling measures are typically between two classes or mod-
ules, whereas we want to take into consideration the complete
Fig. 7. Example graph and the accompanying first iterations of the HITS webmining
structural topology of the application. To overcome this strict bin- algorithm.
ary relation between modules, we add a transitive measurement
for reasoning over the topology. We use webmining techniques
for this (Zaidman et al., 2005; Zaidman and Demeyer, in press).
As an example consider the graph given in Fig. 7. The table in
4.2. Webmining analysis
this figure shows three iteration steps of the hub and authority
scores (represented by tuples (H,A)) for each of the five nodes from
Webmining, a branch of datamining research, analyzes the
the graph in Fig. 7a. From this, we can conclude that 2 and 3 will be
topological structure of the web trying to identify important web
good authorities as can be seen from their high A scores in Table
pages based solely on their hyperlink structure. By replacing the
7b. Looking at the final H values, 4 and 5 will be good hubs, while
hyperlink structure of the web (i.e., a web graph) by a call graph
1 will be a less good one15.
that shows the structural dependencies of a software system, we
The result set obtained from this heuristic is a list of all the
are able to use the same basic technique to retrieve important clas-
modules of which procedures were executed during an execution
ses or modules.
scenario. These modules are ranked from being important to being
The HITS webmining algorithm (Kleinberg, 1999) identifies so-
irrelevant during early program comprehension phases. An earlier
called hubs and authorities in (web) graphs. Conceptually, hubs are
study on Java software showed that the technique is able to re-
nodes that have a high number of outgoing edges, while authorities
trieve the most important classes within a system with a level of
are nodes that have many incoming edges. In terms of the Internet,
recall of 90% and precision of 50%. More information about this
hubs are pages that mainly have a referring function, e.g., web
technique and its validation can be found in (Zaidman and
directories and lists of personal pages. On the other hand, an
Demeyer, in press).
authority contains useful and/or highly detailed information
Now that we have explained how our analysis technique works,
regarding a specific subject. In terms of program comprehension,
we elaborate on the application of it using Aspicere in the Kava
hubs are modules that contain the core high-level logic of the
environment. We start by detailing the instrumentation phase.
application (conceptually, these are the influential classes we dis-
cussed in Section 4.1), while authorities are implementers of more
4.3. Instrumentation of the application
low-level functions e.g., utility classes (Hamou-Lhadj et al., 2005).
The basic mechanism for the HITS algorithm is as follows: every
The pointcut of Fig. 8 (lines 2–4) advises individual procedure
node in the graph i gets assigned two numbers; ai denotes the
calls whose name does not end in ‘‘printf” or ‘‘scanf” (line 2). At
authority of the page, while hi denotes the hubiness. Let i ! j de-
each affected join point, we want to output the relevant associated
note that there is a calling relationship between modules i and j,
context information to a trace file. This context is accessible in the
and let w½i; j be the number of different methods of module j called
advice body through the thisJoinPoint struct, bound to Jp (lines
from within i, i.e., the weight (or importance) of the calling rela-
10 and 16). More specialised context information can also be ob-
tionship. The recursive relation between authority and hubiness
tained and used through bindings such as ReturnType (line 7).
is captured by
Aspicere’s around-advice then enables us to output call- and/or
X
hi ¼ w½i; j  aj ð1Þ return-sequence information to the trace file (lines 9–10 and 15–
i!j 16), which lets us reconstruct the program call tree on which the
X
aj ¼ w½i; j  hi ð2Þ dynamic analysis operates.
i!j Apart from the advice of Fig. 8, we need one almost identical
version of the advice for void-procedures. Robust pointcuts and
The HITS algorithm starts with initializing all h’s and a’s to 1, generic advice allow to instrument the entire Kava system with
and subsequently repeatedly updates the values for all pages using these two concise advices.
formulas (1) and (2). If after each update the values are normalised,
this process is known to converge to stable sets of authorities and 15
Be aware that the example from Fig. 7 uses integer values for calculating the hub
hub values (Kleinberg, 1999). The h and a values are normalised so and authority scores, while the actual implementation uses floats due to the
P P
that i ðhi Þ2 ¼ 1 and i ðai Þ2 ¼ 1. normalisation of the actual scores. Only the final row of values is normalised.
678 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

Fig. 8. One of the two applied tracing aspects, i.e., the one aimed at non-void procedures.

Table 2
Results of the webmining technique. We have noted their answers and also have asked if there were
any particular reasons why they believed a certain module to be
# Module Hubiness # Module Hubiness
important, hard to debug or to contain bugs. Subsequently, we
1 e_tdfs_mut1.c 0.814941 9 csroutines.c 0 have presented our results to each of the two developers sepa-
2 tdfs_mut1_form.c 0.45397 10 UW_strncpy.c 0
rately. Afterwards, we have discussed the results with both of them
3 tdfs_bord.c 0.397726 11 td.ec 0
4 tdfs_mut2.c 0.164278 12 cache.c 0 and have highlighted the similarities and differences in their
5 tools.c 0.164278 13 decfties.c 0 comments.
6 io.c 0.12548 14 weglf.c 0 During our discussion with the developers, D1 mentioned #1
7 csrout.c 0.0321257 15 get_request.c 0 (that is, source file number 1 as ranked in Table 2) and #4 as being
8 tarpargeg.c 0
the most essential modules for the TDFS application. #6 and #12
are technically also important, but are not specific to the TDFS
application, as they are used by many other applications of the sys-
4.4. Results tem. D1 was surprised at the fact that #12 was not catalogued as
being more important. #7 and #9 are difficult to debug, but only
Performing the webmining analysis gives us the results as listed minor details changed in these modules in the last 10 years. D2
in Table 2. The results are ranked according to the hubiness values, clearly ranks #1 as being the most important and most compli-
found in the third and sixth column, from high to low. Hubiness cated module: it contains most of the business logic. #4 makes a
values lie in the range ½0; 1. Some important facts that can be de- summary of the operations carried out by #1 and checks the results
rived from Table 2 are generated by it. #2 is mainly responsible for interaction with the
end-user, while #3 is concerned with formatting the output. As
 Module e_tdfs_mut1.c stands out with a high hubiness score. such, the opinions of D1 and D2 are indeed very similar, and sup-
 Only 7 of the 15 modules have a value greater than zero. Mod- port our own results. Furthermore, all modules specific to this
ules with a value of zero do not call other modules. application are ranked at the very top.
 The four modules that are specific to the TDFS application (as A last remark on one of the drawbacks of the webmining tech-
can be seen from their names) show up in the four highest nique: container classes or modules are often ranked very low, be-
ranked places. cause of the fact that their export coupling is low (Zaidman et al.,
2005; Zaidman and Demeyer, in press) (i.e., they do not call many
procedures in other modules). This fact partly explains why #12, a
4.5. Validation caching data-structure which was expected to rank higher accord-
ing to D1 , is placed quite low.
We now cover the results that we have obtained from applying To summarise, we can say that we have a good indication that
the key class identification technique to the subject software sys- the key class identification technique also works in a non-object-
tem. For the specific execution scenario that we have investigated, oriented environment. For our case study, the technique ranked
namely TDFS, two developers, D1 and D2 , were responsible at Kava. the most important modules at the very top and as such identified
Both have thorough and active knowledge of the structure and the these top-ranked modules as being ‘‘important”. Furthermore, the
inner workings of this particular application. We present the feed- original developers who cooperated in our study confirmed that
back that these two developers had when presenting them our we have indeed retrieved the most important modules, making
results. this technique very suitable for helping novice developers find
Through our experiment, we have established that the TDFS their way in large-scale software systems.
application consists of 15 modules. We have presented the devel-
opers with a schema consisting of each of these modules, and have 5. Obstacles when applying AOP in legacy E-type systems
asked them the following three questions:
Having discussed the results of the analysis of the experiment,
(1) Which module is the most essential? we now shift our focus to our observations of obstacles encoun-
(2) Which module tends to contain the most bugs? tered when applying AOP to a legacy system. We consider three
(3) Which module is the hardest to debug? obstacles: physical integration of Aspicere1 into the Kava build
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 679

Fig. 9. Original makefile.

Fig. 11. Original esql makefile.

Fig. 10. Adapted makefile.

system, problems with providing developers with the right notion


of modularity and issues caused by K&R C code.

5.1. Integration into the build process

The first step in adding AOP to a legacy software system is to ex-


tend its build system such that aspects get woven into the final
product. As Aspicere1 is a preprocessor-style weaver, this implies Fig. 12. Adapted esql makefile.

changing the compile cycle to


is not feasible either. Real tool support for adapting makefiles is
(1) Preprocess needed, and is something we have been focusing on as a result of
(2) Weave this experiment. MAKAO17 (Adams et al., 2007) is a re(verse)-engi-
(3) Compile neering framework for build systems, which enables visualisation,
(4) Link querying, filtering, verification and re-engineering of the build
dependency graph. The makefile re-engineering support is itself
An actual example of this modification from the experiment is based on AOP and facilitates exploiting context information to devel-
shown in Figs. 9 and 10. op robust makefile re-engineerings. Unfortunately, MAKAO was not
With regard to the experiment, the Kava applications use make around at the time of the case study. A regular expression trans-
(Feldman, 1979) to automate the build process. Historically, all 269 former has been used instead, but the lack of context for the refact-
makefiles have been hand-written by several developers, not al- orings required us to manually verify and fix all changes.
ways using the same coding-conventions. During a recent migra-
tion operation from UnixWare to Linux, a significant number of 5.2. Linking aspects into the system
makefiles have been generated with the help of automake16, but
not all. This means that the structure of the makefiles remains het- Aspicere1 conceptually transforms aspects into C compilation
erogeneous, a common attribute of legacy systems. This heterogene- units. This involves transforming the advice constructs into C pro-
ity makes it hard to fully automate the modification of the build cedures (the so-called ‘‘advice instances”). All advised procedure
system as depicted in Figs. 9 and 10, because the existence of certain calls in the base code are replaced by (indirect) calls to the right ad-
environment variables is not ensured, invocation of compilers can vice instance. After the transformed aspect module is compiled, it
happen in various ways, etc. This becomes apparent when, as in should be linked with every object file in which advised join points
the case of our experiment, embedded sql preprocessing needs to reside. As a bonus, one can share static and global state between all
be done (see Figs. 11 and 12). In line 5 of Fig. 12, e.g., the C_INCLUDE advice instances, e.g., the file pointer to which we send the tracing
environment variable is assumed to point to the right location of data.
header files, whereas the original invocation of esql in line 2 of This linking is, again, problematic due to the complexity and
Fig. 11 has to be transformed into a corresponding invocation of heterogeneity of the build system. As the experiment precludes
the C compiler on the second-before-last line of Fig. 12. Context-spe- any knowledge of the legacy system, and as we had no tool support
cific makefile changes are required. for automating modifications of the build system, the linking as de-
An alternative to these makefile changes would be to re-route scribed above was simply not feasible. More in particular, the map-
$(CC) and $(ESQL) to custom shell scripts (‘‘wrappers”) which exe- ping of source code components on build system units could not be
cute Aspicere in the right way before invocation of the real $(CC) determined. Conceptually, aspect developers reason in terms of
and $(ESQL) compilers. This too is problematic as the heterogeneity one software system, whereas in the build system the software
of the makefiles does not guarantee that in all cases there is a di- has been split across multiple libraries and executables. A static
rect use of these commands, which is a typical problem with wrap- weaver such as Aspicere1 needs to process these build components
pers. Even if the wrappers are applied consistently, some tools, e.g., in such a way that the aspect developer’s notion of modular rea-
the $(ESQL) compiler, internally invoke the original C compiler. As soning is still valid (Kiczales and Mezini, 2005; Griswold et al.,
this one is replaced by a wrapper, Aspicere1 eventually would be 2006). As we did not know which build units were part of the build
executed twice instead of once. In the end, a wrapper approach system, nor how they were mapped on the source code, we could
not optimally exploit Aspicere1’s provisions (transformed aspects)
for safeguarding the source code modularity.
16
Automake is a tool that automatically generates makefiles starting from
configuration files. Each generated makefile complies to the GNU Make standards
17
and coding style. See http://sources.redhat.com/automake/. http://users.ugent.be/badams/makao/.
680 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

To circumvent this, we modified the weaver to insert the advice Even the pitfalls in the concrete realisation with Aspicere’s weaver
instances of a particular aspect into each advised base module at a in the Kava case do not endanger the correctness of the program.
time (as needed). This prevents the linking problems, but this also We discussed this in Section 5.
means that one can no longer share aspect state. This is the reason
why each advice instance manages its own file pointer (see line 6 6.2. Internal validity
of Fig. 8), which results in spurious opening, flushing and closing of
the actual trace file. This is a clear example of a situation where While the developer feedback provides a valid explanation for
limitations in the build system have an impact on the source code. the results obtained, we should be aware that the developers are
In the meantime, we have developed a link-time weaver for subjective in the sense that each developer has most knowledge
Aspicere (‘‘Aspicere2”) (Adams and Schutter, 2007), which operates of his or her own set of modules. As a countermeasure we proposed
on the compiled source code during linking. Applications which to have an open and honest discussion with the developers. Fur-
span across multiple libraries and executables still require knowl- thermore, we minimise any bias by interviewing the two develop-
edge of the build architecture, but because libraries and executa- ers separately and by holding an open discussion with both
bles reside at a slightly higher level than object files, integration afterwards.
of Aspicere2 into a build system is easier. This is especially true To avoid that faults in our webmining tool chain might explain
if a tool such as MAKAO (Adams et al., 2007) is used to discover the results of the case study, we thoroughly tested the tools. We
the build system’s architecture. also point to the fact that we obtained similarly good results for
two other case studies, which were also subjected to a thorough
5.3. Coverage of non-ANSI C language features validation phase (Zaidman et al., 2005; Zaidman and Demeyer, in
press).
As Kava’s system is a mix of ANSI and K&R style C code, Aspi-
cere1 has to take care that it covers deviations between these dia- 6.3. External validity
lects. As an example, procedure declarations with an empty
argument list are allowed in K&R style C, whereas they are not in The problems with integrating our AOP tool into the build sys-
ANSI C. Actual declaration of the arguments is then postponed to tem that we describe all stem from a single case study, namely the
the corresponding procedure definitions. The type inferencing re- Kava system. In this context, we acknowledge that additional
quired to handle this in the weaver is rather complex, and was experiments are necessary to see whether the problems that we
therefore not fully integrated into Aspicere1 by the start of the describe can be generalised to other systems. Nevertheless, we
experiment. As a result, some join points were skipped, introduc- do think that the problems that we have encountered are common
ing some errors in our measurements. To be more precise, we have to many (industrial) systems that have undergone years of evolu-
advised 367 files, of which 125 contained skipped join points. Of tion, a claim which is supported by Adams et al. (2007, 2008)
the 57,015 discovered join points, only 2362 were filtered out, or and Kellens et al. (in press).
a minor 4%. Random screenings of the code found that calls to This is the first application of our webmining approach on a
the same small group of procedures were responsible for the procedural system. There are enough similarities between proce-
skipped join points. These procedures turned out to be of very dural programming and object-oriented (OO) programming to be-
low level, not part of the business logic, and therefore ignoring lieve that the approach works on both types of programming
them did not impact the analysis. languages. In particular, the concept of coupling—the basic underly-
ing metric of our approach—is present in both types of systems. We
5.4. Conclusion acknowledge, however, that more case studies are needed in order
to come to a more firm conclusion on the applicability of our ap-
To summarise, the application of aspects in the source code for proach on systems written in procedural programming languages.
reverse engineering and re-engineering of a legacy system entails
more than just adding source code. The new tools, i.e. the aspect
weaver, need to be integrated into the existing development envi- 7. Related work
ronment. Even without the demand for fast incremental weaving
or IDE support, this proves difficult. The major cause of these prob- Large-scale industrial experience reports on the use of dynamic
lems is the gap between the notion of modularity in the source analysis are scarce. Therefore, this related work section not only
code, and the one supported by legacy build systems. Bridging this discusses a number of relevant dynamic analysis techniques, but
gap is hampered by the lack of understanding of the build system. also touches upon a number of industrial reverse engineering
Tool support is needed to deal with this (Adams, 2008; Adams experience reports using static analysis.
et al., 2007).
7.1. Static analysis based industrial experiences

6. Threats to validity Moise and Wong describe their experiences with extracting
knowledge from C/C++ systems in (Moise and Wong, 2003). They
This section discusses threats to the validity of our approach use Rigi as a fact extractor for the C/C++ source code and focus
and results. We consider construct, internal and external validity. on providing different decomposition approaches of the system,
trying to satisfy different understanding needs. Because of the
6.1. Construct validity large-scale industrial case study they are working on, they are very
much concerned with the scalability of their tool chain. The reverse
In order to apply the webmining approach, we need a trace of a engineering efforts that were performed at the industrial partner
concrete scenario of the TDFS application. The aspect that we used made it possible to speed up a number of re-engineering tasks,
for tracing TDFS (Fig. 8) is a simple tracing aspect. We have thor- e.g., the decomposition of a library could be done in a single day,
oughly tested and compared the output of a vanilla version and while previously this task took 3 days.
an instrumented version of TDFS to be sure that the aspect that In (Wong et al., 1995), Wong et al. describe their experiences
we introduce does not change the outcome of the computations. with re-documenting industrial legacy applications with the help
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 681

of their Rigi static reverse engineering environment. They have ap- traces, again to reduce the trace size and focus on the high-level
plied Rigi on COBOL, C and PL/AS18 systems. The PL/AS experiment behavior when trying to understand a piece of software (Hamou-
described in (Wong et al., 1995) exhibits a close resemblance with Lhadj and Lethbridge, 2006). Also the work of Zaidman and
our own experiments, as the goals and setting were very similar: a Demeyer proposes to abstract traces using a heuristic based on
large-scale industrial legacy application with 2 MLOC and 1300 com- the relative frequency of execution of events in an execution trace
pilation units (here not in C, but in a proprietary language). Because (Zaidman and Demeyer, 2004). Cornelissen et al. have experi-
of the large scale of the application, they have also focused on deliv- mented with filtering traces in order to make the resulting se-
ering scalable reverse engineering techniques. One of the most sig- quence diagrams more legible (Cornelissen et al., 2007). Walker
nificant lessons they have learned from their experiments is that et al. (1998) present a dynamic analysis based technique that is
in the large design documents describing the architecture of the soft- closely related to Murphy and Notkin’s Software Reflexion Models
ware system’s current state can be very beneficial for building up (Murphy and Notkin, 1997). Walker et al. start from the high-level
understanding of a software system and maintaining it. As such their notions that a software engineer has of the system and add to that
goal is very similar to ours. knowledge by presenting more detailed information of the sys-
Another study by Moise and Wong highlights the fact that re- tem’s behavior through dynamic analysis.
verse engineering Java, C, C++, C#, Cobol systems, etc. is sometimes Industrial experiences. Literature reporting on industrial experi-
not enough (Moise and Wong, 2006). Often, (implicit) knowledge is ences with reverse engineering through dynamic analysis is scarce
present in programs that are written in a variety of ‘‘scripting lan- and most newly proposed techniques are only applied to academic
guages”, such as Perl. Sometimes these Perl programs are stand- examples or medium-scale open source software projects. A nota-
alone, but often, these programs written in scripting languages ble exception is the recent work by Callo Arias et al., who report on
are gluing together large-scale industrial systems, written in a vari- the application of dynamic analysis in a large and complex indus-
ety of programming languages, making it worthwhile to also re- trial environment (Callo Arias et al., in press). They propose to use a
verse engineer the knowledge contained in these scripting top-down approach that concentrates on execution scenarios,
language-systems. This acknowledges our problems with Kava’s components and processes rather than on code artifacts such as
build scripts. modules, classes and objects when decomposing and understand-
Software architecture recovery has been subject to a lot of re- ing the system. They have applied their approach on the analysis
search before, e.g., of the Linux kernel Bowman et al. (1999) and of the software of an MRI scanner, which enabled them to identify
Mozilla Godfrey and Lee (2000). Most of them leverage static anal- the dependencies in the execution of the software subsystems.
ysis to accomplish their task on a repository of facts extracted from For a more detailed overview of work in the field of dynamic
source code, linked object files, etc. Combining various approaches analysis, we refer to Zaidman (2006) and Greevy (2007).
has led, e.g., to the Portable BookShelf (Finnigan et al., 2002). We
believe that the results of our dynamic analyses are complimentary 7.3. Maintaining build systems
to these efforts and enhance them to get a more complete, dynamic
view on a (legacy) software system. There are a limited number of publications on the reverse- and
Other related works are that of Riva (Riva, 2000), who provides re-engineering of build systems. We consider research efforts in
an industrial experience report of reverse architecting software for the areas of software reverse- and re-engineering, and specific
mobile phones, and that of Linos et al. (2003), who concentrate build development tools.
on understanding multi-language program dependencies. In the reverse engineering community, Tu and Godfrey (2001)
add a ‘‘build time architectural view” (BTV) to Krüchten’s ‘‘4 + 1”
7.2. Dynamic analysis View model. A BTV visualises the extracted high-level architecture
of a build system. The BTV Toolkit uses the grok tool (Müller et al.,
There has recently been a renewed research interest in the area 1993) to abstract up from low-level facts generated by an instru-
of reverse engineering with the help of dynamic analysis. We now mented version of ‘‘make”. The current prototype only extracts
provide some highlights of recent relevant work in this area. build-time facts, although conceptually build time views also take
Feature localisation. Feature localisation is concerned with iden- source code into account. De Jonge (2003) has tried to remodula-
tifying those parts of source code that are responsible for executing rise a build system to synchronise its structure with that of the
a feature, whereby a feature is defined as a unit of human-observa- source code. This is needed to obtain effective reuse and recompo-
ble computer-action. In this context, Greevy et al. use two comple- sition of source code, as each software component should have its
mentary perspectives to correlate features to source code and vice own piece of build system. Re-engineering a build system for
versa (Greevy and Ducasse, 2005). In (Kuhn and Greevy, 2006), speeding up has been studied by Fard et al. (2005) and Ammons
Kuhn and Greevy exploit the analogy between traces and signal (2006). Fard et al. speed up a build by restructuring the included
processing to visualise traces and identify (common) features in dependencies of the C/C++ source code according to a ‘‘reflexion
a set of traces. Eisenbarth et al. use formal concept analysis to cor- model”. Ammons tries to ensure that incremental compilation is
relate features with source code (Eisenbarth et al., 2003). In partic- correct by semi-automatically partitioning a build into parallel
ular, they apply their technique on an industrial case study of mini-builds which are constrained to a sandbox. Each mini-build
500 kSLOC (SLOC, non-commented lines of code) written in C can only access its declared build dependencies. Neither the ap-
(Eisenbarth et al., 2002). However, they do not report on the tech- proach of Fard et al. nor the one of Ammons can be generalised
nicalities of the tracing process. to other build maintenance problems. Kellens et al. (in press) have
Abstraction techniques. Hamou-Lhadj et al. have presented a recently encountered build system problems when integrating the
technique that detects (and eliminates) low-level (function) calls AspectJ weaver into an industrial build system based on Ant. Even
from traces (Hamou-Lhadj et al., 2005); their technique allows to though Ant is a much more recent build system technology than
focus on the more high-level behavior of the application under ‘‘make” (Feldman, 1979), it was not able to cope with the whole-
study. In another work, Hamou-Lhadj and Lethbridge propose to program view AOP requires. Finally, there are build tools which
apply automatic text summarisation techniques to execution make it easier to understand and re-engineer a build system. Re-
make is an improved GNU Make with tracing capabilities and a
debugger. One can set breakpoints, step through the build and
18
Programming Language/Advanced Systems (IBM). evaluate expressions. Another debugger called ‘‘gmd” is imple-
682 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

mented completely using ‘‘make” macros. Tools such as Antelope, Adams, Bram, De Schutter, Kris, 2007. An aspect for idiom-based exception handling
(using local continuation join points, join point properties, annotations and type
AntExplorer and Openmake Build Monitor allow live visualisation
parameters). In: Proceedings of the Software-Engineering Properties of
of build runs. Makeppgraph creates a build dependency graph in Languages and Aspect Technologies Workshop (SPLAT), co-located with AOSD,
which colors are determined by file extensions. Vizant is a similar 2007.
tool for Ant files. In a legacy context, however, it is very hard to mi- Adams, Bram, De Schutter, Kris, Tromp, Herman, De Meuter, Wolfgang, 2007.
Design recovery and maintenance of build systems. In: Proceedings of the
grate from the existing (legacy) build system technology to a more International Conference on Software Maintenance (ICSM), Washington, DC,
advanced one. USA. IEEE Computer Society, pp. 114–123.
Ammons, Glenn, 2006. Grexmk speeding up scripted builds. In: Proceedings of the
International Workshop on Dynamic Systems Analysis WODA. ACM Press, New
8. Conclusion York, USA, pp. 81–87.
Bennett, Keith H., 1995. Legacy systems: coping with success. IEEE Software 12 (1),
19–23.
This paper reports on the use of aspect-oriented programming
Biermann, Alan W., Feldman, Jerome A., 1972. On the synthesis of finite-state
(AOP) to aid in the reverse engineering effort which precedes re- machines from samples of their behavior. IEEE Transactions on Computers 21
engineering activities. As part of the ARRIBA project, our focus is (6), 592–597.
Bowman, Ivan T., Holt, Richard C., Brewster, Neil V., 1999. Linux as a case study: its
on legacy industrial systems written in C. We have therefore first
extracted software architecture. In: Proceedings of the International Conference
undertaken a survey of the existing AOP languages and tools for on Software Engineering (ICSE), Los Alamitos, CA, USA. IEEE Computer Society,
C. Ultimately none of the surveyed tools fit our re-engineering pp. 555–563.
requirements, leading us to design and implement our own AOP Brichau, Johan, Mens, Kim, Volder, Kris De, 2002. Building composable aspect-
specific languages with logic metaprogramming. In: Proceedings of the ACM
solution for C, named Aspicere. Aspicere targets legacy environ- SIGPLAN/SIGSOFT Conference on Generative Programming and Component
ments, has an expressive pointcut language based on logic meta- Engineering (GPCE), London, UK. Springer-Verlag, pp. 110–127.
programming (LMP) and features generic advice with access to join Brodie, Michael, Stonebraker, Michael, 1995. Migrating Legacy Systems: Gateways,
Interfaces & The Incremental Approach. Morgan Kaufmann, San Francisco, CA,
point metadata. USA.
We reported on the application of dynamic analysis with the Bruntink, Magiel, van Deursen, Arie, Tourwé, Tom, van Engelen, Remco, 2004. An
help of our aspect tool for reverse engineering a legacy E-type sys- evaluation of clone detection techniques for identifying crosscutting concerns.
In: Proceedings of the International Conference on Software Maintenance
tem. Using dynamic analysis allowed us to follow a goal-oriented (ICSM), Washington, DC, USA. IEEE Computer Society, pp. 200–209.
strategy, i.e., it allowed us to analyze only specific parts of the sys- Bruntink, Magiel, van Deursen, Arie, Tourwé, Tom, 2004. An initial experiment in
tem according to a scenario, which is a benefit when working with reverse engineering aspects. In: Proceedings of the Working Conference on
Reverse Engineering (WCRE), Washington, DC, USA. IEEE Computer Society, pp.
large-scale applications. The analysis technique of choice, the key
306–307.
module detection technique, enabled us to identify those modules Bruntink, Magiel, van Deursen, Arie, Tourwé, Tom, 2005. Isolating idiomatic
that should be investigated first when trying to understand the crosscutting concerns. In: Proceedings of the International Conference on
Software Maintenance (ICSM), Washington, DC, USA. IEEE Computer Society,
application. We validated that our solution does indeed identify
pp. 37–46.
those need-to-be-understood modules, with the input of the origi- Bruntink, Magiel, van Deursen, Arie, Tourwé, Tom, 2006. Discovering faults in
nal developers. AOP was used to collect the necessary trace infor- idiom-based exception handling. In: Proceeding of the International Conference
mation in a modular and non-invasive way. on Software engineering (ICSE), New York, NY, USA. ACM Press, pp. 242–251.
Bruntink, Magiel, van Deursen, Arie, D’Hondt, Maja, Tourwé, Tom, 2007. Simple
While the AOP solution proved to work very well and is concep- crosscutting concerns are not so simple: analysing variability in large-scale
tually very clean, it comes with a major quid pro quo, namely that idioms-based implementations. In: Proceedings of the International Conference
the integration of an aspect solution in a legacy build environment on Aspect-Oriented Software Development (AOSD), New York, NY, USA. ACM
Press, pp. 199–211.
can prove troublesome. Firstly, the reason for this is the fact that Callo Arias, Trosky B., Avgeriou, Paris, America, Pierre, in press. Analyzing the actual
the modular reasoning of traditional build systems conflicts with execution of a large software-intensive system for dependencies. In:
the non-hierarchical nature of aspect orientation. Secondly, the Proceedings of the Working Conference on Reverse Engineering (WCRE),
Washington, DC, USA. IEEE Computer Society, pp. 49–58.
build system proved to be a legacy itself, which made it difficult Cantrill, Bryan, Shapiro, Michael W., Leventhal, Adam H., 2004. Dynamic
to automate any adaptation of it. This indicates that the build sys- instrumentation of production systems. In: USENIX Annual Technical
tem itself is also in need of a re-engineering operation. Conference, Berkeley, CA, USA. USENIX Association, pp. 15–28.
Carver, Doris L., Montes de Oca, Carlos, 1998. Identification of data cohesive
Overall, we can say that the dynamic analysis proved useful for
subsystems using data mining techniques. In: Proceedings of the International
detecting the most important modules to be used during early pro- Conference on Software Maintenance (ICSM), Washington, DC, USA. IEEE
gram comprehension. AOP enabled us to obtain the necessary trace Computer Society, pp. 16–23.
Chikofsky, Elliot J., Cross II, James H., 1990. Reverse engineering and design
information in a clean way. Yet, when integrating AOP into the
recovery: a taxonomy. IEEE Software 7 (1), 13–17.
build system of the legacy system, we found that the build system Coady, Yvonne, Kiczales, Gregor, 2003. Back to the future: a retroactive study of
itself could benefit from a re-engineering step. We expect this sit- aspect evolution in operating system code. In: Proceedings of the International
uation to be common in legacy environments. Conference on Aspect-Oriented Software Development Conference (AOSD),
New York, NY, USA. ACM Press, pp. 50–59.
Coady, Yvonne, Kiczales, Gregor, Feeley, Mike, Smolyn, Greg, 2001. Using AspectC to
Acknowledgements improve the modularity of path-specific customization in operating system
code. SIGSOFT Software Engineering Notes 26 (5), 88–98.
Colyer, Adrian, Clement, Andrew, 2004. Large-scale AOSD for middleware. In:
We would like to thank Kava for their cooperation and very Proceedings of the International Conference on Aspect-Oriented Software
generous support. Kris De Schutter and Andy Zaidman received Development (AOSD), New York, NY, USA. ACM Press, pp. 56–65.
Corbi, Thomas A., 1990. Program understanding: challenge for the 90s. IBM Systems
support within the Belgian research project ARRIBA, sponsored Journal 28 (2), 294–306.
by the IWT, Flanders. Kris De Schutter also received support from Cornelissen, Bas, van Deursen, Arie, Moonen, Leon, Zaidman, Andy, 2007.
the Belgian research project AspectLab, sponsored by the IWT, Visualizing testsuites to aid in software understanding. In: Proceedings of the
Conference on Software Maintenance and Reengineering (CSMR), Washington,
Flanders. Bram Adams is supported by a BOF grant from Ghent Uni- DC, USA. IEEE Computer Society, pp. 213–222.
versity. Further support came from the Dutch NWO Jacquard de Jonge Merijn. To reuse or to be reused: techniques for component composition
Reconstructor and the MoVES project (Interuniversity Attraction and construction. PhD Thesis, University of Amsterdam, 2003.
Demeyer, Serge, Ducasse, Stéphane, Nierstrasz, Oscar, 2003. Object-oriented
Poles Programme Belgian State, Belgian Science Policy).
Reengineering Patterns. Morgan Kaufmann, San Francisco, CA, USA.
De Roover, Coen, Michiels, Isabel, Gybels, Kim, Gybels, Kris, D’Hondt, Theo, 2006. An
References approach to high-level behavioral program documentation allowing
lightweight verification. In: Proceedings of the International Conference on
Adams, Bram, 2008. The impact of co-evolution between source code and build Program Comprehension (ICPC), Washington, DC, USA. IEEE Computer Society,
system on crosscutting concerns. Ph.D. Thesis, Ghent University, Belgium. pp. 202–211.
B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684 683

De Schutter, 2006. Aspect oriented Revitalisation of legacy software through logic Kleinberg, Jon M., 1999. Authoritative sources in a hyperlinked environment.
meta-programming. PhD Thesis, Ghent University, Ghent, Belgium, 2006. Journal of the ACM Press 46 (5), 604–632.
De Volder, Kris, D’Hondt, Theo, 1999. Aspect-Orientated logic meta programming. Kuhn, Adrian, Greevy, Orla, 2006. Exploiting the analogy between traces and signal
In: Proceedings of the Second International Conference on Meta-Level processing. In: Proceedings of the International Conference on Software
Architectures and Reflection (Reflection), London, UK. Springer-Verlag, pp. Maintenance (ICSM), Washington, DC, USA. IEEE Computer Society, pp. 320–
250–272. 329.
Douence, Rémi, Fritz, Thomas, Loriant, Nicolas, Menaud, Jean-Marc, Ségura- Lämmel, Ralf, De Schutter, Kris, 2005. What does Aspect-Oriented Programming
Devillechaise, Marc, Südholt, Mario, 2005. An expressive aspect language for mean to Cobol? In: Proceedings of the International Conference on Aspect-
system applications with Arachne. In: Proceedings of the International Oriented Software Development (AOSD), New York, NY, USA. ACM Press, pp.
Conference on Aspect Oriented Software Development(AOSD), New York, NY, 99–100.
USA. ACM Press, pp. 27–38. Lehman, Manny M., 1996. Laws of software evolution revisited. In: European
Ducasse, Stéphane, Girba, Tudor, Wuyts, Roel, 2006. Object-oriented legacy system Workshop on Software Process Technology (EWSPT). Lecture Notes in
trace-based logic testing. In: Proceedings of the Conference on Software Computer Science, vol. 1149. Springer-Verlag, London, UK, pp. 108–124.
Maintenance and Reengineering (CSMR), Washington, DC, USA. IEEE Lehman, Manny M., 1998. Software’s future: managing evolution. IEEE Software 15
Computer Society, pp. 37–46. (1), 40–44.
Durr, Pascal, Gulesir, Gurcan, Bergmans, Lodewijk, Aksit, Mehmet, van Engelen, Lehman, Manny M., Belady, Laszlo A. (Eds.), 1985. Program Evolution: Processes of
Remco, 2006. Applying AOP in an industrial context. In: Workshop on Best Software Change. Academic Press Professional Inc., San Diego, CA, USA.
Practices in Applying Aspect-Oriented Software Development. <http:// Linos, Panagiotis K., hong Chen, Zhi, Berrier, Seth, Linos, Panagiotis K., O’Rourke,
www.aosd.net/workshops/bpaosd/2006/papers/BPAOSD2006Aksit.pdf> (last Brian, 2003. A tool for understanding multi-language program dependencies.
visited: September 18th, 2008). Proceedings of the International Workshop on Program Comprehension (IWPC).
Eisenbarth, Thomas, Koschke, Rainer, Simon, Daniel, 2002. Incremental location of IEEE Computer Society, Washington, DC, USA, pp. 64–72.
combined features for large-scale programs. In: Proceedings of the International Lohmann, Daniel, Blaschke, Georg, Spinczyk, Olaf, 2004. Generic advice: on the
Conference on Software Maintenance (ICSM), Washington, DC, USA. IEEE combination of AOP with generative programming in AspectC++. In:
Computer Society, pp. 273–282. Proceedings of the International Conference on Generative Programming and
Eisenbarth, Thomas, Koschke, Rainer, Simon, Daniel, 2003. Locating features in Component Engineering (GPCE). Lecture Notes in Computer Science, vol. 3286.
source code. IEEE Transactions on Software Engineering 29 (3), 210– Springer-Verlag, pp. 55–74.
224. Loriant, Nicolas, Ségura-Devillechaise, Marc, Fritz, Thomas, Menaud, Jean-Marc,
Engel, Michael, Freisleben, Bernd, 2005. Supporting autonomic computing 2006. A reflexive extension to Arachne’s aspect language. In: Proceedings of
functionality via dynamic operating system kernel aspects. In: Proceedings of 2006 AOSD workshop on Open and Dynamic Aspect Languages (ODAL’O6), Bonn,
the International Conference on Aspect Oriented Software Development Germany. <http://www.emn.fr/x-info/nloriant/papers/loriant.odal06. pdf> (last
Conference (AOSD), New York, NY, USA. ACM Press, pp. 51–62. visited: September 18th, 2008).
Fard, Homayoun D., Yu, Yijun, Mylopoulos, John, Andritsos, Periklis, 2005. Nordberg III, Martin E., 2001. Aspect-oriented dependency inversion. In:
Improving the build architecture of legacy C/C++ software systems. In: Proceedings of the Workshop on Advanced Separation of Concerns in Object-
Fundamental Approaches in Software Engineering (FASE). Lecture Notes in Oriented Systems, co-located with OOPSLA. <http://www.cs.ubc.ca/kdvolder/
Computer Science, 3442. Springer, Berlin/Heidelberg, Germany, pp. 96–110. Workshops/OOPSLA2001/submissions/12-nordberg.pdf> (last visited:
Feldman, Stuart I., 1979. Make – a program for maintaining computer programs. September 18th, 2008).
Software – Practice and Experience 9 (4), 255–265. Mens, Kim, Tourwé, Tom, 2008. Evolution issues in aspect-oriented programming.
Finnigan, Patrick, Holt, Richard C., Kallas, Ivan, Kerr, Scott, Kontogiannis, Kostas, In: Mens, Totm, Demeyer, Serge (Eds.), Software Evolution. Springer-Verlag,
Müller, Hausi A., Mylopoulos, John, Perelgut, Stephen G., Stanley, Martin, Wong, Berlin/Heidelberg, Germany, pp. 203–232 (Chapter 9).
Kenny, 2002. Advances in Software Engineering. The Software Bookshelf. Moise, Daniel L., Wong, Kenny, 2003. An industrial experience in reverse
Springer, Berlin/Heidelberg, Germany. pp. 295–339. engineering. In: Proceedings of the Working Conference on Reverse
Godfrey, Michael W., Lee, Eric H.S., 2000. Secrets from the monster: extracting Engineering (WCRE), Washington, DC, USA. IEEE Computer Society, pp. 275–
mozilla’s software architecture. In: Symposium on Constructing Software 284.
Engineering Tools (CoSET). <http://plg.uwaterloo.ca/migod/papers/2000/ Moise, Daniel L., Wong, Kenny, 2006. Extracting facts from perl code. In:
coset00.pdf> (last visited: September 18th, 2008). Proceedings of the Working Conference on Reverse Engineering (WCRE),
Greevy, Orla, 2007. Enriching reverse engineering with feature analysis. PhD Thesis, Washington, DC, USA. IEEE Computer Society, pp. 243–252.
University of Berne, Switzerland, 2007. Müller, Hausi A., Tilley, Scott R., Wong, Kenny, 1993. Understanding software
Greevy, Orla, Ducasse, Stéphane, 2005. Correlating features and code using a systems using reverse engineering technology perspectives from the rigi
compact two-sided trace analysis approach. In: Proceedings of the Conference project. In: Proceedings of the Conference of the Centre for Advanced Studies
on Software Maintenance and Reengineering (CSMR), Washington, DC, USA. on Collaborative research (CASCON). IBM Press, pp. 217–226.
IEEE Computer Society, pp. 314–323. Murphy, Gail C., Notkin, David, 1997. Reengineering with reflection models: a case
Griswold, William G., Sullivan, Kevin, Song, Yuanyuan, Shonle, Macneil, Tewari, study. IEEE Computer 30 (8), 29–36.
Nishit, Cai, Yuanfang, Rajan, Hridesh, 2006. Modular software design with Nágy, Istvan, van Engelen, Remco, van der Ploeg, Durk, 2007. Ideals: evolvability of
crosscutting interfaces. IEEE Software 23 (1), 51–60. software-intensive high-tech systems. An overview of Mirjam and WeaveC.
Gybels, Kris, Brichau, Johan, 2003. Arranging language features for more robust Embedded Systems Institute, TU/e Campus, Eindhoven, The Netherlands,
pattern-based crosscuts. In: Proceedings of the International Conference on December. <http://www.esi.nl/publications/idealsBook.pdf> (last visited:
Aspect Oriented Software Development (AOSD), New York, NY, USA. ACM Press, September 18th, 2008), pp. 69–86.
pp. 60–69. Ostermann, Klaus, Mezini, Mira, Bockisch, Christophe, 2005. Expressive pointcuts
Hamou-Lhadj, Abdelwahab, Lethbridge, Timothy, 2006. Summarizing the content of for increased modularity. In: European Conference on Object-Oriented
large traces to facilitate the understanding of the behaviour of a software Programming (ECOOP). Lecture Notes in Computer Science, vol. 3586.
system. In: Proceedings of the International Conference on Program Springer-Verlag, Berlin/Heidelberg, Germany, pp. 214–240.
Comprehension (ICPC), Washington, DC, USA. IEEE Computer Society, pp. Popovici, Andrei, Gross, Thomas, Alonso, Gustavo, 2002. Dynamic weaving for
181–190. aspect-oriented programming. In: Proceedings of the International Conference
Hamou-Lhadj, Abdelwahab, Braun, Edna, Amyot, Danien, Lethbridge, Timothy, on Aspect-Oriented Software Development New York, NY, USA. ACM Press, pp.
2005. Recovering behavioral design models from execution traces. In: 141–147.
Proceedings of the Conference on Software Maintenance and Reengineering Richner, Tamar, 2002. Recovering behavioral design views: a query-based approach.
(CSMR), Washington, DC, USA. IEEE Computer Society, pp. 112–121. PhD Thesis, University of Berne, Switzerland, 2002.
Hilsdale, Erik, Hugunin, Jim, 2004. Advice weaving in AspectJ. In: Proceedings of the Riva, Claudio, 2000. Reverse architecting: an industrial experience report. In:
International Conference on Aspect-Oriented Software Development (AOSD), Proceedings of the Working Conference on Reverse Engineering (WCRE),
New York, NY, USA. ACM Press, pp. 26–35. Washington, DC, USA. IEEE Computer Society, pp. 42–50.
Kellens, Andy, De Schutter, Kris, D’Hondt, Theo, Jonckers, Viviane, Doggen, Hans, in Robillard, Martin P., 2005. Automatic generation of suggestions for program
press. Experiences in modularizing business rules into aspects. In: Proceedings investigation. SIGSOFT Software Engineering Notes 30 (5), 11–20.
of the International Conference on Software Maintenance (ICSM). IEEE Rohlik, O., Pasetti, A., Cechticky, V., Birrer, I., 2004. Implementing adaptability in
Computer Society, pp. 448–451. embedded software through aspect oriented programming. In: IEEE
Kernighan, Brian, Ritchie, Dennis, 1978. The C Programming Language. Prentice- Mechatronics & Robotics, Sascha Eysoldt Verla, Aachen, Germany, pp. 85–90.
Hall, Upper Saddle River, NJ, USA. Schutter, Kris De, Adams, Bram, 2007. Aspect-orientation for revitalising legacy
Kiczales, Gregor, Mezini, Mira, 2005. Aspect-Oriented Programming and modular business software. Electronic Notes in Theoretical Computer Science 166 (1),
reasoning. In: International Conference on Software Engineering (ICSE), New 63–80.
York, NY, USA. ACM Press, pp. 49–58. Sneed, Harry, 1996. Encapsulating legacy software for use in client/server systems.
Kiczales, Gregor, Hilsdale, Erik, Hugunin, Jim, Kersten, Mik, Palm, Jeffrey, Griswold, In: Proceedings of the Working Conference on Reverse Engineering (WCRE),
William G., 2001. An overview of AspectJ. Lecture Notes in Computer Science Washington, DC, USA. IEEE Computer Society, pp. 104–119.
2072, 327–355. Sneed, Harry, 2004. Program comprehension for the purpose of testing. In:
Kiczales, Gregor, Lamping, John, Menhdhekar, Anurag, Maeda, Chris, Lopes, Cristina, Proceedings of the International Workshop on Program Comprehension
Loingtier, Jean-Marc, Irwin, John, 1997. Aspect-oriented programming. (IWPC), Washington, DC, USA. IEEE Computer Society, pp. 162–171.
European Conference on Object-Oriented Programming ECOOP, vol. 1241. Sneed, Harry, 2005. An incremental approach to system replacement and
Springer, Berlin/Heidelberg, Germany, pp. 220–242. integration. In: Proceedings of the Conference on Software Maintenance and
684 B. Adams et al. / The Journal of Systems and Software 82 (2009) 668–684

Reengineering (CSMR), Washington, DC, USA. IEEE Computer Society, pp. 196– work, he has developed a logic-based aspect language for C (Aspicere) and a
206. re(verse)-engineering framework for build systems (MAKAO).
Spinczyk, Olaf, Lohmann, Daniel, 2007. The design and implementation of
AspectC++. Knowledge-Based Systems 20 (7), 636–651.
Spinczyk, Olaf, Gal, Andreas, Schröder-Preikschat, Wolfgang, 2002. AspectC++: An Kris De Schutter is an active member of the Programming Technology Lab, Vrije
aspect-oriented extension to the C++ programming language. In: Proceedings of Universiteit Brussel. Previously, he was also part of the ‘‘Ghislain Hoffman Software
the International Conference on Tools Pacific. Australian Computer Society Inc, Engineering Lab” at the University of Ghent, and the ‘‘Lab On REengineering” (LORE)
pp. 53–60. at the University of Antwerp. His Ph.D. work was on ”Aspect oriented revitalisation
Srivastava, Amitabh, Eustace, Alan, 1994. ATOM – a system for building customized of legacy software through logic meta-programming”. Currently, his main research
program analysis tools. In: Proceedings of the ACM SIGPLAN Conference on interests are related to (the evolution of) both legacy and new software systems.
Programming Language Design and Implementation (PLDI), New York, NY, USA.
This includes topics such as Cobol and C-based systems, and possible supporting
ACM Press, pp. 196–205.
technologies such as aspects, services and others.
Systä, Tarja, 2000. Static and dynamic reverse engineering techniques for Java. Ph.D.
Thesis, University of Tampere, Finland, 2000.
Tahvildari, Ladan, 2003. Quality-drive object-oriented re-engineering framework.
Andy Zaidman is an assistant professor at the Delft University of Technology, The
Ph.D. Thesis, University of Waterloo, Ontario, Canada, 2003.
Netherlands. He obtained his M.Sc. (2002) and Ph.D. degree (2006) in Computer
Tarr, Peri, Ossher, Harold, Harrison, William, Sutton Jr., Stanley M., 1999. N degrees
Science from the University of Antwerp, Belgium, after which he became a post-
of separation: multi-dimensional separation of concerns. In: Proceedings of the
International Conference on Software engineering (ICSE), Los Angeles, CA, USA. doctoral researcher at the Delft University of Technology. His main research
IEEE Computer Society, pp. 107–119. interests are reverse engineering with the help of dynamic analysis, program
Tu, Qiang, Godfrey, Michael W., 2001. The build-time software architecture view. comprehension, software repository mining and software testing. He is the founder
In: Proceedings of the International Conference on Software Maintenance and organizer of the International Workshop on Program Comprehension through
(ICSM), Washington, DC, USA. IEEE Computer Society, pp. 398–407. Dynamic Analysis (PCODA) series which started in 2005. He was the general chair of
Walker, Robert J., Murphy, Gail C., Freeman-Benson, Bjorn, Wright, Darin, Swanson, the 15th Working Conference on Reverse Engineering (WCRE 2008) held in
Darin, Isaak, Jeremy, 1998. Visualizing dynamic software system information Antwerp, Belgium, and will be program co-chair for WCRE 2009.
through high-level models. In: Proceedings of the Conference on Object-
oriented Programming Systems Languages and Applications (OOPSLA), New
York, NY, USA. ACM Press, pp. 271–283. Serge Demeyer is a professor at the University of Antwerp (Department of Math-
Wong, Kenny, Tilley, Scott R., Müller, Hausi A., Storey, Margaret-Anne D., 1995. ematics and Computer Science) where he leads a research group investigating the
Structural redocumentation: a case study. IEEE Software 12 (1), 46–54. theme of ‘‘Software Reengineering” (LORE - Lab On REengineering). His main
Zaidman, Andy, 2006. Scalability solutions for program comprehension through research interest concerns software engineering (more precisely, reengineering in
dynamic analysis. PhD Thesis, University of Antwerp, Belgium, 2006. an object-oriented context) but due to historical reasons he maintains a heavy
Zaidman, Andy, Demeyer, Serge, 2004. Managing trace data volume through a interest in hypermedia systems as well. He is an active member of the corre-
heuristical clustering process based on event execution frequency. In:
sponding international research communities, serving in various conference orga-
Proceedings of the Conference on Software Maintenance and Reengineering
nizations and program committees. He has written a book entitled ‘‘Object-
(CSMR), Washington, DC, USA. IEEE Computer Society, pp. 329–338.
Oriented Reengineering” and edited a book on ‘‘Software Evolution”.
Zaidman, Andy, Demeyer, Serge, in press. Automatic identification of key classes in
a software system using webmining techniques. Journal of Software
Maintenance and Evolution: Research and Practice.
Herman Tromp was born in Antwerp, Belgium, in 1949. He received degrees in
Zaidman, Andy, Calders, Toon, Demeyer, Serge, Paredaens, Jan, 2005. Applying
electrical engineering (1972), telecommunications engineering (1973) and a Ph.D.
webmining techniques to execution traces to support the program
comprehension process. In: Proceedings of the Conference on Software in engineering (1978), all from Ghent University, Belgium. Currently, he is a pro-
Maintenance and Reengineering (CSMR), Washington, DC, USA. IEEE fessor and doctoral fellow at the Department of Information Technology, Ghent
Computer Society, pp. 134–142. University, Belgium. His research interests include software technology, revitali-
Zaidman, Andy, Adams, Bram, Schutter, Kris De, Demeyer, Serge, Hoffman, Ghislain, sation of legacy systems, related programming language issues and information
De Ruyck, Bern, 2006. Regaining lost knowledge through dynamic analysis and modeling. Herman Tromp is a member of the IEEE Computer Society and of the
Aspect Orientation – an industrial experience report. In: Proceedings of the ACM.
Conference on Software Maintenance and Reengineering (CSMR), Washington,
DC, USA. IEEE Computer Society, pp. 91–102.
Wolfgang De Meuter is a professor at the Vrije Universiteit Brussel, Belgium. He
has been active in the field of object-orientation since the early nineties. His
Bram Adams obtained a Master in Computer Science Engineering (2004) and a
research interests include programming languages and their evaluators, aspect-
Ph.D. in Computer Science (2008) at the GH-SEL lab, Ghent University, Belgium. He
oriented programming, meta-programming and more recently also language con-
is affiliated with the PROG lab at the Vrije Universiteit Brussel, Belgium, and
structs for ambient-oriented systems. He has organized numerous successful
recently joined the SAIL lab at Queen’s University, Canada. Bram’s research focuses
workshops at previous ECOOP’s and OOPSLA’s. His coordinates can be found at
on the co-evolution of source code and the build system in general and its impli-
http://prog.vub.ac.be/~wdmeuter/WolfHome.
cations on the introduction of AOSD technology in legacy systems. To support this

You might also like