Professional Documents
Culture Documents
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Moshe Y. Vardi
Rice University, Houston, TX, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Dave Thomas (Ed.)
ECOOP 2006 –
Object-Oriented
Programming
13
Volume Editor
Dave Thomas
Bedarra Research Lab.
1 Stafford Road, Suite 421, Ottawa, Ontario, Canada K2H 1B9
E-mail: dave@bedarra.com
CR Subject Classification (1998): D.1, D.2, D.3, F.3, C.2, K.4, J.1
ISSN 0302-9743
ISBN-10 3-540-35726-2 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-35726-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 11785477 06/3142 543210
Preface
The 20th Anniversary of ECOOP was held in Nantes, France, July 3-7, 2006. For
20 years ECOOP has been a leading conference in Europe. Each year ECOOP
brings together researchers, graduate students, and software practitioners to ex-
change ideas, progress, and challenges. The conference covered the full spectrum
of the field with many sessions to accommodate the varied interests of the partici-
pants including outstanding invited speakers, refereed technical papers, tutorials,
workshops, demonstrations, and a poster session.
This year, the Program Committee received 162 submissions, covering the en-
tire spectrum of object-orientation: aspects, modularity and adaptability; archi-
tecture and patterns; components, frameworks and product lines; concurrency,
real-time and embedded; mobility and distribution; collaboration and workflow,
domain specific, dynamic, multi-paradigm and constraint languages; HCI and user
interfaces; language innovations; compilation and virtual machines; methodology,
process and practices; model engineering, design languages and transformations;
persistence and transactions; theoretical foundations; and tools. The PC commit-
tee accepted 20 papers for publication after a careful and thorough reviewing pro-
cess. Papers were evaluated based on significance, originality, and soundness.
Eric Jul, Chair of the Selection Committee, presented the Dahl-Nygaard Prize
to Erich Gamma, Richard Helm, Ralph Johnson, and (posthumously) to John
Vlissides, popularly known as the “Gang of Four.” Their significant practical
work on design patterns changed the vocabulary and best practices in our soft-
ware development.
Erich Gamma, Serge Abiteboul, and Ralph Johnson presented invited talks
which complimented the technical papers. A highlight of this 20th ECOOP was
the special anniversary invited panel that provided their perspectives on our field.
The invited talks and the special panel papers are published in the proceedings.
ECOOP 2006’s success was due to the dedication of many people. First I
would like to thank the authors for submitting a high number of quality papers.
The selection of a subset of papers to be published from these took a lot of
careful reviewing and discussion. Secondly I would like to thank our invited
speakers and panelists for their contributions. They provided a rich context for
discussions and future directions. I would like to thank the members of the
Program Committee for their careful reviews, and for thorough and balanced
discussions during the selection process, which was held February 2–3 in Paris. I
thank the General Chairs of the conference, Pierre Cointe and Jean Bézivin, for
organizing the conference and Antoine Beugnard and Thomas Ledoux, Tutorial
Co-chairs, Charles Consel and Mario Südholt, Workshop Co-chairs, Julien Cohen
and Hervé Grall, Demonstration and Poster Co-chairs. Special thanks to Jean-
François Perrot for organizing the invited 20th anniversary panel.
ECOOP 2006 was organized by the Computer Science Department of the Ecole
des Mines de Nantes, the University of Nantes, the CNRS LINA laboratory
and by INRIA, under the auspices of AITO (Association Internationale pour les
Technologies Objets), and in cooperation with ACM SIGPLAN and SIGSOFT.
Executive Committee
Program Chair
Dave Thomas (Bedarra Research Labs)
Organizing Chairs
Jean Bézivin (University of Nantes-LINA, INRIA)
Pierre Cointe (Ecole des Mines de Nantes-INRIA, LINA)
Organizing Committee
Workshops
Charles Consel (LABRI, INRIA)
Mario Südholt (INRIA-Ecole des Mines de Nantes, LINA)
Tutorials
Antoine Beugnard (ENST Bretagne)
Thomas Ledoux (Ecole des Mines de Nantes-INRIA, LINA)
Webmaster and Registration Chair
Didier Le Botlan (CNRS-Ecole des Mines de Nantes, LINA)
Local Arrangement Chair
Christian Attiogbé (University of Nantes, LINA)
Sponsors and Industrial Relations
Gilles Muller (Ecole des Mines de Nantes-INRIA, LINA)
Treasurer
Catherine de Charette (Ecole des Mines de Nantes)
Publicity Chairs
Olivier Roux (Ecole Centrale de Nantes, IRCCYN)
Nathalie Le Calvez (Ecole des Mines de Nantes)
VIII Organization
Sponsoring Organizations
Silver Sponsors
Bronze Sponsors
Institution Sponsors
Organization IX
Program Committee
Mehmet Akşit (University of Twente, Netherlands)
Jonathan Aldrich (Carnegie Mellon University, USA)
David F. Bacon (IBM Research, USA)
Don Batory (University of Texas at Austin, USA)
Françoise Baude (University of Nice Sophia-Antipolis, France)
Andrew Black (Portland State University, USA)
Gilad Bracha (SUN Java Software, USA)
Luca Cardelli (Microsoft Research, UK)
Craig Chambers (University of Washington, USA)
Shigeru Chiba (Tokyo Institute of Technology, Japan)
William Cook (University of Texas at Austin, USA)
Wolfgang De Meuter (Vrije Universiteit Brussel, Belgium)
Theo D’Hondt (Vrije Universiteit Brussel, Belgium)
Christophe Dony (University of Montpellier, France)
John Duimovich (IBM, Canada)
Erik Ernst (Aarhus University, Denmark)
Michael D. Ernst (Massachusetts Institute of Technology, USA)
Patrick Eugster (Purdue University, USA)
Bjørn Freeman-Benson (Eclipse Foundation, USA)
Bill Harrison (Trinity College, Dublin, Ireland)
Eric Jul (Microsoft Research/Univ. of Copenhagen, Denmark)
Gerti Kappel (Vienna University of Technology, Austria)
Gregor Kiczales (University of British Columbia, Canada)
Karl Lieberherr (Northeastern University, USA)
Boris Magnusson (University of Lund, Sweden)
Jacques Malenfant (University of Paris VI, France)
Erik Meijer (Microsoft, USA)
Mira Mezini (Darmstadt University, Germany)
Birger Mller-Pedersen (University of Oslo, Norway)
Douglas C. Schmidt (Vanderbilt University, USA)
Clemens Szyperski (Microsoft, USA)
Frank Tip (IBM, USA)
Mads Torgersen (Microsoft, USA)
Vasco T. Vasconcelos (University of Lisbon, Portugal)
Cristina Videira Lopes (University of California, Irvine, USA)
Wolfgang Weck (Independent Software Architect, Switzerland)
Roel Wuyts (Université Libre de Bruxelles, Belgium)
Matthias Zenger (Google Switzerland)
Referees
Marwan Abi-Antoun Gabriela Arevalo Stephanie Balzer
João Araújo Sushil Bajracharya Klaas van den Berg
X Organization
Keynote
Design Patterns – 15 Years Later
Erich Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Peak Objects
William R. Cook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
XII Table of Contents
Keynote
Turning the Network into a Database with Active XML
Serge Abiteboul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Languages
SuperGlue: Component Programming with Object-Oriented Signals
Sean McDirmid, Wilson C. Hsieh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
Type Theory
Variance and Generalized Constraints for C Generics
Burak Emir, Andrew Kennedy, Claudio Russo, Dachuan Yu . . . . . . . . . 279
Keynote
The Closing of the Frontier
Ralph E. Johnson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Table of Contents XIII
Tools
Augmenting Automatically Generated Unit-Test Suites with Regression
Oracle Checking
Tao Xie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
Modularity
Modular Software Upgrades for Distributed Systems
Sameer Ajmani, Barbara Liskov, Liuba Shrira . . . . . . . . . . . . . . . . . . . . . 452
Erich Gamma
Abstract. Design patterns are now a 15 year old thought experiment. And,
today, for many, software design patterns have become part of the standard
development lexicon. The reason is simple: rather than constantly redis-
covering solutions to recurring design problems developers can refer to a body
of literature that captures the best practices of system design. This talks looks
back to the origins of design patterns, shows design patterns in action, and
provides an overview of where patterns are today.
1 Introduction
Understanding source code is vital to many tasks in software engineering. Source
code querying tools are designed to help such understanding, by allowing pro-
grammers to explore relations that exist between different parts of the code
base. Modern development environments therefore provide querying facilities,
but these are usually fixed: one cannot define new relationships that are partic-
ular to the project in hand.
It can be very useful, however, to define such project-specific queries, for
instance to enforce coding style rules (e.g. naming conventions), to check correct
usage of an API (e.g. no call to a GUI method from an enterprise bean), or
to ensure framework-specific rules (e.g. in a compiler, every non-abstract AST
class must override the visitChildren method). Apart from such checking tasks,
we might want new ways of navigating beyond the fixed set of relations provided
in a development environment. When cleaning up a piece of legacy software, it is
for example useful to know what methods are never called (directly or indirectly)
from the main method. A good querying tool allows the programmer to define all
these tasks via simple, concise queries. Note that none of these examples is easily
implemented with today’s dominant code querying tool, namely grep. Built-in
querying and navigating facilities of Eclipse, widely used by the IDE users, are
limited to a fixed number of certain queries.
The research community has long recognised the need for flexible code queries,
and many solutions have been proposed. We shall discuss this previous work in
detail in Sect. 6. For now it suffices to say that two crucial ideas have emerged
from that earlier research: a logical query language like Prolog to formulate
queries, and a relational database to store information about the program.
All these earlier attempts, however, fall short on at least one of three counts:
the system is not scalable to industrial-size projects, or the query language is not
sufficiently expressive, or the queries require complex annotations to guarantee
efficiency. Scalability is typically not achieved because no query optimisation is
used, and/or all data is held in main memory. Expressiveness requires recursive
queries, to inspect the graph structures (the type hierarchy and the call graph,
for example) that are typically found in code queries. Yet the use of recursion
in SQL and XQuery is cumbersome, and in Prolog recursion over graphs often
leads to non-termination. In Prolog that problem may be solved via tabling plus
mode annotations, but such annotations require considerable expertise to get
right.
1.1 Contributions
This paper solves all these deficiencies, and it presents a code querying tool
that is scalable, expressive and purely declarative. We achieve this through a
synthesis of the best ideas of the previous work on code querying. To wit, our
contributions are these:
2 Datalog
Datalog is a query language originally put forward in the theory of databases [20].
Syntacticly it is a subset of a logic language Prolog, but has a different evaluation
strategy. It also poses certain stratification restrictions on the use of negation and
recursion. As a result, in contrast to Prolog, Datalog requires no extra-logical
annotations in order to guarantee termination of the queries. At the same time it
has the right level of expressiveness for the type of applications discussed above.
Datalog’s basic premise is that data is arranged according to relations. For
example, the relation hasName records names of program elements. Variables
are used to find unknown elements; in our syntax, variable names start with a
capital letter. So one might use the hasName relation as follows:
hasName(L, ‘List’)
is a query to find all program elements with the name List; the variable L will
be instantiated to all program elements that have that name.
Unary relations are used to single out elements of a particular type. So for
example, one might write
Here the comma stands for logical ‘and’. This query checks that the List interface
contains a method named add . It also illustrates an important issue: a method is
a program element, with various attributes, and the name of the method is just
CodeQuest : Scalable Source Code Queries with Datalog 5
Semantics. Datalog relations that are defined with recursive rules have a least-
fixpoint semantics: they denote the smallest relation that satisfies the given
implication. To illustrate, the above clause for hasSubtypePlus defines it to be
the least relation X that satisfies
X ⊇ hasSubtype ∪ (hasSubtype ◦ X )
where (◦) stands for sequential relational composition (i.e. (a, c) ∈ (R ◦ S )
iff ∃b : (a, b) ∈ R ∧ (b, c) ∈ S ). The existence of such a smallest solution X
is guaranteed in our version of Datalog because we do not allow the use of
negation in a recursive cycle. Formally, that class of Datalog programs is said
to be stratified; interested readers may wish to consult [9] for a comprehensive
survey.
It follows that we can reason about relations in Datalog using the relational
calculus and the Knaster-Tarski fixpoint theorem [29, 11, 18]: all our recursions
correspond to monotonic mappings between relations (f is monotonic if X ⊆ Y
implies f (X ) ⊆ f (Y )). For ease of reference, we quote that theorem here:
Theorem 1 (Knaster-Tarski). Let f be a monotonic function on (tuples of )
relations. Then there exists a relation R such that R = f (R) and
for all relations X we have
f (X ) ⊆ X implies R ⊆ X
The relation R is said to be the least fixpoint of f .
In particular, the theorem implies that we can compute least fixpoints by it-
erating from the empty relation: to find the R in the theorem, we compute
∅, f (∅), f (f (∅)), . . . until nothing changes. Because our relations range over a
finite universe (all program elements), and we insist that all variables in the
left-hand side of a clause are used at least once positively (that is not under a
negation) on the right-hand side, such convergence is guaranteed to occur in a
finite number of steps. Together with the restriction to stratified programs, this
means we handle the so-called safe Datalog programs. CodeQuest does not place
any further restrictions on the use of recursion in Datalog.
In words, this says that instead of first computing R ∗ (via exhaustive iteration)
and then composing with S , we can start the iteration with S . As we shall see,
this saves a lot of work during query evaluation. Due to the strictly declarative
nature of Datalog, we can do the optimisation automatically, while compiling
the use of recursive queries.
To illustrate closure fusion, suppose that we wish to find all types in a project
that are subtypes of the List interface:
A naı̈ve evaluation of this query by fixpoint iteration would compute the full
hasSubtypePlus relation. That is not necessary, however. Applying the second
form of the above theorem with R = hasSubtype ∗ and
Readers who are familiar with the deductive database literature will recognise
this as a special case of the so-called magic sets transformation [12]. In the very
specialised context of CodeQuest , it appears closure fusion on its own is sufficient
to achieve good performance.
3 CodeQuest Implementation
CodeQuest consists of two parts: an implementation of Datalog on top of a re-
lational database management system (RDBMS), and an Eclipse [1] plugin for
querying Java code via that Datalog implementation. We describe these two
components separately.
It is our aim to compare CodeQuest to JQuery, the leading code querying sys-
tem for Java. For that reason, we have striven to make the CodeQuest frontend
as similar as possible to JQuery, to ensure the experiments yield an accurate
comparison. For the same reason, the information we extract from Java source
and store in the database is the same with the information that JQuery collects.
For elements, it consists exhaustively of packages, compilation units, classes,
interfaces, all class members and method parameters. As for relational facts,
we store hasChild, calls, fields reads/writes, extends, implements and returns
relationships.
All these facts are not computed by CodeQuest : they are simply read off the
relevant data structures in Eclipse, after Eclipse has processed a Java compila-
tion unit. In what follows, the process of collecting information, and storing it in
the database is called parsing. It is not to be confused with the translation from
strings into syntax trees that happens in the Eclipse Java compiler. Naturally
parsing is expensive (we shall determine exactly how expensive in Sect. 5), so in
the next section we shall consider how CodeQuest achieves its parsing incremen-
tally, making appropriate changes to the database relations when a compilation
unit is modified.
We are currently working on the implementation of a robust user interface
of our plugin for a neat integration within Eclipse. We also wish to develop a
similar add-in for Visual Studio.
Source code queries are typically performed for software development tasks
within an interactive development environment, where frequent changes of the
source code occur. Hence, the database of code facts needs be kept up-to-date
with the source code developers are working on. Developers cannot afford, how-
ever, a reparsing of their entire project between successive modifications and
10 E. Hajiyev, M. Verbaere, and O. de Moor
5 Experiments
In order to determine the performance characteristics – the usability, efficiency
and scalability properties of the CodeQuest system, we have performed a number
of experiments. We compare CodeQuest with two alternative approaches, namely
JQuery (a mature code querying system by Kris de Volder et al. [34, 24]), and
XSB which is an optimising implementation of Prolog.
The experiments can be divided into four categories:
– General queries: these are generally useful queries, of the kind one might
wish to run on any project. They include both recursive and non-recursive
queries. We shall use them to compare all three systems.
– Project specific queries: some examples of queries that are more spe-
cific and specialised for a particular project. It is our contention that such
queries, relating to style rules and API conventions, are often desirable and
necessitate a flexible code querying system beyond the capabilities of today’s
IDEs.
– Program understanding: program understanding is the most common use
of source code querying system. It typically requires a series of queries to be
run; here we take a series inspired by previous work on querying systems.
– Refactoring: this is the process of restructuring software to improve its
design but maintain the same functionality. Typically it involves a series of
queries to be executed and the appropriate changes applied to the source.
This experiment illustrates that our method of keeping the database up-to-
date (described in Sect. 4) is effective.
The above query is non-recursive. A little more interesting is the second exam-
ple. Here, we wish to determine all methods M that write a field of a particular
hasSubtypeStar (T , T ) :− type(T ).
hasSubtypeStar (T , S ) :− hasSubtypePlus(T , S ).
The definition of overrides does also make use of the recursively defined hasSub-
typePlus:
In words, we first check that M 1 has the same signature and visibility as M 2,
since a protected method (say) cannot override a public one. We also check that
M 2 can actually be overridden (so it’s not static, for example). When these two
conditions are satisfied, we find the containing types of M 1 and M 2, and check
that one is a subtype of the other.
Let us now consider different systems and their properties. Figure 1 presents
the evaluation times of each system for the three queries. For each query, we
show eight different ways of evaluating it [systems are listed in the legend of the
chart in the same top-down order as the corresponding bars apper in left-right
order; in the colour version of this paper, the correspondence is further enhanced
14 E. Hajiyev, M. Verbaere, and O. de Moor
10 .00
RegExp
X SB 1.00
M SSQL CTE
IB M DB 2 CTE
0 .10
M SSQL Pro c1
IB M DB 2 Proc1
M SSQL Pro c2
0 .01
IB M DB 2 Proc2
JQuery
0 .00
query1 q uery2 query3
10 0.0 0
JFreeChart
10.0 0
X SB
M SSQL CTE
1.0 0
IB M DB 2 CTE
M SSQL Pro c1
0.10
IB M DB 2 Proc1
M SSQL Pro c2
IB M DB 2 Proc2 0.0 1
JQuery
0.0 0
q uery1 q uery2 q uery3
10 00 0.0 0
Eclipse
100 0.0 0
X SB
10 0.0 0
M SSQL CTE
IB M DB 2 CTE 10.0 0
M SSQL Pro c1
1.0 0
IB M DB 2 Proc1
M SSQL Pro c2 0.10
IB M DB 2 Proc2
0.0 1
JQuery (missing )
0.0 0
query1 query2 q uery3
via colours]. On the vertical axis, we show the time taken in seconds – note that
this is log-scale.
shall see in a query involving the call graph below). It therefore lacks scalability,
whereas CodeQuest shows much slower growth of time against project size, for
each of the queries. It is also important to mention that programs and queries
for the XSB system were optimised by hand (distinguishing modes, appropri-
ate use of cut, and tabling), so that their evaluation occurs in the best possible
order and excludes all unnecessary computations. Less carefully optimised pro-
grams for XSB require considerably more time to execute as will be shown in
the following subsection.
CTEs vs. Procs. We now turn to the comparison of the two implementations of
recursion that we described in Sect. 3: via a built-in feature of the DBMS, namely
Common Table Expressions, or via stored procedures. There is a remarkable
difference in evaluation times between these two approaches. CodeQuest Proc1
and Proc2 have slightly worse performance than CodeQuest CTEs for all non-
recursive queries as well as for recursive queries over small code bases. The
situation changes significantly, however, with the recursive queries over large
amounts of source code. It seems that it is the creation of intermediate tables
in the stored procedures approach that causes a certain overhead. But the least
fixpoint computation algorithm, implemented using stored procedures, proves to
be more efficient, as we see in computationally expensive queries.
Proc1 vs. Proc2. Proc2 has an optimised algorithm for computing the fixpoint of
recursively defined relations. It is therefore equivalent to Proc1 for non-recursive
queries, and it should be more efficient for recursive ones. The downside of the
Proc2 algorithm is that it extensively creates and drops temporary tables. Thus,
there is no efficiency gain for recursive queries over small size projects. Somewhat
to our surprise, Proc2 also does worse than Proc1 on top of DB2, contrary to the
situation for MSSQL. In more complex queries, for instance those that involve
the call graph (discussed below), Proc2 pays off even on DB2.
MSSQL vs. IBMDB2. It is clear from the graphs that usually the CodeQuest
implementation on top of IBM DB2 is less efficient than on top of MS SQL
Server. We have found that this may be somewhat sensitive to the exact form of
the SQL that is produced by our compiler from Datalog. For instance, in DB2 it
is better to avoid generating not exists clauses in the code. Furthermore, we note
that: 1) we did not resort to the help of a professional database administrator
and it is very likely that the database systems we were using could be tuned to
increase performance significantly; 2) creation and deletion operations in IBM
DB2 are generally more expensive than in MS SQL Server and since they are
extensively used in the Proc2 algorithm, the performance gain through a lesser
number of joins was overwhelmed by the performance loss of a bigger number of
creation/deletion of temporary tables. Nevertheless, both implementations prove
that the concept of building a query system with a RDBMS at its backend is
both efficient and scalable.
Project specific queries. While one can spend a lot of time trying to come up
with the best possible optimisations for a general query, it is not quite possible
16 E. Hajiyev, M. Verbaere, and O. de Moor
when queries are written frequently and are specific to different projects. In this
subsection we want to run exactly such experiments.
Most of the coding style constraints in an object oriented software system are
implicit and cannot be enforced by means of the programming language. There-
fore it is desirable to run queries to ensure that such constraints are satisfied.
abc is an AspectJ compiler based on an extensible compiler framework called
Polyglot [10,35]. One of the coding style constraints in Polyglot is the following:
every concrete AST class (an AST class is one that implements the Node inter-
face), that has a child (a field which is also subtype of Node) must implement a
visitChildren method. In order to check whether that constraint holds, we write
the following query:
The existsVChMethod (C ) looks up all the classes that have methods called
visitChildren. The nodeInterface(N ) respectively finds the interface with the
name Node and concreteClass(C ) all the classes that are not abstract. The final
part of the query is read as follows: find all concrete classes that are subtypes of
type Node and have a child (field) of the same type, but there exists no method
called visitChildren in that class.
The evaluation times of this query are given in Fig. 2(query1). In contrast to
the general queries, we did not perform any complex hand-tuning of the Prolog
queries. An obvious equivalent of the CodeQuest query has been taken.
The next query also applies to abc and the Polyglot framework. We would
like to find all the methods that are not called (transitively) from abc’s main
method. We expect to receive a list of methods that are defined to be called
externally, or perhaps via reflection. Potentially we may encounter dead code
here if a function neither reachable from the main nor from any of the extending
modules.
abc
10 00 100 0
X SB X SB
JQuery 100 JQuery 10 0
M SSQL CTE M SSQL Proc1
IB M DB 2 CTE 10 IB M DB 2 Pro c1 10
M SSQL Proc1 M SSQL Proc2
1 1
IB M DB 2 Pro c1 IB M DB 2 Pro c2
M SSQL Proc2 0.1 M SSQL Proc1 (CF)
0 .1
IB M DB 2 Pro c2 IB M DB 2 Pro c1 (CF)
0 .01 M SSQL Proc2 (CF) 0.0 1
IB M DB 2 Pro c2 (CF)
0.0 01 0 .00 1
query1 q uery2
polyCallPlus(X , Y ) :− polyCall (X , Y ).
polyCallPlus(X , Z ) :− polyCallPlus(X , Y ), polyCall (Y , Z ).
100
JFreeChart
M SSQL CTE 10
IBM DB2 CTE
M SSQL Proc1
IBM DB2 Proc1 1
M SSQL Proc2
IBM DB2 Proc2
0.1
JQuery
0.01
0.001
query1 query2 query3 query4
where PickedType is the type, chosen by the programmer. The result of this
query will find an abstract Plot class. To list all its methods, the user defines
the following query:
In the list the user finds an abstract method draw and he can finally define a
query to spot all calls to this method or any overriding method in an extending
class:
Both JQuery and RDBMSs support some form of caching. As the cache warms
up it requires typically less time to evaluate subsequent queries. This is especially
crucial factor for JQuery since it is known to have strong caching strategies and
run much faster on a warm cache. Figure 3 presents the comparison graph for
the above scenario for JQuery and CodeQuest .
The CodeQuest system again shows better results. In retrospect, this is not
that surprising, since RDBMSs also traditionally possess caching mechanisms to
limit the number of disk I/Os. In addition to that, as described in Sect. 7 further
optimisations can be included in the CodeQuest system itself.
In words, this query looks for a class whose name contains the substring Com-
bined, which furthermore declares a method named getSubplots. Evaluation of
this query yields four elements: CombinedDomainCategoryPlot, CombinedDo-
mainXYPlot, CombinedRangeCategoryPlot and CombinedRangeXYPlot.
We perform the first refactoring, by making the four classes implement a new
interface CombinedPlot that declares a single method getSubplots(). This refac-
toring involves a sequence of operations in Eclipse, in particular the application
of built-in refactorings such as ‘Extract Interface’ and ‘Use Supertype Where
Possible’ as well as some minor hand coding.
The next step is to look for other methods than getSubplots(), common to
the four refactored classes, whose declarations could be pulled up in the new
interface. A query for this task reads as follows:
overridingMethod (M ) :− overrides(M , N ).
declares(C , S ) :− class(C ), declaresMethod (C , M ),
hasSignature(M , S ), not(overridingMethod (M )).
declarations(S ) :− class(C 1), hasName(C 1, ‘CombinedDomainCategoryPlot’),
class(C 2), hasName(C 2, ‘CombinedDomainXYPlot’),
class(C 3), hasName(C 3, ‘CombinedRangeCategoryPlot’),
class(C 4), hasName(C 4, ‘CombinedRangeXYPlot’),
declares(C 1, S ), declares(C 2, S ),
declares(C 3, S ), declares(C 4, S ).
In words, we look for signatures of methods that are defined in all four classes
of interest, which furthermore do not override some method in a supertype. Of
course one might wish to write a more generic query, but as this is a one-off exam-
ple, there is no need. The query yields two method signatures, double getGap()
and void setGap(double), which are related to the logic of the new interface.
Hence, we perform a second refactoring to include these declarations in Com-
binedPlot.
This scenario provides a tiny experiment for measuring the efficiency of our in-
cremental update mechanism and compare it to the one implemented in JQuery.
An important difference between these two update mechanisms is the following.
In CodeQuest , the update runs as a background task just after any incremental
compilation is performed by Eclipse. In JQuery, the update occurs only when
160 10
JFreeChart 140
JQuery 120
8
CodeQuest 100 6
80
60 4
40
2
20
0 0
Parsing Executing Inc. update 1 Executing Inc. update 2
query1 query2
Fig. 4. Query evaluation and incremental update times for the Refactoring example
20 E. Hajiyev, M. Verbaere, and O. de Moor
user explicitly executes the update action. The results are shown in Fig. 4. The
sequence of measurements consists of the initial parsing time (which neither sys-
tem needs to repeat after the first loading of the project), followed by two queries
and updates.
In the given scenario the update times of the two systems are comparable.
However, this refactoring example requires an update of very few facts. JQuery’s
performance considerably deteriorates when performing a larger change since
it involves the deletion and recreation of many tiny files on the hard drive.
For instance, if we apply the Rename rafactoring to the org.jfree.data.general
package, update of 185 files will be required. It takes JQuery longer to update
its factbase (over 5 mins) than to reparse the entire project again, whereas
CodeQuest completes the update within 30 secs.
5.5 Summary
In this section we ran a variety of tests to measure performance of CodeQuest and
to compare it against other similar systems. CodeQuest proved to be at least as ef-
ficient as JQuery in all case studies. Furthermore, simple techniques for storing in-
termediate results in temporary tables instead of recomputing them in every sub-
sequent query could be added to already existent caching mechanisms of RDBMSs
which would further leverage their performance. Of course that increased effi-
ciency comes at the price of using a relational database system — there is much
CodeQuest : Scalable Source Code Queries with Datalog 21
merit in JQuery’s lightweight approach, which does not require any additional
software components.
By today’s standards, considering both parameters of the hardware systems at
hand and the size of software projects that require querying, CodeQuest is defi-
nitely competitive with XSB. The memory based computations of an optimised
Prolog program are fast but not scalable. Non-optimised Prolog queries are clearly
less efficient than the same queries evaluated with CodeQuest .
Today’s industrial databases are able to evaluate recursive queries as described
in the SQL99 standard. However, it appears that built-in recursion is often less
efficient than custom algorithms using stored procedures. Furthermore, in some
cases the built-in facilities do not work at all, in particular when an infinite num-
ber of duplicate entries might be generated in intermediate results. So, the choice
between different implementations of CodeQuest with the tested RDBMS comes
down to Proc1 and Proc2. Formally Proc2 is an optimised variant of Proc1 and
should therefore be more preferable. But in practice it requires creating and drop-
ping temporary tables during each iteration step. If a database system has the cost
of creation and dropping tables higher than a certain limit, then the optimisation
becomes too expensive. In our experiments, Proc2 is more efficient than Proc1 in
most of the queries when used in MS SQL Server and vice-versa when used in IBM
DB2. More generally, code generation strategy (CTE, Proc1 or Proc2) is tightly
coupled with an internal RDBMS SQL optimiser. As a consequence of that, the
choice of the appropriate CodeQuest implementation depends not only on the ex-
act type of queries that a user may want to run, but also on the RDBMS and in
particular on the SQL optimiser being used to run produced SQL code.
6 Related Work
There is a vast body of work on code queries, and it is not possible to cover
all of it in a conference paper. We therefore concentrate on those systems that
have provided an immediate inspiration for the design and implementation of
CodeQuest . First we focus on work from the program maintenance community,
then we discuss related research in the program analysis community, and we
conclude with some recent developments that are quite different to CodeQuest ,
and could be seen as alternative approaches to address the same problems.
Storing the program in a database. In the software maintenance community,
there is a long tradition of systems that store the program in a database. One
of the earliest proposals of this kind was Linton’s Omega system [32]. He stores
58 relations that represent very detailed information about the program in the
INGRES database system. Queries are formulated in the INGRES query lan-
guage QUEL, which is quite similar to SQL. There is no way to express recursive
queries. Linton reports some performance numbers that indicate a poor response
time for even simple queries. He notes, however, that future query optimisers
ought to do a lot better; our experiments confirm that prediction.
The next milestone in this line of work is the C Information Abstraction sys-
tem, with the catchy acronym CIA [14]. CIA deviates from Omega in at least two
22 E. Hajiyev, M. Verbaere, and O. de Moor
important ways. First, based on the poor performance results of Omega, CIA
only stores certain relations in the database, to reduce its size. Second, it aims
for an incremental construction of the database — although the precise mech-
anism for achieving that is not detailed in [14], and there are no performance
experiments to evaluate such incremental database construction. In CodeQuest
we also store only part of the program, but it is our belief that modern data-
base systems can cope with much larger amounts of data. Like CIA, we provide
incremental updating of the database, and the way to achieve that efficiently
was described in Sect. 4. CIA does not measure the effects of the optimiser pro-
vided by a particular database system, and in fact it is claimed the system is
independent of that choice.
Despite their disadvantages, Omega and CIA have had quite an impact on
industrial practice, as numerous companies now use a database system as a code
repository, e.g. [13, 42].
Logic query languages. Both Omega and CIA inherited their query language from
the underlying database system. As we have argued in Sect. 2, recursive queries,
as provided in a logic programming language, are indispensable. Indeed, the XL
C++ Browser [26] was one of the first to realise Prolog provides a nice notation
to express typical queries over source code. The implementation is based directly
on top of a Prolog system, implying that all the facts are held in main memory.
As our experiments show, even with today’s vastly increased memory sizes, using
a state-of-the-art optimising compiler like XSB, this approach does not scale.
A particularly interesting attempt at overcoming the problem of expressive-
ness was put forward by Consens et al. [15]. Taking the search of graphs as its
primary focus, GraphLog presents a query language with just enough power to
express properties of paths in graphs, equivalent to a subset of Datalog, with a
graphical syntax. In [15] a convincing case is made that the GraphLog queries
are easier to write than the Prolog equivalent, and the authors state: “One of
our goals is to have such implementations produced automatically by an opti-
mizing GraphLog to Prolog translator.” Our experiments show that to attain
scalability, a better approach is to map a language like GraphLog to a relational
database system.
Also motivated by the apparent inadequacy of relational query languages as
found in the database community, Paul and Prakash revisited the notion of
relational algebra [36]. Their new relational algebra crucially includes a closure
operator, thus allowing one to express the traversals of the type hierarchy, call
graph and so on that code queries require. The implementation of this algebra is
done on top of the Refine system, again with an in-memory representation of the
relevant relations [8]. Paul and Prakash report that hand-written queries in the
Refine language typically evaluate a factor 2 to 10 faster than their declarative
formulations. CodeQuest takes some of its inspiration from [36], especially in our
use of relational algebra to justify optimisations such as closure fusion. Of course
the connection between Datalog (our concrete syntax) and relational algebra
with a (generalised) closure operator has been very thoroughly explored in the
theoretical database community [7].
CodeQuest : Scalable Source Code Queries with Datalog 23
Very recently Martin et al. proposed another PQL (not to be confused with
Jarzabek’s language discussed above), to find bugs in compiled programs [33,31].
Interestingly, the underlying machinery is that of Datalog, but with a completely
different implementation, using BDDs to represent solution sets [43]. Based on
their results, we believe that a combination of the implementation technology
of CodeQuest (a relational database system) and that of PQL (BDDs) could
be very powerful: source code queries could be implemented via the database,
while queries that require deep semantic analysis might be mapped to the BDD
implementation.
Other code query languages. Aspect-oriented programming represents a sepa-
rate line of work in code queries: here one writes patterns to find all places in
a program that belong to a cross-cutting concern. The most popular aspect-
oriented programming language, AspectJ, has a sophisticated language of such
patterns [28]. In IBM’s Concern Manipulation Environment, that pattern lan-
guage is deployed for interactive code queries, and augmented with further prim-
itives to express more complex relationships [41]. We subscribe to the view that
these pattern languages are very convenient for simple queries, but they lack the
flexibility needed for sophisticated queries of the kind presented in this paper.
It comes as no surprise that the work on identifying cross-cutting concerns
and code querying is converging. For example, several authors are now proposing
that a new generation of AspectJ might use a full-blown logic query language
instead [22,21]. The results of the present paper seem to suggest Datalog strikes
the right balance between expressiveness and efficiency for this application also.
To conclude, we would like to highlight one effort in the convergence of aspects
and code queries, namely Magellan. Magellan employs an XML representation
of the code, and XML queries based on XQuery [19, 37]. This is natural and
attractive, given the hierarchical nature of code; we believe it is particularly
suitable for queries over the syntax tree. XQuery is however rather hard to
optimise, so it would be difficult to directly employ our strategy of relying on
a query optimiser. As the most recent version of Magellan is not yet publicly
available, we were unable to include it in our experimental setup. An interesting
venue for further research might be to exploit the fact that semi-structured
queries can be translated into Datalog, as described by Abiteboul et al. [6]
(Chapter 6).
7 Conclusion
In this paper, we have demonstrated that Datalog, implemented on top of a
modern relational database system, provides just the right balance between ex-
pressive power and scalability required for a source code querying system. In
particular, recursion allows an elegant expression of queries that traverse the
type hierarchy or the call graph. The use of a database system as the backend
yields the desired efficiency, even on a very large code base.
Our experiments also indicate that even better performance is within reach.
A fairly simple, direct implementation of recursion via stored procedures often
CodeQuest : Scalable Source Code Queries with Datalog 25
Acknowledgements
Elnar Hajiyev would like to thank Shell corporation for the generous support that
facilitated his MSc at Oxford during 2004-5, when this research was started. We
would also like to thank Microsoft Research (in particular Dr. Fabien Petitcolas)
for its support, including a PhD studentship for Mathieu Verbaere. Finally, this
research was partially funded through EPSRC grant EP/C546873/1. Members
of the Programming Tools Group at Oxford provided helpful feedback at all
stages of this research. We are grateful to Kris de Volder for many interesting
discussions related to the topic of this paper.
References
1. Eclipse. http://www.eclipse.org.
2. JQuery. http://www.cs.ubc.ca/labs/spl/projects/jquery/.
3. XSB. http://xsb.sourceforge.net/.
4. The TyRuBa metaprogramming system. http://tyruba.sourceforge.net/.
5. CodeQuest . http://progtools.comlab.ox.ac.uk/projects/codequest/.
6. Serge Abiteboul, Peter Buneman, and Dan Suciu. Data on the Web: From Relations
to Semistructured Data and XML. Morgan Kaufmann Publishers, 2000.
7. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases.
Addison-Wesley, 1995.
8. Leonor Abraido-Fandino. An overview of Refine 2.0. In Procs. of the Second Inter-
national Symposium on Knowledge Engineering and Software Engineering, 1987.
9. Krzysztof R. Apt and Roland N. Bol. Logic programming and negation: A survey.
Journal of Logic Programming, 19/20:9–71, 1994.
10. Pavel Avgustinov, Aske Simon Christensen, Laurie Hendren, Sascha Kuzins, Jen-
nifer Lhoták, Ondřej Lhoták, Oege de Moor, Damien Sereni, Ganesh Sittampalam,
and Julian Tibble. abc: An extensible AspectJ compiler. In Aspect-Oriented Soft-
ware Development (AOSD), pages 87–98. ACM Press, 2005.
26 E. Hajiyev, M. Verbaere, and O. de Moor
28. Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and
William G. Griswold. An overview of AspectJ. In J. Lindskov Knudsen, editor, Eu-
ropean Conference on Object-oriented Programming, volume 2072 of Lecture Notes
in Computer Science, pages 327–353. Springer, 2001.
29. Bronislaw Knaster. Un théorème sur les fonctions d’ensembles. Annales de la
Societé Polonaise de Mathematique, 6:133–134, 1928.
30. Kemal Koymen. A datalog interface for SQL (abstract). In CSC ’90: Proceedings
of the 1990 ACM annual conference on Cooperation, page 422, New York, NY,
USA, 1990. ACM Press.
31. Monica S. Lam, John Whaley, V. Benjamin Livshits, Michael C. Martin, Dzin-
tars Avots, Michael Carbin, and Christopher Unkel. Context-sensitive program
analysis as database queries. In PODS ’05: Proceedings of the twenty-fourth ACM
SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages
1–12, New York, NY, USA, 2005. ACM Press.
32. Mark A. Linton. Implementing relational views of programs. In Peter B. Hender-
son, editor, Software Development Environments (SDE), pages 132–140, 1984.
33. Michael Martin, Benjamin Livshits, and Monica S. Lam. Finding application errors
using PQL: a program query language. In Proceedings of the 20th annual ACM
SIGPLAN OOPSLA Conference, pages 365–383, 2005.
34. Edward McCormick and Kris De Volder. JQuery: finding your way through tangled
code. In OOPSLA ’04: Companion to the 19th annual ACM SIGPLAN OOPSLA
conference, pages 9–10, New York, NY, USA, 2004. ACM Press.
35. Nathaniel Nystrom, Michael R. Clarkson, and Andrew C. Myers. Polyglot: An
extensible compiler framework for Java. In 12th International Conference on Com-
piler Construction, volume 2622 of Lecture Notes in Computer Science, pages 138–
152, 2003.
36. Santanu Paul and Atul Prakash. Querying source code using an algebraic query
language. IEEE Transactions on Software Engineering, 22(3):202–217, 1996.
37. Magellan Project. Web page at: http://www.st.informatik.tu-darmstadt.de/
static/pages/projects/Magellan/XIRC.html. 2005.
38. Thomas W. Reps. Demand interprocedural program analysis using logic databases.
In Workshop on Programming with Logic Databases, ILPS, pages 163–196, 1993.
39. Konstantinos Sagonas, Terrance Swift, and David S. Warren. XSB as an efficient
deductive database engine. In SIGMOD ’94: Proceedings of the 1994 ACM SIG-
MOD international conference on Management of data, pages 442–453, New York,
NY, USA, 1994. ACM Press.
40. Eric Sword. Create a root combinedplot interface. JFreeChart feature request:
http://sourceforge.net/tracker/index.php?func=detail&aid=1234995&
group_id=15494&atid=365494, 2005.
41. Peri Tarr, William Harrison, and Harold Ossher. Pervasive query support in the
concern manipulation environment. Technical Report RC23343, IBM Research
Division, Thomas J. Watson Research Center, 2004.
42. Michael Thompson. Bluephoenix: Application modernization technology audit.
Available at: http://www.bitpipe.com/detail/RES/1080665824_99.html., 2004.
43. John Whaley, Dzintars Avots, Michael Carbin, and Monica S. Lam. Using datalog
and binary decision diagrams for program analysis. In Kwangkeun Yi, editor,
Proceedings of the 3rd Asian Symposium on Programming Languages and Systems,
volume 3780, pages 97–118. Springer-Verlag, November 2005.
Efficient Object Querying for Java
1 Introduction
No object stands alone. The value and meaning of objects arise from their re-
lationships with other objects. These relationships can be made explicit in pro-
grams through the use of pointers, collection objects, algebraic data types or
other relationship constructs (e.g. [5,27,28,29]). This variety suggests that pro-
gramming languages provide good support for relationships. We believe this
is not entirely true — many relationships are implicit and, as such, are not
amenable to standard relationship constructs. This problem arises from the great
variety of ways in which relationships can manifest themselves. Consider a col-
lection of student objects. Students can be related by name, or student ID —
that is, distinct objects in our collection can have the same name or ID; or, they
might be related by age bracket or street address. In short, the abundance of
such implicit relationships is endless.
Most programming languages provide little support for such arbitrary and
often dynamic relationships. Consider the problem of querying our collection to
find all students with a given name. The simplest solution is to traverse the
collection on demand to find matches. If the query is executed frequently, we
can improve upon this by employing a hash map from names to students to
get fast lookup. This is really a view of the original collection optimised for our
query. Now, when students are added or removed from the main collection or
have their names changed, the hash map must be updated accordingly.
The two design choices outlined above (traversal versus hash map) present
a conundrum for the programmer: which should he/she choose? The answer, of
course, depends upon the ratio of queries to updates — something the program-
mer is unlikely to know beforehand (indeed, even if it is known, it is likely to
2 Object Querying
Figure 1 shows the almost generic diagram of students attending courses. Ver-
sions of this diagram are found in many publications on relationships [5,6,11,31].
Many students attend many courses; Courses have a course code, a title string
and a teacher; students have IDs and (reluctantly, at least at our university’s
registry) names. A difficulty with this decomposition is representing students
who are also teachers. One solution is to have separate Student and Teacher
objects, which are related by name. The following code can then be used to
identify students who are teachers:
30 D. Willis, D.J. Pearce, and J. Noble
In database terms, this code is performing a join on the name field for the
allFaculty and allStudent collections. The code is cumbersome and can be
replaced with the following object query, which is more succinct and, potentially,
more efficient:
List<Tuple2<Faculty,Student>> matches;
matches = selectAll(Faculty f=allFaculty, Student s=allStudents
: f.name.equals(s.name));
This gives exactly the same set of results as the loop code. The selectAll prim-
itive returns a list of tuples containing all possible instantiations of the domain
variables (i.e. those declared before the colon) where the query expression holds
(i.e. that after the colon). The domain variables determine the set of objects
which the query ranges over: they can be initialised from a collection (as above);
or, left uninitialised to range over the entire extent set (i.e. the set of all instan-
tiated objects) of their type. Queries can define as many domain variables as
necessary and can make use of the usual array of expression constructs found
in Java. One difference from normal Java expressions is that boolean operators,
such as && and ||, do not imply an order of execution for their operands. This
allows flexibility in the order they are evaluated, potentially leading to greater
efficiency.
As well as its simplicity, there are other advantages to using this query in
place of the loop code. The query evaluator can apply well-known optimisations
which the programmer might have missed. By leaving the decision of which
optimisation to apply until runtime, it can make a more informed decision based
upon dynamic properties of the data itself (such as the relative size of input
sets) — something that is, at best, difficult for a programmer to do. A good
example, which applies in this case, is the so-called hash-join (see e.g. [26]). The
idea is to avoid enumerating all of allFaculty × allStudents when there are
few matches. A hash-map is constructed from the largest of the two collections
which maps the value being joined upon (in this case name) back to its objects.
This still requires O(sf ) time in the worst-case, where s = |allStudents| and
f = |allFaculty|, but in practice is likely to be linear in the number of matches
(contrasting with a nested loop which always takes O(sf ) time).
Figure 2 illustrates a hash-join implementation of the original loop. We be-
lieve it is highly unlikely a programmer would regularly apply this optimisation in
practice. This is because it is noticeably more complicated than the nested-loop
implementation and requires considerable insight, on behalf of the programmer,
to understand the benefits. Even if he/she had appreciated the value of using
a hash join, the optimal ordering (i.e. whether to map from names to Faculty
Efficient Object Querying for Java 31
class Course {
String code, title;
Faculty teacher;
HashSet<Student> roster;
void enrol(Student s) { roster.add(s); }
void withdraw(Student s) { roster.remove(s); }
}
Fig. 1. A simple UML diagram, describing students that attend courses and teach-
ers that teach them and a typical implementation. Protection specifiers on fields and
accessor methods are omitted for simplicity.
Fig. 2. Illustrating a hash-join implementation of the simple query (shown at the top)
from Section 2. The code first creates a map from Faculty names to Faculty objects.
Then, it iterates over the allStudents collection and looks up those Faculty members
with the same name.
32 D. Willis, D.J. Pearce, and J. Noble
or from names to Students) depends upon the relative number of students and
faculty (mapping to the smaller is generally the best choice [26]). Of course, a
clever programmer could still obtain optimal performance by manually enumer-
ating all possible orderings and including a runtime check to select the best one.
But, is this really likely to happen in practice? Certainly, for larger queries, it
becomes ridiculously impractical as the number of valid orderings grows expo-
nentially. Using a query in place of a manual implementation allows the query
evaluator to perform such optimisations. And, as we will see in Section 4, there
is a significant difference between a good hand-coded implementation and a poor
one — even for small queries.
class BinTree {
private BinTree left;
private BinTree right;
private Object value;
assert can only partially express this because it is limited to a particular in-
stance of BinTree — there is no way to quantify over all instances. The program-
mer could rectify this by maintaining a collection of the class’s instances in a
static class variable. This requires a separate implementation for each class and,
once in place, is difficult to disable (e.g. for software release). Another option
is to maintain a back-pointer in each BinTree object, which points to its par-
ent. Then, before assigning a tree to be a subtree of another, we check whether
it already has a parent. Again, this approach suffers from being difficult to dis-
able and requiring non-trivial implementation. Indeed, properly maintaining this
parent-child relationship is a tricky task that could easily be implemented in-
correctly — potentially leading to a false impression that the class invariant
holds.
Using an object query offers a cleaner, more succinct and more manageable
solution:
This uses the selectA primitive which returns a matching tuple (if there is
one), or null (otherwise). Using selectA (when applicable) is more efficient
that selectAll because the query evaluator can stop as soon as the first match
is made. Notice that, in the query, a and b can refer to the same BinTree object,
hence the need to guard against this in the first two cases.
Other examples of interesting class invariants which can be enforced using
object queries include the singleton pattern [13]:
The above ensures that flyweight objects (in this case Value objects) are not
duplicated, which is the essence of this pattern.
So far, we have assumed that queries are statically compiled. This means they can
be checked for well-formedness at compile time. However, our query evaluator
maintains at runtime an Abstract Syntax Tree (AST) representation of the query
for the purposes of query optimisation. An AST can be constructed at runtime
by the program and passed directly to the query evaluator. This form of dynamic
query has the advantage of being more flexible, albeit at the cost of runtime type
checking. The syntax of a dynamic query is:
Since static typing information is not available for dynamic queries, we simply
implement the returned tuples as Object arrays. The Query object encapsulates
the domain variables and query expression (a simple AST) making up the query.
The following illustrates a simple dynamic query:
This returns all instances, x, of class c where x.equals(y) holds. This query
cannot be expressed statically, since the class c is unknown at compile time.
Dynamic queries are more flexible: in particular, we can construct queries in
direct response to user input.
3 Implementation
We have prototyped a system, called the Java Query Language (JQL), which
permits queries over object extents and collections in Java. The implementation
consists of three main components: a compiler, a query evaluator and a runtime
system for tracking all active objects in the program. The latter enables the query
evaluator to range over the extent sets of all classes. Our purpose in doing this is
twofold: firstly, to assess the performance impact of such a system; secondly, to
provide a platform for experimenting with the idea of using queries as first-class
language constructs.
The core component of the JQL system is the query evaluator. This is responsible
for applying whatever optimisations it can to evaluate queries efficiently. The
evaluator is called at runtime with a tree representation of the query (called the
query tree). The tree itself is either constructed by the JQL Compiler (for static
queries) or by the user (for dynamic queries).
allStudents allCourses
L L
L.name
L.taughtBy(R)
.equals(R.name)
R R
Hash Join Nested−Loop
Results
allFaculty Temporary
the other comes from the previous stage. This is known as a linear processing
tree [19] and it simplifies the query evaluator, although it can lead to inefficiency
in some cases.
The way in which a join combines tuples depends upon its type and the
operation (e.g. ==, < etc) it represents. JQL currently supports two join types:
nested-loop join and hash join. A nested-loop join is a two-level nested loop which
iterates each of L × R and checks for a match. A hash join builds a temporary
hash table which it uses to check for matches. This provides the best perfor-
mance, but can be used only when the join operator is == or equals(). Future
implementations may take advantage of B-Trees, for scanning sorted ranges of
a collection.
Consider the following simple query for finding all students who teach a course
(presumably employed as an RA):
r = selectAll(Student s=allStudents, Faculty f=allFaculty,
Course c=allCourses : s.name.equals(f.name) && c.taughtBy(f));
Figure 3 illustrates the corresponding query pipeline. Since the first join rep-
resents an equals() operation, it is implemented as a hash-join. The second
join, however, represents an arbitrary method call and must be implemented as
a nested-loop. The tuples which are produced from the first join have the form
Student, F aculty and are stored in a temporary location before being passed
into the second join.
Join Ordering. The ordering of joins in the pipeline can dramatically effect the
time needed to process the query. The cost of a single join is determined by its input
size, (i.e. |L| × |R|) while its selectivity affects the input size (hence, cost) of subse-
quent joins. Selectivity is the ratio of the number of tuples which do not match to
the input size1 . Thus, highly selective joins produce relatively few matches com-
pared with the amount of input. To find a minimal cost ordering, we must search
1
We follow Lencevicius [20] with our meaning of selectivity here. While this contrasts
with the database literature (where it means the opposite), we feel this is more intu-
itive.
36 D. Willis, D.J. Pearce, and J. Noble
aspect MyAspect {
pointcut newObject() : call(* *.new(..)) && !within(MyAspect);
after() : newObject() { System.out.println("new called"); }
}
This creates a pointcut, newObject(), which captures the act of calling new on
any class except MyAspect. This is then associated with advice which executes
whenever a join point captured by newObject() is triggered (i.e. whenever new
is called). Here, after() indicates the advice executes immediately after the join
point triggers. Notice that we must use !within(MyAspect) to protect against
an infinite loop which could arise if MyAspect allocates storage inside the advice,
resulting in the advice triggering itself.
ExtentSet getExtentSet(Class C) {
// Find extent set for C. If there is none, create one.
ExtentSet set;
synchronized(extentSets) {
set = extentSets.get(C);
if(set == null) {
set = new ExtentSet();
extentSets.put(C, set);
Class S = C.getSuperClass();
if(S != null) {
getExtentSet(S).link(set); // Link superclass set
}
for(Class I : C.getInterfaces()) {
getExtentSet(I).link(set); // Link interface set
}
}
}
}
}
is the use of “returning(...)” which gives access to the object reference re-
turned by the new call. We use this aspect to provide a facility similar to the
‘allInstances’ message in Smalltalk, without having to produce a custom JVM.
The TrackingAspect maintains a map from classes to their ExtentSets. An
ExtentSet (code not shown) holds every object of its class using a weak refer-
ence. Weak references do not prevent the garbage collector from reclaiming the
object they refer to and, hence, an ExtentSet does not prevent its objects from
being reclaimed. In addition, an ExtentSet has a redirect list which holds the
ExtentSets of all its class’s subclasses. In the case of an interface, the redirect
list refers to the ExtentSet of all implementing classes and subinterfaces. The
reason for using redirect lists is to avoid storing multiple references for objects
whose class either implements some interface(s) or extends another class. Note
that only ExtentSets which correspond to concrete classes will actually contain
object references, as interfaces and abstract classes cannot be instantiated.
An important consideration is the effect of synchronisation. We must synchro-
nise on the Hashtable and, hence, we are essentially placing a lock around object
creation. In fact, the multi-level nature of the extent set map can help somewhat.
This is because new ExtentSets will only be added to the outer Hashtable in-
frequently. A more advanced data structure should be able to exploit this and
restrict the majority of synchronisation to within individual ExtentSets. This
way, synchronisation only occurs between object creations of the same class. We
have experimented with using ConcurrentHashmap for this purpose, although we
saw no performance improvements. We hope to investigate this further in the
future and expect it likely the two tier structure of the extent sets will obviate
most of the synchronisation penalty.
4 Performance
method, which provides millisecond resolution (on NetBSD). The source code
for the JQL system and the three query benchmarks used below can be obtained
from http://www.mcs.vuw.ac.nz/~djp/JQL/.
The purpose of this study was to investigate the query evaluator’s performance,
compared with equivalent hand-coded implementations. We used three queries of
different sizes as benchmarks for comparison. Two hand-coded implementations
were written for each: HANDOPT and HANDPOOR. The former represents the
best implementation we could write. This always used hash-joins when possible
and employed the optimal join ordering. The HANDPOOR implementation was
the exact opposite, using only nested-loop joins and the worst join ordering pos-
sible — a pessimistic but nonetheless possible outcome for novice or distracted
programmers. Our purpose with these implementations was to determine the
range of performance that could be expected for hand-coded queries. This is
interesting for several reasons: firstly, the programmer is unlikely to apply all
the optimisations (e.g. hash-joins) that are possible; secondly; the programmer
is unlikely to select an optimal join order (indeed, the optimal may vary dy-
namically as the program executes). The question, then, was how close the JQL
evaluator performance was, compared with the HANDOPT implementation.
Name Details
OneStage selectAll(Integer a=L1, Integer b=L2 : a == b);
This benchmark requires two pipeline stages. The best join ordering
has the joins ordered as above (i.e. == being first). The query can
be further optimised by using a hash-join rather than a nested loop
implementation for the == join.
ThreeStage selectAll(Integer a=L1, Integer b=L2, Integer c=L3,
Integer d=L4 : a == b && b != c && c < d);
This benchmark requires three pipeline stages. The best join order-
ing has the joins ordered as above (i.e. == being first). The query is
interesting as it makes sense to evaluate b != c before c < d, even
though the former has lower selectivity. This query can be optimised
using a hash-join as before.
40 D. Willis, D.J. Pearce, and J. Noble
Experimental setup. The three query benchmarks are detailed in Table 1. The
queries range over the lists of Integers L1, . . . , L4 which, for simplicity, were
always kept the same size. Let n be the size of each list. Then, each was generated
by initialising with each integer from {1, . . . , n} and randomly shuffling. Note,
the worst case time needed to evaluate StageOne, StageTwo and StageThree
queries is O(n2 ), O(n3 ) and O(n4 ), respectively.
For each query benchmark, four implementations were tested: the two hand-
coded implementations (HANDOPT, HANDPOOR); and, the JQL query evalu-
ator using the MAX SELECTIVITY and EXHAUSTIVE join ordering strategies.
For all JQL tests, join selectivity was estimated using the fixed heuristic outlined
in Section 3.1, but not the sampling approach. The reason for this is simply that,
for these queries, the two approaches to estimating selectivity produced very sim-
ilar results.
Each experiment consisted of measuring the average time taken for an im-
plementation to evaluate one of the query benchmarks for a given list size. The
average time was measured over 10 runs, with 5 ramp-up runs being performed
beforehand. These parameters were sufficient to generate data with a variation
coefficient (i.e. standard deviation over mean) of ≤ 0.15 — indicating low variance
between runs. Experiments were performed for each query and implementation
at different list sizes (i.e. n) to gain insight into how performance varied with n.
Discussion. The results of the experiments are shown in Figure 5. The main
observation is that there is a significant difference between the HANDOPT and
HANDPOOR implementations, even for the small OneStage query. Further-
more, while the performance of JQL is always marginally slower (regardless of
join ordering strategy) than HANDOPT, it is always much better than HAND-
POOR. We argue then, that the guaranteed performance offered by JQL is very
attractive, compared with the range of performance offered by hand-coded im-
plementations — especially as it’s unlikely a programmer will achieve anything
close to HANDOPT in practice.
The ThreeStage benchmark is the most complex of those studied and high-
lights a difference in performance between the MAX SELECTIVITY and EX-
HAUSTIVE join ordering strategies used by JQL. This difference arises because
the MAX SELECTIVITY heuristic does not obtain an optimal join ordering for
this benchmark, while the EXHAUSTIVE strategy does. In general, it seems
that the EXHAUSTIVE strategy is the more attractive. Indeed, for queries that
have relatively few joins, it is. It is also important to remember that it uses
an exponential search algorithm and, hence, for large queries this will certainly
require a prohibitive amount of time. In general, we believe that further work in-
vestigating other possible join ordering heuristics from the database community
would be valuable.
The purpose of this study was to investigate the performance impact of the JQL
object tracking system. This is necessary to permit querying over the object
Efficient Object Querying for Java 41
8
0.1
4 0.05
0 0
8000 8500 9000 9500 10000 10500 11000 11500 12000 8000 8500 9000 9500 10000 10500 11000 11500 12000
List Size (Number of Objects) List Size (Number of Objects)
40 2
30 1.5
20 1
10 0.5
0 0
200 300 400 500 600 700 800 900 1000 200 300 400 500 600 700 800 900 1000
List Size (Number of Objects) List Size (Number of Objects)
6 1.2
5 1
4 0.8
3 0.6
2 0.4
1 0.2
0 0
80 90 100 110 120 130 80 90 100 110 120 130
List Size (Number of Objects) List Size (Number of Objects)
Fig. 5. Experimental results comparing the performance of JQL with different join
ordering strategies against the hand-coded implementations. Data for the OneStage,
TwoStage and ThreeStage benchmarks are shown at the Top, Middle and Bottom re-
spectively. The charts on the right give a close ups of the three fastest implementations
for each benchmark. Different object sizes are plotted to show the general trends.
42 D. Willis, D.J. Pearce, and J. Noble
Table 2. The benchmark suite. Size indicates the amount of bytecode making up the
benchmark, excluding harness code and standard libraries. Time and Heap give the
execution time and maximum heap usage for one run of the benchmark. # Objects
Created gives the total number of objects created by the benchmark during one run.
extent sets. However, each object creation causes, at least, both a hashtable
lookup and an insertion. Keeping an extra reference of every live object also
causes memory overhead. We have used the SPECjvm98 benchmark suite to
test the memory and execution overhead incurred.
Experimental setup. The benchmarks used for these experiments are de-
scribed in Table 2. To measure execution time, we averaged time taken by each
benchmark with and without the tracker enabled. These timings were taken over
10 runs with a 5-run rampup period. This was sufficient to generate data with
a variation coefficient of ≤ 0.1, again indicating very low variance between runs.
For memory overhead measurements, we monitored the resource usage of the
JVM process using top while running the benchmark and recorded the peak
measurement. For these experiments the JVM was run with a 512MB heap.
Discussion. The relative slowdowns for each benchmark’s execution time are
shown in Figure 6. The memory overhead incurred for each benchmark is shown
in Figure 7.
Both memory overhead and execution overhead are directly influenced by the
number of objects created. Execution overhead is linked to how much time is
spent creating objects, rather than operating on them. mpegaudio and compress,
for example, create relatively few objects and spend most of their runtime work-
ing with those objects. Hence, they show relatively minor slowdown. jess and
raytrace, on the other hand, create millions of objects in the space of a few
seconds. This leads to a fairly significant slowdown.
Memory overhead is also influenced by relative size of objects created. A
benchmark like raytrace creates hundreds of thousands of small Point objects,
likely smaller than the WeakReference objects necessary to track them. This
means the relative overhead of the tracker for each object is very large. db,
on the other hand, creates a hundred thousand-odd String instances, which
outweigh the WeakReferences. Compared to these bulky Strings, the tracker’s
overhead is minor.
Efficient Object Querying for Java 43
11
10
8
Evaluation Slowdown
1
compress jess raytrace db javac mpegaudio mtrt jack
(7.15s) (16.81s) (17.50s) (61.75s) (14.79s) (9.72s) (17.17s) (6.02s)
Spec Benchmark
Fig. 6. Slowdown factors for SPECjvm98 benchmark programs executed with object
tracking enabled. The execution time with tracking enabled is shown below each bench-
mark; the untracked execution times are given in Table 2. Slowdown factors are com-
puted as the division of the tracked time over the untracked time.
5
Memory Overhead
1
compress jess raytrace db javac mpegaudio mtrt jack
Spec Benchmark
Fig. 7. Memory overhead factors for SPECjvm98 benchmark programs executed with
object tracking enabled. Memory overhead is computed as the division of the tracked
memory usage over the untracked usage.
performance data for using the object tracker with various ‘real-world’ Java
applications would be valuable, and plan to investigate this in the future.
5 Discussion
In this section, we discuss various issues with the JQL system and explore in-
teresting future directions.
5.2 Incrementalisation
An interesting observation is that, to improve performance, the results of a query
can be cached and reused later. This approach, often called incrementalisation
[7,24], requires that the cached results are updated in line with changes to the
objects and collections they derive from.
Determining whether some change to an object/collection affects a cached
query result it is not without difficulty. We could imagine intercepting all field
reads/writes (perhaps using AspectJ’s field read/write pointcuts) and checking
whether it affects any cached query results. While this might be expensive if
there are a large number of cached queries, it could be lucrative if the cache is
restricted to a small number of frequently called queries. Alternatively, cached
queries could be restricted to those involving only final field values. This way,
we are guaranteed objects involved in a query cannot change, thereby simplifying
the problem.
Efficient Object Querying for Java 45
6 Related Work
tic. The Program Query Language (PQL) is a similar system which allows the
programmer to express queries capturing erroneous behaviour over the program
trace [25]. A key difference from other systems is that static analysis was used
in an effort to answer some queries without needing to run the program. As a
fallback, queries which could not be resolved statically are compiled into the
program’s bytecode and checked at runtime.
Hobatr and Malloy [16,17] present a query-based debugger for C++ that uses
the OpenC++ Meta-Object Protocol [8] and the Object Constraint Language
(OCL) [38]. This system consists of a frontend for compiling OCL queries to
C++, and a backend that uses OpenC++ to generate the instrumentation code
necessary for evaluating the queries. In some sense this is similar to our approach,
except that we use JQL to specify queries and AspectJ to add the necessary
instrumentation for resolving them. Like the others, their focus is primarily
on catching violations of class invariants and pre/post conditions, rather than
as a general purpose language extension. Unfortunately, no details are given
regarding what (if any) query optimisations are employed and/or how program
performance is affected by the added instrumentation code.
More recently, the Language INtegrated Query (LINQ) project has added
querying capabilities to the C language. In many ways, this is similar to JQL
and, while LINQ does not support querying over object extent sets, its queries
can be used to directly access databases. At this stage, little is known about
the query evaluator employed in LINQ and the query optimisations it performs.
We hope that our work will motivate studies of this and it will be interesting
to see how the LINQ query evaluator performs in practice. The Cω language [4]
preceded LINQ and they have much in common as regards queries.
One feature of LINQ is the introduction of lambda expressions to C . Lambda
expressions can be used in place of iterators for manipulating / filtering collec-
tions [2]. In this way, they offer a form of querying where the lambda expression
represents the query expression. However, this is more simplistic than the ap-
proach we have taken as, by permitting queries over multiple collections, we can
exploit a number of important optimisations. Lambda expressions offer no help
in this regard as, to apply such optimisations, we must be able to break apart
and manipulate the query expression to find operators that support efficient
joins and to determine good join orderings.
Another related work is that of Liu et al. [24], who regard all programs as
a series of queries and updates. They use static program analysis to determine
which queries can be incrementalised to permit efficient evaluation. To do this,
they employ a cost model to determine which queries are expensive to compute
and, hence, which should be incrementalised. This incrementalisation can be
thought of as creating a view which represents the query results and automat-
ically updating when the underlying data is changed. This optimisation could
be implemented in JQL (albeit in a dynamic, rather than static setting) and we
wish to explore this in the future.
Cook and Rai [10] describe how building queries out of objects (rather than
using e.g. SQL strings) can ensure typesafety and prevent spoofing attacks. While
Efficient Object Querying for Java 47
these safe query objects have generally been used to generate database queries,
they could also act as a front-end to our query system. Similarly, the object
extents and query optimisations we describe in this paper could be applied in
the context of a safe query object system.
Finally, there are a number of APIs available for Java (e.g. SQLJ, JSQL,
etc.) which provide access to SQL databases. These are quite different from the
approach we have presented in this work, as they do not support querying over
collections and/or object extents. Furthermore, they do not perform any query
optimisations, instead relying on the database back end to do this.
7 Conclusion
In this paper, we have presented a language extension for Java (called JQL)
which permits queries over object collections and extent sets. We have motivated
this as a way of improving both program readability and flexibility and also
in providing a stronger guarantee of performance. The latter arises from the
query evaluator’s ability to perform complex optimisations — many of which
the programmer is unlike to do in practice. Through an experimental study we
have demonstrated there is a large difference in performance between a poor
hand-coded query and a good one. Furthermore, our prototype implementation
performs very well compared with optimised hand-coded implementations of
several queries. We have also demonstrated that the cost of tracking all objects in
the program is practical. We would expect that, with direct support for querying
from the JVM, the performance overhead of this would improve significantly.
The complete source for our prototype implementation is available for down-
load from http://www.mcs.vuw.ac.nz/~djp/JQL/ and we hope that it will mo-
tivate further study of object querying as a first-class language construct.
Acknowledgements
The authors would like to thank Stuart Marshall for some insightful comments on
an earlier draft of this paper. This work is supported by the University Research
Fund of Victoria University of Wellington, and the Royal Society of New Zealand
Marsden Fund.
References
4. G. Bierman, E. Meijer, and W. Schulte. The essence of data access in cω. In Pro-
ceedings of the European Conference on Object-Oriented Programming (ECOOP),
volume 3586 of Lecture Notes in Computer Science, pages 287–311. Springer-
Verlag, 2005.
5. G. Bierman and A. Wren. First-class relationships in an object-oriented lan-
guage. In Proceedings of the European Conference on Object-Oriented Programming
(ECOOP), volume 3586 of Lecture Notes in Computer Science, pages 262–282.
Springer-Verlag, 2005.
6. G. Booch, I. Jacobson, and J. Rumbaugh. The Unified Modeling Language User
Guide. Addison-Wesley, 1998.
7. S. Ceri and J. Widom. Deriving production rules for incremental view maintenance.
In Proceedings of the International Conference on Very Large Data Bases (VLDB),
pages 577–589. Morgan Kaufmann Publishers Inc., 1991.
8. S. Chiba. A metaobject protocol for C++. In Proceedings of the ACM conference on
Object-Oriented Programming, Systems, Languages and Applications (OOPSLA),
pages 285–299. ACM Press, 1995.
9. W. R. Cook and A. H. Ibrahim. Programming languages & databases: What’s the
problem? Technical report, Department of Computer Sciences, The University of
Texas at Austin, 2005.
10. W. R. Cook and S. Rai. Safe query objects: Statically typed objects as remotely
executable queries. In Proceedings of the International Conference on Software
Engineering (ICSE), pages 97–106. IEEE Computer Society Press, 2005.
11. D. F. D’Souza and A. C. Wills. Objects, Components, and Frameworks with UML:
The Catalysis Approach. Addison-Wesley, 1998.
12. M. Eisenstadt. My hairiest bug war stories. Communications of the ACM, 40(4):30–
37, 1997.
13. E. Gamma, R. Helm, R. E. Johnson, and J. Vlissides. Design Patterns: Elements
of Reusable Object-Oriented Software. Addison-Wesley, 1994.
14. S. Goldsmith, R. O’Callahan, and A. Aiken. Relational queries over program traces.
In Proceedings of the ACM Conference on Object-Oriented Programming, Systems,
Languages and Applications (OOPSLA), pages 385–402. ACM Press, 2005.
15. P. J. Haas, J. F. Naughton, and A. N. Swami. On the relative cost of sampling
for join selectivity estimation. In Proceedings of the thirteenth ACM symposium on
Principles of Database Systems (PODS), pages 14–24. ACM Press, 1994.
16. C. Hobatr and B. A. Malloy. The design of an OCL query-based debugger for
C++. In Proceedings of the ACM Symposium on Applied Computing (SAC), pages
658–662. ACM Press, 2001.
17. C. Hobatr and B. A. Malloy. Using OCL-queries for debugging C++. In Proceedings
of the IEEE International Conference on Software Engineering (ICSE), pages 839–
840. IEEE Computer Society Press, 2001.
18. T. Ibaraki and T. Kameda. On the optimal nesting order for computing n-relational
joins. ACM Transactions on Database Systems., 9(3):482–502, 1984.
19. R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries.
In Proceedings of the ACM Conference on Very Large Data Bases (VLDB), pages
128–137. Morgan Kaufmann Publishers Inc., 1986.
20. R. Lencevicius. Query-Based Debugging. PhD thesis, University of California,
Santa Barbara, 1999. TR-1999-27.
21. R. Lencevicius. On-the-fly query-based debugging with examples. In Proceedings
of the Workshop on Automated and Algorithmic Debugging (AADEBUG), 2000.
Efficient Object Querying for Java 49
1 Introduction
2 Background
The object persistence architectures examined in this paper combine elements of
orthogonal persistence [1] with the pragmatic approach of relational data access
libraries, also known as call level interfaces [30].
Orthogonal persistence states that persistence behavior is independent of (or-
thogonal to) all other aspects of a system. In particular, any object can be
persistent, whether an object is persistent does not affect its other behaviors,
and an object is persistent if it is reachable from a persistent root. Orthogo-
nal persistence has been implemented, to a degree, in a variety of programming
languages [25, 2, 19, 22].
A key characteristic of orthogonal persistence is that objects are loaded when
needed. Using a reference in an object is called traversing the reference, or nav-
igating between objects – such that the target object is loaded if necessary. We
use the term navigational query to refer to queries that are generated implicitly
as a result of navigation.
Relational data access libraries are a pragmatic approach to persistence: they
allow execution of arbitrary SQL queries, and the queries can return any com-
bination of data that can be selected, projected, or joined via SQL. Examples
include ODBC [15] and JDBC [12]. The client application determines how the
results of a query are used – each row of the query result may be used as is,
or it may be mapped to objects. Since data is never loaded automatically, the
programmer must specify in a query all data required for an operation – the
concept of prefetching data that might be loaded automatically does not apply.
The object persistence architectures considered in this paper are hybrids of
orthogonal persistence and data access libraries. Examples include EJB [24],
JDO [28], Hibernate [6], and Toplink [10]. They support automatic loading of
objects as needed. But they also include query languages and the ability to man-
ually prefetch related objects. While query languages can significantly increase
performance, they reduce orthogonality because they are special operations that
only apply to persistent data.
For example, in EJB 2.1, a query can return objects or a value:
select object (p) from Person p where p.firstName="John"
The set of objects loaded by a query are called root objects. Use of the root
objects may result in navigation to related objects, each of which will require an
additional query to load.
In client-server architectures, the cost of executing a query, which involves a
round-trip to a database, typically dominates other performance measures. This
Automatic Prefetching by Traversal Profiling 53
3 Automating Prefetch
Type Graph: Let T be the finite set of type names and F be the finite set of
field names. A type graph is a directed graph GT = (T, A).
– T is a set of types.
?
– A is a partial function T × F → T × {single, collection} representing a set
of associations between types. Given types t and t and field f , if A(t, f ) =
(t , m) then there is an association from t to t with name f and cardinality
m, where m indicates whether the association is a single- or multi-valued
association.
Inheritance is not modeled in our type graph because it is orthogonal to
prefetch. Bi-directional associations are supported through two uni-directional
associations. Figure 2 shows a sample type graph. There are three types: Em-
ployee, Department, and Company. Each company has a set of departments and
a CEO, each department has a set of employees, and each employee may have a
supervisor. The formal representation is:
Automatic Prefetching by Traversal Profiling 55
Fig. 2. Simple Type Graph with three types: Employee, Department, and Company.
Solid lines represent single associations, while dashed lines represent collection associ-
ations.
Object Graph: Let O be the finite set of object names. An object graph is a
directed graph GO = (O, E, GT = (T, A), Type). GT is a type graph and Type is
a unary function that maps objects to types. The following constraints must be
satisfied in the object graph GO :
– O represents a set of objects.
– T ype : O → T . The type of each object in the object graph must exist in
the type graph.
?
– E : O × F → powerset (O), the edges in the graph are a partial function from
an object and field to a set of target objects.
– ∀o, f : E(o, f ) = S
• A(Type(o), f ) = (T , m)
• ∀o ∈ S, Type(o ) = T .
• if m = single, then |S| = 1.
Each edge in the object graph corresponds to an edge in the type graph,
single associations have exactly one target object.
An example object graph is shown in Figure 3 which is based on the type
graph in Figure 2. Edges that contain a dark oval represent collection associa-
tions. Null-valued single associations are not represented by edges in the object
graph, however, empty collection associations are represented as edges whose
target is an empty set. We chose this representation because most object persis-
tence architectures represent associations as a reference to a single target object
or collection. A null-valued association is usually represented as a special ref-
erence in the source object. This means that the persistence architecture can
tell if a single-valued association is null without querying the database. On the
56 A. Ibrahim and W.R. Cook
Fig. 3. An example of an object graph based on the type graph in Figure 2. Collection
associations contain an oval in the middle of the edge.
other hand, the persistence architecture will query the database if a collection
association reference is empty, because the collection reference does not have any
information on the cardinality of the collection. The ability to represent traver-
sals to empty collections is important when we discuss traversals in Section 3.1,
because it allows AutoFetch to represent navigational queries that load empty
collections.
Traversals. A traversal captures how the program navigates the object graphs
that the query returns. A program may traverse all the objects and associations
in the result of the query, or it may traverse more or less. Only program navi-
gations that would result in a database load for the query without prefetch are
included in the traversal.
Automatic Prefetching by Traversal Profiling 57
P = T × N × N × (F → P )
such that for all (t, used , potential , (f, p)) ∈ P
1. A(t, f ) is defined
2. used ≤ potential .
Each node in the tree contains statistics on the traversals to this node: the
number of times this node needed to be loaded from the database (used ), and the
number of opportunities the program had to load this node from the database
(potential ), i.e. the number of times the program had a direct reference to an
object representing this node.
Query classification determines a set of queries that share a traversal profile. The
aim of query classification is to group queries which are likely to have similar
Automatic Prefetching by Traversal Profiling 59
Fig. 5. Traversal profile for query class after traversal in Figure 4. Statistics are repre-
sented as (used/potential).
traversals. A simple classification of queries is to group all queries that have the
same query string. There are several reasons why this is not effective.
First, a given query may be used to load data for several different operations.
Since the operations are different, the traversals for these operations may be dif-
ferent as well. This situation typically arises when query execution is centralized
in library functions that are called from many parts of a program. Classifying
based only on the criteria will not distinguish between these different uses of
a query, so that very different traversals may be classified as belonging to the
same class. This may lead to poor prediction of prefetch. The classification in
this case is too coarse.
A second problem is that query criteria are often constructed dynamically. If
each set of criteria is classified as a separate query, then commonality between
operations may not be identified. At the limit, every query may be different,
leading to a failure to gather sufficient data to predict prefetch.
Queries may also be classified by the program state when the query is executed.
This is motivated by the observation that traversals are determined by the control
flow of the program after query execution. Program state includes the current code
line, variable values, library bindings, etc. Classifying queries based on the entire
program state is infeasible as the program state may be very large and will likely
be different for every query. However, a set of salient features of the program state
60 A. Ibrahim and W.R. Cook
The base case is that the f (root ) in the traversal profile is 1. A depth first
traversal can calculate this probability for each node without recomputing any
values. This calculation ensures that traversal profile nodes are prefetched only
if their parent node is prefetched, because f (n) ≤ f (p(n)).
Second, if there is more than one collection path in the remaining tree, an
arbitrary collection path is chosen and other collection paths are removed. Col-
lection paths are paths from the root node to a leaf node in the tree that contain
at least 1 collection association. This is to avoid creating a query which joins
multiple many-valued associations.
The prefetch specification is a set of prefetch directives. Each prefetch direc-
tive corresponds to a unique path in the remaining tree. For example, given the
traversal profile in Figure 5 and the prefetch threshold of 0.5, the prefetch spec-
ification would be: (employees, employees.supervisor, company). The query is
augmented with the calculated prefetch specification. Regardless of the prefetch
specification, profiling the query results remains the same.
4 Implementation
The implementation of AutoFetch is divided into a traversal profile module
and an extension to Hibernate 3.1, an open source Java ORM tool.
4.2 Hibernate
Hibernate was modified to incorporate prefetch specifications and to profile tra-
versals of its query results. The initial AutoFetch implementation used Hiber-
nate 3.0 which did not support multiple collection prefetches. Fortunately, Hi-
bernate 3.1 contains support for multiple collection prefetches and AutoFetch
was migrated to this version. Support for multiple collection prefetches turns out
to be critical for improving performance in some of the evaluation benchmarks.
62 A. Ibrahim and W.R. Cook
Original query
HQL:
from Department d where d.name = ’foo’
SQL:
select ∗ from Department as d where d.name = ’foo’
Query with a single prefetch
HQL:
from Department d
left outer join fetch d.employees where x.name = ’foo’
SQL:
select ∗ from Department as d
left outer join Employee as e on e.deptId = d.id
where d.name = ’foo’
Hibernate obtains the prefetch specification for a query from the traversal
profile module. The code in Figure 6 illustrates how a HQL query is modified to
include prefetches and the SQL generated by Hibernate. Queries which already
contain a prefetch specification are not modified or profiled allowing the program-
mer to manually specify prefetch. The hibernate extensions profile traversals by
instrumenting each persistent object with a dynamically generated proxy. The
proxy intercepts all method calls to the object and if any object state is accessed
that will require a database load, the proxy increments the appropriate node in
the traversal profile for the query class. Hibernate represents single association
references with a key. Therefore, accessing the key is not considered as an object
access because it never requires a database query. Collections are instrumented
by modifying the existing Hibernate collection classes. Although there is a per-
formance penalty for this type of instrumentation, we found that this penalty
was not noticeable in executing queries in our benchmarks. This performance
penalty may be ameliorated through sampling, i.e. only instrumenting a certain
percentage of queries. The AutoFetch prototype does not support all of Hi-
bernate’s features. For example, AutoFetch does not support prefetching or
profiling for data models which contain weak entities or composite identifiers.
Support for these features was omitted for simplicity.
5 Evaluation
We evaluated AutoFetch using the Torpedo and OO7 benchmarks. The Torpedo
benchmark measures on the number of queries that an ORM tool executes in a
simple auction application, while the OO7 benchmark examines the performance
Automatic Prefetching by Traversal Profiling 63
The Torpedo benchmark [23] measures the number of SQL statements executed
by an ORM tool over a set of test cases. The benchmark consists of a Java client
and a J2EE auction server. The client issues requests to the auction server,
such as placing a bid or retrieving information for a particular auction. There
are seven client test cases which were designed to test various aspects of the
mapping tool such as caching or prefetching. The number of SQL statements
executed is used as the measure of the performance of the mapping tool. The
benchmark can be configured to use different object-relational mapping tools
(EJB, JDO, Hibernate) as the persistence backend.
We created two versions of the Hibernate persistence backend, the original
tuned backend included with the benchmark and that same backend minus the
prefetch directives. The latter backend can be configured to have AutoFetch
enabled or disabled. We ran the Torpedo benchmark for each version and possible
Fig. 7. Torpedo benchmark results. The y-axis represents the number of queries exe-
cuted. Maximum extent level is 12.
64 A. Ibrahim and W.R. Cook
options three times in succession. The results of the first and third iterations are
shown in Figure 7. The second run was omitted in the graph since the first and
second iterations produce the same results. A single set of iterations is sufficient,
because the benchmark is deterministic with respect to the number of queries.
As Figure 7 shows, the prefetch directives reduce the number of queries ex-
ecuted. Without either the prefetch directives nor AutoFetch enabled the
benchmark executed three times as many queries. Without prefetch directives
but with AutoFetch enabled, the benchmark executes many queries on the
first and second iterations; however, from the third iteration (and onward) it
executes as many queries as the version with programmer-specified prefetches.
A simple query classification method using the code line where the query
was executed as the query class would not have been sufficient to match the
performance of manually specified prefetches for this benchmark. For example,
the findAuction method is used to load both detailed and summary information
about an auction. The detailed auction information includes traversing several
associations for an auction such as the auction bids. The summary auction infor-
mation only includes fields of the auction object such as the auction id or date.
These different access patterns require different prefetches even though they use
the same backend function to load the auction.
The OO7 benchmark [5] was designed to measure the performance of OODB
management systems. It consists of a series of traversals, queries, and structural
modifications performed on databases of varying sizes and statistical properties.
We implemented a Java version of the OO7 benchmark based on code pub-
licly available from the benchmark’s authors. Following the lead in Han [13], we
omitted all structural modification tests as well as any traversals that included
updates, because updates have no effect on AutoFetch behavior and otherwise
these traversals were not qualitatively different from the included traversals. Q4
was omitted because it requires using the medium or large OO7 databases. Tra-
versal CU was omitted because caching and AutoFetch are orthogonal, and
the traversal’s performance is very sensitive to the exact caching policy.
Only a few of the OO7 operations involve object navigation, which can be
optimized by AutoFetch. Traversal T1 is a complete traversal of the OO7
object graph, both the assembly and part hierarchies. Traversal T6 traverses the
entire assembly hierarchy, but only accesses the composite and root atomic parts
in the part hierarchy. Traversal T1 has a depth of about 29 while Traversal T6
has a depth of about 10. Neither the queries nor traversals T8 or T9 perform
navigation; however, they are included to detect any performance penalties for
traversal profiling.
We added a reverse traversal, RT, which chooses atomic parts and finds their
root assembly, associated module, and associated manual. Such traversals were
omitted from the OO7 benchmark because they were considered not to add any-
thing to the results. They are significant in the context of prefetch, since single-
valued associations can be prefetched more easily than multi-valued associations.
Automatic Prefetching by Traversal Profiling 65
Table 1. Comparison with prefetch disabled and with AutoFetch. Maximum extent
level is 12. Small OO7 benchmark. Metrics for each query/traversal are average number
SQL queries and average time in milliseconds. Percentages are for percent improvement
of AutoFetch over baseline.
Table 1 summarizes the results of the OO7 benchmark. Neither the queries
nor traversals T8 or T9 show any improvement with prefetch enabled. This is to
be expected since they do not perform any navigational queries. These queries
66 A. Ibrahim and W.R. Cook
Table 2. The number of queries executed by AutoFetch with Hibernate 3.0 and
AutoFetch with Hibernate 3.0 for traversals T1, T6, and RT. Only 3rd iteration
shown. Maximum extent level is 12. Small OO7 benchmark.
AutoFetch Version T1 T6 RT
AutoFetch with Hibernate 3.0 2171 415 3
AutoFetch with Hibernate 3.1 38 2 3
are included for completeness, and to show that AutoFetch does not have high
overhead when not needed.
Both traversals T1 and T6 show a large improvement in the number of queries
and time to execute the traversal. T6 shows a larger improvement than T1 even
though T1 is a deeper traversal, because some of the time executing traversal
T1 is spent traversing the object graph in memory; repeatedly traversing the
part hierarchies. The number of queries and the time to execute a traversal
are tightly correlated as expected. Both T1 and T6 are top-down hierarchical
traversals which require multiple collection prefetches to execute few queries.
Table 2 shows a comparison of the number of queries executed by AutoFetch
with Hibernate 3.1 and AutoFetch with Hibernate 3.0 which was unable to
prefetch multiple collection associations. The ability to fetch multiple collection
associations had a greater effect on deep traversals such as T1 and T6 than on
shallow traversals such as RT.
Figure 8 shows that the maximum depth of the traversal profile is important
to the performance of prefetch system in the presence for deep traversals. The
tradeoff for increasing the maximum depth of the traversal profile is an increase in
the memory requirements to store traversal profiles. It should be noted that deep
Fig. 8. Varying maximum extent level from 5 to 15. Only 3rd iteration shown. Small
OO7 database.
Automatic Prefetching by Traversal Profiling 67
interface ResumeDAO {
Resume getResume(Long resumeId);
...
}
class Resume {
List getEducations() { ... }
List getExperiences () { ... }
List getReferences () { ... }
...
}
6 Related Work
Han et al. [13] classify prefetching algorithms into five categories: page-based
prefetching, object-level/page-level access pattern prefetching, manually speci-
fied prefetches, context-based prefetches, and traversal/path-based prefetches.
Page-based prefetching has been explored in object-oriented databases such
as ObjectStore [17]. Page-based prefetching is effective when the access patterns
Automatic Prefetching by Traversal Profiling 69
7 Future Work
A cost model for database query execution is necessary for accurate optimiza-
tion of prefetching. AutoFetch currently uses the simple heuristic that it is al-
ways better to execute one query rather than two (or more) queries if the data
loaded by the second query is likely to be needed in the future. This heuristic re-
lies on the fact that database round-trips are expensive. However, there are other
factors that determine cost of prefetching a set objects: the cost of the modified
query, the expected size of the set of prefetched objects, the connection latency,
etc. A cost model that takes such factors into account will have better performance
and may even outperform manual prefetches since the system would be able to
take into account dynamic information about database and program execution.
8 Conclusion
Object prefetching is an important technique for improving performance of ap-
plications based on object persistence architectures. Current architectures rely
on the programmer to manually specify which objects to prefetch when execut-
ing a query. Correct prefetch specifications are difficult to write and maintain
as a program evolves, especially in modular programs. In this paper we pre-
sented AutoFetch, a novel technique for automatically computing prefetch
specifications. AutoFetch predicts which objects should be prefetched for a
given query based on previous query executions. AutoFetch classifies queries
executions based on the client state when the query is executed, and creates a
traversal profile to summarize which associations are traversed on the results of
the query. This information is used to predict prefetch for future queries. Be-
fore a new query is executed, a prefetch specification is generated based on the
classification of the query and its traversal profile. AutoFetch improves on pre-
vious approaches by collecting profile information across multiple queries, and
using client program state to help classify queries. We evaluated AutoFetch
using both sample applications and benchmarks and showed that we were able
to improve performance and/or simplify code.
References
1. M. P. Atkinson and O. P. Buneman. Types and persistence in database program-
ming languages. ACM Comput. Surv., 19(2):105–170, 1987.
2. M. P. Atkinson, L. Daynes, M. J. Jordan, T. Printezis, and S. Spence. An orthog-
onally persistent Java. SIGMOD Record, 25(4):68–75, 1996.
3. T. Ball and J. R. Larus. Efficient path profiling. In International Symposium on
Microarchitecture, pages 46–57, 1996.
4. P. A. Bernstein, S. Pal, and D. Shutt. Context-based prefetch for implementing
objects on relations. In Proceedings of the 25th VLDB Conference, 1999.
5. M. J. Carey, D. J. DeWitt, and J. F. Naughton. The 007 benchmark. SIGMOD
Rec., 22(2):12–21, 1993.
6. D. Cengija. Hibernate your data. onJava.com, 2004.
7. W. R. Cook and S. Rai. Safe query objects: statically typed objects as remotely
executable queries. In ICSE ’05: Proceedings of the 27th international conference
on Software engineering, pages 97–106. ACM Press, 2005.
72 A. Ibrahim and W.R. Cook
26. M. Palmer and S. B. Zdonik. Fido: A cache that learns to fetch. In Proceedings of
the 17th International Conference on Very Large Data Bases, 1991.
27. D. A. Patterson. Latency lags bandwith. Commun. ACM, 47(10):71–75, 2004.
28. C. Russell. Java Data Objects (JDO) Specification JSR-12. Sun Microsystems,
2003.
29. Raible’s wiki: StrutsResume.
http://raibledesigns.com/wiki/Wiki.jsp?page=StrutsResume, March 2006.
30. M. Venkatrao and M. Pizzo. SQL/CLI – a new binding style for SQL. SIGMOD
Record, 24(4):72–77, 1995.
The Runtime Structure of Object Ownership
Nick Mitchell
1 Introduction
In this paper, we consider the problem excessive memory footprint in object-
oriented programs: for certain intervals of time, the live objects exceed available
or desired memory bounds. Excessive memory footprint has many root causes.
Some data structures impose a high per-element overhead, such as hash sets with
explicit chaining, or tree maps. Data models often include duplicate or unneces-
sary fields, or extend modeling frameworks with a high base-class memory cost.
There may be objects that, while no longer needed, remain live [34, 39], such as
when the memory for an Eclipse [17] plugin persists beyond its last use. Often,
to mask unresolved performance problems, applications aggressively cache data
(using inefficient data structures and bulky data models).
To isolate the root causes for large object graph size requires understanding
both responsibility and internal content: the program may hold on to objects
longer than expected, or may use data structures built up in inefficient ways. We
analyze this combination of ownership structures by summarizing the state of the
heap — at any moment in time within the interval of excessive footprint. In con-
trast, techniques such as heap [34, 35, 40, 43], space [36], shape [32], lexical [6], or
cost-center [37] profiling collect aggregate summaries of allocation sites. Profiling
dramatically slows down the program, gives no information about responsibility
Table 1. A commonly used, but not especially useful, graph summary: aggregate
objects by data type, and then apply a threshold to show only the top few
tools also provide filters to ignore third-party providers such as J2SE, AWT, and
database code. But, as the table shows, those third parties often form the bulk
of the graph’s size. In addition, they often provide the collections that (perhaps
inefficiently) glue together the graph’s objects.
Filters, aggregations, and thresholds are essential elements of summarization,
but must be applied carefully. Thresholds help to focus on the biggest con-
tributors, but the biggest contributors are not single data types, or even single
locations in the graph. As we will show in Section 4, those 3.6 million primitive
arrays in Table 1 are placed in many distinct locations. Thus, at first sight, they
appear to be scattered throughout the (18-million node) graph. However, we
will show that only two distinct patterns of locations that account for 80% of
the largest data structure’s size. This careful combination of aggregation and
thresholding can produce concise summaries of internal content.
The same care is necessary when summarizing responsibility. If the object
graph is tree-like (i.e. a diamond-free flow graph), the problem of summarizing
ownership structure reduces to that of summarizing content; responsibility is
clear when each object has a single owner. As we will quantify later, object
graphs are not at all tree-like. For example, in many cases two unrelated program
mechanisms share responsibility for the objects in a data structure; in turn, those
two mechanisms themselves are shared by higher-level mechanisms. The result is
a vast web of responsibility. We can not arbitrarily filter out edges that result in
sharing, even if it does have the desirable property of reducing the responsibility
structure to a tree [23].
This paper has four main contributions.
Table 2. The heap snapshots we analyze in this paper. They include both real appli-
cations and benchmarks, divided into the top and bottom half of this table.
This section introduces a way to summarize the responsibility for the memory
footprint of an object reference graph. We first introduce important structural
and semantic graph properties, and quantify the extent to which these properties
occur in both real applications and benchmarks. We then present an algorithm
to compute an ownership graph, a new graph that succinctly summarizes the
ownership structures within a given graph.
2
Sometimes, the objects in a halo are garbage; e.g. HPROF [43] does not collect
garbage prior to writing a heap snapshot to disk. More often, non-Java mechanisms
reference members of a halo, but the snapshot does not report them; e.g. if the
garbage collector is not type accurate, this information may not be available.
The Runtime Structure of Object Ownership 79
Table 3. The structural properties of graphs; the fifth and sixth columns show the
fraction of the dominator trees that are shared and involved in butterflies
avg. objects shared domtrees in
application halos per domtree domtrees a butterfly
A1 152 153 81% 25%
A2 3,496 41 91% 24%
A3 1,064 310 87% 54%
A2’ 1,776 39 76% 9%
A4 3,828 27 78% 42%
A5 3,492 43 67% 7%
A6 826 103 72% 13%
mtrt 25 3 2% <1%
db 7 5 32% <1%
javac 27 8 49% 16%
jess 8 3 6% <1%
jack 27 3 24% <1%
mpegaudio 117 8 40% <1%
compress 26 6 45% <1%
Table 3 summarizes the structural properties for the applications and bench-
marks of Table 2. The real applications have many halos, large dominator trees,
and a great deal of shared and butterfly ownership. Only one benchmark, javac,
exhibits characteristics somewhat like the real applications.
3.2 Three Structures Derived from the Language and Data Types
We supplement structural information with three pieces of semantic information:
contingent ownership, the class loader frontier, and context-identical dominator
trees, as illustrated in Figure 3. The first two draw upon language features, and
the third takes object data types into account.
(a) Contingent ownership (b) Class loader frontier (c) Context-identical trees
Table 4 shows the fraction of shared dominator trees that have this property. The
real applications all have thousands of dominator trees with contingent owner-
ship, and on average 52% of the contingent ownership is strong. The benchmarks
have a higher proportion of strong contingent ownership: 78%.
Class Loader Frontier. Real applications have a large boundary between dom-
inator trees headed by class loader mechanisms and trees of non-class loading
mechanisms. Figure 3(b) illustrates a case with four dominator trees located on
either side of this boundary. This boundary is large because real applications
make heavy use of class loaders, and they commonly have shared ownership.
Table 5 shows that real applications have as many as 38 thousand dominator
trees headed by class loader data types; on average, 29% of the class loader
dominator trees from these seven snapshots were shared. Further complicating
matters, these shared class loader dominator trees tend to reach nearly all ob-
jects. This is because, in real applications, the class objects very often reach
a substantial portion of the graph. Next, the class loader dominator trees are
usually reachable from a wide assortment of application, framework, and JVM
mechanisms. For example, to isolate plugins, the Eclipse IDE uses a separate
class loader for each plugin; its plugin system reaches the class loader mecha-
nism, which in turn reaches many of the objects. The result is a highly tangled
web of edges that connect the class loader and other trees.
We say that dominator trees that are headed by an instance of a class loader
data type, and that are on either side of the boundary between class loader
82 N. Mitchell
Table 4. The number of dominator trees that are contingently owned, and strongly
so, compared to the total number of shared dominator trees
mechanisms and all others are said to be on the class loader frontier. The fourth
column of Table 5 shows the number of dominator trees that lie on this frontier.
All of the benchmarks have a small, and roughly equal number of shared class
loader dominator trees that are on this frontier; this, despite a widely varying
range of shared dominator trees across the benchmarks (as shown in the sec-
ond column of Table 4). The real applications have a varied, and much larger,
class loader frontier. This reflects a richer usage of the class loader mechanism,
compared to the benchmarks.
Table 5. The number of dominator trees headed by class loader mechanisms, the
number of those that have shared ownership, and the number of dominator trees that
are on the class loader frontier
class loader class loader class loader
application total shared frontier
A1 8,297 1,032 4,550
A2 26,676 1,008 3,030
A3 38,395 133 3,768
A2’ 19,080 959 2,449
A4 5,475 396 1,127
A5 5,410 363 1,259
A6 1,017 120 522
mtrt 51 8 29
db 46 8 21
javac 48 8 22
jess 135 8 23
jack 48 8 29
mpegaudio 47 8 23
compress 47 8 22
The Runtime Structure of Object Ownership 83
Under this definition of equality, we can group dominator trees into equivalence
classes. Table 6 shows the number and average size of context-identical equiva-
lence classes for our suite of applications and benchmarks. In real applications,
there are typically many thousands of such classes, with a dozen or so dominator
trees per class.
Table 6. The number and average size of the context-identical equivalence classes from
a variety of applications and benchmarks
Applying a graph edit yields a new, reduced, graph that preserves the reacha-
bility of the input graph. When applying a chain of graph edits, each edit takes
as input the reduced graph generated by the previous graph edit.
Using a combination of five kinds graph edits, some applied multiple times, we
construct concise ownership graphs. We now define those five kinds of edits, and
subsequently describe an ownership graph construction algorithm.
Dominator Edit. Compute a representative for each halo of the input graph;
we find the set of representatives that, on any depth-first traversal of the input,
have only back edges incoming. The union of this set of halo representatives
with those nodes that have no incoming edges form the root set of the graph.
Given this root set, compute the dominator forest of the input graph.3 From
3
The dominator algorithm we use [22] assumes that the input is a flow graph. In our
case, we use an implicit start vertex: one that points to the computed root set.
The Runtime Structure of Object Ownership 85
this forest, we define a graph edit (C, De , Dn ). The collapsing relation C maps
a node to its dominator forest root; the deleted edge set De consists of edges
that cross the class loader frontier or that have only contingent ownership; the
deleted node set Dn is empty. This edit collapses the dominator trees into single
nodes. It will also remove the shared ownership from dominator trees that are
strongly contingently owned.
Context-identical Edit. For each node n of the input graph, compute a rep-
resentative type Tn . In the case where each node is a dominator tree, we choose
this representative type to be the node type of the head of the dominator tree;
this will not necessarily be the case when this graph edit is applied subsequent
to graph edits other than the dominator edit. In the case where each node is
a collection of dominator trees whose heads are of uniform type, we choose the
representative type to be that type. Otherwise, we say the representative type is
undefined. Let the parent set of a node n, Pn , be the set of predecessor nodes of
n. Group the nodes of the input graph according to equality, for each graph node
n, of the pair (Pn , Tn ). For the remaining equivalence classes, choose an arbitrary
representative node. The context-identical collapsing relation maps each node to
that representative. The deleted edge set and deleted node set are empty.
everything everything
cum 9MB cum 12MB
everything
cum 780MB
CachedTargets
base9MB cum 720MB
CategoryInfo Category
base3kB cum 143MB
3MB base13kB
b cum 229MB
9MB
x21
ListenerRepository
Lis CategoryImpl
base12MB
MB
B cum
cu 143MB base 98MB cum
m 229MB
229
29MB
MB
B
x211644
miscellaneous
base131MB
(d) A3
small amount of space, i.e. M , on top of the miscellaneous size. The collapsing
relation is the identity, and the deleted edge set is empty.
root in the graph. Table 7 shows the size and time to compute4 ownership graphs.
We do not count the two pseudo-nodes towards an ownership graph’s node count.
The computation time figures include the code to compute the graph halos, a
DFS spanning tree, the dominator tree, all of the graph edits, and the time to
render the graph to SVG (scalable vector graphics). For application A3, the full
graph has nearly 18 million nodes; the ownership graph, computed in 82 seconds,
consists of 11 nodes.
Figure 4 shows the output, generated automatically, from three of the bench-
marks and application A3. Our rendering code draws a stack of nodes whenever
an ownership graph node represents a context-identical aggregate. Each node
shows the uniquely-owned bytes (“base”) represented by that aggregate. Each
non-leaf node also show shared-owned bytes (“cum”). Finally, we color the nodes
based on the source package that primarily contributes to that aggregate’s base
size: dark gray for customer code, light gray for framework code (such as the
application server, servlet processing, XML parsing code), black for standard
Java library code, and white for everything else.
have instances that point to a number of nodes of the same type; the format of
heap snapshots usually distinguishes array types for us. We currently identify
only one-hop recursion, where nodes of a type point to nodes of the same type.
This simple rule is very effective. Even in the XML document example of the
previous paragraph, where there are recursive sandwich types, the recursive-
typed nodes still point to nodes of the same type. A container type is a non-
backbone type that has node instances that point to backbone types. Given a
subpath in the tree that begins and ends with a node from R, all nodes between
those endpoints are from R . Given a subpath that begins with A, C, or R and
that ends with C, all nodes between the endpoints are from C . Finally, there
will be a set of nodes that are pointed to by nodes of backbone type; the union
of the types of nodes dominated by them form D.
Categorizing objects in this way yields powerful summaries of content, such as
the ones shown in Figure 6. This figure includes five additional snapshots from
real server applications, A7-A11, that we do not otherwise study in this paper.
Each of the six categories in the figure represents the sum of the instance size
of each node. We split the array backbone overhead into two subcategories: the
memory devoted to unused slots in the array, and slots used for actual references.
We assume that a null reference value in an array is not meaningful data; in
our experience, this is nearly always the case. We also include the contribution
of the Java object header, assuming eight bytes per header. We include header
overhead, as its contribution varies for similar reasons as backbone overheads in
general: many small collections leads to a higher C overhead, but also a higher
object header overhead. We deduct this header cost from the costs associated
with the other backbone overheads.
90 N. Mitchell
Table 8. The number of backbone nodes and the number of root-type-path and
backbone-folding equivalence classes (summed over all dominator trees)
root-type-path backbone-folding
application backbone nodes equiv. classes equiv. classes
A1 10,864,774 21,820 7,689
A2 4,704,630 23,561 10,381
A3 3,690,480 345,863 21,482
A2’ 772,299 13,855 6,550
A4 342,570 9,630 5,046
A5 630,784 14,847 7,793
A6 107,802 3,907 1,840
mtrt 78,153 3,092 448
db 17,173 91 50
javac 116,274 18,818 9,025
jess 15,069 148 101
jack 2,690 289 154
mpegaudio 2,017 117 77
compress 1,985 175 48
The Runtime Structure of Object Ownership 91
Fig. 7. A hash map of inner hash maps. There are two backbone types (Entry[] and
Entry), ten nodes of those two types, nine backbone equivalence classes under root
type path equality, and four backbone-folding equivalence classes.
Root-type-path Equality. Let the root path of a tree node be the list of nodes
from the tree root to that node, inclusive; the root type path is similarly the list
of those node types. We compute the A and R node types, and the instances of
those types in the tree under analysis. We then form equivalence classes of these
instances, using root type path equality.
It is often the case that a large number of backbone nodes in a tree have equal
root type paths. Forming equivalence classes based on this notion of equality can
therefore produce a more succinct summary of a tree’s content than an enumer-
ation of the backbone nodes. The third column in Table 8 shows the number of
root type path equivalence classes in a number of applications and benchmarks.
Consider the common case of root type path equality shown in Figure 7: a hash
map of inner hash maps, where all of maps use explicit chaining. There are two
backbone types (Entry[] and Entry) and ten backbone nodes. Of those ten, there
are nine distinct classes of backbone nodes under root type path equality. The
only non-singleton class has the two top-left Entry nodes. Every other backbone
node has a unique root type path. For example, the third Entry in the upper hash
map is located under an Entry object, a property that the other two Entry nodes
do not have. This difference skews every other node instance under that chained
Entry, rendering little root type equivalence. We chose this example for illustra-
tive purposes only. In practice, we see that from Table 8 that there are quite a
large number of backbone nodes with type-identical root type paths. The figures
in this table represent the sum over all dominator trees in each heap snapshot.
Backbone-folding Equality. Root type path equality identifies nine backbone
equivalence classes in the tree of Figure 7. We feel there should only be four
distinct classes. The upper Entry[] array is rightly in a singleton class, but
the three upper Entry instances, the two lower Entry[] instances, and the four
lower Entry instances should form a total of three classes, respectively. Imagine
92 N. Mitchell
that the lower hash maps contain values of type string: a hash map of hash
maps that map to string values. We feel that each of those strings should be the
same, despite being located in potentially thousands of separate (lower) hash
maps, and despite each lower hash map being under a wide variety of depths of
(upper) Entry recursion.
To capture this finer notion of equivalence, we observe that it is recursive
structures, through combinations of R and R nodes, that lead to the kind of
skew that foils root type path equality. We compute the set of A, R, and R
nodes and, to each backbone node, associate a regular expression. The canonical
set of regular expressions form the equivalence classes (c.f. the RDS types and
instances of [32] and the DSGraphs of [20, 21]). The regular expression of a
node is its root type path, except that any R node is optional and any R node
can occur one or more times in any position. For example, the two D nodes
from Figure 5 are backbone-folding equivalent, because they share the regular
expression CAR+ (R ?)CARL+ (R ?), where + and ? have the standard meanings
of “one or more” and “optional”.
The fourth column in Table 8 shows the number of equivalence classes of
backbone nodes under backbone-folding equality. Even for graphs with tens of
millions of nodes, aggregation alone (i.e. without filters or thresholds) collapses
all dominator trees down to at most 21 thousand backbone equivalence classes.
Applying thresholds after having aggregated backbones can yield succinct sum-
maries of content. As a first threshold, we usually care to study only the largest
trees, or at least to study the largest trees first. Within a large tree, we consider
two useful thresholds of backbone aggregates: a biggest contributing pattern
analysis, and a suspect locator analysis.
A biggest contributing pattern analysis looks for the equivalence classes that
contribute most to a given tree. Table 9 shows the result of a biggest-contributor
analysis to the largest dominator tree in each application and benchmark. There
are often hundreds of equivalence classes within the largest tree. However, only
a few patterns summarize a substantial fraction of the footprint of the tree. The
third column in the table shows how many of those equivalence classes account
for 80% of the size of the tree (tabulating the largest equivalence classes first).
With just two exceptions, a small handful of classes account for most of the
footprint of the largest tree. Even for the two exceptions, A1 and javac, 80% of
the largest tree’s size is accounted for by 35 and 53 patterns.
Sometimes, it is helpful to know where certain suspicious data types are placed
in an expensive tree. A suspect locator analysis identifies the distinct classes of
locations in which a chosen data type occurs. There may be millions of instances
of this type, but they will not be in a million different equivalence classes. Fur-
thermore, as Table 10 shows, for all of our applications and benchmarks, a only
a handful of equivalence classes account for most of the contribution of that type
in any one tree. This is despite the fact that, in some cases, there are hundreds
of distinct patterns in which the largest data type is located. More generally,
The Runtime Structure of Object Ownership 93
Table 9. A biggest contributing pattern analysis shows that a few hot patterns account
for 80% of the largest dominator tree’s memory footprint
this suspect locator analysis can apply to other notions of suspects, such as the
major contributors to backbone overhead: if my C overhead is so high, then tell
me the patterns that contribute most. We will explore this more general form of
analysis in future work.
5 Related Work
Techniques that summarize the internal structure of heap snapshots are rel-
atively uncommon. Recent work [27, 29] introduces a system for counting, via
general queries, both aggregate and reachability properties of an object reference
graph. They have also done insightful characterization studies [28, 30]. Another
recent work [25], akin to [12], summarizes reachability properties for each root
in the graph. To our knowledge, these works do not aggregate the internal struc-
tural of the graphs according to context. Other related domains include:
Shape Analysis. Static shape analysis builds conservative models of the heap
at every line of code [12, 9, 19, 20, 21]. They often use abstract interpretation to
form type graphs (such as the RSRSG [9] or the DSGraph [20]); these summaries
capture recursive structures, somewhat analogous to the regular expressions we
form in Section 4.2. The work we discussed above [25] can be thought of as a
kind of dynamic shape analysis.
Heap Profiling. This phrase usually applies to techniques that track the
object allocations of an application for a period of time [6, 36, 34, 35, 37, 40, 32].
Mostly, the allocation site profiles are used to populate aggregate call graphs,
and interpreted as one would a profile of execution time. Sometimes, the data
is used to help design garbage collectors [15]. Some works combine static shape
analysis with dynamic profile collection [32].
94 N. Mitchell
Table 10. A suspect locator analysis shows that a few hot patterns contain 80% of
the bytes due to instances of the dominant data type
6 Future Work
We see three exciting areas of future work. First, Section 4.3 demonstrated how
to locate the patterns that explain the hottest elements of a flat summary by
type. This is a powerful style of analysis, and we can extend it to be driven
by a more general notion of suspects. For example, we can use it to locate the
few patterns that explain most of the Java object header overhead. We can also
introduce new kinds of suspects, such as large base class overhead.
Second, the ownership graph provides a visual representation of responsibility.
We feel that there is a need for schematic visual representations of content. The
backbone equivalence classes provide a good model for this summary. There is
much work to be done in finding the powerful, yet concise, visual metaphors that
will capture these patterns.
Third, we feel that the methodology employed in this paper, and the owner-
ship structures we have identified can be useful in understanding the structure
graphs from other domains. For example, many of the difficult aspects of graph
size (scale, scattering of suspects in disparate locations in a graph, sharing of
responsibility) have analogs in the performance realm. In performance, flat sum-
maries usually only point out leaf methods, and yet the structure of a call graph
is highly complex. We will explore this synergy.
7 Conclusion
It is common these days for large-scale object-oriented applications to be devel-
oped by integrating a number of existing frameworks. As beneficial as this may
be to the development process, it has negative implications on understanding
what happens at run time. These applications have very complicated policies
governing responsibility and object lifetime. From a snapshot of the heap, we
are left to reverse engineer those policies. On top of that, even uniquely-, non-
contingently-owned objects have complex structure. Data structures that are
essentially trees, like XML documents, are large, and represented with a multi-
tude of non-tree edges. The common data types within them may be scattered
in a million places; e.g. the attributes of an XML document’s elements occur
across the width and throughout the depth of the tree.
We have presented a methodology and algorithms for analyzing this web of
complex ownership structures. In addition to their usefulness for summarizing
memory footprint, we hope they are helpful as an exposition of the kinds of struc-
tures that occur in large-scale applications. Work that tackles object ownership
from viewpoints other than runtime analysis may benefit from this study.
Acknowledgments
The author thanks Glenn Ammons, Herb Derby, Palani Kumanan, Derek Ray-
side, Edith Schonberg, and Gary Sevitsky for their assistance with this work.
96 N. Mitchell
References
1. Aldrich, J., Chambers, C.: Ownership domains: Separating aliasing policy from
mechanism. In: The European Conference on Object-Oriented Programming. Vol-
ume 3086 of Lecture Notes in Computer Science., Oslo, Norway, Springer-Verlag
(2004)
2. Aldrich, J., Kostadinov, V., Chambers, C.: Alias annotations for program under-
standing. In: Object-oriented Programming, Systems, Languages, and Applica-
tions. (2002)
3. Borland Software Corporation: OptimizeItTM Enterprise Suite.
http://www.borland.com/us/products/optimizeit (2005)
4. Boyapati, C., Lee, R., Rinard, M.: Ownership types for safe programming: pre-
venting data races and deadlocks. In: Object-oriented Programming, Systems,
Languages, and Applications. (2002)
5. Boyapati, C., Liskov, B., Shrira, L.: Ownership types for object encapsulation. In:
Symposium on Principles of Programming Languages. (2003)
6. Clack, C., Clayman, S., Parrott, D.: Lexical profiling: Theory and practice. Journal
of Functional Programming 5(2) (1995) 225–277
7. Clarke, D., Wrigstad, T.: External uniqueness is unique enough. In: The European
Conference on Object-Oriented Programming. Volume 2743 of Lecture Notes in
Computer Science., Springer-Verlag (2003) 176–200
8. Clarke, D.G., Noble, J., Potter, J.M.: Simple ownership types for object contain-
ment. In: The European Conference on Object-Oriented Programming. Volume
2072 of Lecture Notes in Computer Science., Budapest, Hungary, Springer-Verlag
(2001) 53–76
9. Corbera, F., Asenjo, R., Zapata, E.L.: A framework to capture dynamic data
structures in pointer-based codes. IEEE Transactions on Parallel and Distributed
Systems 15(2) (2004) 151–166
10. De Pauw, W., Sevitsky, G.: Visualizing reference patterns for solving memory leaks
in Java. Concurrency: Practice and Experience 12 (2000) 1431–1454
11. Dietl, W., Müller, P.: Universes: Lightweight ownership for JML. Special Issue:
ECOOP 2004 Workshop FTfJP, Journal of Object Technology 4(8) (2005) 5–32
12. Ghiya, R., Hendren, L.J.: Is it a tree, a DAG, or a cyclic graph? a shape analysis
for heap-directed pointers in c. In: Symposium on Principles of Programming
Languages. (1996)
13. Hastings, R., Joynce, B.: Purify — fast detection of memory leaks and access
errors. In: USENIX Proceedings. (1992) 125–136
14. Hill, T., Noble, J., Potter, J.: Scalable visualizations of object-oriented systems
with ownership trees. Journal of Visual Languages and Computing 13 (2002)
319–339
15. Hirzel, M., Hinkel, J., Diwan, A., Hind, M.: Understanding the connectivity of
heap objects. In: International Symposium on Memory Management. (2002)
16. Hitchens, R.: Java NIO. First edn. O’Reilly Media, Inc. (2002)
17. Holzner, S.: Eclipse. First edn. O’Reilly Media, Inc. (2004)
18. IBM Corporation: Rational PurifyPlus (2005)
19. Jeannet, B., Loginov, A., Reps, T., Sagiv, M.: A relational approach to interproce-
dural shape analysis. In: International Static Analysis Symposium. Lecture Notes
in Computer Science, New York, NY, Springer-Verlag (2004)
20. Lattner, C., Adve, V.: Data structure analysis: A fast and scalable context-sensitive
heap analysis. Technical Report UIUCDCS-R-2003-2340, Computer Science Dep-
tartment, University of Illinois (2003)
The Runtime Structure of Object Ownership 97
21. Lattner, C., Adve, V.: Automatic pool allocation: Improving performance by con-
trolling data structure layout in the heap. In: Programming Language Design and
Implementation, Chicago, IL (2005) 129–142
22. Lengauer, T., Tarjan, R.E.: A fast algorithm for finding dominators in a flow graph.
ACM Transactions on Programming Languages and Systems 1(1) (1979) 121–141
23. Mitchell, N., Sevitsky, G.: Leakbot: An automated and lightweight tool for di-
agnosing memory leaks in large Java applications. In: The European Conference
on Object-Oriented Programming. Volume 2743 of Lecture Notes in Computer
Science., Springer-Verlag (2003) 351–377
24. Parkinson, M., Bierman, G.: Separation logic and abstraction. In: Symposium on
Principles of Programming Languages. (2005)
25. Pheng, S., Verbrugge, C.: Dynamic shape and data structure analysis in java.
Technical Report 2005-3, School of Computer Science, McGill University (2005)
26. Pollet, I., Charlier, B.L., Cortesi, A.: Distinctness and sharing domains for static
analysis of Java programs. In: The European Conference on Object-Oriented Pro-
gramming. Volume 2072 of Lecture Notes in Computer Science., Springer-Verlag
(2001) 77–98
27. Potanin, A.: The Fox — a tool for object graph analysis. Undergraduate Honors
Thesis (2002)
28. Potanin, A., Noble, J., Biddle, R.: Checking ownership and confinement. Concur-
rency and Computation: Practice and Experience 16(7) (2004) 671–687
29. Potanin, A., Noble, J., Biddle, R.: Snapshot query-based debugging. In: Australian
Software Engineering Conference, Melbourne, Australia (2004)
30. Potanin, A., Noble, J., Frean, M., Biddle, R.: Scale-free geometry in object-oriented
programs. In: Communications of the ACM. (2005)
31. Quest Software: JProbe R Memory Debugger. http://www.quest.com/jprobe
(2005)
32. Raman, E., August, D.I.: Recursive data structure profiling. In: ACM SIGPLAN
Workshop on Memory Systems Performance. (2005)
33. Rayside, D., Mendel, L., Jackson, D.: A dynamic analysis for revealing object
ownership and sharing. In: Workshop on Dynamic Analysis. (2006)
34. Rojemo, N., Runciman, C.: Lag, drag, void and use — heap profiling and space-
efficient compilation revisited. In: International Conference on Functional Pro-
gramming. (1996) 34–41
35. Runciman, C., Rojemo, N.: New dimensions in heap profiling. Journal of Functional
Programming 6(4) (1996) 587–620
36. Sansom, P.M., Peyton Jones, S.L.: Time and space profiling for non-strict higher-
order functional languages. In: Symposium on Principles of Programming Lan-
guages, San Francisco, CA (1995) 355–366
37. Sansom, P.M., Peyton Jones, S.L.: Formally based profiling for higher-order func-
tional languages. ACM Transactions on Programming Languages and Systems
19(2) (1997) 334–385
38. Shaham, R., Kolodner, E.K., Sagiv, M.: Automatic removal of array memory leaks
in java. In: Computational Complexity. (2000) 50–66
39. Shaham, R., Kolodner, E.K., Sagiv, M.: Estimating the impact of heap liveness in-
formation on space consumption in Java. In: International Symposium on Memory
Management. (2002)
40. Shaham, R., Kolodner, E.K., Sagiv, S.: Heap profiling for space-efficient java. In:
Programming Language Design and Implementation. (2001) 104–113
41. SPEC Corporation: The SPEC JVM Client98 benchmark suite.
http://www.spec.org/osg/jvm98 (1998)
98 N. Mitchell
5
java.nio is not without its faults; e.g. it currently lacks an explicit unmap facility.
On Ownership and Accessibility
1 Introduction
The object-oriented community is paying increasing attention to techniques for
object level encapsulation and alias protection. Formal techniques for modular
verification of programs at the level of objects are being developed hand in hand
with type systems and static analysis techniques for restricting the structure of
runtime object graphs. Ownership type systems have provided a sound basis for
such structural restrictions by being able to statically represent an extensible
object ownership hierarchy. The trick to ownership systems is to hide knowledge
of the identity of an object outside its owner. This form of information hiding is
useful for modular reasoning, data abstraction and confidentiality.
Ownership types support instance-level information hiding by providing a
statically enforceable object encapsulation model based on an ownership tree.
Traditional class-level private fields are not enough to hide object instances. For
example, an object in a private field can be easily returned through a method
call. However, the encapsulation mechanism used by ownership types is still not
flexible enough to express some common design patterns such as iterators and
callback objects. Moreover, ownership types, to date, lack ownership variance.
This means, for instance, that all elements stored in a list must be owned by the
same owner due to the recursive structure of the list.
This paper proposes a novel type system which generalizes ownership types
with an access control system in which object accessibility and reference capability
are treated orthogonally. The rationale behind this mechanism is that one only
needs to hold access permission for an object in order to use it; the capability
of the object can be adapted to the current access context. This allows more
flexible and expressive programming with ownership types.
Our system allows programmers to trade off flexibility/accessibility with use-
ability/capability. We allow object accessibility to be variant; intuitively it is
2 Ownership Types
Earlier object encapsulation systems, such as Islands [13] and Balloons [2], use
full encapsulation techniques to forbid both incoming and outgoing references
to an object’s representation. However, full encapsulation techniques are overly
strong, because outgoing references from the representation are harmless and
are often needed to express typical object-oriented idioms. Ownership types
[11, 10, 7] provide a more flexible mechanism than previous systems; they weaken
the restriction of full encapsulation by allowing outgoing references while still
preventing representation exposure from outside of the encapsulation. The work
on ownership types emanated from some general principles for Flexible Alias
Protection [20] and the use of dominator trees in structuring object graphs [22].
Ownership type systems establish a fixed per object ownership tree, and en-
force a reference containment invariant, so that objects cannot be referenced
from outside of their owner — an object owns its representation and any other
external object wanting to access the representation must do so via the owner.
Ownership types are parameterized by names of runtime objects, called context s
in the type system [10]. Contexts include all objects and a pre-defined context
called world which is used to name the root of the ownership tree. The world
context is in scope throughout the program. All root objects are created in the
world context and all live objects are reachable from the root objects.
In defining ownership the first context parameter of a class is the owner of
the object. An object’s owner is fixed for its lifetime, thus naturally establishing
an ownership tree in the heap. The rest of the context parameters are optional;
they are used to type and reference the objects outside the current encapsulation,
which is how ownership types free objects from being fully encapsulated, as in
early approaches to alias protection.
Types are formed by binding the formal context parameters of a class. In
order to declare a type for a reference, one must be able to name the owner that
encapsulates the object. An encapsulation is protected from incoming references
because the owners of objects inside the encapsulation cannot be named from
the outside. The key mechanism is the use of variable this to name the current
On Ownership and Accessibility 101
class Personnel<o> {
Data<this> privateData;
Data<this> getData() { return privateData; } }
are owned by the list, living within the list’s internal representation. The prob-
lem is obvious — iterators cannot be referenced from outside of the list due
to the encapsulation property. In the client code given below, list owns the
iterator object returned by list.getIter(), but iter cannot be declared as
Iterator<list,o>, because list is not a constant.
class Client<o> {
void m() {
List<this, o> list = new List<this, o>(); // OK
Iterator<this, o> iter = list.getIter(); // ERROR, owner mismatch
iter.add(new Data<o>()); // OK
iter.add(new Data<world>()); // ERROR, owner of Data mismatch
iter.add(new Data<this>()); // ERROR, owner of Data mismatch
iter.element().useMe(); } } // OK
reference object l then l must be inside owner(l ). In our type system, we use
a separate context acc(l) to determine the accessibility of an object; if object
l can reference object l then l must be inside acc(l ). This is the accessibility
invariant for our system.
A typical type consists of three parts: [access] Classcapabilities compris-
ing a class name with a list of contexts. In a type, the capability list binds the
formal parameters of the type’s class definition to actual contexts; the access
modifier context (the prefix in square brackets) restricts accessibility to objects
inside the given context.
Class definitions are parameterized by formal contexts. These formal para-
meters, together with this and world, define the contexts that are available
for types within the class. Note that a class definition does not have a formal
parameter for denoting the accessibility of its instances; there is no need.
Access modifiers are independent from the definition of their objects, that is,
the access modifier is an annotation on a type rather than a context parameter
in the class definition. Note that, in our type system, the ownership tree is still
built from the owner context (the first context argument of a type). The modifier
does not affect the ownership tree, instead, it generalizes the strong containment
invariant of ownership, allowing more general reference structures.
We could use access modifiers to provide various levels of protection on ob-
jects. When an object’s access modifier is the world context, it can be considered
as a public object which is accessible by any other object. When an object’s
access modifier is the owner of the defining object (i.e., the owner parameter of
the current class), it is partially protected — it can be only used by those within
the owner context. When an object’s access modifier is the defining object (i.e.,
the this context), it is private to the defining object and cannot be accessed from
outside the defining object.
In the following example, variable a has world accessibility and owner this.
We interpret this dynamically. Variable a is a field of the current B object that
may hold references to A objects; any such A object must be inside (owned by)
the current B object this because the owner is given as this. Having world
accessibility implies that the referenced object can be used by any other object
via B’s a field. The new expression creates a new A object owned by the current
B object with world accessibility. Such an object may safely be assigned to the
a field of the current object, but not to other B objects.
class A<o> { ... }
class B<o> {
[world] A<this> a;
m() { a = new [world]A<this>; } }
When the owner context and access modifier are the same, the variant own-
ership type [o] A<o> is the same as the A<o> in ownership types. Hence our
model with modifiers subsumes the owners-as-dominators model. The following
example illustrates how separating access from the owner allows us to achieve
different kinds of access protection.
class D<o> {
[o] A<o> a1; // partly protected
[this] A<this> a2;// encapsulated
[this] A<o> a3; // encapsulated, but field is writable from outside
[o] A<this> a4; } // not encapsulated, field is read-only from outside
class E<o> {
[o] D<o> d;
m() {
[o]A<o> x = d.a1; // OK to read the reference
d.a1 = x; // OK to update the field
world
K-
K
K+
owns
dominates
K’
class G<o> {
F<this> f1; // invariant on context argument
F<*> f2; // fully abstract on context argument
F<this+> f3; // inward variance on context argument
F<this-> f4; // outward variance on context argument
A<this> a;
m() {
[this] A x = f1.a; // OK to read
f1.a = x; // OK to write
...= f3.a;// ERROR, cannot name context inside f3.a’s access modifier
f3.a = new [o]A<o>(); // OK, f3.a’s access modifier inside o
The choice of variance is made when types are used. Programmers can use
any combination of invariance, inward/outward variance or full abstraction to
express context arguments in a type. Invariant contexts are most usable but
least flexible because one must be able to name the concrete context. f1 is
most usable, because f1.a is both readable and writable; but f1 is less flexible
because it can only be accessed from within the current context. Fully abstract
contexts are most flexible but least usable because all information about the
context is hidden; f2 is least usable because f2.a is neither readable nor writable
(except for the special case with the world context as shown in the example).
The type of f2.a is [?]A where the ? denotes an unknown context; the only
thing we know about ? is that it is inside world. Unknown contexts are not
for programmer use, but are used in our semantics. They are simply shorthand
for an anonymous context with given variance which is existentially quantified.
However, the combinations of fully abstract and invariant context arguments
are useful as we are about to see in revisiting the list example. Inward/outward
variant contexts give a choice between invariance and full abstraction where
some information of the context is available to give programmers just enough
information they need to use the context, as we see with f3.a and f4.a. Within
class G their types are [this+?]A (respectively [this-?]A) where the unknown
contexts are bounded inside (respectively outside) this.
We extend the above example with some more complicated cases of variance
which involve nested variances and mixed inward/outward variances. The type
system is able to derive the ordering information in the presence of nested vari-
ances. Some of the types involved are:
h1.f1 : F<o+?+> and h1.f1.a : [o+?+?]A<*>
we can derive that o+?+? is inside o. Also we find:
h1.f2 : F<o+?-> and h2.f1 : F<o-?+>
The variance o+?- contains contexts o and world but not this. Similarly o-?+
contains o and this but not world.
class H<o> {
F<o+> f1;
F<o-> f2; }
108 Y. Lu and J. Potter
class I<o> {
H<o+> h1;
H<o-> h2;
[o] A a;
m() {
h1.f1.a = a; // OK, h1.f1.a’s access modifier inside o
a = h2.f2.a; // OK, o inside h2.f2.a’s access modifier
Now we are in a good position to revisit the list example we discussed in the
previous section.
The implementation of the List and Iterator classes is almost the same as
for ownership types except the type of iterators created by the list has the access
modifier the same to the owner of the list, which essentially means anyone who
can name the owner of the list is allowed to access its iterators. By creating
iterators with accessibility as o, the list object authorizes the iterators to act
as its interface objects and to be used by the client to manipulate on itself.
On Ownership and Accessibility 109
However, the list’s representation (that is, the Node objects) is always protected
from the client and never exposed to the outside directly. To access the nodes,
the client must use either the list itself or the iterators created by the list.
In the Node class, the type of data field is [d] Data. As we have mentioned,
this is shorthand for [d] Data<*> where the owner of these Data objects is
abstract. The Node class is a recursive structure so all the node objects must
have the same type. However, with our owner abstraction, each node may contain
data objects owned by different contexts as shown in the client program.
class Client<o> {
void m() {
List<this, o> list = new List<this, o>(); // OK
[this]Iterator<*, o> iter = list.getIter(); // OK
iter.add(new [o]Data<o>()); // OK, o inside o, o matches *
iter.add(new Data<world>()); // OK, o inside world, world matches *
iter.add(new [this]Data<this>()); // ERROR, o outside this!
iter.add(new [o]Data<this>()); // OK, o is inside o, this matches *
iter.element().useMe(); // OK
iter.current = ... } } // ERROR, access modifier abstracted
The client creates the list object as usual, but in order to obtain a reference
to iterator objects returned by the list, it must declare a type which abstracts
the owner of iterators (which is the list object, see the List class). However,
in the type of iterators, the second context argument remains concrete, which
is necessary in order to reference data objects returned by iterators. Moreover,
with context variance, now the client can add data objects owned by various
contexts into the list. Objects with type [this] Data<this> cannot be added
into the list because the access modifier is variant outwards (from this to o)
which is not sound hence not permitted by the type system. Note that the type
system guarantees iterators cannot expose the node objects to the client.
4.1 Syntax
The abstract syntax for the source languages is given in Table 1. The metavari-
able T ranges over types; N ranges over nameable contexts (or concrete contexts);
K ranges over contexts; V ranges over context variances; L ranges over class defi-
nitions; M ranges over method definitions; e ranges over expressions; C, D range
over class names; f and m range over field names and method names respec-
tively; X, Y range over formal context parameters; and x ranges over variable
names with this as a special variable name to reference the target object for
the current call. The overbar is used for a sequence of constructs; for example,
110 Y. Lu and J. Potter
e is used for a possibly empty sequence e1 ..en , T x stands for a possibly empty
sequence of pairs T1 x1 ..Tn xn , etc. In the class production, inheritance DY
is optional because our type system does not need a top type.
The syntax distinguishes between concrete (nameable) contexts N and those
contexts K allowing the abstract contexts. Table 2 shows the extended syntax
used by the type system, which is not accessible by programmers. Abstract
contexts K+? , K−? and ? correspond to context variances K+, K− and ∗. The
difference between K+ and K+? is that K+ means all contexts inside K while
K+? is one context in the set of K+. Actually, K+? is a bounded existential
context whose name is anonymous; but we do know it is inside K. The unbound
existential context ? is an arbitrary context; we know nothing about it (except
it is inside the upper bound context world in the context hierarchy). Figure 2
shows the concept of existential contexts. A program P is a pair consisting of
a fixed sequence of class definitions and an expression e which is the body of
the main method. The environment Γ may contain the types of variables and
domination relations between formal context parameters.
The same syntactical abbreviation for sequences is used in the typing rules.
A sequence of judgements can be simplified with an overbar on the argument,
such as Γ ; N T . Substitution [V/X]T is used to substitute V for X in T ; this
substitution also requires |V| = |X|. Sometimes we use implications, denoted by
=⇒ , to avoid repeating rules with similar structure. Other symbols used in the
type system are: • means empty set; and ... match any single or multiple things;
1..n means an enumeration from 1 to n.
On Ownership and Accessibility 111
world
K-
K-?
K
K+
owns
dominates
K+?
In ownership type systems the contexts used to form types are actual runtime
objects. In order to prove the desired dynamic properties, we need to incorporate
the bindings of context parameters into the type system. Typically, the expres-
sion judgement Γ ; N e : T holds for the current context N. The context N is
bound to the current object (the target object of current call); in the static se-
mantics N is always bound to the variable this or world for the top-level program
expression, while in the dynamic semantics N is bound to the location of the
actual object in heap or world. Note that the bindings for all context parameters
in the current environment can be determined from the type of N at runtime. To
simplify the dynamic semantics we will annotate locations with their object type.
[VAR-ANY] Γ K K
Γ V ⊆∗ [VAR-IN ]
Γ K+ ⊆ K +
[VAR-CRT]
Γ N⊆N Γ K K
[VAR-OUT ]
[VAR-IN] Γ K− ⊆ K −
Γ K+? ⊆ K+
Γ V ⊆ V Γ V ⊆ V
[VAR-OUT] [VAR-TRA]
Γ K−? ⊆ K− Γ V ⊆ V
The [VAR] rules define the valid context variances. Since context variances
represent sets of contexts, the [VAR] rules really just define the subset relations
between them. Contexts can be considered as singleton sets containing only one
element. The only rule that can be applied to the unbound existential context ?
is the [VAR-ANY] rule. By inspection it is also clear that we cannot have anything
112 Y. Lu and J. Potter
Γ ; N o To Γ To <: T
[TYPE]
Γ; N T
|N| = arity(C) Γ ; N N , N Γ N1 N
[TYP-OBJ]
Γ ; N o [N ] CN
Γ N N Γ V ⊆ V
[SUB-VAR]
Γ [N] CV <: [N ] CV
class CX DY ... T = [N] D[V/X]Y Γ T <: T
[SUB-EXT]
Γ [N] CV <: T
Our subtyping rules need to handle context abstraction correctly, and avoid
breaking accessibility constraints through assignments to fields, or method para-
meters, with some of their types’ contexts hidden. The main idea of our system
is to substitute existential contexts for any variant contexts in the type of an
object (via an opening process as we see later) when we determine the types of
its fields/methods, and to prohibit binding to fields or method parameters which
include existential contexts in their types. Let us use the phrase existential type
to describe a type containing an existential context. By guaranteeing that exis-
tential types cannot be supertypes, we achieve the desired prohibition (note the
subtyping premise in all [EXP] rules involve binding). We now explain how the
[SUB] rules achieve this. We cannot use [SUB-VAR] to find an existential super-
type because its premise would require there to be some subset of an existential
context, which the [VAR] rules preclude. It follows that no existential type can
be a subtype of itself, because the alternative [SUB-EXT] is not applicable for the
reflexive case. Finally any type T judged to be a supertype by [SUB-EXT] must be
a supertype of itself according to the last premise of the rule. It follows that no
type judged to be a supertype by these rules can contain an existential context.
Legal concrete contexts include formal context parameters, the current con-
text this and the world context. Recall that the current context N, in the static
semantics, is always bound to this or world. Context ordering rules define the
domination relation between contexts. Domination is the reflexive and transi-
tive closure of ownership. Direct ownership is captured in the [ORD-OWN] rule by
On Ownership and Accessibility 113
looking up the owner from the type of the context via [LKP-OWN] (appearing at
the end of this subsection). The only direct ownership relation available in the
static semantics is for the this context; it is owned by the first context parameter
of its type (see [LKP-OWN] and [LKP-OWN ]). The this context is the only context
that is given a static type; this is both a context parameter and a variable nam-
ing the current object. At runtime, this is bound to the location of the target
object. The ordering on existential contexts is not surprising; ? world by the
[ORD-WLD] rule, but no other ordering is derivable for ?.
[ORD-RFL]
Γ ; N N : [N ] CN N ∈ N ∪ {N} Γ NN
[CTX-LCL]
Γ; N N X X ∈ Γ
[ORD-ENV]
[CTX-WLD] Γ X X
Γ ; N world
[ORD-WLD]
[ORD-OWN] Γ K world
Γ N ownerΓ (N)
[ORD-IN]
Γ K K Γ K K Γ K+? K
[ORD-TRA]
Γ K K [ORD-OUT]
Γ K K−?
The [PROGRAM] rule simply checks the expression in the main method; world is
the only concrete context available at this level.
L •; world e : T
[PROGRAM]
Le
Class well-formedness is checked in the [CLASS] rule. Each class defines its own
environment formed from its formal contexts and the type of this object. In the
original ownership type system, the owner parameter X1 had to be dominated
by all other context parameters, we follow the same convention here. Note that
the only direct ownership relation known to the class, that is this X1 , is not
included in the class environment; instead we capture it in the [ORD-OWN] rule to
make it generally derivable, in particular for its use in the dynamic semantics.
Furthermore, field types and methods need to be checked for well-formedness.
If a class is extended from another then the supertype needs to be valid in the
environment formed from the class, and the owner of the supertype must be the
same as the owner of the current context. Not surprisingly, new field names need
to be distinguished from the field names used in the supertype. Moreover, the
supertype is bound to a super variable in the environment that is used only by
the [METHOD] rule to check the correctness of overridden methods. We implicitly
assume the access modifier for the types of both this and super variables is the
default access modifier world.
and the current context this. Methods can be overridden in the traditional way
— covariant on the return type and contravariant on the types of method para-
meters.
Γ T <: T Γ T <: T
[METHOD]
Γ T m(T x) {e}
In the [EXP-NEW] rule, new objects are created using concrete contexts (accord-
ing to [TYP-OBJ]) in order to establish an ownership tree in heap. For simplicity,
we force all the fields of the object to be initialized at creation time. The in-
ternal context of the newly created object is an anonymous context inside its
owner. The [EXP-SEL] and [EXP-CAL] rules lookup the types of fields or methods
for the target expression e. They need to decide if they are able to name the
internal context of e by using the auxiliary function repT () which simply checks
if e is the current context (i.e. this). If e is the current context then it is used as
the internal context of e in the lookup functions fields() and method(); other-
wise, an anonymous context is used instead thus hiding the internal context (see
[LKP-REP]).
All rules for expressions that involve some form of binding, such as [EXP-ASS],
[EXP-CAL] and [EXP-NEW], use a subtype constraint to ensure that the type of the
target of the binding does not involve any existential contexts (recall the [SUB]
rules). A more conventional formulation of these rules would shift the subtype
check onto a subsumption rule, but that cannot be done here — we need to use
distinct types for the source and target of the binding in the rules.
Γ (x) = T
[EXP-VAR]
Γ; N x : T
Γ ; N o T Γ; N e : T fields(T , owner(T )+? ) = f T Γ T <: T
[EXP-NEW]
Γ ; N new T (e) : T
Γ; N e : T fields(T , repT (N, e))(f) = T
[EXP-SEL]
Γ ; N e.f : T
Γ ; N e : T Γ ; N e.f : T Γ T <: T
[EXP-ASS]
Γ ; N e.f = e : T
Γ; N e : T Γ; N e : T Γ T <: T
method(T , repT (N, e), m) = T m(T )...
[EXP-CAL]
Γ ; N e.m(e) : T
When accessing the fields or methods via an expression e, we determine their
types, given the type of e. These in turn use [LKP-DEF] to find a correct substitu-
tion for parameters of T ’s class. The opening process requires the replacement of
context variances with corresponding existential contexts. This process is similar
to the usual unpack/open for conventional existential types. The major differ-
ence is that our open process does not introduce fresh context variables into
the current environment. Instead, we keep the existential context anonymous by
On Ownership and Accessibility 115
annotating context variances with a special symbol ?. This technique not only
eliminates the need for the pack/close operation (since anonymous contexts do
not have to be bound to an environment, they naturally become global), but
also makes the proofs simpler. Moreover, this technique is capable of handling
complicated variances which would need nested open/close operations.
l, lT typed locations
e ::= ... | l terms
N ::= ... | l nameable contexts
o ::= f → l objects
H ::= l → o heaps
There are also a few auxiliary definitions to help formalize the properties. Lo-
cations are annotated with their type. From this we can lookup the accessibility
context for an object stored at that location. The objects in the heap form an
ownership tree just as in other ownership type systems. However, the reference
containment invariant is different. An object needs to be inside another object’s
modifier in order to access it.
[EXP-LOC]
Γ ; N lT : T
•; • l : [N] CV
[LKP-ACC]
acc(l) = N
∀l ∈ dom(H) · •; • l : T H(l) = f → l
fields(T , l) = f T •; • l : T Γ T <: T • l acc(l)
[HEAP]
H
The reduction rules are defined in a big step fashion. The context N in ⇓N
refers to the target object of the current call, or the world context in case of
the main method. At the time of method invocation in [RED-CAL], the target
object of the body of the invoked method is l. Notice that the variable this is not
substituted in [l/x]e . Instead, this is replaced by l in the substitution provided
by the lookup function method(T , l, m).
•; e ⇓world H; l
[EXECUTION]
Le⇓l
H; e ⇓N H ; l H ; e ⇓N H ; l •; N l : T
method(T , l, m) = ...( x){e } H ; [l/x]e ⇓l H ; l
[RED-CAL]
H; e.m(e) ⇓N H ; l
H; e ⇓N H ; l / dom(H )
lT ∈
f = dom(fields(T , lT )) H = H , lT → {f → l}
[RED-NEW]
H; new T (e) ⇓N H ; lT
H; e ⇓N H ; l H ; e ⇓N H ; l
[RED-ASS]
H; e.f = e ⇓N H [l → H (l)[f → l ]]; l
H; e ⇓N H ; l
[RED-SEL]
H; e.f ⇓N H ; H (l)(f)
Finally we formalize some of the key properties of the type system. We present a
standard subject reduction result in Theorem 1, together with a statement that
goodness of a heap is invariant through expression reductions. This implies that
the heap invariants are maintained through program execution.
On Ownership and Accessibility 117
object. This is because the owner’s internal context can only be named from
within the owner. This highlights the role of the creator — only the creator can
authorize the created objects to access its own representation by defining their
accessibility and capability appropriately.
Obviously, our type system subsumes ownership types; ownership types are
special cases of our type system where the access modifier is the same as the
owner context and no context argument is abstracted. Moreover, the techniques
used in our type system may be applicable in other similar type systems for more
flexibility and expressiveness, such as Effective Ownership [17], Acyclic Types
[16] and Ownership Domains [1].
Aldrich and Chambers noted that ownership types cannot express the event
callback design pattern [1]. Typically, a callback object is created by a listener
object to observe some event. In the event, the callback object is invoked and
will notify the listener object. Callback objects share some of the problems of
iterators. The problem occurs when the callback object needs to directly mutate
the listener’s internal representation rather than use the listener’s interface. The
callback problem does not have such a serious performance issue as iterators do.
The issue here is really about adding some flexibility to the callback classes. For
example, instead of adding more methods (to be called by callback objects in
different events) into the listener class, each callback class may implement its
own code to mutate its listener. In our system, callback objects can be expressed
in exactly the same way as iterators — we may simply promote the access
modifier (permitted by the listener object) of callback objects high enough in
the ownership hierarchy so that they can be named by the user of the callback
objects.
Syntactic overhead for our types is that of ownership types plus an extra
access modifier for each type. As we have seen, with carefully selected defaults
type annotations can be reduced significantly. For instance, access modifiers
can be omitted for globally accessible objects; abstract contexts can be omitted
completely. Moreover, the ideas of Generic Ownership [21] can also be employed
here to reduce the amount of type annotations in the presence of class type
parameters.
As for ownership types, our type system allows separate compilation. It is
statically checkable and does not require any runtime support. Our dynamic
semantics can easily handle typecasting with runtime checks because it incor-
porates full owner information. However, in practice, the overhead for having
owner information available at runtime may be significant for systems with a
large number of small objects because each object will have two extra fields to
identify its owner and access contexts. In security sensitive applications, this
cost may well be worthwhile.
Ownership Domains [1] use a similar method where read-only fields (final
fields in Java) are used to name internal domains (partitions of contexts) instead
of read-only variables. The effect of moving variables to fields allows ownership
domains types to have a more flexible reference structure than ownership types
and JOE. To provide some safety with this approach, only domains declared as
public can be named via final fields. Access policy between domains is explicitly
declared and public domains are typically linked to private domains (which are
unnameable from outside). For soundness, object creation is restricted to the
owner domain of the current object or its locally defined subdomains. The fol-
lowing code shows a simple example of ownership domains. Iterator objects are
created in a public domain of the list object and used as interface objects by the
client. Note that a subclass of Iterator is needed to propagate the name of the
private domain owned (as an extra domain parameter) to the iterator objects.
We consider this to be a limited version of our context abstraction: essentially
the Iterator interface hides the Node owner that is a required capability for the
ListIterator object.
class List<o, d> assumes o->d {
domain owned; link owned->d;
public domain iters; link iters->owned, iters->d;
Node<owned, d> head;
Iterator<iters, d> getIter() {
return new ListIterator<iters, d, owned>(head); } }
// in client class
final Link<some, world> list = ...
Iterator<list.iters, world> iter = list.getIter();
In practice, some problems may arise with read-only variables/fields. For ex-
ample, in order to access an object in a context/public domain, it must firstly
obtain a reference to the owner object of the context and then must place the
120 Y. Lu and J. Potter
owner in a read-only variable. Only in this way can the context be named through
the name of the read-only variable and a valid type be constructed for the object
to be accessed. When accessing an object buried deep in the ownership tree, the
programmer may need to declare many read-only variables and obtain references
to each object along an ownership branch.
Moreover, the restriction on where objects can be created may limit some
common programming practices, the factory design pattern for instance, where
objects need to be created in various contexts/domains given by clients. The
explicitly defined domains add finer-grained structures to the system at the cost
of more domain and link annotations. Domains can be used to express some
architectural constraints more precisely than ownership types do, because these
constraints can be expressed directly as links which defines access policy between
each pair of domains.
The inner class solution was suggested for ownership types by Clarke in his
PhD thesis [7] and adopted by Boyapati et al. [4]. The idea of inner classes is very
simple; inner classes can name the outer object directly. The following example
shows the Iterator class is written as an inner class of the List class, who can
name the list object’s internal context directly via List.this.
Inner classes are lexically scoped and can only be used in limited places where
the usage of the objects are specific and can be foreseen by the programmer. In
general they are not as flexible as our type system.
The closest work to our type system may be the model of Simple Ownership
Types [10]. In this model, there is a separation of owner and rep contexts. The
owner context of the object determines which other objects may access it (like
our accessibility context), while the rep context determines those contexts it
may access (like this). The containment invariant states that if l references l
then rep(l) owner(l ). There are a number of major differences between the
two models. The owner context in simple ownership types is a formal parameter
of the class definition; it controls access to the object rather than defining the
containment structure of objects/contexts — we prefer to reserve the notion of
owner for the latter role. To preserve soundness this prevents the owner context
from being variant. Moreover, as for all context parameters, the owner context
must be a dominator of the rep context (which can be thought of as the object
itself). Although simple ownership types use an explicit form of existential types
to hide the internal contexts, it does not support variance of context arguments,
as our system does.
Ownership Type Systems With Owners-As-Dominators. The proposals
to retain the owners-as-dominators encapsulation allow some references to cross
encapsulation boundary but ensure these reference cannot update the internal
states directly (called observational representation exposure in [3]), that is, any
update still has to be initialized by the owner object.
On Ownership and Accessibility 121
The Universes System [18, 19] uses read-only references to cross the boundary
of encapsulation. Its read-only references are restricted and can only return read-
only references. For example, they are able to express iterators by using a read-
only reference to access the internal implementation of the list object. However,
these iterators can only return data elements in read-only mode, that is, the
elements in the list cannot be updated in this way (unless using dynamic casting
with its associated runtime overheads [19]).
Effective Ownership [17] employs an encapsulation-aware effect system which
allows arbitrary reference structure but still retain an owners-as-dominators en-
capsulation on object representation. It guarantees that any update to an ob-
ject’s internal state must occur (directly or indirectly) via a call on a method
of its owner. In contrast to Universes, effective ownership’s cross-encapsulation
references can be used to mutate data elements via references held by a list ob-
ject, while still protecting the list’s representation from being modification. One
limitation of effective ownership is that the iterator objects cannot be used to
update the list’s internal implementation, such as adding or removing elements
from the list, because of the strong owners-as-dominators effect encapsulation.
Other Type Systems for Alias Protection. Many type systems have been
proposed for alias protection. Confined types [26] manage aliases based on a
package level encapsulation which provides a lightweight but weaker constraint
than instance-level object encapsulation. Uniqueness techniques [27, 13] allow
local reasoning and can prevent representation exposure by forbidding sharing;
a reference is unique if it is the only reference to an object. External unique-
ness combines the benefit of ownership types with uniqueness [8]. Boyland et
al designed a system to unify uniqueness and read-only capability [5] where a
reference is combined with a capability. Adoption [12, 6] can be used to provide
information hiding similar to object encapsulation, but it is not clear how com-
mon object-oriented patterns such as iterators can be expressed in this approach.
Alias types [23, 28] allow fine control on aliases at the cost of more complex an-
notations.
Variance on Parametric Types. The idea of use-site variance on type argu-
ments of parametric class was first introduced informally by Thorup and Torg-
ersen but only for covariance [24]. Igarashi and Viroli added contravariance and
bivariance (complete abstraction) in their Variant Parametric Types and formal-
ized type variances as bounded existential types. Our type system uses a similar
technique, where, instead of types with subtyping, we rely on the containment
ordering of ownership. Usually type systems with existential types would need
some form of pack/unpack or close/open pairing to distinguish between typing
contexts where the existential type is visible or not. Igarashi and Viroli used
this idea directly in their type system, without exposing the existential types
in the language syntax. Our use of the *, K+ and K− context variances in the
language syntax and ?, K+? and K−? in the type system is somewhat akin to
the use of pack/unpack mechanisms for existential types, but simpler. In partic-
ular, we avoid introducing new names for contexts into environments by keeping
122 Y. Lu and J. Potter
6 Conclusion
This paper has presented variant ownership types that generalize ownership
types by separating accessibility of a type from its capability. Combined with
context variance, the resulting type system significantly improves the expres-
siveness and utility of ownership types. The authors wish to acknowledge the
support of the Australian Research Council Grant DP0665581.
References
1. J. Aldrich and C. Chambers. Ownership domains: Separating aliasing policy
from mechanism. In In European Conference on Object-Oriented Programming
(ECOOP), July 2004.
2. P. S. Almeida. Balloon types: Controlling sharing of state in data types. Lecture
Notes in Computer Science, 1241:32–59, 1997.
3. A. Birka and M. D. Ernst. A practical type system and language for reference
immutability. In OOPSLA ’04: Proceedings of the 19th annual ACM SIGPLAN
Conference on Object-Oriented Programming, Systems, Languages, and Applica-
tions, pages 35–49. ACM Press, 2004.
4. C. Boyapati, B. Liskov, and L. Shrira. Ownership types for object encapsulation.
In Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of
Programming Languages, pages 213–223. ACM Press, 2003.
5. J. Boyland, J. Noble, and W. Retert. Capabilities for sharing: A generalisation
of uniqueness and read-only. In In European Conference on Object-Oriented Pro-
gramming (ECOOP), pages 2–27, 2001.
6. J. T. Boyland and W. Retert. Connecting effects and uniqueness with adoption.
In POPL ’05: Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages, pages 283–295, New York, NY, USA, 2005.
ACM Press.
7. D. Clarke. Object Ownership and Containment. PhD thesis, School of Computer
Science and Engineering, The University of New South Wales, Sydney, Australia,
2001.
8. D. Clarke and T. Wrigstad. External uniqueness is unique enough. In In European
Conference on Object-Oriented Programming (ECOOP), July 2003.
9. D. G. Clarke and S. Drossopoulou. Ownership, encapsulation and disjointness
of type and effect. In 17th Annual Conference on Object-Oriented Programming,
Systems, Languages, and Applications (OOPSLA), November 2002.
10. D. G. Clarke, J. Noble, and J. M. Potter. Simple ownership types for object
containment. In European Conference on Object-Oriented Programming (ECOOP),
2001.
11. D. G. Clarke, J. M. Potter, and J. Noble. Ownership types for flexible alias protec-
tion. In Proceedings of the 13th ACM SIGPLAN Conference on Object-Oriented
Programming, Systems, Languages, and Applications, pages 48–64. ACM Press,
1998.
On Ownership and Accessibility 123
12. M. Fahndrich and R. DeLine. Adoption and focus: practical linear types for im-
perative programming. In PLDI ’02: Proceedings of the ACM SIGPLAN 2002
Conference on Programming Language Design and Implementation, pages 13–24,
New York, NY, USA, 2002. ACM Press.
13. J. Hogg. Islands: aliasing protection in object-oriented languages. In OOPSLA ’91:
Proceedings of Conference on Object-Oriented Programming Systems, Languages,
and Applications, pages 271–285, New York, NY, USA, 1991. ACM Press.
14. A. Igarashi, B. Pierce, and P. Wadler. Featherweight Java: A minimal core calculus
for Java and GJ. In L. Meissner, editor, Proceedings of the 1999 ACM SIGPLAN
Conference on Object-Oriented Programming, Systems, Languages, and Applica-
tions (OOPSLA‘99), volume 34(10), pages 132–146, N. Y., 1999.
15. A. Igarashi and M. Viroli. On variance-based subtyping for parametric types. In
Proceedings of the 16th European Conference on Object-Oriented Programming,
pages 441–469. Springer-Verlag, 2002.
16. Y. Lu and J. Potter. A type system for reachability and acyclicity. In Proceedings
of the 19th European Conference on Object-Oriented Programming, pages 479–503.
Springer-Verlag, 2005.
17. Y. Lu and J. Potter. Protecting representation with effect encapsulation. In Pro-
ceedings of the 33th ACM SIGPLAN-SIGACT Symposium on Principles of Pro-
gramming Languages. ACM Press, 2006.
18. P. Müller and A. Poetzsch-Heffter. Universes: A type system for controlling repre-
sentation exposure. Programming Languages and Fundamentals of Programming,
1999.
19. P. Müller and A. Poetzsch-Heffter. Universes: A type system for alias and depen-
dency control. Technical Report 279, Fernuniversität Hagen, 2001.
20. J. Noble, J. Vitek, and J. Potter. Flexible alias protection. In European Conference
on Object-Oriented Programming (ECOOP), 1998.
21. A. Potanin, J. Noble, and R. Biddle. Generic ownership: practical ownership control
in programming languages. In OOPSLA Companion, pages 50–51, 2004.
22. J. Potter, J. Noble, and D. Clarke. The ins and outs of objects. In Australian
Software Engineering Conference. IEEE Press, 1998.
23. F. Smith, D. Walker, and G. Morrisett. Alias types. Lecture Notes in Computer
Science, 1782:366–381, 2000.
24. K. K. Thorup and M. Torgersen. Unifying genericity - combining the benefits of
virtual types and parameterized classes. In ECOOP, pages 186–204, 1999.
25. M. Torgersen, C. P. Hansen, E. Ernst, P. von der Ahé, G. Bracha, and N. M. Gafter.
Adding wildcards to the java programming language. In SAC, pages 1289–1296,
2004.
26. J. Vitek and B. Bokowski. Confined types. In Proceedings of the 14th Annual Con-
ference on Object-Oriented Programming, Systems, Languages, and Applications,
pages 82–96. ACM Press, 1999.
27. P. Wadler. Linear types can change the world! In M. Broy and C. Jones, editors,
IFIP TC 2 Working Conference on Programming Concepts and Methods, Sea of
Galilee, Israel, pages 347–359. North Holland, 1990.
28. D. Walker and G. Morrisett. Alias types for recursive data structures. Lecture
Notes in Computer Science, 2071:177–206, 2001.
Scoped Types and Aspects for Real-Time Java
1 Introduction
The Real-Time Specification for Java (RTSJ) introduces abstractions for managing re-
sources, such as non-garbage collected regions of memory [4]. For instance, in the
RTSJ, a series of scoped memory classes let programmers manage memory explicitly:
creating nested memory regions, allocating objects into those regions, and destroying
regions when they are no longer needed. In a hard real-time system, programmers must
use these classes, so that their programs can bypass Java’s garbage collector and its
associated predictability and performance penalties. But these abstractions are far from
abstract. The RTSJ forces programmers to face more low-level details about the behav-
iour of their system than ever before — such as how scoped memory objects correspond
to allocated regions, which objects are allocated in those regions, how those the regions
are ordered — and then rewards any mistakes by throwing dynamic errors at runtime.
The difficulty of managing the inherent complexity associated with real-time concerns
ultimately compromises the development, maintenance and evolution of safety critical
code bases and increases the likelihood of fatal errors at runtime.
This paper introduces Scoped Types and Aspects for Real-Time Systems (STARS),
a novel approach for programming real-time systems that shields developers from many
Application
Application
Application Real-time
Logic
Logic JavaCOP Aspect
Logic
(.java)
(.java) Verifier Weaver Java
(.java)
Fig. 1. Overview of STARS. Application logic is written according to the Scoped Types disci-
pline. The JAVACOP verifier uses scoped types rules (and possibly some user-defined application-
specific constraints) to validate the program. Then, an aspect weaver combines the application
logic with the real-time behaviour. The result is a real-time Java program that can be executed on
any STARS-compliant virtual machine.
5. Empirical evaluation. We conducted a case study to show the impact STARS has
on both code quality and performance in a 20 KLoc hard real-time application.
Refactoring RTSJ code to a STARS program proved easy and the resulting program
enjoyed a 28% performance improvement over the RTSJ equivalent.
Compared with our previous work, STARS presents two major advances. First, Sco-
ped Types enforce a per-owner relation [10, 18] via techniques based on Confined Types
[9, 22]. The type system described here refines the system described in [21] which in-
cludes a proof of correctness, but no implementation. In fact, the refactoring discussed
in that paper does not type check under the current type system. Secondly, the idea of
using aspects to localize real-time behaviour is also new.
The paper proceeds as follows. After a survey of background and previous work,
Section 2 presents an overview of the STARS programming model while Section 3
overviews the current STARS prototype implementations. Section 4 follows with a case
study using STARS in the implementation of a real-time collision detection system.
Finally we conclude with discussion and future work.
avoiding the need for explicit manipulation of memory areas [15]. Limitations of this
approach include the fact that memory areas cannot be multi-threaded.
In contrast, systems like Islands [13], Ownership Types [10], and their successors
restrict the scope of references to enable modular reasoning. The idea of using own-
ership types for the safety of region-based memory was first proposed by Boyapati et
al. [5], and required changes to the Java syntax and explicit type annotations. Research
in type-safe memory memory management, message-based communication, process
scheduling and the file system interface management for Cyclone, a dialect of C, has
shown that it is possible to prevent dangling pointers even in low-level codes [11]. The
RTSJ is more challenging than Cyclone as scopes can be accessed concurrently and are
first-class values.
Scoped types are one of the latest developments in the general area of type systems
for controlled sharing of references [21]. This paper builds on Scoped Types and pro-
poses a practical programming model targeting the separation of policy and mechanism
within real-time applications. The key insight of Scoped Types is the necessity to make
the nested scope structure of the program explicit: basically, every time the program-
mer writes an allocation expression of the form new Object(), the object’s type
shows where the object fits into the scope structure of the program. It is not essential
to know which particular scope it will be allocated in, but rather the object’s hierarchi-
cal relationship to other objects. This ensures that when an assignment expression, e.g.
obj.f=new F(), is encountered, Scoped Types can statically (albeit conservatively)
ensure that the assignment will not breach the program’s scope structure.
STARS guides the design and implementation of real-time systems with a simple, ex-
plicit programming model. As the STARS name suggests, this is made up of two parts,
Scoped Types, and Aspects. First, Scoped Types ensure that the relative memory loca-
tion of any object is obvious in the program text. We use nested packages to define a
static scope hierarchy in the program’s code; a pluggable type checker ensures programs
respect this hierarchy; at runtime, the dynamic scope structure simply instantiates this
static hiearchy. Second, we use Aspect-Oriented Programming to decouple the real-time
parts of STARS programs from their application logic. Aspects are used as declarative
specifications of the real-time policies of the applications (the size of scoped memory
areas or scheduling parameters of real time threads), but also to link Scoped Types to
their implementations within a real-time VM.
The main points of the STARS programming model are illustrated in Fig. 3. The main
abstraction is the scoped package. A scoped package is the static manifestation of an
RTSJ scoped memory area. Classes defined within a scoped package are either gates or
scoped classes. Every instance of a gate class has its own unique scoped memory area,
and every instance of a scoped class will be allocated in the memory area belonging to a
gate object in the same package. Because gate classes can have multiple instances, each
scoped package can correspond to multiple scoped memory areas at runtime (one for
each gate instance), just as a Java class can correspond to multiple instances. Then, the
dynamic structure of the nested memory areas is modelled by the static structure of the
Scoped Types and Aspects for Real-Time Java 129
imm heap
imm.B
package imm;
B A
class B {...}
G Gate
imm.g
package imm.g;
Fig. 3. The STARS Programming Model. Each runtime scope has a corresponding Java package.
Objects defined in a package are always allocated in a corresponding scope. A scope’s Gate is
allocated in its parent scope.
nested scoped packages, in just the same way that the dynamic structure of a program’s
objects is modelled by the static structure of the program’s class diagram.
Scoped types are allowed to refer to types defined in an ancestor package, just as
in RTSJ, objects allocated in a scope are allowed to refer to an ancestor scope: the
converse is forbidden. The root of the hierarchy is the package imm, corresponding to
RTSJ’s immortal memory. There will be as many scoped memory areas nested inside
the immortal memory area as there are instances of the gate classes defined in imm’s
immediate subpackages.
STARS does impact the structure of Real-time Java programs. By giving an addi-
tional meaning to the package construct, we de facto extend the language. This form
of overloading of language constructs has the same rationale as the definition of the
RTSJ itself — namely to extend a language without changing its syntax, compiler, or
intermediate format. In practice, STARS changes the way packages are used: rather
than grouping classes on the basis of some logical criteria, we group them by lifetime
and function. In our experience, this decomposition is natural as RTSJ programmers
must think in terms of scopes and locations in their design. Thus it is not surprising
to see that classes that end up allocated in the same scope are closely coupled, and so
grouping them in the same package is not unrealistic. We argue that this package struc-
ture is a small price to pay for STARS’ static guarantees, and for the clarity it brings to
programs’ real-time, memory dependent code.
refer to the fully qualified name of a package. We refer to types not defined in a scoped
package as heap types.
Rule 1 (Scoped Types).
1. The package imm is a scoped package. Any package nested within a scoped package
is scoped.
2. Any type not defined in a scoped package is a heap type.
3. The type of a gate class p.G defined within a scoped package p is a gate type.
4. The type of any non-gate interface or class p.S defined within a scoped package
p is a scoped type. The type of an array with elements of scoped type is a scoped
type.
Rule 2 (Visibility).
1. An expression of scoped type p.S is visible in any type defined in p or any of its
subpackages.
2. An expression of gate type p.G is visible in any type defined in the immediate
super-package of p. An exception to this rule is the local variable this which can
be used within a gate class.
3. The type of the top-level gate imm.G is visible in heap types.
4. An expression of heap type is only visible in other heap types.
The visibility rule encodes the essence of the RTSJ access rules. An object can be ref-
erenced from its defining memory area (denoted statically by a package), or from a
memory area with shorter lifetime (a nested package). Gate classes are treated differ-
ently, as they are handles used from a parent scope to access a memory area. They must
only be accessible to the code defined in the parent scope. The reason other types in
the same scope package cannot refer to a gate is that we must avoid confusion between
gates of the same type; a parent can instantiate many gates of the same type and the
contents of these gates must be kept separate. Even though a gate’s type is not visible in
its own class, a single exception is made so that a gate object can refer to itself through
the this pointer (because we know which gate “this” is).
Rule 3 (Widening). An expression of a scoped type p.S can be widened only to an-
other scoped type in p. An expression of a gate type p.G cannot be widened to any
other types.
Rule 3 is traditional in confined type systems where types are used to enforce structural
properties on the object graph. Preventing types from being be cast to arbitrary super-
types (in particular Object) makes it possible to verify Rule 2 statically.
Rule 4 (Method Inheritance). An invocation of some method m on an expression of
scoped type p.S where p is a scoped package is valid if m is defined in a class p.S’
in the same package. An invocation of a method m on an expression of gate type p.G is
valid only if m is defined in p.G.
Rule 4 prevents a more subtle form of reference leak: within an inherited method, the
receiver (i.e. this) is implicitly cast to the method’s defining class — this could lead
to a leak if one were to invoke a method inherited from a heap class.
Scoped Types and Aspects for Real-Time Java 131
Rule 5 (Constructor Invocation). The constructor of a scoped class p.S can only be
invoked by methods defined in p.
Rule 5 prevents a subpackage from invoking new on a class that is allocated in a dif-
ferent area than the currently executing object. This rule is not strictly necessary, as
an implementation could potentially reflect upon the static type of the object to dy-
namically obtain the proper scope. In our prototype, we use factory methods to create
objects.
Rule 6 (Static Reference Fields). A type p.S defined in a scoped package p is not
allow to declare static reference fields.
A static variable would be accessible by different instances of the same class allocated
in different scopes.
2.2 Correctness
The fact that a package can only have one parent package trivially ensures that the RTSJ
single parent rule will hold. Moreover, a scope-allocated object o may only reference
objects allocated in the scope of o, or scopes with a longer lifetime, preventing any RTSJ
IllegalAssignmentError. For example, suppose that the assignment o.f=o’
is in the scope s, where o and o’ have types p.C and p’.C’ respectively. If p.C is a
scoped type, then the rules above ensure that o and o’ can only be allocated in s or its
outer scopes. By Rules 2 and 3, the type of the field f is defined in p’, which is visible
to p.C. Thus, the package p’ is the same as or a super-package of p and consequently
o’ must be allocated in the scope of o or its outer scope. The same is true if p.C is
a gate type, in which case o either represents s or a direct descendant of s. A formal
soundness argument can be found in the extended version of this paper.
The STARS prototype has two software components — a checker, which takes plain
Java code that is supposed to conform to the Scoped Types discipline, and verifies that
it does in fact follow the discipline, and an series of AspectJ aspects that weaves in the
necessary low-level API calls to run on a real-time virtual machine.
We must ensure that only programs that follow the scoped types discipline are accepted
by the system: this is why we begin by passing our programs through a checker that en-
forces the discipline. Rather than implement a checker from scratch, we have employed
the JAVACOP “pluggable types” checker [1]. Pluggable types [6] are a relatively re-
cent idea, developed as extensions of soft type systems [8] or as a generalization of the
ideas behind the Strongtalk type system [7]. The key idea is that pluggable types layer
a new static type system over an existing (statically or dynamically typed) language,
allowing programmers to have greater guarantees about their programs’ behaviour, but
132 C. Andreae et al.
without the expense of implementing entirely new type systems or programming lan-
guages. JAVACOP is a pluggable type checker for Java programs — using JAVACOP,
pluggable type systems are designed by a series of syntax-directed rules that are layered
on top of the standard Java syntax and type system and then checked when the program
is compiled. STARS is a pluggable type system, and so it is relatively straightforward to
check with JAVACOP. The design and implementation of JAVACOP is described in [1].
The JAVACOP specification of the Scoped Type discipline is approximately 300 lines
of code. Essentially, we provide two kinds of facts to JAVACOP to describe Scoped
Types. First we define which classes must be considered scoped or gate types; and then
we to restrict the code of those classes according to the Scoped Type rules.
Defining Scoped Types is relatively easy. Any class declared within the imm package
or any subpackage is either a scoped type or a gate. Declaring a scoped type in the
JAVACOP rule language is straightforward: a class or interface is scoped if it is in a
scoped package and is not a gate. A gate is a class declared within a scoped package
and with a name that case-insensitively matches that of the package. Array types are
handled separately: an array is scoped if its element types are scoped.
The rule that enforces visibility constraints is only slightly more complex. The fol-
lowing rule matches on a class definition (line 1) and ensure that all types of all syntax
tree nodes found within that definition (line 2) meet the constraints of Scoped Types. A
number of types and syntactic contexts, such as Strings and inheritance declarations, are
deemed “safe” (safeNodes on line 3, definition omitted) and can be used in any con-
text. Lines 4-5 ensure that top level gates are only visible in the heap. Lines 7-8 ensure
that a gate is only visible in its parent package. Lines 10-11 ensure that the visibility of
a scoped type is limited to its defining package and subpackages. Lines 13-16 apply if c
is defined within a scoped package and ensure that types used within a scoped package
are visible.
Scoped Types and Aspects for Real-Time Java 133
We restrict widening of scoped types with the following rule. It states that if we are
trying to widen a scoped type, then the target must be declared in the same scoped pack-
age, and if the type is a gate widening disallowed altogether. The safeWidening-
Location predicate is an escape hatch that allows annotations that override the default
rules.
JAVACOP allows users to extend the Scoped Types specification with additional re-
strictions. It is thus possible to use JAVACOP to restrict the set of allowed programs
further. The prototype implementation has one restriction, though, it does not support
AspectJ syntax. JAVACOP is thus not able to validate the implementation of aspects.
As long as aspects remain simple and declarative, this will not be an issue. But in the
longer term we would like to see integration of a pluggable type checker with an Aspect
language.
Though the design of memory management in a real-time system may be clear, typ-
ically, its implementation will be unclear, because it is inherently tangled through-
134 C. Andreae et al.
out the code. For this reason we chose an aspect-oriented approach for modulariz-
ing scope management. This part of STARS is implemented using a (subset of) the
Aspect-Oriented Programming features provided by AspectJ [14]. For performance,
predictability and safety reasons we stay away from dynamic or esoteric features such
as cflow and features that require instance-based aspect instantiation such as perthis and
pertarget.
After the program has been statically verified, aspects are composed with the plain
Java base-level application. The aspects weave necessary elements of the RTSJ API
into the system. This translation (and the aspects) depend critically upon the program
following the Scoped Type discipline: if the rules are broken, the resulting program
will no longer obey the RTSJ scoped memory discipline, and then either fail at runtime
with just the kind of an exception we aim to prevent; or worse, if running on a virtual
machine that omits runtime checks, fail in some unchecked manner.
STARS programs are written against a simple API, shown in Fig. 4. The use of the
API is intentionally simple. Gate classes must extend scope.Gate, which gives ac-
cess to only two methods: waitForNextPeriod(), which is used to block a thread
until its next release event, and runInThread(), which is used to start a new real-
1 package scope;
2
Fig. 4. STARS Interface. The scope package contains two classes, STARS and Gate, and
an abstract aspect ScopedAspect. Every gate class inherits from Gate and has access to
two methods waitForNextPeriod() and runInThread(). Every STARS aspect extends
ScopedAspect, must define pointcut InScope and has access to a number of predefined
pointcuts.
Scoped Types and Aspects for Real-Time Java 135
time thread. The single argument of runInThread is an instance of class that im-
plements the Runnable interface. The semantics of the method is that the argument’s
run method will be executed in a new thread. The characteristics of the thread are left
unbound in the Java code.
STARS aspects must deal with two concerns: the specifics of the memory area asso-
ciated with each gate and the binding between invocations of runInThread() and
real-time threads. Specifying memory area parameters is done by declaring a before
advice to the initialization of a newly allocated gate. The privileged nature of the as-
pect allows the assignment to the Gate.mem private field. The ScopedMemory class
is abstract, the advice must specify one of its subclasses LTMemory and VTMemory
which provide linear time and variable time allocation of objects in scoped memory
areas respectively. It must also declare a size for the area.
The above example shows an advice for class MyGate. The memory area associated
has size sz and is of type VTMemory. The code can get more involved when Size-
Estimators are used to determine the proper size of the area.
It is noteworthy that the mem field is not accessible from the application logic as it
is declared private. This means that memory areas are only visible from aspects. (As
an aside, strict application of the scoped type discipline would preclude use of those
classes in any case.)
1 Opaque MemoryArea.STARSenter();
2 void MemoryArea.STARSrethrow(Opaque,Throwable);
3 void MemoryAreaSTARSexit(Opaque area);
4
We show two key advices from the ScopedAspect introduced in Figure 4. The
first advice executes before the instance initializer of any scoped class or array (lines
1-4). This advice obtains the area of o – which is the object performing the allocation
– and sets the allocation context to that area. The reasoning is that if we are executing a
new then the target class must be visible. We thus ensure that it is co-located.
We use the second advice to modify the behaviour of any call to a gate (recall that these
can only originate from the immediate parent package). This around advice uses the
memory region field of the gate to change allocation context. When the method returns
we restore the previous area.
Intrinsics. Some important features of the standard Java libraries are presented as sta-
tic methods on JDK classes. Invoking static methods from a scoped package, and espe-
cially ones that are not defined in the current package, is illegal. This is too restrictive
and we relaxed the JAVACOP specification to allow calls to static methods in the fol-
lowing classes System, Double, Float, Integer, Long, Math, and Number.
Moreover, we have chosen to permit the use of java.lang.String in scoped pack-
ages. Whether this is wise is debatable – for debugging purposes it is certainly useful
to be able to construct messages, but it opens up an opportunity for runtime memory
Scoped Types and Aspects for Real-Time Java 137
errors. It is conceivable that the JAVACOP rules will be tightened in the future to better
track the use of scope allocated strings.
Annotations. We found that in rare cases it may be necessary to let users override
the scoped type system — typically where (library) code is clearly correct, but where
it fails the conservative Scoped Types checker. For this we provide two Java 5 anno-
tations that are recognised by the JAVACOP rules. @WidenScoped permits to de-
clare that an expression which performs an otherwise illegal widening is deemed safe.
@MakeVisible takes a type and makes it visible within a class or method.
Native methods. Native methods are an issue for safety. This is nothing new, even
normal Java virtual machines depend on the correctness of the implementation of native
methods for type safety. We take the approach that native methods are disallowed unless
explicitly permitted in a JAVACOP specification.
Finalizers. While the STARS prototype allows finalizers, we advocate that they should
not be used in scoped packages. This because there is a well-known pathological case
where a NoHeapRealtimeThread can end up blocking for the garbage collector
due to the interplay of finalization and access to scope by RealtimeThreads. This
constraint is not part of the basic set of JAVACOP rules. Instead we add it as a user-
defined extension to the rule set. This is done by the following rule:
138 C. Andreae et al.
We conducted a case study to demonstrate the relative benefits of STARS. The software
system used in this experiment is modeling a real-time collision detector (or CD). The
collision detector algorithm is about 25K Loc and was originally written with the Real-
time Specification for Java. As a proof-of-concept for our proposal, we refactored the
CD to abide by the scoped type discipline and to use aspects.
The architecture of the STARS version of the CD is given in Fig. 5. The application
has three threads, a plain Java thread running in the heap to generate simulated work-
loads, a 5Hz thread whose job is to communicate results of the algorithm to an output
device and finally a 10Hz NoHeapRealtimeThread which periodically acquires
a data frame with positions of aircrafts from simulated sensors. The system must de-
imm.runner
StateTable Runner
Vector3d HashMap
imm.runner.detector Simulation
Detector Printer (Java Thread)
Thread
(5Hz)
Dector
Vector3d StateTable2
Thread
(10Hz)
Fig. 5. Collision Detector. The CD uses two scoped memory areas. Two threads run in the heap:
the first simulates a workload, the second communicate with an output device. The memory
hierarchy consists of imm (immortal memory) for the simulated workload, imm.runner for
persistent data, and imm.runner.detector for frame specific data.
Scoped Types and Aspects for Real-Time Java 139
tect collision before they happen. The numbers of planes, airports, and nature of flight
restrictions are variables to the system.
The refactoring was done in three stages. First, we designed a scope structure for
the program based on the ScopedMemory areas used in the CD. Second, we moved
classes amongst packages so that the STARS-CD package structure matched the scope
structure. Third, we removed or replaced explicit RTSJ memory management idioms
with equivalent constructs of our model.
Fig. 6 compares the package structure of the two versions. In the original CD the
packages atc and command were responsible of computing trajectories based on a
user-defined specification. They were not affected by the refactoring. Package
detector contained all of the RTSJ code as well the program’s main(). Finally
util contained a number of general purpose utility classes. We split the code in the
detector package in four groups. The package heap contains code that runs in the
heap–this is the main and the data reporting thread. The package imm contains classes
that will be allocated in immortal memory and thus never reclaimed. Below immor-
tal memory there is one scope that contains the persistent state of the application, we
defined a package imm.runner for this. The main computation is done in the last
package, imm.runner.detector. This is the largest real-time package which con-
tains classes that are allocated and reclaimed for each period.
The entire code of the real-time aspect for the CD is given in Fig. 7. This aspect
simply declares the memory area types for the imm.runner and imm.runner.-
detector gates. Then it gives an around advice that specifies that the thread used by
the CD algorithm is a NoHeapRealtimeThread and gives appropriate scheduling
and priority parameters.
The overall size of the Scoped CD has increased because we had to duplicate some of
the utility collection classes. This duplication is due to our static constraints. A number
of collection classes were used in the imm.runner package to represent persistent
state, and in the imm.runner.detector package to compute collisions. While we
could have avoided the duplication by fairly simple changes to the algorithm and the
use of problem specific collections, our goal was to look at the ‘worst-case’ scenario,
so we tried to make as few changes to the original CD as possible. The methodology
Fig. 7. Real-time Aspect for the CD. The aspect specifies the characteristics of memory areas as
well as that of the real-time thread used by the application. The CD logic does not refer to any of
the RTSJ APIs.
Scoped Run Loop. At the core of the CD is an instance of the ScopedRunLoop pattern
identified in [19]. The Runner class creates a Detector and periodically executes
the detector’s run() method within a scope. Fig. 8 shows both the RTSJ version and
the STARS version. In the RTSJ version, the runner is a NoHeapRealtimeThread
which has in its run() method code to create a new scoped memory (lines 11-12)
and a run loop which repeatedly enters the scope passing a detector as argument (lines
17-18).
In the STARS version, Runner and Detector are gates to nested packages. Thus
the call to run() on line 16 will enter the memory area associated with the detec-
tor. Objects allocated while executing the method are allocated in this area. When the
method returns these objects will be reclaimed. Fig. 9 illustrates how a Runner is
Scoped Types and Aspects for Real-Time Java 141
started. In the RTSJ version a scoped memory area is explicitly created (lines 2-3) and
the real-time arguments are provided (lines 6-11). In the STARS version most of this
is implicit due to the fact that a runner is a gate and the use of the runInThread()
method which is advised to create a new thread. What should be noted here is that
STARS clearly separates the real-time support from the non-real-time code. In fact we
can define an alternative aspect which allows the program to run in a standard JVM.
4 public Runner( 4
5 PriorityParameters r, 5
6 PeriodicParameters p, 6
7 MemoryArea m) { 7
8 super(r, p, m); 8
9 } 9
19 } 19
20 } 20
Fig. 8. Scoped Run Loop Example. The Runner class: RTSJ version (on the left) and Scoped
version (on the right).
7 new RelativeTime(PER,0), 7
8 new RelativeTime(5,0), 8
9 new RelativeTime(50,0), 9
10 null,null), 10
11 memory); 11
12 rt.start(); 12
13 } 13
Fig. 9. Starting up. The imm.Imm.run() method: RTSJ version (left-hand side) and Scoped
version (right-hand side).
142 C. Andreae et al.
1 class StateTable {
2 HashMap prev = new HashMap();
3 Putter putter = new Putter();
4
5 List createMotions(Frame f) {
6 List ret = new LinkedList();
7 for (...) {
8 Vector3d pos = new Vector3d();
9 Aircraft craft = iter.next(newpos);
10 ...
11 Vector3d old = (Vector3d) prev.get(craft);
12 if (old == null) {
13 putter.c = craft;
14 putter.v = pos;
15 MemoryArea current =
16 MemoryArea.getMemoryArea(this);
17 mem.executeInArea(putter);
18 }
19 }
20 return ret;
21 }
22
Fig. 10. RTSJ StateTable. This is an example of a RTSJ multiscoped object – an instance of class
allocated in one scope but with some of its methods executing in a child scope. Inspection of the
code does not reveal in which scope createMotions() will be run. It is thus incumbent on
the programmer to make sure that the method will behave correctly in any context.
Scoped Types and Aspects for Real-Time Java 143
is performed in lines 15-17 by first obtaining the area in which the StateTable was al-
located, and finally executing the Putter in that area (line 17). This code is a good
example of the intricacy (insanity?) of RTSJ programming.
The scoped solution given in Fig. 11 makes things more explicit. The StateTable
class is split in two. One class, imm.runner.StateTable, for persistent state
and a second class, imm.runner.detector.StateTable2 that has the update
method. This split makes the allocation context explicit. A StateTable2 has a ref-
erence to the persistent state table. The createMotions() method is split in two
parts, one that runs in the transient area (lines 23-30) and the other that performs the
update to the persistent data (lines 8-14).
Since our type system does not permit references to subpackages the arguments to
StateTable.put() are primitive. The most displeasing aspect of the refactoring is
that we had to duplicate the Vector3d class - there are now two identical versions - in
each imm.runner and imm.runner.detector. We are considering extensions
to the type system to remedy this situation.
1 package imm.runner;
2 public class Vector3d { ... }
3
16 package imm.runner.detector;
17 class Vector3d { ... }
18
19 class StateTable2 {
20 StateTable table;
21
22 List createMotions(Frame f) {
23 List ret = new LinkedList();
24 for (...) {
25 Vector3d pos = new Vector3d();
26 ...
27 table.put(craft, pos.x, pos.y, pos.z);
28 }
29 return ret;
30 }
31 }
Fig. 11. STARS StateTable. With scoped types the table is split in two. This makes the allocation
context for data and methods explicit.
144 C. Andreae et al.
40
35
30
25
(a) RTSJ
20 with scope checks
15
median 17
10 avg 15.4
max 21
5
0 50 100 150 200 250
40
35
30
25
(b) STARS
20 no scope checks
15
median 12
10 avg 11.2
max 15
5
0 50 100 150 200 250
40
35
30
25
(a) Real-time GC
20 with scope checks
15
median 16
10 avg 15.4
max 42
5
0 50 100 150 200 250
Fig. 12. Performance Evaluation. Comparing the performance of the collision detection imple-
mented with (a) RTSJ, (b) STARS and (c) Java with a real-time garbage collector. We measure
the time needed to process one frame of input by a 10Hz high-priority thread. The x-axis shows
input frames and the y-axis processing time in milliseconds. The RTGC spikes at 43ms when
the GC kicks in. No deadlines are missed. The average per frame processing time of STARS is
28% less than that of RTSJ and RTGC. Variations in processing time are due to the nature of the
algorithm.
146 C. Andreae et al.
of the software engineering benefits of Java. We were forced to duplicate the code of
some common libraries in order to abide by the rules of scoped types. While there are
clear software engineering drawbacks to code duplication, the actual refactoring effort
in importing those classes was small. With adequate tool support the entire refactoring
effort took less than a day. The hard part involved discovering and disentangling the
scope structure of the programs that we were trying to refactor.
The benefits in terms of correctness can not be overemphasized. Every single prac-
titioner we have met has remarked on the difficulty of programming with RTSJ-style
scoped memory. In our own work we have encountered numerous faults due to incor-
rect scope usage. As a reaction against this complexity many RTSJ users are asking for
real-time garbage collection. But RTGC is not suited for all applications. In the context
of safety critical systems a number of institutions are investigating restricted real-time
’profiles’ in which the flexibility of scoped memory is drastically curtailed [12]. But
even in those proposals, there are no static correctness guarantees. Considering the cost
of failure, the effort of adopting a static discipline such as the one proposed here is well
justified.
We see several areas for future work. One direction is to increase the expressiveness
of the STARS API to support different kinds of real-time systems and experiment with
more applications to further validate the approach. Another issue to be addressed is to
extend JAVACOP to support AspectJ syntax. In the current system, we are not checking
aspects for memory errors. This is acceptable as long as aspects remain simple and
declarative, but real-time aspects may become more complex as we extend STARS,
and their static verification will become a more pressing concern. Finally we want to
investigate extensions to the type system to reduce, or eliminate, the need for code
duplication.
Acknowledgments. This work is supported in part by the National Science Foundation
under Grant No. 0509156, and in part by the Royal Society of New Zealand Marsden
Fund. Filip Pizlo and Jason Fox implemented the Collision Detector application, Ben
Titzer wrote supporting libraries. We are grateful to David Holmes and Filip Pizlo for
their comments, and to the entire Ovm team at Purdue.
References
1. Chris Andreae, James Noble, Shane Markstrum, and Todd Millstein. A framework for im-
plementing pluggable type systems. Submitted, March 2006.
2. Jason Baker, Antonio Cunei, Chapman Flack, Filip Pizlo, Marek Prochazka, Jan Vitek,
Austin Armbuster, Edward Pla, and David Holmes. A real-time Java virtual machine for
avionics. In Proceedings of the 12th IEEE Real-Time and Embedded Technology and Appli-
cations Symposium (RTAS 2006). IEEE Computer Society, 2006.
3. William S. Beebee, Jr. and Martin Rinard. An implementation of scoped memory for real-
time Java. In Proceedings of the First International Workshop on Embedded Software (EM-
SOFT), 2001.
4. Greg Bollella, James Gosling, Benjamin Brosgol, Peter Dibble, Steve Furr, and Mark Turn-
bull. The Real-Time Specification for Java. Addison-Wesley, June 2000.
5. Chandrasekhar Boyapati, Alexandru Salcianu, William Beebee, and Martin Rinard. Owner-
ship types for safe region-based memory management in real-time Java. In ACM Conference
on Programming Language Design and Implementation, June 2003.
Scoped Types and Aspects for Real-Time Java 147
6. Gilad Bracha. Pluggable type systems. In OOPSLA 2004 Workshop on Revival of Dynamic
Languages, 2004.
7. Gilad Bracha and David Griswold. Strongtalk: Typechecking smalltalk in a production en-
vironment. In In Proc. of the ACM Conf. on Object-Oriented Programming, Systems, Lan-
guages and Applications (OOPSLA), September 1993.
8. Robert Cartwright and Mike Fagan. Soft typing. In Proceedings of the ACM SIGPLAN 1991
conference on Programming language design and implementation, pages 278–292. ACM
Press, 1991.
9. Dave Clarke, Michael Richmond, and James Noble. Saving the world from bad Beans:
Deployment-time confinement checking. In Proceedings of the ACM Conference on Object-
Oriented Programming, Systems, Languages, and Appplications (OOPSLA), Anaheim, CA,
November 2003.
10. David G. Clarke, John M. Potter, and James Noble. Ownership types for flexible alias pro-
tection. In OOPSLA ’98 Conference Proceedings, volume 33(10) of ACM SIGPLAN Notices,
pages 48–64. ACM, October 1998.
11. Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Ch-
eney. Region-based memory management in Cyclone. In Proceedings of Conference on
Programming Languages Design and Implementation, pages 282–293, June 2002.
12. HIJA. European High Integrity Java Project. www.hija.info., 2006.
13. John Hogg. Islands: Aliasing Protection in Object-Oriented Languages. In Proceedings
of the OOPSLA ’91 Conference on Object-Oriented Programming Systems, Languages and
Applications, 1991.
14. Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G.
Griswold. An overview of AspectJ. Lecture Notes in Computer Science, 2072:327–355,
2001.
15. Jagun Kwon and Andy Wellings. Memory management based on method invocation in RTSJ.
In OTM Workshops 2004, LNCS 3292, pp. 33–345, 2004.
16. Jagun Kwon, Andy Wellings, and Steve King. Ravenscar-Java: A high integrity profile for
real-time Java. In Joint ACM Java Grande/ISCOPE Conference, November 2002.
17. NASA/JPL and Sun. Golden gate.
research.sun.com/projects/goldengate, 2003.
18. James Noble, John Potter, and Jan Vitek. Flexible alias protection. In Proceedings of the
12th Eurpean Conference on Object-Oriented Programming (ECOOP), Brussels, Belgium,
July 1998.
19. Filip Pizlo, Jason Fox, David Holmes, and Jan Vitek. Real-time java scoped memory: design
patterns and semantics. In Proceedings of the IEEE International Symposium on Object-
Oriented Real-Time Distributed Computing, May 2004.
20. David Sharp. Real-time distributed object computing: Ready for mission-critical embedded
system applications. In Proceeding of the Third International Symposium on Distribtued-
Objects and Applications, 2001.
21. Tian Zhao, James Noble, and Jan Vitek. Scoped Types for Realtime Java. In International
Real-Time Systems Symposium (RTSS 2004), Lisbon, Portugal, December 2004. IEEE.
22. Tian Zhao, Jens Palsberg, and Jan Vitek. Type-based confinement. Journal of Functional
Programming, 16(1), January 2006.
Transparently Reconciling Transactions with Locking
for Java Synchronization
Abstract. Concurrent data accesses in high-level languages like Java and C# are
typically mediated using mutual-exclusion locks. Threads use locks to guard the
operations performed while the lock is held, so that the lock’s guarded operations
can never be interleaved with operations of other threads that are guarded by the
same lock. This way both atomicity and isolation properties of a thread’s guarded
operations are enforced. Recent proposals recognize that these properties can also
be enforced by concurrency control protocols that avoid well-known problems
associated with locking, by transplanting notions of transactions found in data-
base systems to a programming language context. While higher-level than locks,
software transactions incur significant implementation overhead. This overhead
cannot be easily masked when there is little contention on the operations being
guarded.
We show how mutual-exclusion locks and transactions can be reconciled trans-
parently within Java’s monitor abstraction. We have implemented monitors for
Java that execute using locks when contention is low and switch over to trans-
actions when concurrent attempts to enter the monitor are detected. We formally
argue the correctness of our solution with respect to Java’s execution semantics
and provide a detailed performance evaluation for different workloads and vary-
ing levels of contention. We demonstrate that our implementation has low over-
heads in the uncontended case (7% on average) and that significant performance
improvements (up to 3×) can be achieved from running contended monitors trans-
actionally.
1 Introduction
There has been much recent interest in new concurrency abstractions for high-level
languages like Java and C#. These efforts are motivated by the fact that concurrent
programming in such languages currently requires programmers to make careful use of
mutual-exclusion locks to mediate access to shared data. Threads use locks to guard
the operations performed while the lock is held, so that the lock’s guarded operations
can never be interleaved with operations of other threads that are guarded by the same
lock. Rather, threads attempting to execute a given guarded sequence of operations will
execute the entire sequence serially, without interruption, one thread at a time. In this
way, locks, when used properly, can enforce both atomicity of their guarded operations
(they execute as a single unit, without interruption by operations of other threads that
are guarded by the same lock), and isolation from the side-effects of all operations by
other threads guarded by the same lock.
Unfortunately, synchronizing threads using locks is notoriously difficult and error-
prone. Undersynchronizing leads to safety violations such as race conditions. Even
when there are no race conditions, it is still easy to mistakenly violate atomicity guar-
antees [14]. Oversynchronizing impedes concurrency, which can degrade performance
even to the point of deadlock. To improve concurrency, some languages provide lower-
level synchronization primitives such as shared (i.e., read-only) locks in addition to the
traditional mutual-exclusion (i.e., read-write) locks. Correctly using these lower-level
locking primitives requires even great care by programmers to understand thread inter-
actions on shared data.
Recent proposals recognize that properties such as atomicity and isolation can be
enforced by concurrency control protocols that avoid the problems of locking, by trans-
planting notions of transactions found in database systems to the programming lan-
guage context [17, 20, 36]. Concurrency control protocols ensure atomicity and isolation
of operations performed within a transaction while permitting concurrency by allowing
the operations of different transactions to be interleaved only if the resulting schedule
is serializable: the transactions (and their constituent operations) appear to execute in
some serial order. Any transaction that might violate serializability is aborted in mid-
execution, its effects are revoked, and it is retried. Atomicity is a powerful abstraction,
permitting programmers more easily to reason about the effects of concurrent programs
independently of arbitrary interleavings, while avoiding problems such as deadlock and
priority inversion. Moreover, transactions relieve programmers of the need for careful
(and error-prone) placement of locks such that concurrency is not unnecessarily im-
peded while correctness is maintained. Thus, transactions promote programmability by
reducing the burden on programmers to resolve the tension between fine-grained lock-
ing for performance and coarse-grained locking for correctness.
Meanwhile, there is comprehensive empirical evidence that programmers almost al-
ways use mutual-exclusion locks to enforce properties of atomicity and isolation [14].
Thus, making transaction-like concurrency abstractions available to programmers is
generating intense interest. Nevertheless, lock-based programs are unlikely to disap-
pear any time soon. Certainly, there is much legacy code (including widespread use of
standard libraries) that utilizes mutual-exclusion locks. Moreover, locks are extremely
efficient when contention for them is low – in many cases, acquiring/releasing an un-
contended lock is as cheap as setting/clearing a bit using atomic memory operations
such as compare-and-swap. In contrast, transactional concurrency control protocols re-
quire much more complicated tracking of operations performed within the transaction
as well as validation of those operations before the transaction can finish. Given that
transaction-based schemes impose such overheads, many programmers will continue to
program using exclusion locks, especially when the likelihood of contention is low. The
advantages of transactional execution (i.e., improved concurrency, deadlock-freedom)
accrue only when contention would otherwise impede concurrency and serializability
violations are low.
These tradeoffs argue for consideration of a hybrid approach, where existing con-
currency abstractions (such as Java’s monitors) used for atomicity and isolation can be
150 A. Welc, A.L. Hosking, and S. Jagannathan
mediated both by locks and transactions. In fact, whether threads entering a monitor
acquire a lock or execute transactionally, so long as the language-defined properties of
the monitor are enforced, all is well from the programmer’s perspective. Dynamically
choosing which style of execution to use based on observed contention for the monitor
permits the best of both worlds: low-cost locking when contention is low, and improved
concurrency using transactions when multiple threads attempt to execute concurrently
within the monitor.
Complicating this situation is the issue of nesting, which poses both semantic and
implementation difficulties. When a nested transaction completes, isolation semantics
for transactions mandate that its effects are not usually globally visible until the outer-
most transaction in which it runs successfully commits. Such nesting is referred to as
closed, and represents the purest expression of nested transactions as preserving atom-
icity and isolation of their effects. In contrast, Java monitors expressed as synchronized
methods/blocks reveal all prior effects upon exit, even if the synchronized execution
is nested inside another monitor. Obtaining a meaningful reconciliation of locks with
transactions requires addressing this issue.
Our Contribution
In this paper, we describe how locks and transactions can be transparently reconciled
within Java’s monitor abstraction. We have implemented monitors for Java that execute
using locks when contention is low and switch over to transactions when concurrent
attempts to enter the monitor are detected. Our implementation is for the Jikes Research
Virtual Machine (RVM). To our knowledge, ours is the first attempt to consider hybrid
execution of Java monitors using both mutual-exclusion and transactions within the
same program.
Our treatment is transparent to applications: programs continue to use the standard
Java synchronization primitives to express the usual constraints on concurrent execu-
tions. A synchronized method/block may execute transactionally even if it was previ-
ously executed using lock-based mutual exclusion, and vice versa. Transactional execu-
tion dynamically toggles back to mutual-exclusion whenever aborting a given transac-
tion becomes infeasible, such as at native method calls. In both cases, hybrid execution
does not violate Java semantics, and serves only to improve performance.
We make the following contributions:
1. The design and implementation of a Java run-time system that supports imple-
mentation of Java monitors (i.e., synchronized methods/blocks) using both mutual-
exclusion and software transactions based on optimistic concurrency. A given mon-
itor will execute using either concurrency control mechanism depending on its con-
tention profile.
2. An efficient implementation of monitors as closed nested transactions. We intro-
duce a new mechanism called delegation that significantly reduces the overhead
of nested transactions when contention is low. Support for delegation is provided
through extensions to the virtual machine and run-time system.
3. A formal semantics that defines safety criteria under which mutual exclusion and
transactions can co-exist. We show that for programs that conform to prevalent
atomicity idioms, Java monitors can be realized using either transactions or mutual-
Transparently Reconciling Transactions with Locking 151
exclusion with no change in observable behavior. In this way, we resolve the appar-
ent mismatch in the visibility of the effects of Java monitors versus closed nested
transactions.
4. A detailed implementation study that quantifies the overheads of our approach. We
show that over a range of single-threaded benchmarks the overheads necessary to
support hybrid execution (i.e., read barriers, meta-data information on object head-
ers, etc.) is small, averaging less than 10%. We also present performance results on
a comprehensive synthetic benchmark that show how transactions that co-exist with
mutual-exclusion locks lead to clear run-time improvements over mutual-exclusion
only and transaction-only non-hybrid implementations.
2 A Core Language
To examine notions of safety with respect to transactions and mutual exclusion, we
define a two-tiered semantics for a simple dynamically-typed call-by value language
similar to Classic Java [16] extended with threads and synchronization. The first tier
describes how programs written in this calculus are evaluated to yield a schedule that
defines a sequence of possible thread interleavings, and a memory model that reflects
how and when updates to shared data performed by one thread are reflected in another.
The second tier defines constraints used to determine whether a schedule is safe based
on a specific interpretation of what it means to protect access to shared data; this tier
thus captures the behavior of specific concurrency control mechanisms.
Before describing the semantics, we first introduce the language informally (see Fig-
ure 1). In the following, we take metavariables L to range over class declarations, C to
range over class names, t to denote thread identifiers, M to range over methods, m to
range over method names, f and x to range over fields and parameters, respectively,
to range over locations, and v to range over values. We use P for process terms, and e
for expressions.
S YNTAX :
P ::= (P | P ) | t[e]
L ::= class C {f M}
M ::= m(x) { e }
e ::= x | | this | e.f | e.f := e | new C()
| e.m(e) | let x = e in e end | guard {e} e
| spawn (e)
3 Semantics
The semantics of the language are given in Figure 2. A value is either the distinguished
symbol null, a location, or an object C(), that denotes an instance of class C, in which
field fi has value i .
In the following, we use over-bar to represent a finite ordered sequence, for instance,
f represents f1 f2 . . . fn . The term αα denotes the extension of the sequence α with a
single element α, and α α denotes sequence concatenation, S.OPt denotes the exten-
sion of schedule S with operation OPt . Given schedules S and S , we write S S if
S is a subsequence of S .
Program evaluation and schedule construction is specified by a reduction relation,
P, Δ, Γ, S =⇒ P , Δ , Γ , S that maps program states to new program states. A state
consists of a collection of evaluating processes (P ), a thread store (Δ) that maps threads
to a local cache, a global store (Γ ) that maps locations to values, and a schedule (S) that
defines a collection of interleaved actions. This relation is defined up to congruence of
processes (P |P = P |P , etc.). An auxiliary relation t is used to describe reduction
steps performed by a specific thread t. Actions that are recorded by a schedule are
those that read and write locations, and those that acquire and release locks, the latter
generated as part of guard evaluation. Informally, threads evaluate expressions using
their local cache, loading and flushing their cache at synchronization points defined
by guard expressions. These semantics roughly correspond to a release consistency
memory model similar to the Java memory model [22].
The term t[E[e]] evaluated in a reduction rule is identified by the thread t in which
it occurs; thus EPt [e] denotes a collection of processes containing a process with thread
Transparently Reconciling Transactions with Locking 153
σ = σ[i → v]
S = S . rdt . wrt i
t fresh Δ = Δ[t → Δ(t)]
.fi := , σ, S t , σ , S P = P | t [e]
EP [spawn (e)], Δ, Γ, S =⇒ EPt [null], Δ , Γ, S
t
, fresh
σ = σ[ → C(), → null]
1 , . . . , n ∈
new C(), σ, S t , σ , S
Fig. 2. Semantics
identifier t executing expression e with context E. The expression “picked” for evalua-
tion is determined by the structure of evaluation contexts.
Most of the rules are standard: holes in contexts can be replaced by the value of the
expression substituted for the hole, let expressions substitute the value of the bound
variable in their body. Method invocation binds the variable this to the current re-
ceiver object, in addition to binding actuals to parameters, and evaluates the method
body in this augmented environment. Read and write operations augment the schedule
154 A. Welc, A.L. Hosking, and S. Jagannathan
in the obvious way. Constructor application returns a reference to a new object whose
fields are initialized to null.
To evaluate expression e within a separate thread, we first associate the new thread
with a fresh thread identifier, set the thread’s local store to be the current local store of
its parent, and begin evaluation of e using an empty context.
Let be the monitor for a guard expression. Before evaluating the body, the local
store for the thread evaluating the guard is updated to load the current contents of the
global store at location . In other words, global memory is indexed by the set of lo-
cations that act as monitors: whenever a thread attempts to synchronize against one of
these monitors (say, ), the thread augments its local cache with the store associated
with in the global store. The body of the guard expression is then evaluated with re-
spect to this updated cache. When the expression completes, the converse operation is
performed: the contents of the local cache are flushed to the global store indexed by .
Thus, threads that synchronize on different references will not have their updates made
visible to one another. Observe that the semantics do not support a single global store; to
propagate effects performed in one thread to all other threads would require encoding
a protocol that uses a global monitor for synchronization. To simplify the presenta-
tion, we prohibit nested guard expressions from synchronizing on the same reference
( ∈ lockset (S, t)).
3.1 Schedules
When the body of the guard is entered, the schedule is augmented to record the fact that
there was monitored access to e via monitor by thread t (acqΓt ). When evaluation of
the guard completes, the schedule is updated to reflect that reference is no longer used
as a monitor by t (relΓt ). The global store recorded in the schedule at synchronization
acquire and release points will be used to define safety conditions for mutual-exclusion
and transactional execution as we describe below.
These semantics make no attempt to enforce a particular concurrency model on
thread execution. Instead, we specify safety properties that dictate the legality of an
interleaving by defining predicates on schedules. To do so, it is convenient to reason in
terms of regions, subschedules produced as a result of guard evaluation:
region(S) = {Si S|Si = acqΓt .Si .relΓt }
For any region R = acqΓt .Si .relΓt , T (R) = t, and L(R) = .
The predicate msafe(S) defines the structure of schedules that correspond to an
interpretation of guards in terms of mutual exclusion:
Definition 1. Msafe
For a schedule to be safe with respect to concurrency control based on mutual exclu-
sion, multiple threads cannot concurrently execute within the body of a guarded region
Transparently Reconciling Transactions with Locking 155
protected by the same monitor. Thus if thread t is executing within a guard protecting
expression e using monitor , no other thread can attempt to acquire until t relin-
quishes it.
We can also interpret thread execution within guarded expressions in terms of trans-
actions. Under this interpretation, multiple threads can execute transactionally within
the same guarded expression concurrently. To ensure the legality of such concurrent
executions, we impose constraints that capture notions of transactional isolation and
atomicity on schedules:
Definition 2. Isolated
∀R ∈ region(S) = acqΓt .S .relΓt
∀rdt ∈ S
Γ () = σ ∧ Γ () = σ → σ( ) = σ ( )
isolated (S)
Isolation ensures that locations read by a guarded region are not changed during the re-
gion’s evaluation. The property is enforced by requiring that the global store associated
with a region’s monitors is not modified during the execution of the region. Note that
the global store Γ at the point control exits a guarded expression does reflect global
updates performed by other threads, but does not reflect local updates performed by
the current thread that have yet to be propagated. Thus, the isolation rule captures vis-
ibility constraints on schedules corresponding to execution within a guard expression;
namely, any location read within the schedule cannot be modified by other concurrently
executing threads.
Definition 3. Atomic
∀N, R ∈ region(S), = L(N ) R = Sb .N.Sa
T (N ) = T (R) = t ∧ t = t → acqΓt ∈ Sa
atomic(S)
Atomicity ensures that the effects of a guarded region are not propagated to other
threads until the region completes. Observe that our semantics propagate updates to the
global store when a guarded region exits; these updates become visible to any thread
that subsequently executes a guard expression using the same monitor. Thus, a nested
region may have its effects made visible to any thread whose execution is mediated by
the same monitor. This would violate our intuitive notion of atomicity for the enclosing
guarded region since its partial effects (i.e., the effects performed by the inner region)
would be visible before it completes. Our atomicity rule thus captures the essence of
a closed nested transaction model: the effects of a nested transaction are visible to the
parent, via the local store, but are propagated to other threads only when the outermost
transaction completes.
Our safety rules are distinguished from other attempts at defining atomicity prop-
erties [14, 15] for concurrent programs because they do not rely on mutual-exclusion
semantics for lock acquisition and release. For example, consider a schedule in which
two threads interleave execution of two guarded regions protected by the same monitor.
Such an execution is meaningless for semantics in which synchronization is defined
156 A. Welc, A.L. Hosking, and S. Jagannathan
in terms of mutual-exclusion, but quite sensible if guarded regions are executed trans-
actionally. Isolation is satisfied if the actions performed by the two threads are non-
overlapping. Atomicity is satisfied even if these guarded regions execute in a nested
context because the actions performed within a region by one thread are not witnessed
by the other due to the language’s release consistency memory model.
3.2 Safety
and suppose tsafe(S) holds but msafe(S) does not. Then, there exists a schedule Sf
such that
t[e], Δ0 , Γ0 , φ =⇒∗ t[v], Δ , Γ , Sf
where tsafe(Sf ) and msafe(Sf ) hold, and in which Γ = Γ .
Proof Sketch. Let S be tsafe, R S, and suppose msafe(R) does not hold, and
thus msafe(S) does not hold. Suppose R = acqΓt .S .relΓt . Since msafe(R) does
not hold, there must be some R S such that R = acqΓt .S .relΓt where
acqΓt ∈ S . Since isolated (S) holds, isolated (R) must also hold, and thus none of
the actions performed by t within S are visible to actions performed by t in S . Sim-
ilarly, since atomicity holds, actions performed by t in S are not visible to operations
executed by t in S . Suppose that relΓt follows R in S. Then, effects of S may
become visible to operations in S (e.g., through nested synchronization actions). But,
then isolated (R ) would not hold. However, because tsafe(S) holds, we can construct a
Transparently Reconciling Transactions with Locking 157
4 Design Overview
Our design is motived by three overarching goals:
1. The specific protocol used to implement guarded regions must be completely trans-
parent to the application. Thus, Java synchronized blocks and methods serve
as guarded regions, and may be executed either transactionally or exclusively de-
pending upon heuristics applied at run-time.
2. The modifications necessary to support such transparency should not lead to perfor-
mance degradation in the common case – single-threaded uncontended execution
within a guarded region – and should lead to notable performance gain in the case
when there is contention for entry to the region.
3. Transparency should not come at the expense of correctness. Thus, transactional
execution should not lead to behavior inconsistent with Java concurrency seman-
tics.
Issue 3 is satisfied for any Java program that is transaction-safe as defined in the previ-
ous section. Fortunately, recent studies have shown that the overwhelming majority of
concurrent Java programs exhibit monitor usage that satisfy the correctness goal by us-
ing monitors solely as a mechanism to enforce atomicity and isolation for sequences of
operations manipulating shared data [14]. We thus focus our attention in the remainder
of this section on the first two goals.
Note that lock-based synchronization techniques for languages such as Java are al-
ready heavily optimized for the case where monitors are uncontended [2, 5]. Indeed,
the Jikes RVM platform that serves in our experiments already supports very efficient
lock acquisition and release in this common case: atomically (using “test-and-set” or
equivalent instructions) set a bit in the header of the monitor object on entry and clear
it on exit. Only if another thread tries to acquire the monitor does lock inflation occur to
obtain a “fat” lock and initiate full-blown synchronization with wait queues, etc. Thus,
the second of our goals has already been met by the current-state-of-the-art.
Supporting transactional execution of guarded regions in place of such highly-
optimized locking techniques is thus a significant engineering challenge, if they are to
have any advantage at all. As discussed below, our implementation uses optimistic con-
currency control techniques to minimize the overhead of accesses to shared data [21].
We also make the obvious but important assumption that a guarded region cannot
be executed concurrently by different threads using different protocols (i.e., locking or
transactional). Any thread wishing to use a different protocol (e.g., locking) than the
one currently installed (e.g., transactional) for a given monitor must wait until all other
threads have exited the monitor.
with nesting; recall that the definition of atomicity and isolation captures the essence of
the closed nested transaction model [24], and that the prevalent usage of monitors is to
enforce atomicity and isolation [14].
Unfortunately, nesting poses a performance challenge since each monitor defines a
locus of contention, for which we must maintain enough information to validate the se-
rializability invariants that guarantee atomicity and isolation. Nesting exacerbates this
overhead since nested monitors must record separate access sets used to validate serial-
izability.
However, there is no a priori reason why accesses must be mediated by the imme-
diately enclosing monitor that guards them. For example, a single global monitor could
conceivably be used to mediate all accesses within all monitors. Under transactional ex-
ecution, a single global monitor effectively serves to implement the atomic construct
of Harris and Fraser [17]. Under lock-based execution, a single global monitor defines
a global exclusive lock. The primary reason why applications choose not to mediate
access to shared regions using a single monitor is because of increased contention and
corresponding reduced concurrency. In the case of mutual exclusion, a global lock re-
duces opportunities for concurrent execution; in the case of transactional execution, a
global monitor would have to mediate accesses from logically disjoint transactions, and
is likely to be inefficient and non-scalable.
Nonetheless, we can leverage this observation to optimize an important specific case
for transactional execution of monitors. Consider a thread T acquiring monitor outer,
and prior to releasing outer, also acquiring monitor inner. If no other thread at-
tempts concurrent acquisition of inner (i.e., the monitor is uncontended) then the ac-
quisition of inner can be delegated to outer. In other words, instead of synchroniz-
ing on monitor inner we can establish outer as inner’s delegate and synchronize
on outer instead. Since monitor inner is uncontended, there is nothing for inner
to mediate, and no loss of efficiency accrues because of nesting (provided that the act
of setting a delegate is inexpensive). Of course, when monitor inner is contended,
we must ensure that atomicity and isolation are appropriately enforced. Note that if
inner was an exclusive monitor, there would be no benefit in using delegation since
acquisition of an uncontended mutual-exclusion monitor is already expected to have
low overhead.
Protocol Description. Figure 3 illustrates how the delegation protocol works for a
specific schedule; for simplicity, we show only Java-level monitor acquire/release oper-
ations. The schedule consists of steps 1 through 6 enumerated in the first column of the
schedule table. The right-hand side of Figure 3 describes the state of the transactional
monitors, used throughout the schedule, with respect to delegation. A monitor whose
delegate has been set is shaded grey; an arrow represents the reference to its delegate. To
begin, we assume that the delegates of both monitor outer and monitor inner have
not been set. Thread T starts by (transactionally) “acquiring” monitor outer, creating
a new transaction whose accesses are mediated by outer and setting outer’s del-
egate to itself (step 1 in Figure 3(a)). Then T proceeds to (transactionally) “acquire”
monitor inner. Because there is no delegate for inner, and T is already executing
within a transaction mediated by outer, T sets inner’s delegate to refer to outer
(step 2 in Figure 3(b)).
Transparently Reconciling Transactions with Locking 159
(c) steps 3-5: delegates re- (d) step 6: all delegates are
main set despite releases by cleared
T
This protocol implements a closed nested transaction model: the effects of T ’s execu-
tion within monitor inner are not made globally visible until the outer transaction
commits, since only outer is responsible for mediating T ’s accesses and validating
serializability.
The delegates stay set throughout steps 3, 4 and 5 (Figure 3(b)), even after thread T ,
the setter of both delegates, commits its top-level transaction and “releases” outer. In
the meantime, thread T attempts to “acquire” inner. The delegate of inner is at this
point already set to outer so thread T starts its own transaction whose accesses are
mediated by outer. The delegates are cleared only after T ’s transaction, mediated by
outer, commits or aborts. At this point there is no further use for the delegates.
Note that some precision is lost in this example: the transactional meta-data main-
tained by outer is presumably greater than what would be necessary to simply im-
plement consistency checks for actions guarded by inner. However, despite nesting,
only one monitor (outer) has been used to mediate concurrent data accesses and only
one set of transactional meta-data was created for outer. However, observe that if the
actions of steps 2 and 3 are reversed so that T acquires inner before T then inner’s
delegate would not be set, and T would begin its new transaction mediated in this case
by inner, so transactional meta-data for both outer and inner would be needed.
5 Implementation
Our transactional delegation protocol reduces overheads for uncontended nested mon-
itors executed transactionally by deploying nested transaction support only when ab-
solutely necessary. Thus, transactions are employed only for top-level monitors or for
contended nested monitors, as described earlier.
Transactions are implemented using an optimistic protocol [21], divided into three
phases: read, validate and write-back. Each transaction updates private copies of the
objects it manipulates: a copy is created when the transaction (thread) first writes to
an object. The validation phase verifies transaction-safety, aborting the transaction and
discarding the copies if safety is violated. Otherwise, the write-back phase lazily prop-
agates updated copies to the shared heap, installing each of them atomically.
In the remainder of this section we discuss our strategy for detecting violation of
serializability via dependency tracking, our solutions for revocation and re-execution
on abort, and details of the implementation platform.
5.1 Platform
Our prototype implementation is based on the Jikes Research Virtual Machine (RVM)
[4]. The Jikes RVM is a reserch Java virtual machine with performance comparable
to many production virtual machines. Jikes RVM itself is written almost entirely in
Java and is self-hosted (i.e., it does not require another virtual machine to run). Java
bytecodes in the Jikes RVM are compiled directly to machine code. The Jikes RVM’s
distribution includes both a baseline and an optimizing compiler. The baseline compiler
performs a straightforward expansion of each bytecode instruction into its correspond-
ing sequence of assembly instructions. Our prototype targets the Intel x86 architecture.
maps: a read-map and a write-map, mapping each shared object to a single bit. Once a
transaction commits and propagates its updates into the shared heap it also propagates
information about its own updates to a global write-map associated with the monitor
mediating the transaction. Other transactions whose operations are mediated by the
same monitor will then, during their validation phase, intersect their local read-maps
with the global write-map to determine if the shared data accesses caused a violation
of serializability. When nesting results in an inner monitor running a distinct nested
transaction (as opposed to piggy-backing on its delegate-parent) there will be a sep-
arate global write-map for each transaction level, so validation must check all global
write-maps at all nesting levels. The remaining details of our implementation are the
same as in our earlier work [36].
Since for most Java programs reads significantly outnumber writes, reducing the
number of read barriers is critical to achieving reasonable performance. Our imple-
mentation therefore trades-off accuracy for run-time efficiency in detecting violation of
transaction safety. Instead of placing barriers on all reads to shared heap variables (e.g.,
reading an integer from an object), we assume that the first time a reference is loaded
from the heap, it will eventually be used to read from its target object. Thus, read barri-
ers are placed only on loads of references from the heap. In other words, we “pre-read”
(tagging the local-read map appropriately) all objects whose references are loaded to
the stack of a transactional thread. As a result, even objects that are never read, but
only written, are conservatively pre-read. This greatly simplifies version management
and enables early detection of serializability violations, as described below. This read
barrier optimization is applied only for objects and arrays. All other accesses, including
all reads from static variables and all writes to shared items incur an appropriate barrier.
5.4 Revocation
Our revocation procedure is identical to our prior work [36], allowing for transac-
tion abort at arbitrary points during its execution. The abort is signaled by throwing a
Revoke exception. Undo and re-execution procedures are implemented using a com-
bination of bytecode re-writing and virtual machine modifications. Even though Java
monitors are lexically scoped, it is necessary to support transaction aborts at arbitrary
points to correctly handle native method calls as well as wait and notify operations,
as described in Section 4.2.
In the case of optimistic protocols, the decision about whether a transaction should
be committed or aborted is made during the validation phase. Since transactions are
lexically scoped, it is relatively easy to encapsulate the computation state at the begin-
ning of the transaction so that it can be reinstated if the transaction aborts, by copying
the closure of thread-local state at that point. We use bytecode rewriting in conjunction
with a modified exception handling mechanism to restore this saved state on abort.
5.5 Versioning
We use shared data versioning to prevent the effects of incomplete transactions from
being made visible to other threads until they commit. We maintain versions of both
objects and arrays, as well as static (global) variables. Object and array versioning are
162 A. Welc, A.L. Hosking, and S. Jagannathan
handled exactly the same. Statics use a slightly modified approach, requiring boxing
and unboxing of the static values.
Because our array versioning procedure is identical to that used for versioning ob-
jects, we refer only to objects in the following description. Versions are accessible
through a forwarding pointer from the original object. We use a “copy-on-write” strat-
egy for creating new versions. A transaction creates a new (uncommitted) copy right
before performing first update to an object, and redirects all subsequent read and write
operations to access that version. It is important to note that for transaction safety all
programs executed in our system must be race-free (a prerequisite for atomicity): all
accesses by all threads to a given shared data item must be guarded by the same moni-
tor [14]. As a result, writes to the same location performed by different threads will be
detected as unsafe by our validity check described above. This also means that only the
first transaction writing to a given object need create a version for it. Other transactions
accessing that object are aborted when the writing transaction commits and discovers
the overlap.
Upon successful commit of a transaction, the current version becomes the committed
version and remains accessible via a forwarding pointer installed in the original object.
Subsequent accesses are re-directed (in the read and write barriers) via the forwarding
pointer to the committed version. When a transaction aborts all its versions are dis-
carded. Note that at most two versions of an object exist at any given time: a committed
version and an uncommitted version.
As noted above, the read barriers are only executed on reference loads. In general,
multiple on-stack references may end up pointing to different versions of the same ob-
ject. This is possible, even though read barriers are responsible for retrieving the most
up-to-date version of the object, writes may occur after the reference has been loaded to
multiple locations on the stack. The run-time system must thus ensure that the version
of an object accessible through an on-stack reference is the “correct” one. The visibility
rules for the Java Memory Model [22] mean that at certain synchronization points (e.g.,
monitor entry, access to volatile variables, etc.) threads are obliged to have the same
view of the shared heap. As a result, it is legal to defer fixing on-stack references until
specific synchronization points (e.g., monitor enter/exit, wait/notify). At these points all
on-stack references must be forwarded to the most up-to-date version. Reference for-
warding is implemented using a modified version of thread stack inspection as used by
the garbage collector.
In addition to performing reference forwarding at synchronization points, when a
version is first created by a transaction, the thread creating the version must forward
all references on its stack to point to the new version. This ensures that all subsequent
accesses (by the same thread) observe the results of the update.
5.6 Example
Step T T T
1 acq(outer)
2 wt(o2)
3 acq(inner)
4 wt(o1)
5 acq(outer)
6 acq(inner)
7 rd(o1)
8 rel(inner)
9 rel(outer)
10 rd(o1)
11 rel(inner)
12 rel(outer)
GW GW
o2v
LW LR LW LR
T o2 T o2
LW LR LW LR
T’ T’ o1v
LW LR o1 LW LR o1
T’’ T’’
GW GW
(a) (b)
GW GW
o2v o2v
LW LR LW LR
T o2 T o2
LW LR LW LR
T’ o1v T’ o1v
LW LR o1 LW LR o1
T’’ T’’
GW GW
(c) (d)
GW GW
o2v o2v
LW LR LW LR
T o2 T o2
LW LR LW LR
T’ o1v T’ o1v
LW LR o1 LW LR o1
T’’ T’’
GW GW
(e) (f)
have not yet been versioned – they are shaded grey because at the moment they contain
the most up-to-date values. The larger box (open at the bottom) represents the scope
of transactional monitor outer, the smaller box (open at the top) represents the scope
of transactional monitor inner. Both the global write map (GW ) associated with the
monitor and the local maps (write map LW and read map LR) associated with each
thread have three slots. Maps that belong to a given thread are located above the wavy
164 A. Welc, A.L. Hosking, and S. Jagannathan
line representing this thread. We assume that accesses to object o1 hash to the first slot
of every map and accesses to object o2 hash to the second slot of every map
Execution begins with threads T and T starting to run transactions whose opera-
tions are mediated by monitors outer and inner (respectively) and performing up-
dates to objects o2 and o1 (respectively), as presented in Figure 4(b). The updates
trigger creation of copies o2v and o1v for objects o2 and o1, respectively, and tag-
ging of the local write maps. Thread T tags the second slot of its local write map since
it modifies object o2, whereas thread T tags the first slot of its local write map since
it modifies object o1. In Figure 4(c) thread T starts executing, running the outermost
transaction mediated by monitor outer and its inner transaction mediated by monitor
inner, and then reads object o1, which tags the local read map. In Figure 4(d) T
attempts to commit its transaction. Since no writes by other transactions mediated by
monitor inner have been performed, the commit is successful: o1v becomes the com-
mitted version, the contents of T ’s local write map are transferred to inner’s global
write map and the local write map is cleared. Similarly, in Figure 4(e), T ’s transaction
commits successfully: o2v becomes the committed version and the local write map is
cleared after its contents has been transfered to the global write map associated with
monitor outer. In Figure 4(f) thread T proceeds to again read object o1 and then
to commit its transactions (both inner and outer). However, because a new committed
version of object o1 has been created, o1v is read by T instead of the original object.
When attempting to commit both its inner and outer transactions, thread T must inter-
sect its local read map with the global maps associated with both monitor outer and
monitor inner. The first intersection is empty (no writes performed in the scope of
monitor outer could compromise reads performed by T ), the second however is not
– both transactions executed by T must be aborted and re-executed.
For performance we need efficient access to several items of meta-data associated with
each object (e.g., versions and their identities, delegates, identity hash-codes, access
maps, etc.). At the same time, we must keep overheads to a minimum when transactions
are not used. The simplest solution is to extend object headers to associate the necessary
meta-data. Our transactional meta-data requires up to four 32-bit words. Unfortunately,
Jikes RVM does not easily support variable header sizes and extending the header of
each object by four words has serious overheads of space and performance, even in the
case of non-transactional execution. On the other hand keeping meta-data “on the side”
(e.g., in a hash table), also results in a significant performance hit.
We therefore implement a compromise. The header of every object is extended by a
single descriptor word that is lazily populated when meta-data needs to be associated
with the object. If an object is never accessed in a transactional context, its descriptor
word remains empty. Because writes are much less common than reads, we treat the
information needed for reads as the common case. The first transactional read will place
the object’s identity hash-code in the descriptor (we generate hash codes independently
of native Jikes RVM object hashcodes to ensure good data distribution in the access
maps). If additional meta-data needs to be associated with the object (e.g., a new version
on write) then the descriptor word is overwritten with a reference to a new descriptor
Transparently Reconciling Transactions with Locking 165
object containing all the necessary meta-data (including the hash-code originally stored
in the descriptor word). We discriminate these two cases using the low-order bit of the
descriptor word.
6 Experiments
The performance evaluation of our prototype implementation is divided into two parts.
We use a number of single-threaded benchmarks (from the SPECjvm98 [31] and Java
Grande [30] suites) to measure the overheads of supporting hybrid-mode execution
(e.g., compiler-inserted barriers, code-duplication, object layout modifications, etc.)
166 A. Welc, A.L. Hosking, and S. Jagannathan
when monitors are uncontended. We also use an extended version of the OO7 object
database benchmark [10], to expose the range of performance when executing under dif-
ferent levels of monitor contention. We measure the behavior when all monitors are ex-
ecuted transactionally and when using the hybrid scheme that executes monitors trans-
actionally only when sufficient monitor contention is detected. Our measurements were
taken on an eight-way 700MHz Intel Pentium III symmetric multi-processor (SMP)
with 2GB of RAM running Linux kernel version 2.4.20-31.9smp (RedHat 9.0). Our im-
plementation uses version 2.3.4+CVS (with 2005/12/08 15:01:10 UTC timestamp) of
Jikes RVM for all the configurations used to take the measurements (mutual-exclusion-
only, transactions-only and hybrid). We ran each benchmark configuration in its own
invocation of the virtual machine, repeating the benchmark six times in each invoca-
tion, and discarding the results of the first iteration, in which the benchmark classes are
loaded and compiled, to eliminate the overheads of compilation.
50 50 ext header
other
barrier
40 40
30 30
Overhead (%)
Overhead (%)
20 20
10 10
0 0
-10 -10
-20 -20
s db ce pt fft ap t s r e s db ce pt fft ap t s r e
pres tra cry he luf
ac
se
rie so ars res tra cry he luf
ac
se
rie so ars
om ray sp mp ray sp
c co
barriers.1 The top bar captures overhead from execution of the barriers themselves
(mostly read barriers but also barriers on static variable accesses).
Observe that changing the object layout can by itself have a significant impact on
performance. In most cases, the version of the system with larger object headers indeed
induces overheads over the clean build of Jikes RVM, but in some situations (e.g., FFT
or Series), its performance actually improves over the clean build by a significant
amount; variations in cache footprint is the most likely cause. The performance impact
of the compiler-inserted barriers is also clearly noticeable, especially in the case of
benchmarks from the SPECjvm98 suite. When discounting overheads related to the
execution of the barriers, the average overhead with respect to the clean build of Jikes
RVM drops to a little over 1% on average. This result is consistent with that reported by
Blackburn and Hosking [7] for garbage collection read barriers that can incur overheads
up to 20%. It would be beneficial for our system to use a garbage collector that might
help to amortize the cost of the read barrier. Fortunately, there exist modern garbage
collectors (e.g., [6]) that fulfill this requirement.
Component Number
Modules 1
Assembly levels 7
Subassemblies per complex assembly 3
Composite parts per assembly 3
Composite parts per module 500
Atomic parts per composite part 20
Connections per atomic part 3
Document size (bytes) 2000
Manual size (bytes) 100000
2 2
Level 1 Level 1
Level 3 Level 3
Level 6 Level 6
1.5 1.5
Elapsed time (normalized)
0.5 0.5
0 0
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Percent of writes (100% - percent of reads) Percent of writes (100% - percent of reads)
When there is a suitable level of monitor contention, and when the number of writes
is moderate, transactional execution significantly outperforms mutual exclusion by up
to three times. The performance of the transactions-only scheme degrades as the num-
ber of writes increases (and so does the number of generated hash-codes) since the
number of bitmap collisions increases, leading to a large number of aborts even at low
contention (Figure 7(b)). Extending the size of the maps used to detect serializability
violations would certainly remedy the problem, at least in part. However, we cannot use
maps of an arbitrary size. This could unfavorably affect memory overheads (especially
compared to mutual-exclusion locks) but more importantly we have determined that the
time to process potentially multiple maps at the end of the outermost transaction must
be bounded. Otherwise, the time spent to process them becomes a source of significant
delay (currently each map contains over 16,000 slots). The increased number of aborts
certainly has a very significant impact on the difference in performance between the
transactions-only and hybrid schemes. The overheads of the transactions-only scheme
cannot however be attributed only to the increased abort rate – observe that the shape of
Level 1 Level 1
Level 3 Level 3
Level 6 Level 6
100000 100000
Number of aborts
Number of aborts
10000 10000
1000 1000
100 100
10 20 30 40 50 60 70 80 90 10 20 30 40 50 60 70 80 90
Percent of writes (100% - percent of reads) Percent of writes (100% - percent of reads)
the graphs plotting execution times and aborts are different. During hybrid-mode exe-
cution, monitors are executed transactionally only when monitor contention is detected,
read and write operations executed within uncontended monitors incur little overhead,
and the revocations are very few. Thus, instead of performance degradation of over
70% in the transactions-only case when writes are dominant, our hybrid scheme incurs
overhead on the order of only 10%.
7 Related Work
The design and implementation of our system has been inspired by the optimistic con-
currency protocols first introduced in the 1980’s [21] to improve database performance.
Given a collection of transactions, the goal in an optimistic concurrency implementation
is to ensure that only a serializable schedule results [1, 19, 32]. Devising fast and efficient
techniques to confirm that a schedule is correct remains an important topic of study.
There have been several attempts to reduce overheads related to mutual-exclusion
locking in Java. Agesen et al. [2] and Bacon et al. [5] describe locking implementations
for Java that attempt to optimize lock acquisition overhead when there is no contention
on a shared object. Other recent efforts explore alternatives to lock-based concurrent
programming [17, 36, 20, 18]. In these systems, threads are allowed to execute within a
guarded region (e.g., protected by monitors) concurrently, but are monitored to ensure
that safety invariants (e.g., serializability) are not violated. If a violation of these invari-
ants by some thread is detected, the computation performed by this thread is revoked, any
updates performed so far discarded and the thread is re-executed. Our approach differs
from these in that it seamlessly integrates different techniques to manage concurrency
within the same system. When using our approach, the most appropriate scheme is dy-
namically chosen to handle concurrency control in different parts of the same application.
There is also a large body of work on removing synchronization primitives when it
can be shown that there is never contention for the region being guarded [3, 28, 33].
The results derived from these efforts would equally benefit applications running in the
system supporting hybrid-mode execution.
There has been much recent interest in devising techniques to detect data races in
concurrent programs. Some of these efforts [13, 8] present new type systems using, for
example, ownership types [12] to verify the absence of data races and deadlocks. Others
such as Eraser [29] employ dynamic techniques to check for races in programs [25, 23,
34]. There have also been attempts to leverage static analyses to reduce overheads and
increase precision of purely dynamic implementations [11, 35].
Recent work on deriving higher-level safety properties of concurrent programs [15,
14] subsumes data-race detection. It is based on the observation that race-free programs
may still exhibit undesirable behavior because they violate intuitive invariants such as
atomicity that are not easily expressed using low-level abstractions such as locks.
8 Conclusions
Existing approaches to providing concurrency abstractions for programming languages
offer disjoint solutions for mediating concurrent accesses to shared data throughout
Transparently Reconciling Transactions with Locking 171
the lifetime of the entire application. Typically these mechanisms are either based on
mutual exclusion or on some form of transactional support. Unfortunately, none of these
techniques is ideally suited for all possible workloads. Mutual exclusion performs best
when there is no contention on guarded region execution, while transactions have the
potential to extract additional concurrency when contention exists.
We have designed and implemented a system that seamlessly integrates mutual ex-
clusion and optimistic transactions to implement Java monitors. We formally argue cor-
rectness (with respect to language semantics) of such a system and provide a detailed
performance evaluation of our hybrid scheme for different workloads and varying lev-
els of contention. Our implementation and experiments demonstrate that the hybrid
approach has low overheads (on the order of 7%) in the uncontended (base) case and
that significant performance improvements (speedups up to 3×)can be expected from
running contended monitors transactionally.
Acknowledgements
We thank the anonymous referees for their suggestions and improvements to this paper.
This work is supported by the National Science Foundation under grants Nos. CCR-
0085792, CNS-0509377, CCF-0540866, and CNS-0551658, and by gifts from IBM
and Microsoft. Any opinions, findings and conclusions expressed herein are the authors
and do not necessarily reflect those of the sponsors.
References
[1] Adya, A., Gruber, R., Liskov, B., and Maheshwari, U. Efficient optimistic concurrency
control using loosely synchronized clocks. ACM SIGMOD Record 24, 2 (June 1995), 23–
34.
[2] Agesen, O., Detlefs, D., Garthwaite, A., Knippel, R., Ramakrishna, Y. S., and White, D.
An efficient meta-lock for implementing ubiquitous synchronization. In OOPSLA’99 [26],
pp. 207–222.
[3] Aldrich, J., Sirer, E. G., Chambers, C., and Eggers, S. J. Comprehensive synchronization
elimination for Java. Science of Computer Programming 47, 2-3 (2003), 91–120.
[4] Alpern, B., Attanasio, C. R., Barton, J. J., Cocchi, A., Hummel, S. F., Lieber, D., Ngo, T.,
Mergen, M., Shepherd, J. C., and Smith, S. Implementing Jalapeño in Java. In OOPSLA’99
[26], pp. 314–324.
[5] Bacon, D., Konuru, R., Murthy, C., and Serrano, M. Thin locks: Featherweight synchro-
nization for Java. In Proceedings of the ACM Conference on Programming Language
Design and Implementation (Montréal, Canada, June). ACM SIGPLAN Notices 33, 5 (May
1998), pp. 258–268.
[6] Bacon, D. F., Cheng, P., and Rajan, V. T. A real-time garbage collector with low overhead
and consistent utilization. In Conference Record of the ACM Symposium on Principles
of Programming Languages (New Orleans, Lousiana, Jan.). ACM SIGPLAN Notices 38, 1
(Jan. 2003), pp. 285–298.
[7] Blackburn, S. M., and Hosking, A. L. Barriers: Friend or foe? In Proceedings of the ACM
International Symposium on Memory Management (Vancouver, Canada, Oct., 2004), D. F.
Bacon and A. Diwan, Eds. ACM, 2004, pp. 143–151.
172 A. Welc, A.L. Hosking, and S. Jagannathan
[8] Boyapati, C., Lee, R., and Rinard, M. C. Ownership types for safe programming: preventing
data races and deadlocks. In Proceedings of the ACM Conference on Object-Oriented
Programming Systems, Languages, and Applications (Seattle, Washington, Nov.). ACM
SIGPLAN Notices 37, 11 (Nov. 2002), pp. 211–230.
[9] Carey, M. J., DeWitt, D. J., Kant, C., and Naughton, J. F. A status report on the OO7
OODBMS benchmarking effort. In Proceedings of the ACM Conference on Object-
Oriented Programming Systems, Languages, and Applications (Portland, Oregon, Oct.).
ACM SIGPLAN Notices 29, 10 (Oct. 1994), pp. 414–426.
[10] Carey, M. J., DeWitt, D. J., and Naughton, J. F. The OO7 benchmark. In Proceedings of
the ACM International Conference on Management of Data (Washington, DC, May). ACM
SIGMOD Record 22, 2 (June 1993), pp. 12–21.
[11] Choi, J.-D., Lee, K., Loginov, A., O’Callahan, R., Sarkar, V., and Sridharan, M. Efficient
and precise datarace detection for multithreaded object-oriented programs. In Proceedings
of the ACM Conference on Programming Language Design and Implementation (Berlin,
Germany, June). ACM SIGPLAN Notices 37, 5 (May 2002), pp. 258–269.
[12] Clarke, D. G., Potter, J. M., and Noble, J. Ownership types for flexible alias protection.
In Proceedings of the ACM Conference on Object-Oriented Programming Systems, Lan-
guages, and Applications (Vancouver, Canada, Oct.). ACM SIGPLAN Notices 33, 10 (Oct.
1998), pp. 48–64.
[13] Flanagan, C., and Freund, S. N. Type-based race detection for Java. In PLDI’00 [27],
pp. 219–232.
[14] Flanagan, C., and Freund, S. N. Atomizer: a dynamic atomicity checker for multithreaded
programs. In Conference Record of the ACM Symposium on Principles of Programming
Languages (Venice, Italy, Jan.). 2004, pp. 256–267.
[15] Flanagan, C., and Qadeer, S. Types for atomicity. In Proceedings of the 2003 ACM SIG-
PLAN International Workshop on Types in Language Design and Implementation (New
Orleans, Louisiana, Jan.). 2003, pp. 1–12.
[16] Flatt, M., Krishnamurthi, S., and Felleisen, M. Classes and mixins. In Conference Record
of the ACM Symposium on Principles of Programming Languages (San Diego, California,
Jan.). 1998, pp. 171–183.
[17] Harris, T., and Fraser, K. Language support for lightweight transactions. In Proceed-
ings of the ACM Conference on Object-Oriented Programming Systems, Languages, and
Applications (Anaheim, California, Nov.). ACM SIGPLAN Notices 38, 11 (Nov. 2003),
pp. 388–402.
[18] Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. Composable memory transactions.
In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming (Chicago, Illinois, June). 2005, pp. 48–60.
[19] Herlihy, M. Apologizing versus asking permission: Optimistic concurrency control for
abstract data types. ACM Trans. Database Syst. 15, 1 (1990), 96–124.
[20] Herlihy, M., Luchangco, V., Moir, M., and Scherer, III, W. N. Software transactional mem-
ory for dynamic-sized data structures. In Proceedings of the Annual ACM Symposium on
Principles of Distributed Computing (Boston, Massachusetts, July). 2003, pp. 92–101.
[21] Kung, H. T., and Robinson, J. T. On optimistic methods for concurrency control. ACM
Trans. Database Syst. 9, 4 (June 1981), 213–226.
[22] Manson, J., Pugh, W., and Adve, S. The Java memory model. In Conference Record of
the ACM Symposium on Principles of Programming Languages (Long Beach, California,
Jan.). 2005, pp. 378–391.
[23] Mellor-Crummey, J. On-the-fly detection of data races for programs with nested fork-
join parallelism. In Proceedings of the ACM/IEEE Conference on Supercomputing (Albu-
querque, New Mexico, Nov.). 1991, pp. 24–33.
Transparently Reconciling Transactions with Locking 173
Steve Cook
Abstract. This brief article sets out some personal observations about the de-
velopment of object technology from its emergence until today, and suggests
how it will develop in the future.
1 The Beginning
Like many people, my first encounter with object oriented programming was in 1981
when I read the August 1981 special issue of Byte magazine [1] that was entirely de-
voted to Smalltalk (I still have it). At the time, I was developing graphical user inter-
faces using the programming languages C and Pascal, and I encountered difficulties in
trying to make the procedural structure of my programs correspond cleanly to the prob-
lems I was trying to solve. It seemed that something was missing from these languages;
and when I saw Smalltalk, I realized what that thing was – objects. In fact I discovered
that I was late to the party; objects had been in the Simula language since 1967.
I threw myself for the next ten years into studying and teaching object-oriented
programming. My team implemented Smalltalk and used it to build experimental dis-
tributed systems. I witnessed the development and competition of Objective-C and
C++. Eiffel inspired me by its visionary design. I tried to understand the relationship
between functional and OO programming. I participated in the first several OOPSLA
and ECOOP conferences.
In those days, objects were controversial. Some academic colleagues of mine were
deeply offended by the polymorphism: they saw it as fundamentally undermining the
programmer’s ability to reason logically about program consequences. Others were
unwilling to admin that anything new was going on: objects were variously “just sub-
routines” or “just data abstractions” or “just entities”.
Of course, objects won the day. Although COBOL remains popular for commer-
cial data processing, most widespread programming languages today are object-
oriented. All of the basic mechanisms of object-oriented programming are generally
taken for granted: inheritance, virtual functions, instances and classes, interfaces, real-
time garbage collection. In Microsoft I work daily with the .NET framework, an ex-
tensive object-oriented library for building distributed interactive applications, using
the languages supported by Microsoft’s Common Language Runtime, primarily C#
and Visual Basic.
However we should recognize that the path that we’ve collectively trod over the
past 20 years has involved some significant blind alleys, each of which has had a sub-
stantial effect on the industry. I’ve been personally involved in all of them.
D. Thomas (Ed.): ECOOP 2006, LNCS 4067, pp. 174 – 179, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Object Technology – A Grand Narrative? 175
2 Some Detours
People became very excited about objects. In some quarters, objects became regarded
as a solution for all known problems. Let’s look critically at some of the proposals
that resulted.
We’ve all been enticed by promises of reuse. Objects, we were told, are just like
hardware components. We were led to believe that once an object has been designed
to solve a problem, it can be reused over and over again in different projects, thus re-
ducing the cost and increasing the productivity of software development.
Although there is a grain of truth in this proposition, it has often been treated much
too naively. I have seen several large companies kick off major initiatives based on a
simplistic interpretation of this suggestion, and spend many millions of dollars invest-
ing in structures and organizations for reuse that ultimately did not work. Objects are
typically not reusable because of architectural mismatches. You can almost never
take an object designed for use in a single application and use it successfully in an-
other. Except for a few very small and self-contained items, almost all objects are
constructed with a large number of both functional and non-functional assumptions
about the environment in which they live. Such assumptions are encapsulated by
object-oriented frameworks and class libraries, without which it would be almost im-
possible to build modern distributed interactive applications. The design of these
frameworks and libraries is a very complicated business, and organizing them to be
powerful and easy to use requires enormous skill and investment.
Objects seemed like a great answer to the problem of how to build a distributed com-
puting system. If we could conceal the difference between invoking an operation on
an object locally and remotely, it was said, then distribution became simple. Indeed,
there were early experimental implementations of distributed Smalltalk in which the
menus would pop up on the wrong screens because of programming errors. The
highest-profile manifestation of this principle was CORBA – the Common Object
Request Broker Architecture, from the Object Management Group.
The facts about distribution are these:
• invoking a remote operation is orders of magnitude slower than invoking a local
one;
• the programmer of a remote operation probably did it in a programming lan-
guage different from yours;
• the invoker and the invokee will want to change their implementations at differ-
ent times;
• neither end can trust the other nor the connection to be well-behaved in any
way.
The consequence of these facts is that using objects and their interfaces as a basis
for implementing distributed systems does not work well. Abstraction is a wonderful
176 S. Cook
thing, but not when the properties that you are abstracting away are in fact the essence
of the problem. In a distributed system, it’s important to deal explicitly with autono-
mous systems, with explicit boundaries, in a language-independent format, over lim-
ited bandwidth connections, with explicit statements of how interoperation will be
managed, and explicit mechanisms to handle failures.
During the mid 1990s, object databases were developed [2]. Many investments were
made on the assumption that object databases would just blow relational databases
away. Object databases were thought to be so conspicuously good that relational da-
tabases just wouldn’t survive, in the same way that relational databases supplanted
their hierarchical predecessors.
It didn’t happen. Relational databases remain the mainstay of commercial data proc-
essing. Relational databases continue to provide the performance, scalability, autonomy
and language independence for which they were designed, while object databases did
not provide these qualities. Object databases exist mainly in niche applications, and ob-
ject-oriented ideas continue to slowly permeate into the relational world. But the “im-
pedance mismatch” between object-oriented programs and the relational containers
where they store their data remains a major unsolved bugbear for programmers.
The early 1990s saw the publication of several books that proposed using object-
oriented concepts to model systems. The general idea was to assert that since the “real
world” self-evidently consists of objects, then the appropriate approach for developing
software is to describe the objects that exist in the real world, decorate these descrip-
tions with suitable properties and methods, and your software system would simply
pop out. Some authors went so far as to claim that this provides a suitable method for
modelling business processes and organizations, and these models could be directly
translated into the software systems that would implement these businesses. Nobody
successfully made this work, though, and such a view is rarely proposed today.
Alongside this philosophy went the development of various diagrammatic ap-
proaches for designing object-oriented systems, under the moniker of “object-oriented
analysis and design”. The Unified Modeling Language (UML) [3] was developed in
order to unify three different diagrammatic conventions for object-oriented analysis
and design. Unfortunately, UML also claimed to unify semantics, even though such
semantic unification is in practice impossible. Different object-oriented programming
technologies have subtly different meanings for terms like class, instance, interface,
inheritance etc, and these cannot be unified. Either UML is another, different pro-
gramming language, or it is a set of diagrammatical conventions which can be loosely
coupled to existing programming languages. This confusion remains unsettled to this
date. Furthermore, some proponents of UML propose that it can also be used for
modelling tasks outside its original scope, such as modelling of businesses, work-
flows, databases, and service-oriented systems. Trying to satisfy all of these conflict-
ing needs has led to UML becoming bloated and unwieldy. I doubt that we’ll see any
future revisions of it.
Object Technology – A Grand Narrative? 177
3 The Present
This section sets out some views about the current state of the art, reflecting on some
lessons learnt from the proposals in the previous section.
3.1 Languages
A great deal of research has been done into programming language design since the
emergence of object-oriented programming. Nevertheless, there are still many contro-
versies in object-oriented language design, such as: should multiple inheritance be
allowed? When is it appropriate to use interfaces vs classes? Should classes be “mini-
mal” or “humane”1? Why doesn’t my language allow covariant redefinition of result
types? Should classes themselves be objects? Should types be declared or inferred?
How to deal effectively with concurrency? Why must operations be attached to a
single class? Experience with designing object-oriented programs and class libraries
has led to many insights, and many of today’s object-oriented languages are nicer than
those of 20 years ago; but there are no right answers to any of these questions.
Object-oriented design has always been difficult to learn. The basic aim of object-
orientation has been to create the most efficient representation of your program, the
essence of which is to avoid repetition. If you want to create a piece of program, you
must always go and see whether it already exists in your class library. This means
that the library must be designed to offer such pieces effectively: and here the fun
starts. Designing a good class library is a very complex undertaking. In the work I
am currently doing, we’ve found Steven Clarke’s “Cognitive Dimensions of Usabil-
ity” [4] particularly useful in designing a class library for managing modelling meta-
data. We also refer frequently to Cwalina and Abrams [5] for guidance. The naïve
idea that “objects are just there for the picking” seems truly absurd in 2006.
We’ve realized that the individual components of distributed systems need to be dealt
with explicitly as autonomous entities, with explicit boundaries, interacting through
message exchanges, sharing data schemas and behavioural contracts. This set of ten-
ets has come to be known as “service-orientation”. The roots of service orientation lie
in the internet. Since the internet provides ubiquitous access to almost all of the com-
puters in the world, it brings the requirements for distributed computing to the surface
in a relentless and vivid fashion.
Service-orientation is supported by standards that are subscribed to by all major
software vendors, and is widely agreed to provide the path towards global interopera-
bility of distributed systems. Service orientation does not depend on object-oriented
technology: it separates description of data and behaviour, and is not bound to any
particular programming language or object-oriented semantics. XML is a widespread
1
http://www.martinfowler.com/bliki/HumaneInterface.html
178 S. Cook
An interesting development that attempts to bridge the gap between objects and data-
bases is Microsoft’s LINQ (Language Integrated Query) project [6]. LINQ integrates
queries, sets and transform operations directly into the programming language,
thereby providing a strongly-typed means of interacting with data from within an ob-
ject oriented program. LINQ comes in two flavours: Xlinq talks to XML data and
Dlinq talks to SQL-based databases.
LINQ is particularly useful, I believe, because it explicitly recognizes the pervasive
existence of distinct “technology spaces”. In [7] a technology space is defined as “a
working context with a set of associated concepts, body of knowledge, tools, required
skills, and possibilities”. Examples of technology spaces include OO programming,
XML, relational databases, and metamodelling. Many of the blind alleys identified in
the previous section were based on a hope that objects provide a unifying technology
that can be successfully applied to any problem. We’ve recognized that this is not so,
and that different spaces will continue to exist and evolve as new technology chal-
lenges are encountered. Therefore we’ll have to find efficient and effective ways of
bridging between these spaces: and LINQ offers a promising approach to bridging be-
tween the object-oriented programming, the XML and the SQL spaces.
3.5 Modelling
We’ve learnt from our mistakes that objects do not provide a unifying technology. So
let’s not make the same mistakes with models: they don’t provide a unifying technol-
ogy either. By models, I mean (influenced by UML) diagrammatic depictions of
structures and behaviours that can easily be consumed by humans and interpreted by
machines. The notion that all of software development will somehow be replaced by
modelling is at least as mistaken as “objects are just there for the picking”. Models
are useful for specific purposes, such as flowcharting, describing entities and relation-
ships, conceptual class diagrams, message sequence charting, state diagramming, and
especially describing the structures and relationships of architectural components.
Specific kinds of models are useful in specific tasks; the modelling language used for
a specific task must be designed to be fit for that task. Today’s increasing interest in
Domain Specific Languages, rather than general-purpose modelling languages,
clearly recognizes this.
4 The Future
I’ve tried to describe some lessons learnt from experience with objects over the last
25 years. Summarizing, objects remain only one of a variety of different ways to
solve problems, whilst our frequent mistake was to hope that objects were the “grand
narrative” that might lead us to understand and unify everything.
Which leaves us with the question of whether there is anything – any language,
representation, set of concepts, ontology, etc - that might solve everything? Could we
Object Technology – A Grand Narrative? 179
References
1. BYTE magazine, Volume 6 No 8, August 1981.
2. Atkinson, M., Bancilhon F., DeWitt D., Dittrich K., Maier D., Zdonik, S: The Object-
Oriented Database System Manifesto. Deductive and Object-Oriented Databases. Elsevier,
Amsterdam (1989).
3. Rumbaugh, J., Jacobson, I., Booch, G: The Unified Modeling Language Reference Manual.
Addison-Wesley (1999).
4. Clarke, S. Cognitive Dimensions of Usability http://blogs.msdn.com/stevencl/linklist.aspx
5. Cwalina, K. and Abrams, B: Framework Design Guidelines : Conventions, Idioms, and Pat-
terns for Reusable .NET Libraries. Addison-Wesley (2006).
6. The LINQ project, at http://msdn.microsoft.com/netframework/future/linq/
7. Kurtev, I., Bézivin, J., Aksit, M. Technical spaces: An initial appraisal. CoopIS, DOA 2002
Federated Conferences, Industrial track, Irvine (2002). Available online at
http://www.sciences.univ-nantes.fr/lina/atl/www/papers/PositionPaperKurtev.pdf
8. Noble, J and Biddle, R: Notes on Postmodern Programming, at
http://www.mcs.vuw.ac.nz/comp/Publications/CS-TR-02-9.abs.html
Peak Objects
William R. Cook
I was aware of a need for object-oriented programming long before I learned that
it existed. I felt the need because I was using C and Lisp to build medium-sized
systems, including a widely-used text editor, CASE and VLSI tools. Stated sim-
ply, I wanted flexible connections between providers and consumers of behavior
in my systems. For example, in the text editor anything could produce text
(files, in-memory buffers, selections, output of formatters, etc) and be connected
to any consumer of text. Object-oriented programming solved this problem, and
many others; it also provided a clearer way to think about the problems. For me,
this thinking was very pragmatic: object solved practical programming problems
cleanly.
The philosophical viewpoint that “objects model the real world” has never
appealed to me. There are many computational models, including functions,
objects, algebras, processes, constraints, rules, automata – and each has a par-
ticular ability to model interesting phenomena. While some objects model some
aspects of the real world, I do not believe they are inherently better suited to
this task than other approaches. Considered another way, what percentage of
classes in the implementation of a program have any analog in the real world?
In the mid-’80s, when I was learning about objects, it was frequently said that
objects could not be explained, they must be experienced. Sufficient experience
would lead to an “Ah ha!” insight after which you could smile knowingly and
say “it can’t be explained... it must be experienced.” This is, unfortunately, still
true to a degree. Many students (and programmers) do not feel comfortable with
dynamic dispatch, the higher-order nature of objects and factories, or complex
subtype/inheritance hierarchies. Advanced programmers also struggle to design
effective architectures using advanced techniques – where the best approach is
not obvious.
In what follows I describe some long-standing myths about object-oriented
programming and then suggest future directions.
Classes Are Abstract Data Types
The assumption that classes are abstract data types (ADTs) is one of the more
persistent myths. It is not clear exactly how it started, but the early lack of a
solid theoretical foundation of objects may have contributed.
ADTs consist of a type and operations on the type – where the type is abstract,
meaning it name/identity is visible but is concrete representation is hidden.
Hiding the representation type generally requires a static type system.
Objects, on the other hand, are collections of operations. The types of the op-
erations (the object’s interface) are completely public (no hidden types), whereas
the internal representation is completely invisible from the outside. The idea of
type abstraction, or partially hiding a type, is not essential to objects, although
it does show up when classes are used as types (see below). Since they don’t
require type abstraction, object work equally well in dynamically and statically
typed languages (e.g. Smalltalk and C++).
The relationship between ADTs and objects is well-known [7, 17], yet even
experts often treat “data abstraction” and “abstract data type” as synonyms,
for example, in the history of CLU [11], and many textbooks. I suspect that
the identification of “data abstraction” and “abstract data type” arose because
ADTs seem natural and fundamental: they have the familiar structure of abstract
algebra, support effective reasoning and verification [9], and have an elegant
explanation in type theory [14]. In the late ’70s ADTs appeared in practical
programming languages, including Ada, Modula-2, and ML. However, ADTs in
their pure form have never become as popular as object-oriented programming.
It is interesting to consider what would have happened if Stroustrup had added
ML-style ADTs, modules, and functors to “C” instead of adding objects [12].
Most modern languages combine both ADTs and objects: Smalltalk has built-
in ADTs for integers, which are used to implement the object-oriented numbers.
But Smalltalk does not support user-defined ADTs. OCAML allows user-defined
ADTs and also objects. Java, C# and C++ have built-in ADTs for primitive
types. They also support pure objects via interfaces and a form of user-defined
ADTs: when a class is used as a type it acts as a bounded existential, in that it
specifies a particular implementation/representation, not just public interface.
Integrating the complementary strengths of ADTs and objects is an active
research topic, which now focuses on extensibility, often using syntactic expres-
sions as a canonical example [10, 15, 19, 21]. However, the complete integration
of these approaches has not yet been achieved.
Future
What does the future hold? In the late ’90s I started working on enterprise
software and found that object-oriented programming in its pure form didn’t
provide answers to the kinds of problems I was encountering.
It is still too difficult to build ordinary applications – ones with a user in-
terface, a few algorithms or other kinds of program logic, various kinds of data
(transactional, cached, session state, configuration), some concurrency, workflow,
a security model, running on a desktop, mobile device, and/or server.
I find myself yearning for a new paradigm, just as I yearned for objects in the
’80s. New paradigms do not appear suddenly, but emerge from long threads of
development that often take decades to mature. Both pure functional program-
ming (exemplified by Haskell) and object-oriented programming (Smalltalk &
Java) are examples.
Thus it should be possible to see traces of future paradigms in ideas that
exist today. There are many promising ideas, including generative program-
ming, reflection, partial evaluation, process algebra, constraint/logic program-
ming, model-driven development, query optimization, XML, and web services. It
is unlikely that focused research in any of these areas will lead to a breakthrough
that triggers a paradigm shift. What is needed instead is a wholistic approach
to the problem of building better software more easily, while harnessing specific
technologies together to create a coherent paradigm.
I want a more declarative description of systems. I find myself using domain-
specific languages: for semantic data models, security rules, user interfaces, gram-
mars, patterns, queries, consistency constraints, upgrade transformations, work-
flow processes. Little bits of procedural code may be embedded in the declarative
framework, acting as procedural plugins.
Current forms of abstraction were designed to express isolated data abstrac-
tions, rather than families of interrelated abstractions. Today object models,
e.g. a business application or the HTML document object model, have hundreds
or thousands of interrelated abstractions. A the same time, it is very desirable
to place each feature of a program into a separate module, even though the
implementation of the features may be fairly interconnected.
184 W.R. Cook
Designs typically include aspects like security or persistence that are concep-
tually global, yet must be configured and specialized to each individual part of
the system. The concept of an aspect is a powerful one – yet current aspect-
oriented programming languages are only an initial step toward fulfilling the
promise of this concept.
The underlying infrastructure for these higher levels will most likely be built
using object-oriented programming. But at the higher levels of application pro-
gramming, the system may not follow any recognizable object-oriented style.
For years there have been suggestions that object-oriented programming has
reached its peak, that nothing new is to be discovered. I believe that objects will
continue to drive innovation and will ultimately play a key role in the future
of software development. However, it is still to be seen whether objects can
maintain their position as a fundamental unifying concept for software designs,
or if a new paradigm will emerge.
References
1 Introduction
This paper is a personal viewpoint to be presented in a panel at ECOOP 2006 on
“Summing up the Past and trying to outline the Future”. It is not a scientific paper and
there has been no attempt to include all relevant references.
ECOOP’87 in Paris was an important milestone in the history of object-oriented
programming. ECOOP’87 and the first OOPSLA’86 in Portland mark the point in
time where object-oriented programming started to become mainstream in research as
well as in industry. For this author the SIMULA languages [4], Smalltalk [9] and C++
[21] were important programming language milestones towards the success of object-
oriented programming. In addition, the development of object-oriented methodologies
[6] was also important for the increasing adaptation of object-orientation.
The history of object-orientation started more than 20 years before ECOOP’87.
The SIMULA languages were developed in the early sixties by Ole-Johan Dahl and
Kristen Nygaard. SIMULA 67 contained most of the central concepts now available
in mainstream object-oriented languages. And SIMULA was one of the languages
described in the first History of Programming Languages conferences in 1978 [24]. In
the seventies and eighties, Smalltalk was developed by Alan Kay, Adele Goldberg
and co-workers. The highly dynamic Smalltalk system was clearly a major reason for
the huge interest in object-orientation. And finally C++ as developed by Bjarne
Stroustrup in the early eighties demonstrated that object-oriented concepts could be
used for development of efficient software systems.
There is no doubt that ECOOP for more than 20 years has served as the main
scientific conference for object-oriented software systems. In this paper we will
reflect upon what we consider the main contributions of object-oriented programming
since ECOOP’87 and discuss challenges for the future.
D. Thomas (Ed.): ECOOP 2006, LNCS 4067, pp. 186 – 191, 2006.
© Springer-Verlag Berlin Heidelberg 2006
From ECOOP'87 to ECOOP 2006 and Beyond 187
References
1. M. Aksit, L. Bergmans, S. Vural: An Object-Oriented Language-Database Integration
Model: The Composition-Filters Approach, ECOOP'92, LNCS 615, Springer-Verlag,
1992.
2. K. Beck: Extreme Programming Explained: Embrace Change. Addison-Wesley, 2000.
3. Critical Pervasive Computing: Research proposal. Department of Computer Science,
Aarhus University, 2005.
4. G. Birtwistle, O.-J. Dahl, B. Myrhaug, K. Nygaard: SIMULA BEGIN. Studentlitteratur,
Lund, Sweden, 1979.
5. J.-P. Briot, A. Yonezawa: Inheritance and Synchronization in Concurrent OOP.
ECOOP'87, LNCS 276, Springer-Verlag, 1987.
6. P. Coad, E. Yourdon: Object-Oriented Analysis. Yourdon Press Computing Series, 1990.
7. A. Cockburn: Crystal Clear: A Human-Powered Methodology for Small Teams. Addison-
Wesley, 2004.
8. E. Gamma, R. Helm, R.E. Johnson, J. Vlissides: Design Patterns: Elements of Object-
Oriented Software Architecture. Addison-Wesley, 1994.
9. A. Goldberg, D. Robson: Smalltalk-80, the Language and its Implementation, Addison-
Wesley, 1983.
10. B.B. Kristensen, O.L. Madsen, B. Møller-Pedersen, K. Nygaard: Classification of Actions
or Inheritance also for Methods. ECOOP’87, LNCS 276, Springer-Verlag, 1987.
11. B.B. Kristensen, O.L. Madsen, B. Møller-Pedersen, K. Nygaard: Object-Oriented
Programming in the BETA Programming Language, ACM Press/Addison-Wesley, 1993.
12. G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C.V. Lopes, J.-M. Loingtier, J. Irwin:
Aspect-Oriented Programming, ECOOP’97, LNCS 1247, Springer-Verlag, 1997.
13. O.L. Madsen, B. Møller-Pedersen: What Object-Oriented Programming may be and what
it does not have to be. ECOOP’88, LNCS 322, Springer Verlag, 1988.
14. O.L. Madsen: Open Issues in Object-Oriented Programming – A Scandinavian
Perspective. Software Practice and Experience, Vol. 25, No. S4, Dec. 1995.
15. O.L. Madsen: Towards a Unified Programming Language. ECOOP 2000, LNCS 1850,
Springer Verlag, 2000.
16. B. Meyer: Genericity versus Inheritance. OOPSLA’86, SIGPLAN Notices, 21(11), 1986.
17. K. Nygaard: Basic Concepts in Object-Oriented Programming. SIGPLAN Notices, 21(10),
1986.
18. PalCom: Palpable Computing – a new perspective on ambient computing. www.ist-
palcom.org
19. T. Rentsch: Object Oriented Programming. SIGPLAN Notices, 17(9), 1982.
20. C. Szyperski: Component Software – Beyond Object-Oriented Programming. Addison-
Wesley, 1997.
21. B. Stroustrup: The C++ Programming Language. Addison-Wesley, 1986.
22. B. Stroustrup: What is “Object-Oriented Programming”? ECOOP’87, LNCS 276,
Springer-Verlag, 1987.
23. D. Ungar, R.B Smith: SELF: The Power of Simplicity. OOPSLA’87, SIGPLAN Notices,
22(12), 1987.
24. R.L. Wexelblat: History of Programming Languages, ACM/Academic Press, 1981.
The Continuing Quest for Abstraction
Henry Lieberman
D. Thomas (Ed.): ECOOP 2006, LNCS 4067, pp. 192 – 197, 2006.
© Springer-Verlag Berlin Heidelberg 2006
The Continuing Quest for Abstraction 193
represent a true advance. However, work continues in making them more capable and
precise, and developing tools to help programmers work with them.
Second, there is the Aspects movement. Aspects were motivated, again, by the
limits of functional abstraction. Again, researchers noted that there were significant
ideas about how programs should be organized that crossed functional boundaries.
Again, also, these ideas were easily expressed by programmers in natural language,
but difficult to capture in programming languages. While Patterns try to capture
notions that are bigger, in some sense, than a single object, Aspects try to model
notions that cut across small pieces of multiple objects, and would otherwise
necessitate multiple unsynchronized edits and traces.
Finally, I think a future direction that will be important is to understand the
dynamic nature of abstraction. By and large, current ideas of abstraction in
programming languages are about static abstraction; at the time of programming, the
programmer decides what abstractions need to be made and how they are expressed.
But increasingly, abstraction will have to be done on-the-fly, by applications as they
are running and interacting with users. The reason for this is the increasing prevalence
of context-sensitive applications, sparked by interest in mobile, distributed and
personalized applications.
In some sense, abstraction and context-sensitivity are in opposition. Abstraction
gains its power from ignoring details of a particular situation, which is the last thing
you want to do if you want to make an application context-sensitive! But the key to
resolving this dilemma is to introduce abstraction dynamically. If the program can
decide, on the fly, what aspects to ignore or to take into account according to the
particular situation, it can be both general and adaptable at the same time. This will
bring object-oriented programming much closer to machine learning.
The next section will present some work I have done with my colleague Hugo Liu
in exploring the perhaps crazy, idea that people could program computers directly in
natural language. Other interesting approaches to expressing computational ideas in
natural language are also being explored, such as Lotfi Zadeh's Computing with
Words [9].
There are, of course, other ways to express ideas than talking about them with
words. Ideas can also be expressed visually, and the idea of visual programming has
been explored, notably in research reported in IEEE Human-Centric Computing
(formerly Visual Languages). I think the field of Object-Oriented Programming
should take the idea of visual programming more seriously. Some of the objections
often raised against visual programming, such as visual programs "taking up more
screen space" are obsolete and/or easily overcome. Visual programming could
empower a whole new generation of people to do programming, who are "visual
thinkers" by nature, who are currently disenfranchised by the overly verbal and
symbolic nature of today's programming languages. Approaches which use graphical
manipulation to introduce new ways of working with programs, such as the
remarkable SubText of Jonathan Edwards [1], represent innovations that deserve
more attention.
Finally, ideas are expressed not only in words or pictures, but in actions. There is
the possibility that the computer could use a sequence of actions performed by, or
observed from, the user as a representation. I am a big proponent of this approach,
under the name of Programming by Example. I won't go further into it here, but refer
the reader to my 2001 book [2] and 2006 collection on End-User Programming [3].
In this section, I want to present, from my current work, an example of what I think
will be an important, but perhaps controversial, direction for Object-Oriented
Programming. I don't mean to say that all programming should be done this way, but I
think it is an example of new kinds of approaches that ought to be considered.
In the Metafor project [4, 5, 6, 7, 8] we are exploring the idea of using descriptions
in a natural language like English as a representation for programs. While we cannot
yet convert arbitrary English descriptions to fully specified code, we can use a
reasonably expressive subset of English as a conceptualization, visualization, editing
and debugging tool. Simple descriptions of program objects and their behavior are
converted to scaffolding (underspecified) code fragments, that can be used as
feedback for the designer, and which can later be elaborated.
Roughly speaking, noun phrases can be interpreted as program objects; verbs can
be functions, adjectives can be properties. It is our contention that today's natural
language technology has improved to the point that reasonable, simple, descriptions
of simple procedural and object oriented programming ideas can be understood
(providing, of course, the user is trying to cooperate with the system, not fool it).
There's no need to impose a rigid, unforgiving syntax on a user. Interactive dialogues
can provide disambiguation when necessary, or if the system is completely stuck, it
falls back on the user.
196 H. Lieberman
Fig. 1. The Metafor natural language programming system. At the lower left is the user's
natural language input. At the lower right the automatically generated code. The two top
windows trace the parser's actions and are not intended for the end user.
Metafor has some interesting capabilities for refactoring programs. Different ways
of describing objects in natural language can give rise to different representation and
implementation decisions as embodied in the details of the code. Conventional
programming requires making up-front commitments to overspecified details, and
saddles the user with having to perform distributed, error-prone edits in order to
change design decisions. Metafor uses the inherent "ambiguity" of natural language as
an advantage, automatically performing refactoring as the system learns more about
the user's intent. For example,
a) There is a bar. (Single object. But what kind of "bar"?)
b) The bar contains two customers. (unimorphic list. Now, a place serving alcohol)
c) It also contains a waiter. (unimorphic wrt. persons)
d) It also contains some stools. (polymorphic list)
e) The bar opens and closes. (class / agent)
f) The bar is a kind of store. (inheritance class)
g) Some bars close at 6pm. (subclass or instantiatable)
The Continuing Quest for Abstraction 197
More details about Metafor and natural language programming appear in the
references. Many details and problems still need to be worked out. But we present this
as an example that radical new approaches to the programming problem need to be
considered if Object-Oriented Programming is to advance in the future.
I look forward to the next 20 years of ECOOP!
References
1. Jonathan Edwards. Subtext: Uncovering the Simplicity of Programming. 20th annual ACM
SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and
Applications (OOPSLA). October 2005, San Diego, California.
2. Henry Lieberman, ed., Your Wish is My Command: Programming by Example, Morgan
Kaufmann, San Francisco, 2001.
3. Henry Lieberman, Fabio Paterno and Volker Wulf, eds., End-User Development, Springer,
2006.
4. Hugo Liu and Henry Lieberman (2004) Toward a Programmatic Semantics of Natural
Language. Proceedings of VL/HCC'04: the 20th IEEE Symposium on Visual Languages
and Human-Centric Computing. pp. 281-282. September 26-29, 2004, Rome. IEEE
Computer Society Press.
5. Hugo Liu and Henry Lieberman (2005) Programmatic Semantics for Natural Language
Interfaces. Proceedings of the ACM Conference on Human Factors in Computing Systems,
CHI 2005, April 5-7, 2005, Portland, OR, USA. ACM Press.
6. Hugo Liu and Henry Lieberman (2005) Metafor: Visualizing Stories as Code. Proceedings
of the ACM International Conference on Intelligent User Interfaces, IUI 2005, January 9-
12, 2005, San Diego, CA, USA, to appear. ACM 2005.
7. Henry Lieberman and Hugo Liu. Feasibility Studies for Programming in Natural Language.
H. Lieberman, F. Paterno, and V. Wulf (Eds.) Perspectives in End-User Development, to
appear. Springer, 2006.
8. Rada Milhacea, Henry Lieberman and Hugo Liu. NLP for NLP: Natural Language
Processing for Natural Language Programming, International Conference on Computational
Linguistics and Intelligent Text Processing, Mexico City, Springer Lecture Notes in
Computer Science, February 2006.
9. Lotfi Zadeh, Precisiated natural language (PNL), AI Magazine, Volume 25, Issue 3 Pages:
74 – 91, Fall 2004.
Early Concurrent/Mobile Objects
– Modeling a Simple Post Office –
Akinori Yonezawa
2 Concurrent Objects
After some trials of developing frameworks, I came up with my own notion of ob-
jects which abstract away or model entities that interact with each other in the
domain (or world). To me, it was very suitable for modeling things in the world.
In explaining my notion of concurrent objects, I often used an anthropomor-
phic analogy. Things are modeled as autonomous information processing agents
called concurrent objects, and their mutual message transmissions abstract away
various forms of communications found in human or social organizations in such
a way that they can be realized by the current computer technology without
great difficulties.
In our approach, the domain to be modeled/designed/implemented is repre-
sented as a collection of concurrent objects and the interaction of the domain
components is represented as concurrent message passing among such objects.
Domains where our approach is powerful include distributed problem solving,
modeling of human cognitive process, modeling and implementation of real-time
systems and operating systems, design and implementation of distributed event
simulation etc.
Although it is not so difficult to give a thorough mathematical account of the
notion of concurrent objects, for the sake of convenience, let me give an intuitive
characterization of concurrent objects (COs) below. Each CO
– has a globally unique identity/name,
– may have a protected, yet updatable local memory,
– has a set of procedures that manipulate the memory,
– receives a message that activates one of the procedures,
– has a single FIFO queue for arrived messages, and
– has autonomous thread(s) of control.
In each CO, memory-updating procedures are activated one at a time with the
corresponding message arrival order. The contents of the memory of a CO, which
is the local state of the CO, can be defined at the time of message arrival,
owing to its single FIFO message queue. Each CO can send messages to the
set of COs whose ids the CO knows about at the time of sending. This means
that communication is point-to-point and any CO can send messages as long
as it remembers the names of destination COs. As the memory of a CO can be
updatable, a CO can forget names of other COs. So the set of objects to which
a CO can send messages varies from time o time. So communication topology is
dynamic. Also any CO can dynamically create COs.
contain not only pointers to COs, but also the code of COs. This mechanism
corresponds to what has been called code migration or code mobility.
Messages are sent by COs, not by other components in our framework. When
a message contains a CO (not its id), the CO is actually (forced to be) moved
by COs. This mechanism is rather too strong in modeling interactions among
domain components. Thus, I restricted message passing in such a way that a
message ontaining a CO should be transmitted only when CO sends the very
CO itself, not when other COs do. This restriction allows a CO to move with
its own will, but never be forced by other COs. It should be noted that while a
CO is moving, it can send and receive messages.
counter-section
mailbox
door
of objects, customer objects and the door object, are in our domain. The next
thing to do is how to model the interactions of the two kinds of objects. In
the message passing paradigm, arrivals or transmissions of messages are the sole
event of interaction among COs. An event of a customer C going through the
door is naturally represented as an arrival of a message M at the door object D
where M contains the customer object C itself (not the id of C). Then, the door
object D sends to the customer object C a message M’ requesting C to move to
the counter-section object CS. As the customer object C does not know where
the counter-section object CS is, the door object D should supply the informa-
tion of the location/name of the counter-section object. This information is, of
course, given in the message M’ reqeusting the customer to change its location.
References
1. Birtwistle, G., Dahl, O.-J., Myhrhang, B., Nygaard, K.: SIMULA Begin Auerbach,
Philadelphia, 1973.
2. Briot, J.-P., Yonezawa, A.: Inheritance and Synchronization in Concurrent OOP,
Proc. ECOOP’87, Springer LNCS No. 276, 1987.
3. Kobayashi, N., Yonezawa, A.: Asynchronous Communication Model Based on Lin-
ear Logic, Formal Aspects of Computing, Springer-Verlag, 1994.
4. Matsuoka, S., Yonezawa, A.: Analysis of Inheritance Anomaly in Object-Oriented
Concurrent Programming Languages, Research Directions in Concurrent Object-
Oriented Programming (Eds. G. Agha, P. Wegner and A. Yonezawa), The MIT
Press, 1993, pp.107 - 150.
5. Milner, R.: The polyadic pi-calculus: a tutorial, Technical Report ECD-LFCS-91-
180, Laboratory for Foundations of Computer Science, Edingburgh University, Oc-
tober 1991.
6. Cardelli, L: Abstractions for Mobile Computation, Secure Internet Programming:
Security Issues for Mobile and Distributed Objects, Springer LNCS No. 1603, 1999.
7. Sekiguchi, T., Masuhara, H., Yonezawa, A.: A Simple Extension of Java Language
for Controllable Transparent Migration and its Portable Implementation Proc. Co-
ordination Languages and Models, Springer LNCS No. 1594, 1999.
8. Yonezawa, A., Hewitt, C.: Modelling Distributed Systems, Machine Intelligence
No.9,(1978) 41–50.
9. Yonezawa, A., Matsuda, H., Shibayama, E.: An Approach to Object-oriented Con-
current Programming – A Language ABCL –, Proc. 3rd Workshop on Object-
Oriented Languages, Paris, 1985.
10. Yonezawa, A., Briot, J.-P., Shibayama, E.: Object-oriented Concurrent Program-
ming in ABCL/1, Proc. ACM OOPSLA’86, Portland Oregon, USA, (1986) 258–
268.
11. Yonezawa, A. (Ed.): ABCL: an Object-Oriented Concurrent System, MIT Press
1990, 329 pages.
12. Yonezawa, A., Matsuoka, S., Yasugi, M., Taura, K.: Implementing Concurrent
Object-Oriented Languages on Multi-computers IEEE Parallel & Distributed Tech-
nology, 1(2):49-61, May 1993.
Turning the Network into a Database
with Active XML
Serge Abiteboul
mean web forms or more generally web services providing access to data sources.
This information is published on the web, and accessed via (subscription) queries,
web sites and reports. For the model, we insist on two aspects:
Our two theses. The first thesis is that “intentional” and “active” data ala
AXML should form the basis for the logical model of the data ring. The in-
tensional component enables the sharing of information and knowledge in a
distributed context. The active component provides support for aspects such
as pub/sub, monitoring, synchronization/reconciliation, network connectivity,
awareness.
The second thesis is that AXML is a proper basis for the physical model
as well. Consider for instance distributed query optimization, a key issue. We
need to be able to describe and exchange distributed query execution plans.
A recent work [4] shows how this can be achieved using AXML expressions
describing standard algebraic XML operations and send/receive operators over
XML streams.
Related work and conclusion. Many ideas found here are influenced by previous
works on P2P content warehouse, see [2]. Similar ideas have been promoted in
[8]. The author of the present paper has been strongly influenced by on-going
works with Ioana Manolescu, Tova Milo and Neoklis Polyzotis.
Some of the underlying technology has already been developed in related
software systems, e.g.: structured p2p network such as Chord or Pastry, XML
repositories such as Xyleme or DBMonet, file sharing systems such as BitTorrent
or Kazaa, distributed storage systems such as OceanStore or Google File Sys-
tem. content delivery network such as Coral or Akamai, multicast systems such
as Bullet or Avalanche, Pub/Sub system such as Scribe or Hyper, application
Turning the Network into a Database with Active XML 205
Fig. 1. Some aspects of the comparison of relational DBMS and Data Rings
References
1. Active XML web site, activexml.net
2. S. Abiteboul: Managing an XML Warehouse in a P2P Context. CAiSE, 4-13, 2003
3. S. Abiteboul, R. Hull and V. Vianu: Foundations of Databases, Addison-Wesley,
Reading-Massachusetts, 1995
4. S. Abiteboul, I. Manolescu, E. Taropa: A Framework for Distributed XML Data
Management. EDBT, 1049-1058, 2006
5. S. Abiteboul, T. Milo, O. Benjelloun: The Active XML Project: an overview, INRIA
Internal Report,
ftp://ftp.inria.fr/INRIA/Projects/gemo/gemo/GemoReport-331.pdf
6. Edited by R. Cattell: The Object Database Standard: ODMG-93, Morgan Kauf-
mann, San Mateo, California, 1994
7. The Edos Project, http://www.edos-project.org/
8. A. Halevy, M. Franklin, D. Maier: Dataspaces: A New Abstraction for Information
Management. DASFAA, 1-2, 2006
9. T. Özsu, P. Valduriez: Principles of Distributed Database Systems. 2nd Edition,
Prentice Hall, Englewood Cliffs, New Jersey, 666 pages, 1999
10. N. Paton, O. Diaz, Active database systems, ACM Computing Surveys, Vol. 31,
No. 1, March 1999
11. W3C - The World Wide Web Consortium, http://www.w3.org/
SuperGlue: Component Programming with
Object-Oriented Signals
1 Introduction
Programs that are interactive are more usable than their batch program counterparts.
For example, an interactive compiler like the one in Eclipse [15] can continuously de-
tect syntax and semantic errors while programmers are typing, while a batch compiler
can only detect errors when it is invoked. Interactive programs are often built out of
components that can recompute their output state as their input state changes over time.
For example, a compiler parser component could incrementally recompute its output
parse tree according to changes made in its input token list. Other examples of these
kinds of components include many kinds of user-interface widgets such as sliders and
tables.
The assembly of components together in interactive programs often involves ex-
pressing state-viewing relationships. In object-oriented languages, such relationships
are often expressed according to a model-view controller [13] (MVC) architecture. An
MVC architecture involves model and view components, and glue code that transforms
model state into view state. This glue code is often difficult to develop because most
languages lack good constructs for transforming state. Instead, changes in state are of-
ten communicated as discrete events that must be manually translated by glue code into
discrete changes of the transformed state.
This paper introduces SuperGlue, which simplifies component assembly by hiding
event handling details from glue code. Components in SuperGlue are assembled by con-
necting together their signals [10], which represent state declaratively as time-varying
values. Operations on signals are also signals whose values automatically change when
the values of their operand signals change. For example, if x and y are signals, then x
+ y is a signal whose current value is always the sum of the current values for x and
y. By operating on signals to produce new signals, SuperGlue code can transform state
between components without expressing custom event handlers.
Although signals are abstractions in functional-reactive programming languages
[8, 10, 14], SuperGlue is novel in that it is the first language to combine signals with
object-oriented abstractions. A program is expressed in SuperGlue as a set of signal con-
nections between components. Because realistic programs often require an unbounded
number of connections, each connection cannot be expressed individually. Instead, rules
in SuperGlue can express new connections through type-based pattern matching over
existing connections. To organize these rules, the types that are used in connection pat-
tern matching are supported with three object-oriented mechanisms:
These mechanisms form a novel object system that is specifically designed to support
connection-based component programming.
As a concrete example of how SuperGlue can reduce the amount of code needed to
assemble components together, consider the SuperGlue code in Figure 1, which imple-
ments the following master-detail behavior in an email client: “the messages of a folder
that is uniquely selected in a folder view tree are displayed as rows in a message view ta-
ble.” This code glues together the folderView and messageView components, which
are respectively a user-interface tree and table. The second line of Figure 1 is a condition
that detects when only one node is selected in the folder view. The third line of Figure 1
is a connection query that detects if the first (and only) node selected in the folder view
is connected to an email folder. If the connection query is true, the connected-to email
folder is bound to the folder variable that is declared on the first line of Figure 1. If
both of these conditions are true, the fourth line of Figure 1 connects the rows signal
of the message view to the messages signal of the selected email folder.
Because the code in Figure 1 is evaluated continuously, how the rows of the message
view table are connected can change during program execution. The user could select
more than one node in the folder view tree, which causes the first condition to become
208 S. McDirmid and W.C. Hsieh
false, and then deselect nodes until only one node is selected, which causes the first
condition to become true. The user can select a node in a folder view tree that is not
an email folder, which causes the second condition to become false. When a new email
message is added to the folder whose messages are connected to the message view
table’s rows, a new row is added to the message view table. All of this behavior occurs
with only four lines of SuperGlue code. In contrast, implementing this behavior in Java
requires more than thirty lines of code because the code is exposed to event-handling
details.
SuperGlue components are implemented with either SuperGlue code or Java code.
When implemented with Java code, signals are represented with special Java interfaces
that enable the wrapping of existing Java libraries. For example, we have implemented
SuperGlue components around Java’s Swing [24] and JavaMail [23] class libraries. The
advantage of this dual language approach is that an expressive language (Java) can be
used to implement components, while modularity is significantly enhanced by having
components interact through SuperGlue’s declarative signals.
This paper describes SuperGlue and how it reduces the amount of glue code needed
to build interactive programs out of components. Section 2 details why components are
difficult to assemble together in interactive programs. Section 3 introduces SuperGlue
and provides examples of how it is used. Section 4 evaluates SuperGlue through a case
study that compares an email client implementation in both SuperGlue and Java. Sec-
tion 5 describes SuperGlue’s syntax, semantics, and implementation. Section 6 presents
related work and Section 7 summarizes our conclusions.
2 Motivation
Fig. 2. Java glue code that implements and installs the observerobjects of a message view com-
ponent
tree node selection events are transformed into a condition that determines what email
folder’s messages are displayed. Detecting the discrete boundaries where this condition
changes requires substantial logic in the selectionObserver implementation. For
example, when a new node is selected in the tree view, code is needed to check if one
folder was already selected, in which case the condition becomes false, or if one folder
has become selected, in which case the condition becomes true.
Two approaches can currently be used to improve how the glue code of an interactive
program is written. First, programming languages can reduce event handler verbosity
with better syntax; e.g., through closure constructs and dynamic typing. Programming
languages that follow this approach include Python [25], Ruby [19], and Scala [21].
However, although glue code in these languages can be less verbose than in Java, it is
often not less complicated because programmers must still deal with the same amount
of event handling details. In the second approach, standardized interface can be used in
component interfaces to hide event handling details from glue code. For example, The
ListModel Swing interface listed in Figure 3 can be used standardize how element ad-
dition and removal events are transmitted and received by different components. Using
list model interfaces in our email client example, displaying email messages as rows in
a table view can be reduced to the following line of code:
tableView.setRows(folder.getMessages());
210 S. McDirmid and W.C. Hsieh
interface ListModel {
int getSize();
Object getElementAt(int index);
void addDataListener(ListDataListener listener);
void removeDataListener(ListDataListener listener);
}
Fig. 3. Swing’s ListModel Java interface, which describes a list with observable changes in
membership
Fig. 4. An illustration of a SuperGlue program’s run-time architecture; “use” means the Java
code is using imported signals through a special Java interface; “implement” means Java code is
providing exported signals by implementing special Java interface
Unfortunately, standardized interfaces cannot easily improve glue code that performs
state transformations. For example, the glue code in Figure 2 selects what folder’s mes-
sages are displayed using a condition. Expressing this condition with standardized in-
terfaces requires redefining how ==, &&, and, most significantly, if operations work.
Although the resulting code could be syntactically expressed in Java, such code would
be very verbose and not behave like normal Java code.
3 SuperGlue
The problems that are described in Section 2 occur when glue code is exposed to events
that communicates state changes between components. When event handling can be
hidden from glue code with standardized interfaces, these problems disappear and glue
code is much easier to write. However, the use of standardized interfaces to express state
transformations in glue code requires layering a non-standard semantics on top of the
original language. Instead, these standardized interfaces should be supported with their
own syntax. In SuperGlue, standardized interfaces are replaced with signal language
abstractions that represent mutable state declaratively as time-varying values.
The architecture of a SuperGlue program is illustrated in Figure 4. A SuperGlue pro-
gram is assembled out of components that interact by viewing each others’ state through
signals. A component views state through its imported signals, and provides state for
viewing through its exported signals. SuperGlue code defines program behavior by oper-
ating on and connecting the signals of the program’s components together. Components
in SuperGlue can be implemented either in SuperGlue or Java code, while components
are always assembled together with SuperGlue code. Components are implemented in
Java code according to special Java interfaces that are described in Section 5. For the rest
of this section, we focus on the SuperGlue code that assembles components together.
SuperGlue: Component Programming with Object-Oriented Signals 211
atom Thermometer {
export temperature : Int;
}
atom Label {
import text : String ;
import color : Color ;
}
This code connects the text signal that is imported into the view component to an ex-
pression that refers to the temperature signal that is exported from the model com-
ponent. Because of this connection and the Java implementations of the Label and
Thermometer atoms, whenever the temperature measured by the model component
changes, the text displayed in the view component is updated automatically to reflect
this change.
Connections are expressed in rules with conditions that guard when the connections
are able to connect signals. When all the conditions of a rule evaluate to true, the rule is
active, meaning that the source expression of its signal connection can be evaluated and
used as the sink signal’s value. Connection rules in SuperGlue are expressed as C-like
if statements. As an example of a rule, consider the following code:
This code connects the foreground color of the view component to the color red when
the current temperature measured by the model component is greater than 30. When
212 S. McDirmid and W.C. Hsieh
Fig. 6. Glue code that causes the color of the view label component to change according to the
current temperature measured through the model thermometer component
the current temperature is not greater than 30, the condition in this code prevents red
from being used as the for the foreground color of the view component.
Although rules in SuperGlue resemble if statements in an imperative language, they
have semantics that are declarative. Conditions are evaluated continuously to determine
if the connections they guard are active. In our example, the current temperature can
dynamically go from below 30 to above 30, which causes the view component’s fore-
ground color to become red. In SuperGlue’s runtime, this continuous evaluation is trans-
parently implemented with event handling that activates the port connection when the
current temperature rises above 30.
Multiple rules that connect the same signal form a circuit that controls how the sig-
nal is connected during program execution. At any given time, any number of rules in
a circuit can be active. If all rules in a signal’s circuit are inactive, then the signal is un-
connected. If exactly one rule in a circuit is active, then the circuit’s signal is connected
according to that rule. It is also possible that multiple rules in a circuit are active at the
same time, while only one of these rules can connect the circuit’s imported signal. If
these rules cannot be prioritized, then the signal is connected ambiguously.
To explicitly prioritize rules in a circuit, rules can be expressed in the body of an
else clause so that they are never active when rules that are expressed in the body of the
corresponding if clause are active. As an example of a circuit, consider the glue code in
Figure 6, which continuously connects a different color to the view label’s foreground
color depending on the current temperature. As the current temperature falls below 30,
the foreground color of the view label changes from red to black. Likewise, as the
current temperature falls below 0, the foreground color of the view label changes from
black to blue. The color connection code in Figure 6 forms a circuit that is illustrated in
Figure 7. Conditions, which are ovals, control whether a connection is active based on
their test inputs. Multiple connections are organized into a switch that passes through
the highest priority connection, which is closest to hi, that is currently active.
With the basic connection model that has been described in this section, each signal
connection in a program is encoded separately. This model has two limitations:
– Programs that deal with stateful graph-like structures such as lists and trees cannot
be expressed very effectively. Graph-like structures are unbounded in their sizes
and therefore cannot be expressed as a fixed number of direct connections between
components.
– Many connections conform to patterns that are repeated many times within the
same program or across different programs. If each connection must be encoded
separately, then these patterns cannot be modularized and connection code can be-
come very repetitive.
SuperGlue: Component Programming with Object-Oriented Signals 213
thermometer label
temperature text
> 30 < 0
hi
red
switch
blue color
black
lo
Fig. 7. An illustration of a circuit that connects to the color signal of the view label component;
rounded rectangles are components; boxes that end with triangles are signals; ovals are conditions
with outgoing results; and small boxes activate connections according to their incoming condition
results; the multiplexor “switch” only passes through the highest (closest to hi) active connection
If unaddressed, these two limitations prevent the effective construction of real programs
in SuperGlue. Consider the email client that is used as an example in Section 2. This
email client consists of a user-interface tree that displays a hierarchy of mailboxes and
email folders. Because the size of this tree is not known until run-time, it cannot be
expressed with a fixed number of connections. Additionally, the connections in this
tree are largely homogeneous; e.g., the connections used to relate one tree node to one
email folder are repeated to relate another tree node to another email folder. As a result,
specifying each connection individually would result in very repetitive code.
atom TreeView {
inner Node {
import text : String ;
import children : List[T : Node];
}
import root : Node;
export selected : List[T : Node];
}
Fig. 8. The TreeView atom and the Node inner object type that is nested in the TreeView
atom
atom Mailbox {
inner Message {...}
inner Folder {
export sub folders : List[T : Folder];
export messages : List[T : Message];
}
export root folder : Folder;
}
Fig. 9. The Mailbox atom and the Message and Folder inner object types that are nested in
the Mailbox atom
size signal and a signal of type Node for each of its elements; e.g., the children sig-
nal contains [0] and [1] signals, which are both of type Node. We describe the List
trait in Section 3.3.
At run-time, an object in SuperGlue is a vertex in the program’s connection graph
that is connected to other objects. The declared type of a signal describes the object that
is attached to the signal. For example, an imported root signal of declared type Node
is attached to a Node object. The type of a signal does not restrict how the signal can
be used in a connection. In fact, the types involved in the same connection do not have
to be related in any way. Instead, types describe how objects are connected as a result
of connections. For example, consider the following SuperGlue code, which uses the
Mailbox atom that is declared in Figure 9:
This rule connects an exported root folder signal of type Folder to an imported
root signal of type Node. The Folder and Node types of the connection’s objects are
entirely unrelated. Despite the unrelated types, this connection is allowed because other
rules can resolve the incompatibilities that occur between these two objects.
Unlike components, inner objects are not instantiated in glue code. Instead, the cre-
ation and identity of an inner object is managed inside its containing component. As a
result, inner object types can describe a component’s interface without revealing details
about how the component is implemented. For example, the folder and message objects
SuperGlue: Component Programming with Object-Oriented Signals 215
This code declares a node variable, which abstracts over all Node objects in the fol-
derView component. Variables must be bound to values when they are used in rules.
A variable is bound to a value when it is used as a connection target, which means its
signal is being connected in the connection rule. As an example of how a variable is
bound when it is a connection target, consider the following code:
Because the node variable in this code is the target of a connection, it is always bound
to some tree node object when the rule is evaluated. The way that variables are bound
when they are used as connection targets resembles how procedure arguments are bound
in a procedural language, where evaluating a connection binds the connection’s target
to a value is analogous to how calling a procedure binds the procedure’s arguments to
values.
In conjunction with variables, rules can abstract over connections in a program by
identifying how objects are connected with connection queries. A connection query is
a condition of the form sink = source, where sink and source are expressions.
A connection query succeeds only if sink is connected to a value that is equivalent to
source, where unbound variables referred to in source can become bound to facilitate
this equivalence. Whether a variable can be bound to a value depends on the value’s type
being compatible with the variable’s type. As an example, consider the following code:
1. The node variable, which is the target of the rule defined in this code, is bound to
a Node object whose children signal is being accessed.
2. The connection query node = folder checks if the targeted Node object is con-
nected to a Folder object of a Mailbox component. If it is, then the folder
variable is bound to the connecting Folder object.
3. If the connection query is true, the imported children signal of the Node object
is connected to the exported sub folders signal of the Folder object.
Variables and connection queries allow rules in SuperGlue to be reused in multiple
connection contexts. For example, because of the following rule, the root node object
of the folder view tree is connected to the root folder object of a mailbox:
folderView.root = mailbox.root folder;
216 S. McDirmid and W.C. Hsieh
atom TableView {
inner Row { ... }
import rows : List[T : Row];
}
let messageView = new TableView;
var folder : Mailbox.Folder;
if (folderView.selected.size = 1 &&
folderView.selected[0] = folder)
messageView.rows = folder.messages;
Fig. 10. The TableView atom and code that describes how the rows of a message view table
are connected in an email client
When the folder view tree’s implementation accesses the children signal of its root
node object, the node variable target of the following rule, which was just described, is
bound to the root node object:
Because of the connection query in this rule, the folder variable is bound to the
root folder object of the mailbox object. As a result, the children signal of the root
node object is connected to the root folder object’s sub folders signal. In turn, the
same rule connects the folderView.root.children[0].children signal to the
mailbox.root folder.sub folders[0].sub folders signal, and so on. Despite
the recursion, this rule will terminate because it is only evaluated when a tree node is
expanded by a user.
Our use of variables and connection queries to build trees resembles how hierar-
chical structures are often built in user interfaces using conventional object-oriented
languages. For example, in Java’s Swing [24] library, TreeModel objects often exam-
ine the run-time types of objects, which are used as nodes, to compute their child nodes.
Compared to this approach, expressing a tree in SuperGlue requires less code because
of SuperGlue’s direct support for expressing connection graphs.
SuperGlue’s signal and object-oriented abstractions are designed to work seamlessly
together. Circuits that change how signals are connected at run-time consequently cha-
nge how objects are connected. Additionally, because signal expressions can be used
as the source of a connection query, if and how a connection query is true can change
at run-time. As an example, consider the SuperGlue code in Figure 10, some of which
was presented in Section 1. The second condition of the rule defined in this code is
a connection query that tests if the first and only selected node of a folder view tree is
connected to a folder object. As node selection in the folder view changes, the condition
can become false if the newly selected node is not a folder object. Alternatively, a
different folder object can become bound to the folder variable if the newly selected
node is another folder. The latter situation would cause the rows of the message view
table to be connected to a different list of email messages, which in turn causes the
message view table to be refreshed.
SuperGlue: Component Programming with Object-Oriented Signals 217
3.2 Traits
SuperGlue traits are similar to Java interfaces in that they lack implementations. Traits
serve two purposes: they enable the reuse of signal declarations in different component
prototypes; and they enable coercions that adapt incompatible objects by augmenting
them with new behavior.
Unlike the signals that are declared in atoms and compounds, signals are declared
in traits with the port keyword, meaning whether they are imported or exported is not
fixed. As an example, the following SuperGlue code declares the Labeled trait with
one label signal:
trait Labeled { port label : String; }
An atom or compound can extend a trait by specifying if the trait’s signals are imported
or exported. As an example, the following code declares that the Node inner object type
imports the signals of the Labeled trait:
atom TreeView { inner Node imports Labeled { ... } }
Whenever an object that imports a trait is connected to another object that exports the
same trait, the trait’s declared signals in both objects are connected by default. As an
example, first consider having the Folder inner object type of a Mailbox atom export
the Labeled trait:
atom Mailbox { inner Folder exports Labeled { ... } }
When Node objects are connected to Folder objects, the rule node.label = fol-
der.label is implied and does not need to be specified in glue code.
The example in the previous paragraph is not very modular: the Folder inner object
type, which is an email concern, should not implement the Labeled trait, which is a
user-interface concern. In general, components should only declare signals that are nec-
essary to express their state and capabilities. Traits for other concerns can be externally
implemented as coercions by using variables and connection queries (Section 3.1) to
specify how the components’ signals are translated between the traits’ signals. As an
example of a coercion, consider the SuperGlue code in Figure 11, where the Folder
object does not implement the Labeled trait. Instead, the rule in Figure 11 specifies
how the label signal of an object that imports the Labeled trait is connected when
the object is connected to a Folder object. This rule then applies to each tree Node
object that is connected to an email Folder object.
atom Mailbox {
inner Folder { export name : String; }
...
}
var labeled : Labeled ;
var folder : Mailbox.Folder;
if (labeled = folder) labeled.label = "Folder " + folder.name;
Fig. 11. An example of how a Labeled trait coercion is defined for Folder objects that are
nested in Mailbox components
218 S. McDirmid and W.C. Hsieh
3.3 Extension
SuperGlue’s support for type extension enables the refinement of inner object types and
traits. Extension in SuperGlue differs from extension in conventional object-oriented
languages in the way that it is used to prioritize rules in circuits. Such prioritization is
the only way in SuperGlue to prioritize rules that are specified independently.
Given two SuperGlue variables, one variable is more specific than the other variable
if the type of the former variable extends the type of the latter variable. Connections
are then prioritized based on the specificity of their involved variables. As an example,
consider the following code:
trait Bird { port canFly : Boolean; }
trait Penguin extends Bird ;
var aBird : Bird ;
var aPenguin : Penguin;
aBird.canFly = true;
aPenguin.canFly = false;
The Penguin trait extends the Bird trait, so the type of the aPenguin variable is more
specific than the type of the aBird variable. As a result, the connection of a penguin
object’s canFly signal to false has a higher priority than the connection of a bird
object’s canFly signal to true. In this way, penguin behavior overrides more generic
bird behavior.
Besides being used to prioritize connections, type extension is also used in Super-
Glue to refine inner object types. Generic typing is achieved in SuperGlue by refining
inner object types when their containers are extended. As an example, the List trait,
which is declared in Figure 12, describes list items through its T inner object type. The
T inner object type can be refined whenever the List trait is extended, imported, or
exported. As an example, consider the following code:
atom TableView {
inner Row { ... }
inner Rows imports List {
refine T extends Row;
}
import rows : Rows;
}
In this code, the Rows inner object type is declared to extend the List trait through
the imports keyword. As a result, the Rows inner object type has the List trait’s
T inner object type, which itself can be refined to extend the Row inner object type
declared in the TableView component prototype. As a result, any element of the rows
list will be an object that is or is connected from a row object. The refinement of a single
inner object type in a trait is a common operation and is supported by SuperGlue with
syntactic sugar so that the above SuperGlue code can be re-written as follows:
atom TableView {
inner Row { ... }
import rows : List[T : Row];
}
In this code, the colon operator is used twice as a short hand for extension: the colon
expresses that the T inner object type from the List trait extends Row and that the type
of the rows signal imports the List trait. Similar syntax was used to declare list signals
in Figure 8, Figure 9, and Figure 10.
Alternatively, SuperGlue provides syntactic sugar to access list elements; e.g., table-
.rows[0] is equivalent to the above SuperGlue code. Array signal queries are also
used to express arithmetic expressions. For example, the expression x + y is syntactic
sugar for x.plus(operand = y).result. We use query syntax rather than argument
binding for an important reason: the coercions that are described in Section 3.2 can be
directly applied to query bindings, which would be problematic with parameter passing
semantics.
So that entire interactive programs can be expressed in SuperGlue, SuperGlue sup-
ports streams that can be used to manipulate discrete events and imperative commands.
There are two kinds of streams in SuperGlue: event streams, which intercept discrete
events, and command streams, which perform imperative commands. As an example of
how streams are used, consider the following SuperGlue code that implements delete
email message behavior:
let delete button = new Button;
on (delete button.pushed)
if (msg : Mailbox.Message = messageView.selected)
do msg.delete;
220 S. McDirmid and W.C. Hsieh
The last three lines of this code use streams. The on statement intercepts events where
the delete button is pushed. When the delete button is pushed, each email message se-
lected in the message view table is deleted by executing its delete command stream
in a do statement. Streams in SuperGlue are convenience abstractions that enable im-
perative programming without sacrificing the benefits of SuperGlue’s signals. Unlike
signals, we do not claim that there is a significant advantage to using SuperGlue to
express components interactions through streams.
4 Evaluation
Our evaluation of SuperGlue focuses on our claim that SuperGlue can substantially
reduce the amount of glue code needed to build an interactive program out of com-
ponents. More importantly, we claim that this code reduction corresponds to a similar
reduction in complexity. This section evaluates our claims using the case study of im-
plementing a complete email client in SuperGlue, which has been used as our primary
example in this paper. We then compare this SuperGlue implementation of an email
client to a feature-equivalent implementation in Java. As part of this comparison, we
have designed a strategy to compare complexity that accounts for differences between
SuperGlue and Java syntax.
Our email client is composed of user-interface and email components. For this case
study, we have implemented the necessary SuperGlue component libraries by wrapping
the Swing [24] and JavaMail [23] class libraries, which are used directly in the Java
email client. How a Java class library is wrapped into a SuperGlue component library is
described in Section 5. By far, the most complicated wrapping involves Swing’s JTree
and JTable classes, which require 444 and 611 lines of Java code, respectively. This
code is devoted to translating the Java-centric abstractions of the original classes into
SuperGlue signal abstractions. Component wrappers involve a lot of code that is not
counted in our case study because the resulting component libraries are reusable in
multiple applications. On the other hand, the need for wrapper code represents a signif-
icant amount of complexity that must be amortized by reusing the components in many
programs.
A screen shot of the SuperGlue email client is shown in Figure 13. Our comparison
case study is organized according to the code that is needed to express the following
email client features:
– Navigation, which allows a user to navigate mailboxes, folders, and messages.
Navigation is divided into three views: a folder view tree, which views the folders
of installed mailboxes, a message view table, which views rows of message headers,
and a body view form, which views the contents of a message. In Figure 13, the
folder view is in the upper left-hand corner, the message view is in the upper right-
hand corner, and the content view is in the bottom portion of the screen shot.
– Deletion, which allows a user to delete email messages. Deleted messages are high-
lighted in the message view table, and the user can expunge deleted messages in a
folder that is selected in the tree view.
– Composition, which allows a user to compose and send a new message, and reply
to an existing message.
SuperGlue: Component Programming with Object-Oriented Signals 221
The methodology used in our comparison case study involves measuring two metrics
in each implementation: lines of code and number of operations. While line counts are
accurate measures of verbosity, they are not necessarily accurate measures of complex-
ity. Verbosity and complexity are only loosely related, and code that is more verbose
can aid in readability and is not necessarily more complicated. For this reason, we also
measure the number of operations needed to implement a feature. We count only op-
erations that are defined by libraries and not built into the programming language. We
do not count type declarations, local variable assignments, control flow constructs, and
so on, which contribute to verbosity but do not make a library more difficult to use. For
example, a method call in Java or a signal connection in SuperGlue are both counted
as an operation each, while variable uses in both Java and SuperGlue are not counted
as operations. Because the operations we count are related to using a library, they are a
more accurate measure of complexity than line count.
The results of our comparison are shown in Figure 14. In these results, the line count
and operation metrics are similar for each feature so, at least in this case study, opera-
tion density per line is similar between SuperGlue and Java. By far the largest reduction
in program complexity is obtained in the SuperGlue implementation of the navigation
feature. This reduction is large because the navigation feature uses trees, tables, and
forms, which are components with a large amount of signal-based functionality. Be-
sides the navigation feature, the deletion and composition features involve only a minor
amount of continuous behavior, and so their implementations do not benefit very much
from SuperGlue. Overall, SuperGlue’s reduces the amount of code needed to imple-
ment an email client by almost half because the navigation feature requires much more
Java code to implement than the other two features combined.
Fig. 14. A comparison of email client features as they are implemented in SuperGlue and Java
222 S. McDirmid and W.C. Hsieh
According to the results of this initial case study, SuperGlue can provide a code re-
duction benefit when an interactive program contains a significant amount of continuous
behavior that can be expressed as signals. As mentioned in Section 3.4, SuperGlue’s use
is of little benefit in programs that involve a lot of non-continuous behavior; i.e., behav-
ior that is discrete or imperative.
Our prototype of SuperGlue can execute many realistic programs, including the email
client described in Section 4. Our prototype consists of an interpreter that evaluates the
circuits of signals at run-time to drive component communication. This prototype can
deal with components that are implemented either in SuperGlue (compounds) or Java
(atoms). The discussion in this section describes informally the syntax, semantics, and
implementation of SuperGlue. For space reasons, our discussion omits the following
language features: else clauses, array signals, and streams. Our discussion also does
not cover syntactic sugar that was described in Section 3.
The syntax used to declare components and traits is expressed informally in Figure 15.
Notation: the horizontal over-arrow indicates zero-to-many repetition, while the verti-
cal bar indicates choice. A program consists of a collection of atoms, compounds, and
traits. Atom s have Java implementations, which are described in Section 5.3. Com-
pounds are implemented in SuperGlue code (glue-code) to glue other constituent
components together. SuperGlue code is described in Section 5.2. SuperGlue code al-
ways exists in compound components, and a self-contained SuperGlue program is a
compound that lacks imports and exports.
Atoms, compounds, traits, and inner object types all have declarations (decls) of
their extended traits, signals, and inner object types. Sub-typing (<t ) is established by
trait extension for components, traits, and inner object types, and by inner object type
extension for inner object types. Inheritance behaves in the usual way; i.e., a type in-
herits all signals and inner object types of its super-types. An inner object type is a
subtype of any inner object type it mirrors in its container’s super-types: if B <t A, then
B.anInner <t A.anInner if anInner is declared in A. Additionally, inner object types
are extended virtually: if A.anInnerY <t A.anInnerX and B <t A, then B.anInnerY
<t B.anInnerX . Because of this virtual extension behavior, as inner object types are re-
fined via the refine keyword, all extending inner object types with the same container
type automatically inherit the refinements.
When declaring a signal or extending a trait within the declaration of an atom or
compound, the import/export polarity of the declared signal or the extended trait’s sig-
nals must be specified via the import(s) and export(s) keywords. When declaring
a signal or extending a trait within a trait declaration, import/export polarity is not spec-
ified so the port or extends keywords are used instead. When a trait is imported or
exported into a component or inner object type of a component, the trait’s signals are
inherited under the same import/export polarity, which also applies to the signals of the
SuperGlue: Component Programming with Object-Oriented Signals 223
−−−−−−−−−−−−−−−−−−−−−−−→
program ≡ atom | compound | trait
atom ≡ atom AtomName decls
compound ≡ compound CompoundName decls with { glue-code }
trait ≡ trait TraitName decls
−−−−−−−−−−−−−→
−−−−−−−−−→
decls ≡ implements { inner | signal }
−−−−−−−−−−−−−−→
inner ≡ (inner | refine) InnerName extends AnInner decls
implements ≡ (imports | exports | extends) ATrait
signal ≡
(import | export | port) signalName : (ATrait | AnInner)
Fig. 15. Informal syntax for SuperGlue programs as well as components and traits
trait’s inner object types. Likewise, when a trait is used as a signal’s type, the signal
inherits the trait’s signals with its own import/export polarity.
The connection queries that are a rule’s antecedents perform a uni-directional form of
unification in logic programming languages: the query ev = ew is true if ev is connected
to an expression that is equivalent to ew , where variables referred to by ew can be bound
to expressions in ev to meet the equivalence condition. A connection query successively
evaluates ev until an expression is yielded that is equivalent to ew , or until an end-point
expression is reached, in which case the connection query fails.
224 S. McDirmid and W.C. Hsieh
−−−−−−−−−−−−−−−−→
glue-code ≡ let | var | rule
rule ≡ if (− −−−→ e .aSignal = e ;
query) x y
let ≡ let instanceName = new aComponent;
var ≡ var varName : (aTrait | aComponent | e.anInner)
query ≡ e v = ew
e ≡ anInstance | aVar | e.aSignal | aConstant | anException
Fig. 16. Informal syntax for the SuperGlue code that is inside a compound component
Signals that are imported into atoms or compounds or are exported from compounds
are implemented with SuperGlue code. Evaluating a compound’s exported signals is
similar to evaluating imported signals, although the evaluation environments between
the outside and inside of a compound are different. Signals that are exported from atoms
are implemented with Java code. Atoms implement their exported signals and use their
imported signals according to a Java interface that is described in Figure 17. This inter-
face declares an eval method that allows an atom’s Java code to implement the circuits
of its exported signals directly and also enables access to the SuperGlue-based circuits
of an atom’s imported signals.
The eval method of the Signal interface accepts an observer that is notified when
the result of the eval method has changed. When implementing a signal, Java code can
install this observer on the Java resource being wrapped by the atom. For example, when
implementing the exported selected nodes signal of a user-interface tree, this observer can
be translated into a selection observer that is installed on a wrapped JTree object. When
a signal is being used, Java code can provide an observer to notify the Java resource that
is being wrapped of some change. For example, when using the imported rows signal of
SuperGlue: Component Programming with Object-Oriented Signals 225
interface Signal {
Value eval(Value target, Context context, Observer o);
Signal signal(Value target, Port port);
} interface Observer {
void notify(Context context);
}
The observers that are implemented in and used by Java code form the underlying in-
frastructure for continuous evaluation in SuperGlue, where evaluation results are up-
dated continuously as state changes in the program’s atoms. When an imported signal
is evaluated from within an evaluating atom, the atom can specify an observer that is
installed on dependent evaluations. Eventually, when evaluation reaches the exported
signals of other atoms, this observer is installed in those atoms. When state changes in
any of these other atom implementations, the evaluating atom is notified through the ob-
server it defined. The evaluating atom can then re-evaluate the changed imported signal
to refresh its view of state in other components.
When the state of an atom’s imported signal changes, the atom will uninstall its
observer on the signal’s old evaluation and install its observer on the signal’s new eval-
uation. The atom may also compute changes in the state of its own exports, where ob-
servers that are defined in other atoms and are installed on the atom are notified of these
changes. A naive implementation of continuous evaluation processes all state changes
as soon as they occur. However, this strategy results in glitches that cause atoms to ob-
serve inconsistent old and new signal evaluations. In the case of a glitch, an observer is
not uninstalled and installed in the correct evaluation contexts, and therefore atoms will
begin to miss changes in state.
Avoiding glitches in SuperGlue involves a sophisticated interpreter that adheres to
the following guidelines:
This is an issue when a signal changes rapidly so that the processing of its new
evaluations could overlap.
– A change in the value of an atom’s exported signal is processed in two phases.
The first phase only discovers what exported signals in other atoms have changed
as a result of the first exported signal’s change. As a result of this discovery, the
second phase updates observers for each changing signal together. The separation
of these two phases allow multiple dependent signals to change together and avoid
situations where they exhibit combinations of evaluations that are inconsistent; e.g.,
where both b and !b evaluate to true.
Cycles in the signal graph occur when the evaluation of a signal expression yields itself.
In SuperGlue, cycles are detected at run-time and rejected with a cyclic connection ex-
ception. Although cycles created solely in SuperGlue code can be statically detected, a
cycle can also occur because of dependencies between an atom’s imported and exported
signals. For example, a table view imports a list of rows and exports a list of these rows
that are selected. Connecting the exported selected rows of a table view to its imported
rows creates a cycle.
Even without cycles in the signal graph, non-termination can still occur in a Super-
Glue program. Because SuperGlue lacks recursion that is strong enough to traverse data
structures, SuperGlue code by itself will never be the sole cause of non-termination.
However, atoms can be implemented with arbitrary Java code, and how the atom is
connected by SuperGlue code can influence whether this Java code terminates. For ex-
ample, a tree view atom will not terminate if it is configured to expand all tree nodes
and is connected to a tree model of an unbounded size.
Our prototype implementation of SuperGlue is not tuned in any way for performance.
According to microbenchmarks, SuperGlue code is between two and 144 times slower
than equivalent Java code, depending on the connections and evaluations being com-
pared. As a worst case, the continuous evaluation of a circuit whose highest-priority ac-
tive rule is always changing is 144 times slower than Java code that implements the equiv-
alent behavior. For the interactive programs that we have explored so far, most of the work
is being performed in atoms that are implemented in Java, so SuperGlue’s performance
penalty is not very noticeable. On the other hand, if SuperGlue code is to be used in
more computationally intensive ways, higher performance will be necessary. Given Su-
perGlue’s lack of strong recursion, the direct effects of SuperGlue code on performance
is linearly related to the number of rules in the program. However, as with cycles and ter-
mination, the presence of atoms implemented with arbitrary Java code make it difficult
to reason about how overall program performance is affected by SuperGlue code.
6 Related Work
The signal abstraction originates from the functional-reactive programming (FRP) lan-
guage Fran [10], which extends Haskell with signals. FRP is itself based on vari-
ous synchronous data-flow languages such as Esterel [4], Lustre [6], and Signal [3].
SuperGlue: Component Programming with Object-Oriented Signals 227
More recent FRP languages include Fran’s successor, Yampa [14], and FatherTime (Fr-
Time) [8], which extends Scheme with signals. SuperGlue differs from Fran and Yampa
and resembles FrTime in that it supports an asynchronous rather than synchronous con-
currency model. By supporting an asynchronous concurrency model, SuperGlue code
can easily integrate with imperative Java code, although we must deal with “glitches”
that are described in Section 5. Unlike SuperGlue, both Fran and FrTime support sig-
nals with higher-order and recursively-defined functions. While the use of functions
presents programmers with a well-understood functional programming model, func-
tion calls obscure the state-viewing relationships between components when compared
to SuperGlue’s connection-based programming model. Frappé [9] is an implementation
of FRP for Java that is based on Fran. Because signals in Frappé are manipulated purely
as Java objects, it suffers from the verbosity problems described in Section 2.
SuperGlue connections resemble simple constraints. Kaleidoscope [11, 12] supports
general constraint variables whose values can be updated imperatively. Kaleidoscope
constraints are more powerful than SuperGlue signals: its constraints are multi-way
and it directly supports the expression of imperative code. However, these features also
make Kaleidoscope more complicated than SuperGlue: the mixing of constraint and
imperative constructs complicates the language, and the duration of its constraints must
be explicitly programmed.
SuperGlue supports the expression of graph-like state with object-oriented abstrac-
tions. Inner objects with virtual refinement in SuperGlue resembles pattern nesting in
BETA [17]. As described in Section 3.2, traits can be used to add new behavior to
objects. As a result, SuperGlue’s objects and traits are similar to the mixin [5], open
classes [7], or views [21] that are used to add new methods to existing classes. Super-
Glue’s rules are similar to rules in logic programming languages such as Prolog [22],
which query and prove logical relations. Although connections in SuperGlue are similar
to simple relations, SuperGlue does not use first-order logic to manipulate connections.
While logic programming focuses on expressing computations declaratively, SuperGlue
focuses on the expression of glue code with as little expressiveness as possible. In this
way, SuperGlue is more similar to SQL than Prolog.
Connections have long been used to describe dependencies between components–
see various work on architecture description languages [16, 18, 20]. Visual tools in Jav-
aBeans can be used to wire the bean properties of components together. ArchJava [1]
is a language that uses the port-connection model to specify how components can inter-
act in the implementation of a software architecture. ArchJava has a custom connector
abstraction [2] that can be used to express a wide-variety of port types. In contrast,
although SuperGlue only supports signals, it can support them with abstractions that
cannot be easily expressed in ArchJava.
7 Conclusion
SuperGlue combines signal, object, and rule abstractions into a novel language for
building interactive programs out of components. By focusing on glue code instead
of component implementations, SuperGlue’s abstractions sacrifice expressiveness so
that glue code is easy to write. One consequence of this tradeoff is that SuperGlue does
228 S. McDirmid and W.C. Hsieh
not have the control flow or recursion constructs found in general-purpose program-
ming languages. The use of these constructs in glue code is often neither necessary nor
desirable as they can often be replaced by rules that are more concise.
SuperGlue’s design demonstrates how connections, an object system, and a declara-
tive rule system can be effectively integrated together. The key to this integration is the
use of object-oriented types to abstract over component connection graphs that are po-
tentially unbounded in size. SuperGlue’s support for connection prioritization via type
extension and coercions via traits are all directly related to the use of types to abstract
over connections.
Although the features presented in this paper are complete, SuperGlue is still under de-
velopment. We are currently refining our implementation of array signals, which were
briefly described in Section 3.4. Additionally, we are improving component instantia-
tion to be more dynamic. As presented in this paper, our language only supports a fixed
number of components per program, which is too restrictive for many programs.
We have implemented SuperGlue with an initial prototype that is capable of running
the case study described in Section 4. SuperGlue’s implementation can be improved
in two ways. First, compilation rather than interpretation can improve performance so
SuperGlue can be used in computationally-intensive areas. Second, SuperGlue should
be implemented in a way so that its programs can be developed interactively. This would
allow the editing of a program’s circuits while the program is running.
We plan to explore how SuperGlue can be used to build more kinds of interactive
programs. We are currently investigating how a complete user-interface library in Su-
perGlue can enable user-interface programs that are more interactive. Beyond user in-
terfaces, many programs can benefit from being more interactive than they currently
are. For example, programming tools such as compilers are more useful when they are
interactive. With the appropriate component libraries, SuperGlue would be a very good
platform for developing these new kinds of interactive programs.
Acknowledgements
We thank Craig Chambers, Matthew Flatt, Gary Lindstrom, Gail Murphy, and the anony-
mous reviewers for comments on drafts of this paper and the preceeding research. Sean
McDirmid and Wilson Hsieh were supported during this research in part by NSF CA-
REER award CCR–9876117. Wilson Hsieh is currently employed at Google.
References
1. J. Aldrich, C. Chambers, and D. Notkin. Architectural reasoning in ArchJava. In Proceedings
of ECOOP, volume 2374 of Lecture Notes in Computer Science, pages 334–367. Springer,
2002.
2. J. Aldrich, V. Sazawal, C. Chambers, and D. Notkin. Language support for connector ab-
stractions. In Proceedings of ECOOP, Lecture Notes in Computer Science. Springer, 2003.
SuperGlue: Component Programming with Object-Oriented Signals 229
1 Introduction
Software development for mobile devices is given a new impetus with the advent
of mobile networks. Mobile networks surround a mobile device equipped with
wireless technology and are demarcated dynamically as users move about. Mo-
bile networks turn isolated applications into cooperative ones that interact with
their environment. This vision of ubiquitous computing, originally described by
Weiser [38], has recently been termed Ambient Intelligence (AmI for short) by
the European Council’s IST Advisory Group [12].
Mobile networks that surround a device have several properties that distin-
guish them from other types of networks. The most important ones are that
connections are volatile (because the communication range of wireless technol-
ogy is limited) and that the network is open (because devices can appear and
disappear unheraldedly). This puts extra burden on software developers. Al-
though system software and networking libraries providing uniform interfaces to
the wireless technologies (such as JXTA and M2MI [21]) have matured, devel-
oping application software for mobile networks still remains difficult. The main
reason for this is that contemporary programming languages lack abstractions
to deal with the mobile hardware characteristics. For instance, in traditional
programming languages failing remote communication is usually dealt with us-
ing a classic exception handling mechanism. This results in application code
Research Assistant of the Fund for Scientific Research Flanders, Belgium (F.W.O.).
Funded by a doctoral scholarship of the Institute for the Promotion of Innovation
through Science and Technology in Flanders (IWT-Vlaanderen), Belgium.
polluted with exception handling code because failures are the rule rather than
the exception in mobile networks. Observations like this justify the need for a
new Ambient-Oriented Programming paradigm (AmOP for short) that consists
of programming languages that explicitly incorporate potential network failures
in the very heart of their basic computational steps.
The goal of our research is threefold:
– First, we want to gain insight in the structure of AmOP applications.
– Second, we want to come up with AmOP language features that give pro-
grammers expressive abstractions that allow them to deal with the charac-
teristics of the mobile networks.
– Third, we want to distill the fundamental semantic building blocks that are at
the scientific heart of AmOP language features in the same way that current
continuations are at the heart of control flow instructions and environments
are the essence of scoping mechanisms.
As very little experience exists in writing AmOP applications, it is hard to
come up with AmOP language features based on software engineering require-
ments. Therefore, our research is grounded in the hardware phenomena that
fundamentally distinguish mobile from stationary networks. These phenomena
are listed in section 2.1 and form the basis from which we distill a number of
fundamental programming language characteristics that define the AmOP par-
adigm. These characteristics are the topic of section 3. A concrete scion of the
AmOP paradigm — called AmbientTalk — is presented starting from section 4.
AmbientTalk’s design is directly based on our analysis of the hardware phenom-
ena and features a number of fundamental semantic building blocks designed
to deal with these hardware phenomena. Since AmbientTalk was conceived as
a reflectively extensible language kernel, the semantic building blocks turn Am-
bientTalk into a language laboratory that allows us to investigate the language
features that populate the AmOP paradigm. Section 5 validates this by realising
three language features that facilitate high-level collaboration of objects running
on devices connected by a mobile network. The language features are used in a
concrete experiment we conducted, namely the implementation of an ambient
peer-to-peer instant messaging application that was deployed on smart phones.
2 Motivation
The hardware properties of the devices constituting a mobile network engender
a number of phenomena that have to be dealt with by distributed programming
languages and/or middleware. We summarize these hardware phenomena below
and describe how existing programming languages and middleware fail to deal
with them. These shortcomings form the main motivation for our work.
limited battery life) than traditional hardware. However, in the last couple of
years, mobile devices and full-fledged computers like laptops are blending more
and more. That is why we do not consider such restrictions as fundamental as
the following phenomena which are inherent to mobile networks:
RPC-based Middleware like Alice [13] and DOLMEN [29] are attempts that
focus mainly on making ORBs suitable for lightweight devices and on im-
proving the resilience of the CORBA IIOP protocol against volatile connec-
tions. Others deal with such connections by supporting temporary queuing
of RPCs [18] or by rebinding [30]. However, these approaches remain vari-
ations of synchronous communication and are thus irreconcilable with the
autonomy and connection volatility phenomena.
Data Sharing-oriented Middleware tries to maximize the autonomy of tem-
porarily disconnected mobile devices using weak replica management (cf.
Bayou [32], Rover [18] and XMiddle [40]). However, since replicas are not al-
ways synchronisable upon reconnection, potential conflicts must be resolved
at the application level. In spite of the fact that these approaches foster fruitful
ideas to deal with the autonomy characteristic, to the best of our knowledge,
they do not address the ambient resources phenomenon.
Publish-subscribe Middleware adapts the publish-subscribe paradigm [10]
to cope with the characteristics of mobile computing [7, 5]. Such middleware
allows asynchronous communication, but has the disadvantage of requiring
manual callbacks to handle communication results, which severely clutters
object-oriented code.
Tuple Space based Middleware [28, 24] for mobile computing has been pro-
posed more recently. A tuple space [11] is a shared data structure in which
processes can asynchronously publish and query tuples. Most research on
tuple spaces for mobile computing attempts to distribute a tuple space over
a set of devices. Tuple spaces are an interesting communication paradigm for
mobile computing. Unfortunately, they do not integrate well with the object-
oriented paradigm because communication is achieved by placing data in a
tuple space as opposed to sending messages to objects.
3 Ambient-Oriented Programming
In the same way that referential transparency can be regarded as a defining prop-
erty for pure functional programming, this section presents a collection of lan-
guage design characteristics that discriminate the AmOP paradigm from classic
234 J. Dedecker et al.
an ambient resource. For example, in tuple space based middleware this prop-
erty is achieved, because a process can publish data in a tuple space, which can
then be consulted by the other processes based on a pattern matching basis.
Another example is M2MI [21], where messages can be broadcast to all objects
implementing a certain interface.
We are not arguing that all AmOP applications must necessarily be based on
distributed naming. It is perfectly possible to set up a server for the purposes
of a particular application. However, an AmOP language should allow an object
to spontaneously get acquainted with a previously unknown object based on an
intensional description of that object rather than via a fixed URL. Incorporating
such an acquaintance discovery mechanism, along with a mechanism to detect
and deal with the loss of acquaintances, should therefore be part of an AmOP
language. We will refer to the combination of these mechanisms as ambient
acquaintance management.
3.5 Discussion
Having analysed the implications of the hardware phenomena on the design of
programming languages, we have distilled the above four characteristics. We will
henceforth refer to programming languages that adhere to them as Ambient-
oriented Programming Languages. Surely, it is impossible to prove that these
are strictly necessary characteristics for writing the applications we target. Af-
ter all, AmOP does not transcend Turing equivalence. However, we do claim
that an AmOP language will greatly enhance the construction of such appli-
cations because its distribution characteristics are designed with respect to the
hardware phenomena presented in section 2.1. AmOP languages incorporate
transient disconnections and evolving acquaintance relationships in the heart of
their computational model.
are avoided. The merit of the model is that it unifies imperative object-oriented
programming and concurrent programming without suffering from omnipresent
race conditions. We will henceforth use the term ‘active object’ or ‘actor’ inter-
changeably for ABCL/1-like active objects.
To avoid having every single object to be equipped with heavyweight concur-
rency machinery and having every single message to be thought of as a concurrent
one, an object model that distinguishes between active and passive (i.e. ordinary)
objects is adopted. This allows programmers to deal with concurrency only when
strictly necessary (i.e. when considering semantically concurrent and/or distrib-
uted tasks). Since passive objects are not equipped with an execution thread, the
“current thread” flows from the sender into the receiver, thereby implementing
synchronous message passing. However, it is important to ensure that a passive
object is never shared by two different active ones because this easily leads to
race conditions. AmbientTalk’s object model avoids this by obeying the following
rules:
– Containment. Every passive object is contained within exactly one active
object. Therefore, a passive object is never shared by two active ones. The
only thread that can enter the passive object is the thread of its active
container.
– Argument Passing Rules. When an asynchronous message is sent to an active
object, objects may be sent along as arguments. In order not to violate
the containment principle, a passive object that is about to leave its active
container this way, is passed by copy. This means that the passive object
is deep-copied up to the level of references to active objects. Active objects
process messages one by one and can therefore be safely shared by two
different active objects. Hence, they are passed by reference.
This pragmatic marriage between the functional actor model, the imperative
thread model and the prototype-based object model was chosen as the basis
for AmbientTalk’s distribution model. Active objects are defined to be Ambi-
entTalk’s unit of distribution and are the only ones allowed to be referred to
across device boundaries. Therefore, AmbientTalk applications are conceived as
suites of active objects deployed on autonomous devices. Several active objects
can run on a device and every active object contains a graph of passive objects.
Objects in this graph can refer to active objects that may reside on any device.
In other words, AmbientTalk’s remote object references are always references to
active objects. The rationale of this design is that synchronous messages (as sent
to passive objects) cannot be reconciled with the non-blocking communication
characteristic presented in section 3.2.
AmbientTalk does not know the concept of proxies on the programming lan-
guage level. An active object a1 can ‘simply’ refer to another active object a2 that
resides on a different machine. If both machines move out of one another’s com-
munication range and the connection is (temporarily) lost, a1 conceptually re-
mains referring to a2 and can keep on sending messages as if nothing went wrong.
Such messages are accumulated in a1 and will be transparently delivered after
the connection has been re-established. Hence, AmbientTalk’s default delivery
238 J. Dedecker et al.
policy strives for eventual delivery of messages. The mechanism that takes care
of this transparency is explained in section 4.4. First we discuss both layers of
AmbientTalk’s object model in technical detail.
Objects are created using the object(...) form2 . It creates an object by execut-
ing its argument expression, typically a block of code (delimited by curly braces)
containing a number of slot declarations. Slots can be mutable (declared with
:) or immutable (declared with ::). Mutable slots are always private and im-
mutable slots are always public (for the rationale of this design decision we refer
to [9]). Both method invocation and public slot selection use the classic dot no-
tation. Objects are lexically scoped such that names from the surrounding scope
can be used in the object(...) form. As illustrated by makeStack(size), the
form can be used in the body of a function in order to generate objects. Such a
function will be referred to as a constructor function and is AmbientTalk’s idiom
to replace the object instantiation role of classes. Objects can also be created
by extending existing ones: the extend(p,...) form creates an object whose
parent is p and whose additional slots are listed in a block of code, analogous
to the object(...) form. Messages not understood by the newly created object
are automatically delegated to the parent. Furthermore, a Java-like super key-
word can be used to manually delegate messages to the parent object. Following
the standard delegation semantics proposed by Lieberman [22] and Ungar [34],
references to this in the parent object denote the newly created child object.
Apart from objects, AmbientTalk features built-in numbers, strings, a null
value void and functions. However, these ‘functions’ are actually nothing but
methods in AmbientTalk. For example, the makeStack constructor function
is actually a method of the root object which is the global environment of
the AmbientTalk interpreter. Methods can be selected from an object (e.g.
myPush:aStack.push). Upon selection, a first-class closure object is created
which encapsulates the receiver (aStack) and which can be called using canoni-
cal syntax, e.g., myPush(element). Closure objects are actually passive objects
2
A ‘form’ is a Scheme-like special form, i.e., a built-in ‘function’ whose parameters
are treated in an ad hoc way. if(test,exp1,exp2) is another example of a form.
Ambient-Oriented Programming in AmbientTalk 239
with a single apply method. Finally, a syntactic sugar coating allows anony-
mous closures to be created given a list of formal parameters and a body, e.g.,
lambda(x,y) -> {x+y}. When bound to a name (e.g., as the value of a slot f or
when bound to a formal parameter f), a closure is called using canonical syntax,
e.g., f(1,2).
makeFriendFinder(age,hobbies)::actor(object({
init()::{ display("Friend Finder initialized!") };
beep()::{ display("Match Found - BEEP!") };
match(info)::{
if(and(age.isInRangeOf(info.age),
hobbies.intersectsWith(info.hobbies)),
{ thisActor#beep() })
}}))
Actors can be sent asynchronous messages using the # primitive which plays the
same role for actors as the dot notation for passive objects. E.g., if ff is a friend
finder (possibly residing on another cellular phone), then ff#match(myInfo)
asynchronously sends the match message to ff. The return value of an asynchro-
nous message is void and the sender never waits for an answer. Using the # oper-
ator without actual arguments (e.g., ff#match) yields a first-class message object
that encapsulates the sending actor (thisActor), the destination actor (ff) and
the name of the message. First-class messages are further explained in section 4.5
that describes AmbientTalk’s meta-level facilities. Finally, using the dot notation
for actors (resp. # for passive objects) is considered to be an error.
When passing along objects as arguments to message sends, caution is re-
quired in order not to breach the containment principle. In the case of synchro-
nous messages of the form po.m(arg1,...argn) between two objects that are
contained in the same actor, the arguments do not “leave” the actor and can
240 J. Dedecker et al.
one or more descriptive tags (e.g. strings) in its providedbox mailbox (using the
add operation described below). Conversely, an actor that needs other actors for
collaboration can listen for actors broadcasting particular descriptive tags by
adding these tags to its requiredbox mailbox. If two or more actors join by enter-
ing one another’s communication range while having an identical descriptive tag
in their mailboxes, the mailbox joinedbox of the actor that required the collabo-
ration is updated with a resolution object containing the corresponding descriptive
tag and a (remote) reference to the actor that provided that tag. Conversely, when
two previously joined actors move out of communication range, the resolution is
moved from the joinedbox mailbox to the disjoinedbox mailbox. This mech-
anism allows an actor not only to detect new acquaintances in its ambient, but
also to detect when they have disappeared from the ambient. It is AmbientTalk’s
technical realisation of the ambient acquaintance management characteristic dis-
cussed in section 3.4.
Mailboxes are first-class passive objects contained in the actor. Due to the
containment principle for passive objects, mailboxes can never be shared among
multiple actors. However, mailboxes are necessarily shared between the actor
and the AmbientTalk interpreter because this interpreter puts messages into the
mailboxes (e.g. upon reception of a message or upon joining with another actor).
To avoid race conditions on mailboxes, the interpreter is not given access to them
while the actor is processing a message because it may then manipulate its own
mailboxes. Mailboxes provide operators to add and delete elements (such as mes-
sages, descriptive tags and resolutions): if b is a mailbox, then b.add(elt) adds
an element to b. Similarly, b.delete(elt) deletes an element from a mailbox.
b.iterate(f) applies the closure f to all elements that reside in the mailbox b.
Moreover, the changes in a mailbox can be monitored by registering observers
with a mailbox: b.uponAdditionDo(f)(resp. b.uponDeletionDo(f)) installs a
closure f as a listener that will be triggered whenever an element is added to
(resp. deleted from) the mailbox b. The element is passed as an argument to
f. The closure is invoked when the mailbox’s actor is ready to accept the next
incoming message as to avoid internal concurrency.
The following code excerpt exemplifies these concepts by extending the friend
finder example of the previous section with ambient acquaintance management in
order for two friend finders to discover each other. The initialisation code shows
that the actor advertises itself as a friend finder and that it requires communi-
cation with another friend finder. When two friend finders meet, a resolution is
added to their joinedbox, which will trigger the method onFriendFinderFound
that was installed as an observer on that mailbox. This resolution contains a
tag slot (in this case "<FriendFinder>") and a provider slot corresponding to
the providing actor. The latter is sent the match message (as described in the
previous section).
makeFriendFinder(age,hobbies)::actor(object({
...as above...
onFriendFinderFound(aResolution)::{
aResolution.provider#match(makeInfo(age, hobbies))
};
242 J. Dedecker et al.
init()::{
provided.add("<FriendFinder>");
required.add("<FriendFinder>");
joinedbox.uponAdditionDo(this.onFriendFinderFound)
}}))
Every actor has a perpetually running thread that receives incoming messages
in the inbox and transfers them to the rcvbox after processing them. Message
reception is reified in the MOP by adding messages to an actor’s inbox which
can be intercepted by adding an observer to that mailbox. Message processing
is reified in the MOP by invoking the parameterless process method on that
message (which will execute the recipient’s method corresponding to the message
name) and by subsequently placing that message in the rcvbox. The latter event
can be trapped by installing an observer on that mailbox.
Futures [14, 23] are a classic technique to reconcile return values of meth-
ods with asynchronous message sends without resorting to manual callback
methods or continuation actors. A future is a placeholder that is immediatly
returned from an asynchronous send and that is eventually resolved with
the expected result. Computations that access an unresolved future block
until it is resolved. However, this contradicts the non-blocking communica-
tion characteristic of AmOP. AmbientTalk’s futures avoid this problem by
adopting the technique that was recently proposed in E [27]. It allows for
a transparent forwarding of messages sent to a future to its resolution and
features a when(aFuture, closure) construct to register a closure that is
to be applied upon resolving the future.
Due-blocks are similar to try-catch blocks. They consist of a block of code, a
handler and a deadline that is imposed on the transmission of all asynchro-
nous messages sent during the execution of the block. The handler is invoked
should the deadline expire. This mechanism is used by the messenger to vi-
sualize undelivered messages by greying them out in the GUI.
These language constructs are implemented by using and overriding the MOP
operations described in the previous section. We use a mixin-based technique to
implement them in a modular way: an AmbientTalk construct and its support-
ing MOP operations are grouped in what we call a language mixin; a function
that returns an extension of its argument with new meta-level behaviour (i.e.
it overrides send, createmessage, process and/or installs observers on mail-
boxes). The idea is to apply such a language mixin to a passive object before
that passive object is used to create an actor. This way, a newly created actor
will exhibit the required behaviour.
Given these three language abstractions, the code for the instant messenger
follows. An instant messenger in AmbientTalk is conceived as an actor created by
the constructor function makeInstantMessenger given a nickname, a guiActor
(representing the application’s graphical user interface) and a maxTimeOut value
that indicates how resilient the messenger will be w.r.t. volatile connections. The
actor’s MOP is extended with the three language constructs by applying the
DueMixin, the FuturesMixin and the AmbientRefMixin to the passive object
representing its behaviour. The usage of the language constructs is indicated in
comments.
makeInstantMessenger(nickname, guiActor, maxTimeOut)::
actor(AmbientRefMixin(FuturesMixin(DueMixin(object({
buddies : makeHashmap();
statusLine: "Available";
getStatusLine() :: { statusLine };
setStatusLine(newStatus) :: { statusLine := newStatus };
buddyAppears(buddyNick) :: {
when(buddies.get(buddyNick)#getStatusLine(), // FUTURES
lambda(status) -> { guiActor#onlineColor(buddyNick,status) })
};
buddyDisappears(buddyNick) :: {
guiActor#offlineColor(buddyNick)
};
addBuddy(buddyNick) :: {
bAmsg:thisActor#buddyAppears;
bDmsg:thisActor#buddyDisappears;
Ambient-Oriented Programming in AmbientTalk 245
bAmsg.getArgs().set(1,buddyNick);
bDmsg.getArgs().set(1,buddyNick);
buddies.put(buddyNick, makeAmbientRef("<Messenger id="+buddyNick+">", bAmsg, bDmsg))
}; // AMBIENT REFERENCES
receiveText(from, text) :: {
guiActor#showText(from,text)
};
failedDelivery(msg) :: {
text: msg.getArgs().get(2);
guiActor#unableToSend(text)
};
talkTo(buddyNick,text) :: {
due(maxTimeOut, lambda() -> { // DUE BLOCKS
buddies.get(buddyNick)#receiveText(nickname,text)
}, thisActor#failedDelivery)
};
init() :: {
guiActor#register(thisActor);
broadcast("<Messenger id="+nickname+">")
}})))));
An instant messenger actor has a slot statusLine and a slot buddies map-
ping nicknames to ambient references that represent instant messengers on other
mobile devices. Upon creation, its init method registers the messenger with the
GUI and broadcasts its presence in the ambient. The latter is accomplished by
the broadcast function which is merely a thin veneer of abstraction to hide
the fact that a descriptive tag is added to the providedbox of the actor (i.e.
broadcast(tag)::{providedbox.add(tag)}). Instant messenger actors have
three methods (addBuddy, setStatusLine and talkTo) that are called from the
GUI when the user adds a buddy (given a nickname), changes his status line or
sends a text message to one of his buddies. Two other methods (getStatusLine
and receiveText) are invoked by other instant messengers to retrieve a buddy’s
status line and to send a message to a buddy.
Upon adding a buddy, addBuddy creates an ambient reference (which searches
for a messenger) based on the nickname and a couple of first-class callback mes-
sages (bAmsg and bDmsg) which are to be invoked by the ambient reference when-
ever that buddy appears or disappears in the ambient. The first-class callback
message bAmsg (resp. bDmsg) is created by the expression thisActor#buddy-
Appears (resp. thisActor#buddyDisappears) and given the buddy’s nickname
as its first and only argument. Whenever an actor matching the ambient refer-
ence’s tag appears (resp. disappears) buddyAppears (resp. buddyDisappears)
will thus be invoked. buddyAppears subsequently asks for the status line of its
communication partner. This yields a future, that will trigger the when language
construct upon resolution. In the closure that is passed to the when construct,
the resolved future can be accessed as a parameter (e.g. status). Finally, when-
ever the GUI invokes talkTo to communicate with a buddy, receiveText is sent
to the ambient reference representing that buddy. It is the ambient reference’s
task to forward that message to the actual messenger it denotes. The send of
receiveText occurs in a due block which tries to send it within the prescribed
time period. Should the message expire, failedDelivery is invoked which in
turn informs the GUI about this event.
246 J. Dedecker et al.
forwardMsg(msg) :: {
if(not(is_void(principal)), {
outbox.add(msg.setDestination(principal));
inbox.delete(msg)
})
};
handleActorJoined(resolution) :: {
if(is_void(principal), {
principal := resolution.provider;
send(uponJoinMsg);
inbox.iterate(this.forwardMsg)
})
};
handleActorDisjoined(resolution) :: {
if(resolution.provider = principal, {
principal := void;
send(uponDisjoinMsg);
outbox.iterate(lambda(msg) -> {
outbox.delete(msg);
inbox.addToFront(msg)
})
});
disjoined.delete(resolution)
};
init() :: {
requiredbox.add(tag);
inbox.uponAdditionDo(this.forwardMsg);
joinedbox.uponAdditionDo(this.handleActorJoined);
disjoinedbox.uponAdditionDo(this.handleActorDisjoined)
}}))})
The ambient reference is initialised by adding the tag to the requiredbox mak-
ing it listen for actors broadcasting this tag, and by installing three mailbox ob-
servers to be triggered when messages arrive in the inbox and when resolutions ap-
pear in the joinedbox or disjoinedbox. An ambient reference has a private slot
principal, the value of which is toggled between an actor broadcasting the tag,
and void when no such actors are currently available in the ambient. This toggling
is accomplished by the joinedbox observer handleActorJoined (called whenever
an actor was discovered) and the disjoinedbox observer handleActorDisjoined
Ambient-Oriented Programming in AmbientTalk 247
(that voids the principal when it has moved out of communication range). When a
message is sent to the ambient reference, the inbox observer forwardMsg is called
since it is the ambient reference’s task to forward messages to the actor it repre-
sents. This is implemented by changing the destination actor of the message from
the ambient reference to the principal and by moving it from the inbox of the am-
bient reference to its outbox. The AmbientTalk interpreter will henceforth handle
the delivery of the message as explained in section 4.5. Messages may be accumu-
lated in the inbox of the ambient reference while its principal is void3. There-
fore, handleActorJoined flushes all unsent messages in the inbox by forward-
ing them to the newly discovered actor. Similarly the handleActorDisjoined
method will ensure that messages that were not delivered yet and were accumu-
lated in the outbox are transferred to the inbox of the reference in order to make
sure they will be resent upon rebinding to another principal.
forward(msg) :: {
if(not(has_slot(this, msg.getName())),
if(is_actor(value),
{ inbox.delete(msg);
outbox.add(msg.setDestination(value)) }))
};
register(aWhenObserver) :: {
if(is_void(value),
{ whenObservers.add(aWhenObserver) },
{ aWhenObserver.notify(value) })
};
resolve(computedValue) :: {
value:=computedValue;
whenObservers.iterate(lambda(observer) -> { observer.notify(value) });
inbox.iterate(this.forward)
};
init() :: { inbox.uponAdditionDo(this.forward) }
})); // CONTINUED
future using register. Upon resolution, the future notifies all its registered
when-observers and forwards all previously accumulated messages.
To introduce futures in the MOP of actors, createMessage is overridden
such that asynchronous messages are intercepted in order to be extended with a
future slot. Furthermore, the message’s process method (which will be invoked
by the destination actor) is refined in order to resolve the message’s future with
the computed value. The overridden send first performs a super-send to delegate
message sending to the default implementation. However, instead of returning
void, the new implementation returns the future contained in the extended
message.
// FuturesMixin, CONTINUED
createMessage(sender,dest,name,args)::{
extend(super.createMessage(sender,dest,name,args), {
future :: makeFuture();
process()::{
computedValue: super.process();
future#resolve(computedValue);
computedValue
};
})
};
send(message)::{
super.send(message);
message.future
};
whenBlocks: makeHashmap();
whenIdCtr : 1;
invokeWhen(whenBlockID, computedValue)::{
whenBlocks.get(whenBlockID)(computedValue); //curried call
whenBlocks.delete(whenBlockID)
};
makeWhenObserver(callBackActor, whenBlockID): object({
notify(computedValue):: {
callBackActor#invokeWhen(whenBlockID, computedValue) }
});
when(aFuture, whenBlock)::{
whenBlocks.put(whenIdCtr, whenBlock);
aFuture#register(makeWhenObserver(thisActor, whenIdCtr));
whenIdCtr := whenIdCtr + 1
}})
The DueMixin installs the due construct in an actor and overrides the way its
outgoing messages are sent in order to stamp those messages by extending them
with a deadline slot and a ‘complaint address’ in the form of a handlerMsg
250 J. Dedecker et al.
which will determine how to react when the deadline expires. The overridden
send method extends msg with these slots provided that it was invoked in the
dynamic context of a due-block (i.e. if dueTimeout contains a meaningful value
rather than void). The message is then sent by delegating to the overridden im-
plementation. At any particular time on the execution path of an actor, at most
one active due-block exists (cf. try-catch). Its timeout value and its handler are
stored in the slots dueTimeout and dueHandlerMsg. To allow dynamic nesting of
due-blocks, the current values of dueTimeout and dueHandlerMsg are saved in
temporary variables and are restored upon returning from the due body closure.
The ExpiryCheckMixin is parameterized by an actor behaviour and a mail-
box name. The mixin adds a mailbox observer to the mailbox corresponding to
the mailbox name. Whenever a message stamped with a deadline is added to
that mailbox, a closure is scheduled for execution when the message’s deadline
has passed (using AmbientTalk’s after form). The closure checks whether the
message is still present in either the inbox or the outbox and if so, removes it
from that mailbox and sends its handler message to deal with the expiration.
ExpiryCheckMixin(actorBehaviour, mbxName) :: extend(actorBehaviour,{
init() :: { super.init(); getMailbox(mbxName).uponAdditionDo(this.checkExpired) };
checkExpired(msg) :: {
if(has_slot(msg, "deadline"), {
after(msg.deadline - time(), lambda() -> {
this.retractMsgInMailbox(msg, outbox); this.retractMsgInMailbox(msg, inbox)
})
}) };
retractMsgInMailbox(aMsg, aMailbox) :: {
if(aMailbox.contains(aMsg), {
aMailbox.delete(aMsg);
aMsg.handlerMsg.setArgs(makeVector(1).set(1, aMsg));
send(aMsg.handlerMsg)
}) } })
5.4 Evaluation
This section has presented three tentative high-level AmOP language features:
ambient references, due-blocks and non-blocking futures. We have adhered to the
(functional programming) tradition of modular interpreters to formulate these
features as modular semantic building blocks — called language mixins — that
enhance AmbientTalk’s kernel. AmbientTalk’s basic semantic building blocks
(consisting of the eight first-class mailboxes, its mailbox observers and its reflec-
tive facilities) have been shown to be sufficient to implement these abstractions.
The abstractions have been validated in the context of the instant messenger ap-
plication. Indeed, the essence of communication between two messengers consists
of making the corresponding actors get acquainted and in handling the delivery,
Ambient-Oriented Programming in AmbientTalk 251
References
1. G. Agha. Actors: a Model of Concurrent Computation in Distributed Systems. MIT
Press, 1986.
2. G. Agha and C. Hewitt. Concurrent programming using actors. Object-oriented
concurrent programming, pages 37–53, 1987.
3. J.-P. Briot, R. Guerraoui, and K.-P. Lohr. Concurrency and distribution in object-
oriented programming. ACM Computing Surveys, 30(3):291–329, 1998.
Ambient-Oriented Programming in AmbientTalk 253
1 Introduction
class GuessingGame {
static final int GAME NOT RUNNING = 0;
static final int GAME RUNNING = 1;
private int currState = GAME NOT RUNNING;
private Random rand = new Random();
private int correctAnswer;
GuessResult startGame() {
switch(currState) {
case GAME NOT RUNNING:
currState = GAME RUNNING;
correctAnswer = rand.nextInt(50);
return null;
case GAME RUNNING:
return GuessResult.HAVENOTFINISHED;
}
return null;
}
GuessResult guess(int i) {
switch(currState) {
case GAME NOT RUNNING:
return GuessResult.HAVENOTSTARTED;
case GAME RUNNING:
//... compare i to correctAnswer
}
return null;
}
}
button, the game chooses a random integer in some range and asks the player for a
guess. If the player’s guess is lower than the target number, the game responds with
“too low!” and asks for another guess, and similarly if the player’s guess is too high. If
the player guesses correctly, the game is over and the player may press the start button
to play again. Pressing the start button has no effect and emits a warning message if the
player is in the middle of a game. Similarly, making a guess has no effect and emits a
warning message if the game has not yet started.
class GuessingGame {
public static interface GameState {
GuessResult guess(int guess);
GuessResult startGame();
}
private Random rand = new Random();
private GameState currState = new GameStartState();
public class GameStartState implements GameState {
public GuessResult startGame() {
currState = new GameRunningState();
return null;
}
public GuessResult guess(int i) {
return GuessResult.HAVENOTSTARTED;
}
}
public class GameRunningState implements GameState {
private int correctAnswer;
public GameRunningState() {
correctAnswer = rand.nextInt(50);
}
public GuessResult startGame() {
return GuessResult.HAVENOTFINISHED;
}
public GuessResult guess(int i) {
// ... compare i to correctAnswer
}
}
}
teger, and each event handler switches on the value of currState to determine the
appropriate response. In this way, the implementation of each logical state is frag-
mented across the different handlers, making it difficult to understand the behavior
of each state. Further, state transitions occur only indirectly through modifications to
currState. Finally, the correctAnswer field is only well-defined when the game is
in the GAME RUNNING state. However, this fact is difficult to ascertain from the code,
and there is nothing preventing an event handler from manipulating that field in the
GAME NOT RUNNING state.
The state design pattern [6] is an attempt to avoid several of these problems. The basic
idea is to reify each internal state as its own class, thereby modularizing the applica-
tion logic by state rather than by event. An implementation of our guessing game using
the state pattern is shown in Figure 2. Each state class contains its own methods for
handling events, thereby avoiding the switch statements necessary in the event-driven
implementation. Each state class also has its own local fields, which is an improvement
258 B. Chin and T. Millstein
on the event-driven style. However, state transitions are still expressed indirectly through
updates to currState, and the logical flow of the game is now fragmented across the
various state classes.
2 Responders
2.1 Responding Blocks, Events, and Event Loops
We explain the basic concepts of responders using a ResponderJ implementation of the
guessing game, which is shown in Figure 3. A responder is an ordinary Java class that
Responders: Language Support for Interactive Applications 259
class GuessingGame {
public revent StartGame();
public revent Guess(int num);
responding yields GuessResult { //A
Random rand = new Random();
eventloop { //B
case StartGame() { //C
int correctAnswer = rand.nextInt(50);
eventloop { //D
case Guess(int guess) { //E
if(guess > correctAnswer) {
emit GuessResult.LOWER;
} else if(guess < correctAnswer) {
emit GuessResult.HIGHER;
} else {
emit GuessResult.RIGHT;
break;
}
}
default { //F
emit GuessResult.HAVENOTFINISHED;
}
}
}
default {
emit GuessResult.HAVENOTSTARTED;
}
}
}
}
One disadvantage of responding methods is that they are in a different static scope
from the responding block and hence do not have access to the responding block’s local
variables. Therefore, any state needed by a responding method must be explicitly passed
as an argument, as with the initialPoint argument to doDrag in Figure 6. Of course,
the fact that a responding method has its own scope also provides the usual benefits
of procedural abstraction. For example, a responding method that encapsulates some
common control logic can be invoked from multiple places within a responder, with the
method’s arguments serving to customize this logic to the needs of each caller.
Because a responding method can contain eventloops and emits, it only makes
sense to invoke such a method as part of the execution of an object’s responding block.
We statically enforce this condition through three requirements. First, a responding
method must be declared private or protected, to ensure that it is inaccessible out-
side of its associated class and subclasses. Second, a responding method may only be
invoked from a responding block or from another responding method. Finally, we re-
quire that every call to a responding method have either the (possibly implicit) receiver
this or super. This requirement ensures that the responding method is executed on the
same object whose responding block is currently executing.
the superclass’s responding block as well as any responding methods. The subclass also
inherits the superclass’s yields type. We disallow narrowing the yields type in the
subclass, as this would only be safe if the subclass overrode the superclass’s responding
block and all responding methods, to ensure that all emits are of the appropriate type.
The ability to override responding methods allows an existing responder’s behavior
to be easily modified or extended by subresponders. For example, the DragHoldPanel
responder in Figure 7 inherits the responding block of DragDropPanel but overrides
the doDrag responding method from Figure 6. The overriding doDrag method uses two
eventloops in sequence to change the behavior of a drag. Under the new semantics, the
user can release the mouse but continue to drag the panel. Drag mode only ends after a
second MouseUp event occurs, which causes the doDrag method to return. It would be
much more tedious and error prone to make this kind of change using an event-driven
or state-based implementation of the drag-and-drop panel.
The example above shows how subresponders can easily add new states and state
transitions to a responder. Subresponders also have the ability to add new respon-
der events. For example, the DragKeyPanel responder in Figure 8 subclasses from
DragDropPanel and adds a new event representing a key press. DragKeyPanel then
overrides the doDrag responding method in order to allow a key press to change the
color of the panel while it is in drag mode. Because of the possibility for subrespon-
ders to add new events, a responding block may be passed events at run time that were
not known when the associated responder was compiled. To ensure that all events can
nonetheless be handled, we require each eventloop to contain a default case.
Overriding in ResponderJ is expressed at the level of entire responding methods.
It would be interesting to consider forms of overriding and extension for individual
eventloops. Without such features, subresponders sometimes have no choice but to
Responders: Language Support for Interactive Applications 265
duplicate code. For example, DragKeyPanel’s doDrag method is identical to the origi-
nal doDrag method from Figure 6 but additionally contains a case for the new KeyDown
event.
3 Compilation
ResponderJ is implemented as an extension to the Polyglot extensible Java compiler
framework [13], which translates Java 1.4 extensions to Java source code. Each respon-
der class is augmented with a field base of type ResponderBase, which orchestrates
the control flow between the responding block (and associated responding methods)
and the rest of the program. To faithfully implement the semantics of eventloops,
ResponderBase runs all responding code in its own Java thread, and ResponderBase
includes methods that yield and resume this thread as appropriate. Although our current
implementation uses threads, we use standard synchronization primitives to ensure that
266 B. Chin and T. Millstein
while(true) {
Event temp = (Event)base.passOutput();
if(temp instanceof GuessEvent) {
int guess = ((GuessEvent)temp).num;
{
// Implementation
if(guess > correctAnswer) {
base.emitOutput(GuessResult.LOWER);
} else if(guess < correctAnswer) {
base.emitOutput(GuessResult.HIGHER);
} else {
base.emitOutput(GuessResult.RIGHT);
break;
}
}
continue;
} else {
// default handler
base.emitOutput(GuessResult.HAVENOTFINISHED);
}
}
only one thread is active at a time, thereby preserving ResponderJ’s purely sequential
semantics and also avoiding concurrency issues like race conditions and deadlocks. As
we discuss in Section 5, in future work we plan to do away with threads entirely in our
compilation strategy.
First we describe the compilation of responder events. Each revent declaration in
a responder is translated into both a method of the specified visibility and a simple
class. This class contains a field for every formal parameter of the declared responder
event. When the responder event’s method is called, the method body creates an in-
stance of the class, fills its fields with the given parameters, and passes this instance
to the ResponderBase object. For example, Figure 9 shows the translation of the
Guess responder event from the guessing game in Figure 3. The passInput method
in ResponderBase passes our representation of the signaled event to the responding
thread and resumes its execution from where it last yielded.
Responders: Language Support for Interactive Applications 267
4 Case Studies
In order to demonstrate the practical applicability of ResponderJ, we performed two
case studies. First, we expanded the drag-and-drop GUI example shown in Section 2
into a complete application that interfaces with Java’s Swing library. We implemented
and compared three versions of the application: using responders in ResponderJ, using
the event-driven style in Java, and using the state design pattern in Java. Second, we
rewrote an existing application to use ResponderJ. JDOM [8] is a library that makes it
easy to access and manipulate XML files from Java programs. JDOM parses XML files
via the SAX API [16], which signals events as an XML file is parsed (e.g., when a tag
is read). We rewrote in ResponderJ the portion of JDOM that responds to SAX events
in order to create a tree of Java objects that represents the parsed XML data.
268 B. Chin and T. Millstein
responding {
outer: eventloop {
case Paint(Graphics g) {
this.paintAll(g);
}
case MouseDown(Point initial) {
//We eventually expect a mouseUp, so we do a nested eventloop
Shape currentShape = null;
//set currentShape to clicked-on shape
if(currentShape == null)
continue;
this.repaint();
//While the mouse is down and a shape is selected
eventloop {
case MouseUp(Point p2) {
//Drag is over
this.repaint();
continue outer;
}
case Paint(Graphics g) {
this.paintExcept(g, currentShape);
currentShape.drawSelected(g);
}
case MouseMove(Point p2) {
if(Math.abs(initial.getX() - p2.getX()) > 3 ||
Math.abs(initial.getY() - p2.getY()) > 3)
break;
}
default {
}
}
this.doDrag(initial, currentShape);
}
default {
}
}
}
to currState. Another problem is the need to store all data as fields of the class.
DragDropPanel has four fields used for this purpose, including the currShape field
used in Figure 14. The four fields are used for different purposes and in different states
in the control flow, but it is difficult to understand the intuition behind each field and
whether it is being used properly across all handlers. A final problem with the event-
driven approach is that there is no single place to execute code that should be run upon
Responders: Language Support for Interactive Applications 271
reaching a particular state. Instead, this code must be duplicated in each event handler
that can cause a transition to that state.
The event-driven approach does have some advantages over the ResponderJ imple-
mentation. First, it is easy to share event-handling code across states. An example is
shown in Figure 14, which handles the paint event identically for the mouse-down and
drag states, without any code duplication.Second, it is straightforward to add a new
event like KeyDown in a subclass — the subclass simply needs to add a KeyDown method
and can inherit all the other event-handling methods. However, adding a new state in
a subclass, for example to change the way a drag works as shown in Figure 7, would
necessitate overriding every event-handling method to include a case for the new state,
as well as modifying existing logic to transition appropriately to the new state.
To address the duplication of the event-handling code for Paint across multi-
ple states, we employed inheritance. We created an abstract state class Selected-
ShapeState that implements the Paint method appropriately. The states that should
employ that behavior for Paint simply subclass from SelectedShapeState, as shown
in Figure 15. However, this technique does not work in general, for example if overlap-
ping sets of states need to share code for different event handlers, because of Java’s
lack of multiple inheritance. Therefore, some code duplication is still required in some
cases.
The most apparent disadvantage of the state pattern is its verbosity. Several classes
must be defined, each with its own constructor and fields. Having multiple state classes
can also cause problems for code evolution. For example, if a state needs to be aug-
mented to use a new field, that field will likely need to be initialized through an extra
argument to the state class’s constructor, thereby requiring changes to all code that con-
structs instances of the class. By using ordinary local variables, ResponderJ avoids this
problem. Further, while the behavior of a single state is easier to understand than in
the event-driven approach, the control flow now jumps among several different state
classes, which causes its own problems for code comprehension.
Finally, augmenting the drag-and-drop panel to support a new event like KeyDown re-
quires all state classes to be overridden to add a new method and to meet an augmented
interface that includes the new method. This approach necessitates type casts when ma-
nipulating the inherited currState field, since it is typed with the old interface. Using
inheritance to add a new state is easier, requiring the addition of a new state class, but it
also requires existing state classes to be overridden to appropriately use the new state.
JDOM 1.0 is a Java class library that uses the SAX API to construct its own implemen-
tation of DOM [5], which is an object model for XML data. At the core of JDOM is
the SAXHandler class, which implements several of the standard SAX interfaces. An
instance of SAXHandler is given to the SAX parser, which in turn passes events to that
object while parsing an XML file. The SAXHandler is supposed to respond to these
events by constructing the corresponding DOM tree.
The original event-driven version of SAXHandler utilized 17 fields to store local
state. Most of these fields were booleans that kept track of whether or not the handler
was currently in a particular mode. Others were data members that stored information
needed to implement the class’s functionality. The remaining few fields were integers
used to keep track of the nesting depth in the structure of the XML document as it is
parsed. Altogether it was difficult to determine the exact purpose of each of the variables
and to make sure each was used properly.
To parse into a DOM document, the JDOM SAXHandler maintains an explicit stack
of nodes in the DOM tree whose subtrees have not yet been fully parsed. When a start
tag is seen in the XML data, a new node is pushed on the stack. When the associated
end tag is seen, the node is popped off the stack and linked as a subtree of the next node
on the stack. Since there are different kinds of nodes (e.g., the root document node,
element nodes), switching logic is used to decide what action to take at a given point,
based on what kind of node is on the top of the stack.
Responders: Language Support for Interactive Applications 273
Figure 16 shows a representative subset of the original SAXHandler code. The field
atRoot is used to keep track of whether or not the element on top of the stack is cur-
rently in an element node or a document node. This field is then explicitly checked (and
set) throughout the code. In a similar vein, the processingInstruction method starts
with a check of the member variable suppress: nothing is done if we are currently in
suppress mode. This dependency on multiple fields that serve as flags for various con-
ditions pervades the class’s code, making the logical control flow extremely difficult to
follow.
In contrast, the ResponderJ implementation relies on ordinary control flow among
eventloops to implicitly keep track of the various modes of computation, with local
variables storing the data needed in each mode. A representative responding method
from the ResponderJ version of SAXHandler is shown in Figure 17. The buildElement
method handles the logic for creating the DOM representation of an XML element,
which is roughly the data between a given start- and end-tag pair of the same name.
The method first creates the element instance, storing its associated tag name along
with any associated XML attributes, before waiting at an eventloop. The logic of the
eventloop makes use of the fact that SAX’s start-tag and end-tag events are always
properly nested. If a new start-tag is seen, we recursively use buildElement to parse the
nested element. Since the call stack is saved when an eventloop yields to a caller, all of
the pending enclosing elements are still available the next time the responder resumes.
If an end-tag is seen, then we know that construction of the element has completed.
274 B. Chin and T. Millstein
block handles building the root document in the DOM tree, while each of the other five
methods handles the building of an individual kind of XML construct (e.g., an element).
This refactoring of the code made it much easier to understand its behavior, leading
to further simplifications of the logic. In the original class, the startEntity method
was possibly the most complex, explicitly keeping track of the XML document’s nest-
ing depth by counting the number of onStartEntity and onEndEntity calls. The
boolean logic in the method was rather confusing, reading and setting no fewer than
four boolean fields. The ResponderJ version of this code aided understanding greatly,
allowing us to find a much simpler way to express the logic. We created a method
ignoreEntities that calls itself recursively on every onStartEntity event and re-
turns at every onEndEntity event, similar to the style shown for buildElement’s logic
in Figure 17. This method avoids the need to count explicitly and encapsulates the sim-
pler logic in a separate method. Our refactoring also led us to discover several redun-
dancies in the usage of boolean fields, whereby a field’s value is tested even though its
value is already known from an earlier test. These kinds of redundancies, as well as
similar kinds of errors, are easy to make in the programming style required of the Java
version of the code.
Finally, the need to explicitly forward calls from the SAX-level event-handling meth-
ods to the appropriate responder events, while verbose, provided an unexpected benefit
in the ResponderJ version of SAXHandler. One of the original event-handling methods
executed a particular statement before doing a switch on the current state. While the
logic of the switch was moved to the responding block’s eventloops, that first state-
ment could remain in the event-handling method. In essence, the event-handling method
now serves as a natural repository for any state-independent code to be executed when
an event occurs. Without this method, such code would have to be duplicated in each
eventloop’s case for that event.
5 Discussion
There are several potential directions for future work, many of which are inspired by
issues encountered during the case studies described above. Since the responding block
and associated responding methods store state in ordinary local variables, this state
cannot easily be shared. We could of course use fields to share state, but that approach
leads to the kinds of difficulties described earlier for the event-handling and state-pattern
programming styles. One possibility is to define a notion of a local method, which
could be nested within a responding block or responding method, thereby naturally
obtaining access to the surrounding local variables while still allowing the usual benefits
of procedural abstraction.
If a responder subclass wishes to modify an eventloop from the superclass, say to
include a new revent, the entire eventloop must currently be duplicated in the sub-
class. We are pursuing approaches for allowing subclasses to easily customize super-
class eventloops to their needs. The key question is how to convey information about
the local variables in the superclass eventloop for use in the subclass eventloop,
while maintaining modularity.
276 B. Chin and T. Millstein
Responders encode state transitions implicitly by the control flow from eventloop
to eventloop. This approach naturally represents sequential as well as nested state
logic, but it cannot easily represent arbitrary control flow among states. To increase
expressiveness, we are considering language support for directly jumping among named
eventloops. The challenge is to provide the desired expressiveness while preserving
traditional code structure and scoping.
In addition to events, it may be useful to allow callers to query the responder for
some information without actually changing state. Such queries can currently be en-
coded via events, but specialized language support would make this idiom simpler and
easier to understand. For example, separating queries from more general kinds of events
would allow queries to return a single value in the usual way, instead of returning an
unspecified number of results indirectly through emits.
Finally, as described earlier, our current implementation strategy relies on Java
threads. Although we avoid concurrency concerns by appropriate use of synchronization
primitives, there is still overhead simply by using threads. However, since ResponderJ’s
semantics is purely sequential, it is possible to implement the language without resorting
to threads. Our idea is to instead break the responding block and responding methods
into multiple methods, using the yield points as boundaries. When a responding event
is signaled, the appropriate one of these methods would be dispatched to as directed
by the semantics of the original responding code. Each method would explicitly save
and restore its local state, and a liveness analysis of the original responding code’s local
variables could be used to minimize the amount of state that each method requires.
6 Related Work
The coroutine [9] is a general control structure that allows multiple functions to interact
in order to complete a task. During execution, a function can explicitly yield control
to another function. When control is eventually yielded back to the first function, it
resumes execution from where it left off, with its original call stack and local variables
restored.
The control-flow semantics of ResponderJ’s eventloop construct can be viewed as
a specialization of the coroutine idea to the domain of event-driven programming. This
specialization entails several novel design choices and extensions. First, the eventloop
bundles the coroutine-style control flow with an event dispatch loop in a natural way.
Second, the interaction among coroutines is symmetric, with each explicitly yielding
control to the others. In contrast, our approach is asymmetric: the responder yields to
its caller, but callers are insulated from the coroutine-like control flow by the responder
events, which appear as ordinary methods to clients. Third, we have shown how re-
sponding blocks can make use of procedural abstraction through the notion of respond-
ing methods. Finally, we have integrated responders with object orientation, allowing
responders to be refined through inheritance.
CLU iterators [10] and their variants (e.g., Sather iterators [12] and Python gener-
ators [15]) are functions that are used to produce a sequence of values one-by-one for
manipulation by clients. The body of an iterator function emits a value and yields con-
trol to the client. When the client asks for the next value, the iterator resumes from
Responders: Language Support for Interactive Applications 277
where it left off. Interruptible iterators [11] additionally allow clients to interrupt an
iterator through an exception-like mechanism, for example to perform an update during
iteration.
CLU-style iterators and eventloops specialize the coroutine for very different pur-
poses. Iterators specialize the coroutine in order to interleave the generation of values
with their manipulation by clients, while eventloops specialize the coroutine to natu-
rally represent the internal state logic of an interactive application. However, it is possi-
ble to use an eventloop to implement an iterator, with the emit statement generating
consecutive values appropriately. Further, a form of iterator interruptions can be sup-
ported, with responder events playing the role of interrupts and eventloops playing the
role of the interrupt handlers. In fact, interrupt handlers support a very similar style of
control flow as eventloops, employing a form of break and continue [11].
Cooperative multitasking (e.g., [17]) is an alternative to preemptive multitasking
whereby a thread explicitly yields control so that another thread can be run, saving state
like in any other context switch. There is a related body of work on the event-driven
approach to I/O (e.g., [14]), in which fine-grained event handlers run cooperatively in
response to asynchronous I/O events. Further, researchers have explored language and
library support to make this application of event-driven programming easier and more
reliable [2, 4, 1].
ResponderJ is targeted at a significantly different class of event-driven applications
than these systems, namely those that must be deterministic. With both cooperative
multitasking and event-driven I/O, a central scheduler decides which of the pending
threads or event handlers should be executed upon a yield. In contrast, program exe-
cution in ResponderJ is dictated entirely by the order of responder events invoked by
clients. Such determinism is critical for a large class of event-driven programs, for ex-
ample computer games and GUIs, where events must be processed in a specific order.
The deterministic semantics carries the additional benefits of being easier to test and
analyze.
7 Conclusions
We have introduced the responder, a new language construct supporting interactive ap-
plications. Responders allow the control logic of an application to be expressed naturally
and modularly and allow state to be locally managed. In contrast, existing approaches to
interactive programming fragment the control logic across multiple handlers or classes,
making it difficult to understand the overall control flow and to ensure proper state man-
agement. We instantiated the notion of responders in ResponderJ, an extension to Java,
and described its design and implementation. We have employed our ResponderJ com-
piler in two case studies, which illustrate that responders can provide practical benefits
for application domains ranging from GUIs to XML parsers.
Acknowledgments
This research was supported in part by NSF ITR award #0427202 and by a generous
gift from Microsoft Research.
278 B. Chin and T. Millstein
References
1. Ada 95 reference manual. http://www.adahome.com/Resources/refs/rm95.html.
2. A. Adya, J. Howell, M. Theimer, W. Bolosky, and J. Douceur. Cooperative task management
without manual stack management. In Proc. Usenix Tech. Conf., 2002.
3. K. Arnold, J. Gosling, and D. Holmes. The Java Programming Language Third Edition.
Addison-Wesley, Reading, MA, third edition, 2000.
4. R. Cunningham and E. Kohler. Making events less slippery with EEL. In HotOS X: Hot
Topics in Operating Systems, 2005.
5. Dom home page. http://www.w3.org/DOM.
6. E. Gamma, R. Helm, R. E. Johnson, and J. Vlissides. Design Patterns: Elements of Reusable
Object-Oriented Software. Addison-Wesley, Massachusetts, 1995.
7. J. Gosling, B. Joy, G. Steele, and G. Bracha. The Java Language Specification Second Edi-
tion. The Java Series. Addison-Wesley, Boston, Mass., 2000.
8. Jdom home page. http://www.jdom.org.
9. D. Knuth. Fundamental Algorithms, third edition. Addison-Wesley, 1997.
10. B. Liskov. A history of CLU. ACM SIGPLAN Notices, 28(3):133–147, 1993.
11. J. Liu, A. Kimball, and A. C. Myers. Interruptible iterators. In POPL ’06: Conference record
of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages,
pages 283–294, New York, NY, USA, 2006. ACM Press.
12. S. Murer, S. Omohundro, D. Stoutamire, and C. Szyperski. Iteration abstraction in Sather.
ACM Transactions on Programming Languages and Systems, 18(1):1–15, January 1996.
13. N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler frame-
work for Java. In Proceedings of CC 2003: 12’th International Conference on Compiler
Construction. Springer-Verlag, Apr. 2003.
14. J. K. Ousterhout. Why threads are a bad idea (for most purposes). Invited talk at the 1996
USENIX Technical Conference, Jan. 1996.
15. PEP 255: Simple generators. http://www.python.org/peps/pep-0255.html.
16. Sax home page. http://www.saxproject.org.
17. M. Tarpenning. Cooperative multitasking in C++. Dr. Dobb’s Journal, 16(4):54, 56, 58–59,
96, 98–99, Apr. 1991.
Variance and Generalized Constraints
for C Generics
1 Introduction
The Generics feature of C 2.0 introduced parametric polymorphism to the lan-
guage, supporting type parameterization for types (classes, interfaces, structs,
and delegates) and methods (static and instance). Being object-oriented, C al-
ready offers subtype polymorphism, namely the ability for a value of type T to
be used in a context that expects type U , if T is a subtype of U .
As it stands, though, subtype and parametric polymorphism interact only
through subclassing. In particular, there is no subtyping relationship between
distinct instantiations of the same generic type – type parameters are said to be-
have invariantly with respect to subtyping. This leads to a certain inflexibility: a
method whose parameter has type IEnumerable<Control> cannot be passed an
argument of type IEnumerable<Button>, even though this is safe: since Button
is a subclass of Control, something that enumerates Buttons also enumerates
Controls. Dually, a method expecting a parameter of type IComparer<Button>
cannot be passed an argument of type IComparer<Control>, even though this
is safe: something that can compare Controls can also compare Buttons.
1.1 Variance
We can increase the flexibility of generic types by declaring variance properties
on type parameters. For example, IEnumerable is declared covariant (+) in its
argument in the list, as U is not a subtype of T, but it can create a new list cell
of type List<U> with tail of type List<T>, as List<T> is a subtype of List<U>.
It is easy to check that these refined signatures for Append rule out the unsafe
implementations above, while still allowing the intended, benign ones:
class List<+T> { ...
public List<U> Append<U>(U other) where T : U
{ return new List<U>(head,new List<U>(other,null));}
public List<U> Append<U>(List<U> other) where T : U
{ return new List<U>(head, tail==null? other: tail.Append(other));}
Notice how this is actually less constraining for the client code – it can al-
ways instantiate Append with T, but also with any supertype of T. For example,
given a List<Button> it can append a Control to produce a result of type
List<Control>. The designers of Scala [12] identified this useful pattern.
The type constraint above is not expressible in C 2.0, which supports only
‘upper bounds’ on method type parameters. Here we have a lower bound on U.
Alternatively, it can be seen as an additional upper bound of the type para-
meter T of the enclosing class, a feature that is useful in its own right, as the
following example demonstrates:
interface ICollection<T> { ...
void Sort() where T : IComparable<T>;
bool Contains(T item) where T : IEquatable<T>; ...
Here, constraints on T are localized to the methods that take advantage of them.
We therefore propose a generalization of the existing constraint mechanism to
support arbitrary subtype constraints at both class and method level. This neatly
subsumes both the existing mechanism, which has an unnatural asymmetry, and
the equational constraint mechanism proposed previously [10]: any equation T=U
between types can be expressed as a pair of subtype constraints T:U and U:T.
Adding variance to parametric types has a long history [1], nicely summarized
by [8]. More recently, others have proposed variance for Java. NextGen, one of
the original designs for generics in Java, incorporated definition-site variance
annotations, but details are sketchy [2]. Viroli and Igarashi described a system
of use-site variance for Java [8], which is appealing in its flexibility. It was since
adopted in Java 5.0 through its wildcard mechanism [16]. However, use-site vari-
ance places a great burden on the user of generic types: annotations can become
complex, and the user must maintain them on every use of a generic type. We fol-
low the designers of Scala [11] and place the burden on the library designer, who
must annotate generic definitions, and if necessary, factor them into covariant
and contravariant components.
Our type constraints generalize the ‘F-bounded polymorphism’ of Java [7]
and C and the bounded method type parameters of Scala [11], and also sub-
sume previous work on equational constraints [10]. The treatment of constraint
282 B. Emir et al.
2 Design Issues
Use-site variance, as first proposed by Viroli and Igarashi [8] and recast as wild-
cards in Java 5.0 [16], requires no annotations on type parameters at type de-
finitions. Instead, an annotation on the use of a generic type determines (a)
its properties with respect to subtyping, and (b) the members of the type
that are ‘visible’ for that variance annotation. For example, a mutable List<X>
class can be used covariantly, permitting values of type List<+String> to be
passed at type List<+Object> (in Java, List<? extends String> passed at
type List<? extends Object>), and restricting invocation to ‘reader’ meth-
ods such as Get. Conversely, the class can be used contravariantly, permitting
values of type List<-Object> to be passed at type List<-String> (in Java,
List<? super Object> passed at type List<? super String>), and restrict-
ing invocation to ‘writer’ methods such as Add.
With definition-site variance, the library designer must prepare for such uses
ahead of time. One natural way to achieve this is to expose covariant and con-
travariant behaviour through the implementation of covariant and contravariant
interfaces. For example, a non-variant mutable list class could implement two
interfaces, one containing ‘reader’ methods, and the other ‘writer’ methods.
interface IListReader<+X> {
X Get(int index);
void Sort(IComparer<X> comparer); ...
}
interface IListWriter<-Y> {
void Add(Y item);
Variance and Generalized Constraints for C Generics 283
This might give the impression that type safety violations necessarily involve
reading and writing of object fields. This is not so: the toy subset of C stud-
ied in Section 4 is purely functional, but nevertheless it is worthwhile proving
soundness, as the following example illustrates:
interface IComparer<+T> { int Compare(T x, T y); } // this is illegal
class LengthComparer : IComparer<string> {
int Compare(string x, string y)
{ return Int32.Compare(x.Length, y.Length); }
}
... IComparer<object> oc = new LengthComparer();
int n = oc.Compare(3,new Button()); // takes Length of int & button
Concrete variant classes are also useful. For example, here is a covariant class
Set that implements immutable sets:
class Set<+X> : IEnumerable<X> where X : IComparable<X> {
private RedBlackTree<X> items;
public Set() { ... }
public Set(X item) { ... }
public bool All(Predicate<X> p) { ... }
public bool Exists(Predicate<X> p) { ... }
public IEnumerator<X> GetEnumerator() { ... } ...
}
When are covariant and contravariant parameters on classes safe? First, note
that no restrictions need be placed on the signatures of constructors or static
Variance and Generalized Constraints for C Generics 285
members, as the class type parameters cannot vary. The second constructor
above has X appearing in a contravariant position (the argument), but this is
safe: once an object is created, the constructor cannot be invoked at a supertype.
For the same reason, constraints declared on a class may make unrestricted use
of variant type parameters, as in the example above.
In general, fields behave invariantly and so their types must not contain any
covariant or contravariant parameters. Fields marked readonly, however, can
be treated covariantly – as we do in our formalization in Section 4.
No restrictions need be placed on private members, which is handy in prac-
tice when re-factoring code into private helpers. It is also useful on fields, as
above, where a field can be mutated from within the class – for example, to re-
balance the RedBlackTree representing the Set above. However, we must take
care: if private is interpreted as a simple lexical restriction – “accessible from
code lexically in this class” – then a type hole is the result:
Note that Java cannot express such a signature, because it does not support
lower bounds on method type parameters; though the use of bounded wildcards
can achieve the same effect for type parameters used for a single argument.
What restrictions, if any, should we apply to occurrences of class type pa-
rameters within method level constraints? The answer is that a constraint on
a method behaves covariantly on the left of the constraint, and contravariantly
on the right. To see why this must be the case, consider the following pair of
interfaces, which attempt to avoid occurrences of covariant (contravariant) pa-
rameters in argument (result) positions, by introducing illegal bounds:
286 B. Emir et al.
interface IReader<+X> {
public X Get(); // this is legal
public void BadSet<Z>(Z z) where Z : X; // this is illegal
}
interface IWriter<-Y> {
public void Set(Y y); // this is legal
public Z BadGet<Z>() where Y : Z; // this is illegal
}
class Bad<T> : IReader<T>, IWriter<T> {
private T item; public Bad(T item) { this.item = item; }
public void T Get() { return this.item ; }
public void BadSet<Z>(Z z) where Z : T { this.item = z; }
public void Set(T t) { this.item = t ; }
public Z BadGet<Z>() where T : Z { return this.item; }
}
... IReader<object> ro = new Bad<string>("abc");
ro.BadSet<Button>(new Button()); // we wrote a button as a string
... IWriter<string> ws = new Bad<object>(new Button());
string s = ws.BadGet<string>(); // we read a button as a string
vi ∈ {◦, +} X ∈
/X
v-vvar v-var
v X Xi mono v X X mono
∀i wi ∈ {◦, +} ⇒ v X Ti mono -, if v = +,
C <w Y >:K
∀i wi ∈ {◦, -} ⇒ ¬v X Ti mono ¬v = ◦, if v = ◦,
v-con
v X C <T > mono +, if v = -
¬v X T mono v X U mono
v-sub
v X T <:U mono
Ignoring Δ for the moment, ground subtyping requires just three rules: we
assert that subtyping is transitive (tran), that instantiations of the same class
vary according to the annotations on the type parameters (con), and that sub-
classing induces subtyping (base). Observe that reflexivity is admissible (by re-
peated use of con), and that the induced equivalence relation for ground types
is just syntactic equality.
Now suppose that subtyping judgments are open and we make use of assump-
tions in Δ. We add reflexivity on type variables (var), and hypothesis (hyp).
This lets us deduce, for example, for contravariant I that X <:C I <C ><:I <X >.
These rules alone are insufficient to check code such as in Section 2.5. Sup-
pose our subtype assumptions include C <X ><:C <Y >. Take any ground in-
stantiation of X and Y , say [T /X , U /Y ]. If C is invariant or covariant then
C <T ><:C <U > can hold only if T <:U . Dually, if C is invariant or contra-
variant then C <T ><:C <U > can hold only if U <:T . This justifies inverting
rule con to obtain rules decon+ and decon− that ‘deconstruct’ a type accord-
ing to its variance.
In a similar vein, suppose our subtype assumptions include C <X ><:D <Y >,
for class definitions C <◦Z >:D <Z > and D <-Z >:object. Consider any ground in-
stantiation of X and Y , say [T /X , U /Y ]. Then a derivation of C <T ><:D <U >
exists only if D <T ><:D <U > and thus U <:T . We are justified in ‘inverting’
rule base to obtain debase that uses the class hierarchy to derive a subtype
relationship between two instantiations of the same class.
It is straightforward to prove standard properties of subtype entailment.
Lemma 1 (Substitution). If Δ T <:U then SΔ ST <:SU for any substi-
tution S = [T /X ].
We will also make use of the following lemma, which states that subtype asser-
tions lift through type formers according to variance.
Lemma 3 (Subtype lifting). Suppose that v X V mono, and for all i, if
vi ∈ {◦, +} then Δ Ti <:Ui , and if vi ∈ {◦, -} then Δ Ui <:Ti . Then Δ
[T /X ]V <:[U /X ]V .
subtyping rules, where the structure of the types determines uniquely a rule
(scheme) to apply. These are presented in Figure 3.
We write Ψ T <:U to mean that “under context Ψ we can deduce that T is
a subtype of U ”. As is usual, we eliminate the transitivity rule tran, rolling it
into rules s-base, s-upper, and s-lower. We also work with a different form of
context: instead of an arbitrary set of subtype assertions, the context Ψ provides
upper or lower bounds for type variables. In place of a hypothesis rule, we have
rules s-upper and s-lower that replace a type variable by one of its bounds.
For transitivity to be admissible, we need to impose some restrictions on
the context Ψ . For example, consider the context Ψ = {C <X ><:Z , Z <:C <Y >}
for covariant C . Clearly we have Ψ C <X ><:Z and Ψ Z <:C <Y >, but not
Ψ C <X ><:C <Y >. We need to add X <:Y to Ψ to achieve this. We define a
notion of consistency for contexts (see Pottier [14] and Trifonov and Smith [17]
for similar ideas).
Definition 1 (Consistency). A context Ψ is consistent if for any pair of as-
sertions T <:X ∈ Ψ and X <:U ∈ Ψ it is the case that Ψ T <:U .
We should now have enough to relate the syntax-directed and declarative rules:
given a consistent context Ψ that is equivalent to a set of constraints Δ (in
the sense that Ψ Δ and Δ Ψ ), the relation Ψ −<:− should coincide with
Δ −<:−. The proof of this rests on the admissibility of transitivity: if Ψ T <:U
and Ψ U <:V then Ψ T <:V . Attempts at a direct proof of transitivity fail
(for example, by induction on the total height of the derivations). There are
two difficult cases. If the first derivation ends with rule s-con and the second
ends with s-base then we need to ‘push’ the premises of s-con through the
second derivation. We use an auxiliary result (Lemma 6) to achieve this. If the
first derivation ends with rule s-lower (so we have a proof of Ψ T <:X ) and
the second ends with rule s-upper (so we have a proof of Ψ X <:V ) then we
need to make use of the consistency of Ψ in the side-conditions of these rules
(T <:X ∈ Ψ and X <:U ∈ Ψ ) to obtain a derivation of Ψ T <:U . But to
apply the induction hypothesis on this derivation we need to bound its size.
Lemma 4. For any consistent context Ψ there exists a context Ψ such that
Ψ Ψ and Ψ Ψ and satisfying the following property: if T <:X <:U ∈ Ψ then
there is a derivation Ψ T <:U in which all uses of rules s-lower and s-upper
are trivial, namely, that the premise is of the form Ψ V <:V .
Variance and Generalized Constraints for C Generics 291
C <v X >:K ∀i, vi ∈ {◦, +} ⇒ Ψ b,n Ti <:Ui and vi ∈ {◦, -} ⇒ Ψ b,n Ui <:Ti
r-con
Ψ b,n+1 C <T ><:C <U >
Sub(Ξ, Ψ, T , U ) =
if (T , U ) ∈ Ξ then false
else let Sub (T , U ) = Sub({(T , U )} ∪ Ξ, Ψ, T , U ) in
case T , U of
X , X ⇒ true
T , X ⇒ Ti <:X ∈Ψ Sub (T , Ti )
X , K ⇒ X <:Ti ∈Ψ Sub (Ti , K )
C <T >, D<U > ⇒ K ∈K Sub ([T /X ]K , D<U >), if C = D and C <v X >:K
C <T >, C <U > ⇒ i|vi ∈{◦,+} Sub (Ti , Ui ) ∧ i|vi ∈{◦,-} Sub (Ui , Ti ), if C <v X >:K
Like Featherweight GJ, this language does not include object identity and en-
capsulated state, which arguably are defining features of the object-oriented
programming paradigm, nor does it model interfaces. It does include dynamic
296 B. Emir et al.
Subtyping: Δ T <:U
(sub-incl)
X , x :T , Δ T <: U
Well-formed types and constraints:
D(C ) = class C < v X > : K where Δ { . . . }
X ∈Γ
Γ T ok Γ [T /X ]Δ
Γ object ok Γ X ok
Γ C <T > ok
Δ ≡ T <:U Γ T , U ok
Γ Δ ok
Typing: Γ e : K fields(K ) = P T f
(ty-var) (ty-fld)
Γ, x :T x : T
Γ e.fi : Ti
v X K , T mono Δ consistent X , Δ K , Δ , T ok
fields(K ) = P U g f and g disjoint
(ok-class)
md ok in C <X > kd = public C (U g, T f ) base(g) {this.f =f ; }
Operational Semantics:
(reduction rules)
K <: T
(r-cast)
(T )new K (v ) → new K (v )
(evaluation rules)
e → e e → e
(c-new) (c-fld)
new K (v , e, e) → new K (v , e , e) e.f → e .f
e → e e → e
(c-cast) (c-meth-rcv)
(T )e → (T )e e.m<T >(e) → e .m<T >(e)
e → e
(c-meth-arg)
v .m<T >(v , e, e) → v .m<T >(v , e , e)
Field lookup:
D(C ) = class C < v X > : K where Δ { P 1 U1 f1 ; kd md }
fields([T /X ]K ) = P 2 U2 f2
fields(object) = {}
fields(C <T >) = P 2 U2 f2 , P 1 [T /X ]U1 f1
Method lookup:
D(C ) = class C < v X1 > : K where Δ { . . . md}
m not defined public virtual in md
mtype(C <T1 >.m) = mtype([T1 /X1 ]K .m)
Fig. 8. Evaluation rules and helper definitions for C minor with variance and con-
straints
298 B. Emir et al.
dispatch, generic methods and classes, and runtime casts. Despite the lack of
mutation, unrestricted use of variant type parameters leads to unsoundness:
class C<-X> { public readonly X x; public C(X x) { this.x = x; } }
// Interpret a Button as a string!
((C<string>) (new C<object>(new Button()))).x
For readers unfamiliar Featherweight GJ we summarize the language here.
Type variables X , types T , classes C , constructed types K , constraints T <:U ,
constraint lists Δ and indeed the declarative subtyping relation Δ T <:U are
as in Section 3 and not re-defined here; object abbreviates object<>.
A class definition cd consists of a class name C with formal, variance-
annotated type parameters v X , single base class (superclass) K , constraints
Δ, constructor definition kd , typed instance fields P T f and methods md .
Method names in md must be distinct i.e. there is no support for overloading.
A field qualifier P is always public readonly, denoting a publicly acces-
sible field that can be read, but not written, outside the constructor. Readonly
fields behave covariantly.
A method qualifier Q is either public virtual, denoting a publicly-
accessible method that can be inherited or overridden in subclasses, or public
override, denoting a method that overrides a method of the same name and
type signature in some superclass.
A method definition md consists of a method qualifier Q , a return type
T , name m, formal type parameters X , formal argument names x and types
T , constraints Δ and a body consisting of a single statement return e;. The
constraint-free sugar Q T m<X >(T x ) {return e;} abbreviates a declaration
with an empty where clause (|Δ| = 0). By design, the typing rules only allow
constraints to be placed on a virtual method definition: constraints are inher-
ited, modulo base-class instantiation, by any overrides of this virtual method.
Implicitly inheriting constraints matches C ’s implicit inheritance of bounds on
type parameters. Note that if Δ contains a bound on a class type parameter,
then it may become a general constraint between types in any overrides of this
method (by virtue of base class specialization). This is why we accommodate
arbitrary constraints, not just bounds, in constraint sets Δ.
A constructor kd initializes the fields of the class and its superclass.
An expression e can be a method parameter x , a field access e.f , the invoca-
tion of a virtual method at some type instantiation e.m<T >(e) or the creation
of an object with initial field values new K (e). A value v is a fully-evaluated
expression, and (always) has the form new K (v ).
A class table D maps class names to class definitions. The distinguished class
object is not listed in the table and is dealt with specially.
A typing environment Γ has the form Γ = X , x :T , Δ where free type variables
in T and Δ are drawn from X . We write · to denote the empty environment.
Judgment forms are as follows. The subtype judgment Γ T <: U extracts Δ
from Γ and defers to subtype judgment of Figure 2. To do this we define the
predicates C <v X >:K of Section 3 to mean |K | = 0 and C <v X > ≡ object<>
or |K | = 1 and D(C ) = class C <v X > : K1 . . .. The formation judgment
Variance and Generalized Constraints for C Generics 299
constraints. This does not rule out any useful programs: methods with incon-
sistent constraints are effectively dead, since the pre-conditions for calling them
can never be established. However, imposing consistency means that subtype
relations can be decided by appealing to our subtyping algorithm.
Nevertheless, our proof of Type Soundness does not rely on the notion of
consistency. Type soundness holds even if we omit the consistency premises.
We now outline the proof (eliding standard lemmas like Well-formedness,
Weakening and Inversion). The class table implicit in all results is assumed to
be valid.
We prove the usual type and term substitution properties that follow, but a
key lemma for our system is Lemma 11, that lets us discharge proven hypothet-
ical constraints from various judgment forms (a similar lemma appears in [10],
but for equations).
Lemma 9 (Substitution Property for Lookup).
– If fields(K ) = P T f then fields([U /Y ]K ) = P [U /Y ]T f .
– mtype(K .m) = <X where Δ>T → T implies
mtype(([U /Y ]K ).m) = [U /Y ](<X where Δ>T → T ).
– mtype(K .m) is undefined then mtype(([U /Y ]K ).m) is undefined.
Lemma 10 (Substitution for types). Let J range over the judgment forms
of subtyping (T <:U ), type well-formedness (T ok ) and typing (e : T ):
If X , Y , x :T , Δ J and Y U ok then Y , x :[U /X ]T , [U /X ]Δ [U /X ]J .
Proof. Straightforward induction on the derivation of J , using Lemma 9.
Lemma 11 (Constraint Elimination). Let J range over the judgment forms
of subtyping (T <:U ), type well-formedness (T ok ) and typing (e : T ):
If Γ, Δ J and Γ Δ then Γ J .
Proof. Induction on the derivation of J .
Lemma 12 (Substitution for terms). If Γ, x :T e : T and Γ v : T then
Γ [v /x ]e : T .
Proof. By induction on the typing derivation.
To prove Preservation we also need the following properties of (ground) subtyp-
ing. The first two lemmas tell us that the types of members are preserved by
subtyping, but only up to subtyping, since fields and method signatures behave
covariantly (subtyping on method signatures may be defined in the usual contra-
co fashion, treating constraints contra-variantly). The proofs of these lemmas
rely on the monotonicity restrictions on base classes, fields and method signa-
tures enforced by rules (ok-class) and (ok-virtual): these, in turn, justify appeals
to Lemma 3 in the proofs.
Lemma 13 (Field Preservation). If · T , U ok and · T <: U , then
fields(U ) = P U g and fields(T ) = P T f implies · Ti <: Ui and fi = gi for
all i ≤ |g|.
Variance and Generalized Constraints for C Generics 301
5 Conclusion
References
Rice University,
Houston, TX 77005, USA
Jeremy.G.Siek@rice.edu, taha@rice.edu
1 Introduction
We start with a review of C++ templates, demonstrating their use with some basic
examples. We then review more advanced uses of templates to perform compile-
time computations and to write program generators. We give an overview of the
technical contributions of this paper at the end of this section.
The following definition is an example of a class template that defines a con-
tainer parameterized on the element type T and length n.
template<class T, int n>
class buffer {
T data[n];
public:
void set(int i, T v) { data[i] = v; }
T get(int i) { return data[i]; }
};
This work was supported by NSF ITR-0113569 Putting Multi-Stage Annotations
to Work, Texas ATP 003604-0032-2003 Advanced Languages Techniques for Device
Drivers, and NSF SOD-0439017 Synthesizing Device Drivers.
When a template is used, C++ performs pattern matching between the tem-
plate arguments and the specialization patterns to determine which specializa-
tion to use, or whether a specialization needs to be generated from a class tem-
plate or a partial specialization. Consider the following program that defines three
objects.
int main() {
buffer<int,3> buf1;
buffer<bool,3> buf2;
buffer<bool,0> buf3;
}
The type buffer<int,3> matches neither of the specializations for buffer, so C++
will generate a specialization from the original buffer template by substituting
int for T and 3 for n. This automatic generation is called implicit instantiation.
The resulting template specialization is shown below.
template<>
class buffer<int,3> {
int data[3];
public:
void set(int i, int v);
int get(int i);
};
Note that definitions (implementations) of the members set and get were not
generated. Only the declarations of the members were generated. The definition
306 J. Siek and W. Taha
The static keyword means the member is associated with the class and not with
each object. The const keyword means the member is immutable.
There are limitations, however, to what kinds of expressions C++ will evalu-
ated at compile time: only arithmetic expressions on built-in integer-like types.
There are similar restrictions on the kinds of values that may be passed as tem-
plate parameters. For example, a list value can not be passed as a template
parameter. Fortunately, it is possible to encode data structures as types, and
types can be passed as template parameters. The following creates a type en-
coding for cons-lists.
template<class Head, class Tail> struct Cons { };
struct Nil { };
typedef Cons< int, Cons< float, Nil > > list of types;
A Semantic Analysis of C++ Templates 307
In general, templates can be used to mimic algebraic datatypes [5], such as those
found in SML [10] and Haskell [1]. Furthermore, templates are “open ended” so
they can mimic the extensible variants of Objective Caml [9].
This paper omits value template parameters and compile-time expressions
because they are technically redundant: we can encode computations on integer-
like values as computations on types. For example, the following types encode
natural numbers.
template<class T> struct succ { };
struct zero { };
typedef succ<zero> one;
typedef succ< succ<zero> > two;
The following is the power template reformulated for types. The definition of mult
is left as an exercise for the reader.1
template<class x, class n> struct power { };
template<class p>
struct power< succ<p> > {
static int f(int x){ return x ∗ power<p>::f(x); }
};
template<> struct power<zero> {
static int f(int x) { return 1; }
};
int main(int argc, char∗ argv[]) {
return power<two>::f(atoi(argv[1])); // command-line input
}
1
A C++ expert will notice missing typename keywords in our examples. We do this
intentionally to avoid confusing readers unfamiliar with C++ and its syntactic clutter.
308 J. Siek and W. Taha
The bodies of functions, such as in main and f, contain run-time code. Type
expressions, such as power<two> and power<p> represent escapes from the run-
time code back into the compile-time level. The power metaprogram is recursive
but the generated program is not. The generated program has a static call tree of
height 3. An optimizing C++ compiler is likely to simplify the generated program
to the following:
int main(int argc, char∗ argv[]) {
int x = atoi(argv[1]);
return x ∗ x;
}
Such optimization is not required by the C++ Standard. However, if the com-
piler performs inlining, it must preserve the call-by-value semantics. We might
label function f with the inline keyword, but this is only a suggestion to the
compiler. The performance of the generated programs is therefore brittle and
non-portable. Compilers rarely publicize the details of their inlining algorithm,
and the algorithms are heuristic in nature and hard to predict. Furthermore, the
inlining algorithm can vary dramatically from one compiler to the next. See [7]
for an alternative approach based on macros that guarantees inlining.
The subset of C++ we study in the paper includes just enough features to ex-
hibit both the compile time and run time computations needed to write template
metaprograms.
1.3 Contributions
We present the first formal account of C++ templates. We identify a small subset
of C++ called C++.T and give a semantics to C++.T by defining:
The main technical result is that valid programs are type safe. That is, if
a program successfully instantiates, then the run-time evaluation will result in
a value of the appropriate type, provided that evaluation terminates. The type
safety theorem is proved in Section 4. The lemmas and definitions in the following
sections lead up to this result.
S(α) = S α
S(τ1 → τ2 ) = S(τ1 ) → S(τ2 )
S(tτ1 ..τn ) = tS(τ1 )..S(τn )
S(τ.a) = S(τ ).a
S(int) = int
Proposition 1. FTV(S(τ )) = α∈FTV(τ ) FTV(S(α))
We define the following lookup function to capture the rule that the use of a
template resolves to the most specific template that matches the given template
arguments.
lookup : Set T × T → T⊥
max{π{m : κ τ } ∈ T | π ≤ τ }
if the max exists
lookup(T, τ )
⊥ otherwise
The inst function maps a set of templates and a type to the template spe-
cialization obtained by instantiating the best matching template from a set of
templates. The condition that S(π) = τ fully determines the action of S on free
variables in π, and because the free variables in τ are required to be a subset of
those in π, the type S(τ ) is unique.
inst : Set T × T → T⊥
⎧
⎪
⎨ τ {m : κ S(τ )}
where lookup(T, τ ) = π{m : κ τ }
lookupmem(T, τ, m)
⊥ otherwise
Next we show that member lookup produces closed types. For this lemma we
need to define the free type variables of a set of template definitions.
Definition 6. FTV(T ) {α | π{m : κ τ } ∈ T ∧ α ∈ FTV(τ ) − FTV(π)}
The rules for type evaluation are complicated by the need to evaluate types un-
derneath type variable binders. In the following example, the type A<A<int>::u>
is underneath the binder for T but it must be evaluated to A<float>.
314 J. Siek and W. Taha
T;P τ ⇓ τ
α∈P
(C-VarT)
T;P α ⇓ α
S
T ; P τ ⇓n tτ1 ..τn
i FTV(τi ) = ∅
lookupmem(T, tτ1 ..τn
, a) = (type, τ )
T ; P τ ⇓ τ
(C-MemT1)
T ; P τ.a ⇓ τ
T; P τ ⇓ τ FTV(τ ) = ∅
(C-MemT2)
T ; P τ.a ⇓ τ .a
T ; P τ1 ⇓ τ1 T ; P τ2 ⇓ τ2
(C-ArrowT)
T ; P τ1 → τ2 ⇓ τ1 → τ2
(C-IntT)
T ; P int ⇓ int
1. If T ; P
τ ⇓ τ then FTV(τ ) ⊆ P .
2. If T ; ∅
τ ⇓ τ then τ ∈ R.
It is worth mentioning that type evaluation may diverge for some types and
therefore it is impossible to build a derivation for those types (the derivation
would need to be infinite). For example, let T = {Aα{x : type AAα.x}}
and P = ∅. Then we can not build an derivation for Aint.x.
Well-formed types are types that do not contain out-of-scope type parameters
or use undefined template names. The definition of well-formed types is given
in Fig. 3. The well-formed type judgment has the form T ; P
τ wf where T is
the set of in-scope templates and P is the set of in-scope type parameters. In
the case for member access, we do not check that the member name is indeed a
member of the given type τ , as that would require us to evaluate τ . The purpose
316 J. Siek and W. Taha
of the well-formed type judgment is not to ensure a safety property for type
evaluation, it is merely to check for uses of undefined type variables or template
names.
T ; P τ wf
T ; P τ1 wf T ; P τ2 wf
T ; P int wf
T ; P τ1 → τ2 wf
The following defines when a type is defined and when a type is complete, two
notions that will be used in the evaluation semantics.
Definition 8.
.
– A type τ is defined in T iff ∃τ , m, κ, τ . τ = τ ∧ τ {m : κ τ } ∈ T .
– A type τ is complete in T iff τ is defined in T and τ ∈ R.
Similarly, in the (R-Obj) rule, the type of the object must be complete so that
we know how to construct the object. The semantics includes error propagation
rules so that we can distinguish between non-termination and errors. The (R-
AppE1) rule states that a function application results in an error if either e1 or
e2 evaluates to an error. Strictly speaking, this would force an implementation of
C++.T to interleave the evaluation of e1 and e2 so that non-termination of either
would not prevent encountering the error. This is not the intended semantics,
but a precise treatment is rather verbose. Our type safety result still holds for
any sequential implementation because the behavior only differs on programs
that are rejected by the type system.
F ; T e →n ans
0<k
(R-Int)
F ; T n →k n
τ is complete in T 0 < n
(R-Obj)
F ; T obj τ →n obj τ
0<n
(R-Mem)
F ; T τ.f →n τ.f
Proof. By induction on the typing judgment. The cases for (T-Obj1) and
(T-Mem1) use Proposition 4.
Definition 9.
Proof. By induction on the evaluation judgment. The case for application relies
on the fact that the functions used in e1 [x := e2 ] are a subset of the functions
used in e1 and e2 .
Definition 10. (Well typed function environment) We write T
F if, for all
full member specializations r has f (x : τ1 ) → τ2 {e} ∈ F we have
1. r{f : fun τ1 → τ2 } ∈ T
2. T ; ∅; x : τ1
e : τ2
3. τ1 and τ2 are complete in T
320 J. Siek and W. Taha
T ; P ; Γ e : τ?
x : τ ∈ Γ FTV(τ ) = ∅
(T-Var1)
T ; P ; Γ x : τ
x : τ ∈ Γ FTV(τ ) = ∅
(T-Var2)
T;P;Γ x : ?
(T-Int)
T ; P ; Γ n : int
tτ1 ..τn
is complete in T
(T-Obj1)
T ; P ; Γ obj tτ1 ..τn
: tτ1 ..τn
FTV(τ ) = ∅
(T-Obj2)
T ; P ; Γ obj τ : ?
T ; P τ wf FTV(τ ) = ∅ τ {f : fun τ } ∈ T
(T-Mem1)
T ; P ; Γ τ.f : τ
FTV(τ ) = ∅
(T-Mem2)
T ; P ; Γ τ.f : ?
T ; P ; Γ e1 : τ → τ T ; P ; Γ e2 : τ
(T-App1)
T ; P ; Γ e1 e2 : τ
T ; P ; Γ e1 : a1 T ; P ; Γ e2 : a2
a1 =? ∨ a2 =?
(T-App2)
T ; P ; Γ e1 e2 : ?
The next lemma is type safety for expression evaluation. The appearance of
F ; T
e →n ans in the conclusion of this lemma, and not as a premise, would
normally be a naive mistake because not all programs terminate. However, by
using n-depth evaluation, we can construct a judgment regardless of whether
the program is non-terminating. Further, by placing F ; T
e →n ans in the
conclusion, this lemma proves that our evaluation rules are complete, analogous
to a progress lemma for small-step semantics. We learned of this technique from
Ernst et al. [6].
A Semantic Analysis of C++ Templates 321
Definitions d ∈ D ::= T | F
Programs p ∈ P ::= d∗
Definition 11.
– We write τ ∈ e iff obj τ is a subexpression of e and τ ∈ R.
– The notation X, z stands for {z} ∪ X where z ∈/ X.
T;F p ⇓ T;F
funused (F ) ⊆ F
(C-Nil)
T ; F [] ⇓ T ; F
(C-InstFun)
τ.f ∈ funused (F1 ) − F1 lookup(T1 , τ ) = π{f : fun τ1 → τ2 }
π has f (x : τ1 ) → τ2 {e} ∈ F1 S(π) = τ
T1 ; F1 τ has f (x : S(τ1 )) → S(τ2 ){S(e)} ⇓ T2 ; F2 T2 ; F2 [] ⇓ T ; F
T1 ; F1 [] ⇓ T ; F
5 Discussion
The semantics defined in this paper instantiates fewer templates than what is
mandated by the C++ standard. In particular, the C++ standard says that mem-
ber access, such as A<int>::u, causes the instantiation of A<int>. In our se-
mantics, the member access will obtain the definition of member u but it will
not generate a template specialization for A<int>. We only generate template
specializations for types that appear in residual program contexts that require
complete types: object construction and function parameters and return types.
Our type-safety result shows that even though we produce fewer template spe-
cializations, we produce enough to ensure the proper run-time execution of the
program. The benefit of this semantics is that the compiler is allowed to be more
space efficient.
The semantics of member function instantiation is a point of some controversy.
Section [14.6.4.1 p1] of the Standard says that the point of instantiation for a
member function immediately follows the enclosing declaration that triggered the
instantiation (with a caveat for dependent uses within templates). The problem
with this rule is that uses of a member function may legally precede its definition
and the definition is needed to generate the instantiation. (A use of a member
function must only come after the declaration of the member function, which
is in the template specialization.) In general, the C++ Standard is formulated
to allow for compilation in a single pass, whereas the current rules for member
instantiation would require two passes. Also, there is a disconnect between the
Standard and the current implementations. The Edison Design Group and GNU
compilers both delay the instantiation of member functions to the end of the
program (or translation unit). We discussed this issue on C++ committee2 and
2
A post to the C++ committee email reflector on September 19, 2005, with a response
from John Spicer.
A Semantic Analysis of C++ Templates 325
the opinion was that this is a defect in the C++ Standard and that instantiation
of member functions should be allowed to occur at the end of the program.
Therefore, C++.T places instantiations for member functions at the end of the
program.
6 Related Work
Recently, Stroustrup and Dos Reis proposed a formal account of the type system
for C++ [13, 14]. However, they do not define the semantics of evaluation and
they do not study template specialization. The focus of the work by Stroustrup
and Dos Reis is to enable the type checking of template definitions separately
from their uses. Siek et al. [16] also describe an extension to improve the type
checking of template definitions and uses.
Wallace studied the dynamic evaluation of C++, but not the static aspects
such as template instantiation [20].
C++ templates are widely used for program generation. There has been con-
siderable research on language support for program generation, resulting in lan-
guages such as MetaOCaml [2] and Template Haskell [15]. These languages pro-
vide first-class support for program generation by including a hygienic quasi-
quotation mechanism. The advanced type system used in MetaOCaml guaran-
tees that the generated code is typable. There are no such guarantees in C++.
The formal semantics defined in this paper will facilitate comparing C++ with
languages such as MetaOCaml and will help in finding ways to improve C++.
The C++ Standards Committee has begun to investigate improved support for
metaprogramming [18].
7 Conclusion
This paper presents a formal account of C++ templates. We identify a small
subset, named C++.T, that includes templates, specialization, and member func-
tions. We define the compile-time and run-time semantics of C++.T, including
type evaluation, template instantiation, and a type system. The main technical
result is the proof of type safety, which states that if a program is valid (tem-
plate instantiation succeeds), then run-time execution of the program will not
encounter type errors. In the process of formalizing C++.T, we encountered two
interesting issues: the C++ Standard instantiates member functions too soon and
generates unnecessary template specializations.
From the point of view of language semantics and program generation re-
search, it was interesting to see that C++.T involves a form of evaluation under
variable binders at the level of types but not at the level of terms. However, eval-
uation of open terms is not allowed. We plan to investigate how this affects the
expressivity of C++ templates as a mechanism for writing program generators.
326 J. Siek and W. Taha
Acknowledgments
We thank the anonymous reviewers for their careful reading of this paper. We
thank Emir Pasalic and Ronald Garcia for reading drafts and suggesting im-
provements and we thank Jaakko Järvi, Doug Gregor, Todd Veldhuizen, and
Jeremiah Willcock for helpful discussions. We especially thank John Spicer and
Daveed Vandevoorde of Edison Design Group for help with interpreting the C++
Standard.
Bibliography
[1] Haskell 98 Language and Libraries: The Revised Report, December 2002.
http://www.haskell.org/onlinereport/index.html.
[2] MetaOCaml: A compiled, type-safe multi-stage programming language. Available
online from http://www.metaocaml.org/, 2004.
[3] David Abrahams and Aleksey Gurtovoy. C++ Template Metaprogramming: Con-
cepts, Tools, and Techniques from Boost and Beyond. Addison-Wesley Longman
Publishing Co., Inc., Boston, MA, USA, 2004.
[4] Andrei Alexandrescu. Modern C++ design: generic programming and design pat-
terns applied. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,
2001.
[5] Krzysztof Czarnecki and Ulrich W. Eisenecker. Generative programming: methods,
tools, and applications. ACM Press/Addison-Wesley Publishing Co., New York,
NY, USA, 2000.
[6] Erik Ernst, Klaus Ostermann, and William R. Cook. A virtual class calculus. In
POPL’06: Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on
Principles of programming languages, pages 270–282, New York, NY, USA, 2006.
ACM Press.
[7] Steven E. Ganz, Amr Sabry, and Walid Taha. Macros as multi-stage computations:
type-safe, generative, binding macros in MacroML. In ICFP ’01: Proceedings of
the sixth ACM SIGPLAN international conference on Functional programming,
pages 74–85, New York, NY, USA, 2001. ACM Press.
[8] International Organization for Standardization. ISO/IEC 14882:2003: Program-
ming languages — C++. Geneva, Switzerland, October 2003.
[9] Xavier Leroy. The Objective Caml system: Documentation and user’s manual,
2000. With Damien Doligez, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon.
[10] Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML.
MIT Press, 1990.
[11] Tobias Nipkow. Structured proofs in Isar/HOL. In Types for Proofs and Programs
(TYPES 2002), number 2646 in LNCS, 2002.
[12] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL — A
Proof Assistant for Higher-Order Logic, volume 2283 of LNCS. Springer, 2002.
[13] Gabriel Dos Reis and Bjarne Stroustrup. A formalism for C++. Technical Report
N1885=05-0145, ISO/IEC JTC1/SC22/WG21, 2005.
[14] Gabriel Dos Reis and Bjarne Stroustrup. Specifying c++ concepts. In POPL ’06:
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles
of programming languages, pages 295–308, New York, NY, USA, 2006. ACM Press.
A Semantic Analysis of C++ Templates 327
[15] Tim Sheard and Simon Peyton Jones. Template meta-programming for haskell. In
Haskell ’02: Proceedings of the 2002 ACM SIGPLAN workshop on Haskell, pages
1–16, New York, NY, USA, 2002. ACM Press.
[16] Jeremy Siek, Douglas Gregor, Ronald Garcia, Jeremiah Willcock, Jaakko Järvi,
and Andrew Lumsdaine. Concepts for C++0x. Technical Report N1758=05-0018,
ISO/IEC JTC 1, Information Technology, Subcommittee SC 22, Programming
Language C++, January 2005.
[17] Jeremy Siek and Walid Taha. C++.T formalization in Isar. Technical Report
TR05-458, Rice University, Houston, TX, December 2005.
[18] Daveed Vandevoorde. Reflective metaprogramming in C++. Technical Report
N1471/03-0054, ISO/IEC JTC 1, Information Technology, Subcommittee SC 22,
Programming Language C++, April 2003.
[19] Todd Veldhuizen. Using C++ template metaprograms. C++ Report, 7(4):36–43,
May 1995. Reprinted in C++ Gems, ed. Stanley Lippman.
[20] Charles Wallace. Specification and validation methods, chapter The semantics of
the C++ programming language, pages 131–164. Oxford University Press, Inc.,
New York, NY, USA, 1995.
[21] Markus Wenzel. The Isabelle/Isar Reference Manual. TU München, April 2004.
Session Types for Object-Oriented Languages
Abstract. A session takes place between two parties; after establishing a con-
nection, each party interleaves local computations with communications (send-
ing or receiving) with the other. Session types characterise such sessions in terms
of the types of values communicated and the shape of protocols, and have been
developed for the π-calculus, CORBA interfaces, and functional languages. We
study the incorporation of session types into object-oriented languages through
M OOSE , a multi-threaded language with session types, thread spawning, iterative
and higher-order sessions. Our design aims to consistently integrate the object-
oriented programming style and sessions, and to be able to treat various case stud-
ies from the literature. We describe the design of M OOSE , its syntax, operational
semantics and type system, and develop a type inference system. After proving
subject reduction, we establish the progress property: once a communication has
been established, well-typed programs will never starve at communication points.
1 Introduction
computation, interleaved with several communications with the other party. Communi-
cations take the form of sending and receiving values over a channel, and additionally,
throughout interaction between the two parties, there should be a perfect matching of
sending actions in one with receiving actions in the other, and vice versa. This form of
structured interaction is found in many application scenarios.
Session types have been proposed in [18], aiming to characterise such sessions, in
terms of the types of messages received or sent by a party. For example, the session
type begin.!int.!int.?bool.end expresses that two int-values will be sent, then a
bool-value will be expected to be received, and then the protocol will be complete.
Thus, session types specify the communication behaviour of a piece of software, and
can be used to verify the safety, in terms of communication, of the composition of
several pieces of software executing in parallel. Session types have been studied for
several different settings, i.e., for π-calculus-based formalisms [6, 7, 13, 18, 20, 26], for
CORBA [27], for functional languages [15, 29], and recently, for CDL, a W3C standard
description language for web services [3, 8, 30].
In this paper we study the incorporation of session types into object-oriented lan-
guages. To our knowledge, except for our earlier work [10], such an integration has
not been attempted so far. We propose the language M OOSE, a multi-threaded object-
oriented core language augmented with session types, which supports thread spawning,
iterative sessions, and higher-order sessions.
The design of M OOSE was guided by the wish for the following properties:
object oriented style. We wanted M OOSE programming to be as natural as possible to
people used to mainstream object oriented languages. In order to achieve an object
oriented style, M OOSE allows sessions to be handled modularly using methods.
expressivity. We wanted to be able to express common case studies from the literature
on session types and concurrent programming idioms [24], as well as those from the
WC3 standard documents [8, 30]. In order to achieve expressivity, we introduced
the ability to spawn new threads, to send and receive sessions (i.e., higher-order
sessions), conditional, and iterative sessions.
type preservation. The guarantee that execution preserves types, i.e., the subject re-
duction property, proved to be an intricate task. In fact, several session type systems
in the literature fail to preserve typability after reduction of certain subtle configu-
rations, which we identified through a detailed analysis of how types of communi-
cation channels evolve during reduction. Type preservation requires linear usage of
channels; in order to guarantee this we had to prevent aliasing of channels, mani-
fested by the fact that channel types cannot be assigned to fields.
progress. We wanted to be able to guarantee that once a session has started, i.e., a
connection has been established, threads neither starve nor deadlock at the points of
communication during the session, a highly desirable property in communication-
based programs. Establishing this property was an intricate task as well, and, to the
best of our knowledge, no other session type system in the literature can ensure it.
The combination of higher-order sessions, spawn and the requirement to prevent
deadlock during sessions posed the major challenge for our type system.
The paper is organised as follows: §2 illustrates the basic ideas through an example. §3
defines the syntax of the language. §4 presents the operational semantics. §5 describes
330 M. Dezani-Ciancaglini et al.
design decisions. §6 illustrates the typing system. §7 states basic theorems on type
safety and communication safety. §8 discusses the related work, and §9 concludes.
A full version of this paper, [1], includes complete definitions and proofs. Also,
more detailed explanations and examples can be found in [24].
We describe a typical collaboration pattern that appears in many web service busi-
ness protocols [8, 30] using M OOSE. This simple protocol contains essential features
by which we can demonstrate the expressivity of M OOSE: it requires a combination
of session establishing, higher-order session passing, spawn, conditional and deadlock-
freedom during the session.
In Fig. 1 we show the sequence diagram for the protocol which models the purchas-
ing of items as follows: first, the Seller and Buyer participants initiate interaction over
channel c1; then, the Buyer sends a product id to the Seller, and receives a price quote
in return; finally, the Buyer may either accept or reject this price. If the price received is
acceptable then the Seller connects with the Shipper over channel c2. First the Seller
sends to the Shipper the details of the purchased item. Then the Seller delegates its
part of the remaining activity with the Buyer to the Shipper, that is realised by sending
c1 over c2. Now the Shipper will await the Buyer’s address, before responding with the
delivery date. If the price is not acceptable, then the interaction terminates.
In Fig. 2 we declare the necessary session types, and in Fig. 3 we encode the given
scenario in M OOSE, using one class per protocol participant. The session types Buy-
Product and RequestDelivery describe the communication patterns between Buyer and
Seller, and Seller and Shipper, respectively. The session type BuyProduct models the
sending of a String, then the reception of a double, and finally a conditional behaviour,
in which a bool is (implicitly) sent before a branch is followed: the first branch requires
that an Address is sent, then a DeliveryDetails received, and finally that the session is
closed; the second branch models an empty communication sequence and the closing
Session Types for Object-Oriented Languages 331
of the session. We write BuyProduct for the dual type, which is constructed by taking
BuyProduct and changing occurrences of ! to ? and vice versa; hence, these types repre-
sent the two complementary behaviours associated with a session, in which the sending
of a value in one end corresponds to its reception at the other. In other words, BuyProduct
is the same as begin.?String.!double.?<?Address.!DeliveryDetails.end,end> . Note
that in the case of the conditional, the thread with ! in its type decides which branch is
to be followed and communicates the boolean value, while the other thread passively
awaits the former thread’s decision. The session type RequestDelivery describes send-
ing a ProductDetails instance, followed by sending a ‘live’ session channel of type
?Address.!DeliveryDetails.end .
1 session BuyProduct =
2 begin.!String.?double.!<!Address.?DeliveryDetails.end,end>
3 session RequestDelivery =
4 begin.!ProductDetails.!(?Address.!DeliveryDetails.end).end
Sessions can start when two compatible connect ... statements are active. In Fig. 3,
the first component of connect is the shared channel that is used to start communication,
the second is the session type, and the third is the session body, which implements the
session type. The method buy of class Buyer contains a connect statement that imple-
ments the session type BuyProduct, while the method sell of class Seller contains a
connect statement over the same channel and the dual session type. When a Buyer and
a Seller are executing concurrently their respective methods, they can engage in a ses-
sion, which will result in a fresh channel being replaced for occurrences of the shared
channel c1 within both session bodies; freshness guarantees that the new channel only
occurs in these two session bodies, therefore the objects can proceed to perform their
interactions without the possibility of external interference.
Once the session has started in the body of method buy, the product identifier, prodID,
is sent using c1.send(prodID) and the price quote is received using c1.receive. If the
price is acceptable, i.e., c1.receive <= maxPrice , then true is sent and the first branch
of the conditional is taken, starting on line 9. In this case, the customer’s address, addr,
is sent and an instance of DeliveryDetails is received. If the price is not acceptable,
then false is sent and the second branch of the conditional starting on line 11 is taken,
and the connection closes.
The body of method sell implements behaviour dual to the above. Note that in c1.
receiveIf{...}{...} the branch to be selected depends on the boolean value received
from the other end, which will execute the complementary expression c1.sendIf(..)
{...}{...}. The first branch of the former conditional contains a nested connect in
line 25, via which the product details are sent to the Shipper, followed by the actual
runtime channel that was substituted for c1 when the outer connect took place; the
latter is sent through the construct c2.sendS(c1), which realises higher-order session
communication. Notice that the code in lines 25-26 is within a spawn, which reduces to
a new thread with the enclosed expression as its body.
332 M. Dezani-Ciancaglini et al.
1 class Buyer {
2
3 Address addr;
4
16 class Seller {
17 void sell() {
18 connect c1 BuyProduct {
19 String prodID := c1.receive;
20 double price := getPrice( prodID ); // implem. omitted
21 c1.send( price );
22 c1.receiveIf { // buyer accepts price
23 ProductDetails prodDetails := new ProductDetails();
24 // ... init prodDetails with prodID, size, etc
25 spawn { connect c2 RequestDelivery {
26 c2.send( prodDetails ); c2.sendS( c1 );} }
27 }{ null; /* receiveIf : buyer rejects */ }
28 } /* End connect */
29 } /* End method sell */
30 }
31
32 class Shipper {
33 void delivery() {
34 connect c2 RequestDelivery {
35 ProductDetails prodDetails := c2.receive;
36 c2.receiveS( x ) {
37 Address custAddress := x.receive;
38 DeliveryDetails delivDetails := new DeliveryDetails();
39 //... set state of delivDetails
40 x.send( delivDetails ); }
41 } /* End connect */
42 } /* End method delivery */
43 }
Method delivery of class Shipper should be clear, with the exception of c2.receiveS
(x){..} which is dual to c2.sendS(c1). In the former expression, the received channel
is bound to variable x.
The above example shows how M OOSE achieves deadlock-freedom during a ses-
sion: because sessions take place between threads with complementary communication
Session Types for Object-Oriented Languages 333
In Fig. 4 we describe the syntax of M OOSE. We distinguish user syntax, i.e., source level
code, and runtime syntax, which includes null pointer exceptions, threads and heaps.
The syntax is based on FJ [21] with the addition of imperative and communication
primitives similar to those from [4, 6, 10, 18, 20, 29]. We designed M OOSE as a multi-
threaded concurrent language for simplicity of presentation although it can be extended
to model distribution; see § 8.
Channels. We distinguish shared channels and live channels. Shared channels have
not yet been connected; they are used to decide if two threads can communicate, in
which case they are replaced by fresh live channels. After a connection has been created
the channel is live; data may be transmitted through such active channels only. The
types of M OOSE enforce the condition that there are exactly two threads which contain
occurrences of the same live channel: we call it bilinearity condition.
User syntax. The metavariable t ranges over types for expressions, C ranges over class
names and s ranges over session types. Each session type s has one corresponding dual,
denoted s , which is obtained by replacing each ! (output) by ? (input) and vice versa.
We use η to denote the type of a live channel. We introduce the full syntax of types
334 M. Dezani-Ciancaglini et al.
in § 6. Class and method declarations are as expected, except for the restriction that at
most one parameter can be a live channel. This condition is explained in Example 5.4.
The syntax of user expressions e , e is standard but for the channel constructor
new (s , s ), and the communication expressions, i.e., connect u s {e } and all the ex-
pressions in the last three lines.
The first line gives parameter, value, the receiver this, sequence of expressions, as-
signment to fields, field access, method call, object and channel creation, and new (s , s ),
which builds a fresh shared channel used to establish a private session. The values are
channels, null, and the literals true and false. Thread creation is declared using
spawn { e }, in which the expression e is called the thread body.
The expression connect u s {e } starts a session: the channel u appears within the
term {e } in session communications that agree with session type s . The remaining
eight expressions, which realise the exchanges of data, are called session expressions,
and start with “u . ”; we call u the subject of such expressions. In the below explanation
session expressions are pairwise coupled: we say that expressions in the same pair and
with the same subject are dual to each other.
The first pair is for exchange of values (which can be shared channels): u .receive
receives a value via u , while u .send (e ) evaluates e and sends the result over u . The
second pair expresses live channel exchange: in u .receiveS (x ){e } the received channel
will be bound to x within e , in which x is used for communications. The expression
u .sendS (u ) sends the channel u over u . The third pair is for conditional communica-
tion: u .receiveIf {e }{e } receives a boolean value via channel u , and if it is true con-
tinues with e , otherwise with e ; the expression u .sendIf (e ){e }{e } first evaluates
the boolean expression e , then sends the result via channel u and if the result was
true continues with e , otherwise with e . The fourth is for iterative communication:
the expression u .receiveWhile {e } receives a boolean value via channel u , and if it is
true continues with e and iterates, otherwise ends; the expression u .sendWhile (e ){e }
first evaluates the boolean expression e , then sends its result via channel u and if the
result was true continues with e and iterates, otherwise ends.
Runtime syntax. The runtime syntax (shown shaded in Fig. 4) extends the user syntax:
it introduces threads running in parallel; adds NullExc to expressions, denoting the null
pointer error; finally, extends values to allow for object identifiers o , which denote
references to instances of classes. Single and multiple threads are ranged over by P, P .
The expression P | P says that P and P are running in parallel.
Heaps, ranged over h, are built inductively using the heap composition operator ‘·’,
and contain mappings of object identifiers to instances of classes, and channels. In par-
ticular, a heap will contain the set of objects and fresh channels, both shared and live,
that have been created since the beginning of execution. The heap produced by com-
posing h · [o → (C, f˜ : ṽ )] will map o to the object (C, f˜ : ṽ ), where C is the class name
and f˜ : ṽ is a representation for the vector of distinct mappings from field names to their
values for this instance. The heap produced by composing h · c will contain the fresh
channel c . Heap membership for object identifiers and channels is checked using stan-
dard set notation, we therefore write it as o ∈ h and c ∈ h, respectively. Heap update for
objects is written h[o → (C, f˜ : ṽ )], and field update is written (C, f˜ : ṽ )[f → v ].
Session Types for Object-Oriented Languages 335
4 Operational Semantics
This section presents the operational semantics of M OOSE, which is inspired by the
standard small step call-by-value reduction of [4, 5, 25] and mainly of [10]. We only
discuss the more interesting rules. First we list the evaluation contexts.
E ::= [ ] | E .f | E; e | E .f := e | o .f := E | E.m (ẽ ) | o .m (ṽ , E, ẽ )
| c .send (E) | u .sendIf (E){e }{e }
Notice that connect u s{E}, u .receiveS (x ){E}, u .sendIf (e ){E}{e }, u .sendIf (e ){e }{E},
u .receiveIf {E}{e}, u .receiveIf {e }{E}, u .receiveWhile {E}, and u .sendWhile(e ){E}
are not evaluation contexts: the first would allow session bodies to run before the start of
the session; the second would allow execution of an expression waiting for a live chan-
nel before actually receiving it; the remaining would allow parts of a conditional or
iterative session to run before determining which branch should be selected, or whether
the iteration should continue.
Meth
NullProp
h(o ) = (C, . . . ) mbody(m ,C) = (x̃ , e )
E[NullExc ], h −→ NullExc , h
o .m (ṽ ), h −→ e [o/this][ṽ/x̃ ], h
Expressions. Fig. 5 shows the rules for execution of expressions which correspond to
the sequential part of the language. These are standard [5, 11, 21], but for the addition
of a fresh shared channel to the heap (rule NewS). In rule NewC the auxiliary function
fields(C) examines the class table and returns the field declarations for C. The method
invocation rule is Meth; the auxiliary function mbody(m ,C) looks up m in the class
C, and returns a pair consisting of the formal parameter names and the method’s code.
The result is the method body where the keyword this is replaced by the receiver’s
object identifier o , and the formal parameters x̃ are replaced by the actual parameters
ṽ . Note that the replacement of this by o cannot lead to unwanted behaviours since
the receiver cannot change during the execution of the method body.
336 M. Dezani-Ciancaglini et al.
Par Str
Spawn
P, h −→ P , h P1 , h −→ P2 , h Pi ≡ Pi i ∈ {1, 2}
E[spawn { e }], h −→ E[null] | e , h
P | P0 , h −→ P | P0 , h P1 , h −→ P2 , h
Connect
E1 [connect c s{e 1 }] | E2 [connect c s {e 2 }], h −→ E1 [e 1 [c /c ]] | E2 [e 2 [c /c ]], h · c c ∈ h
ComS
E1 [c .send (v )] | E2 [c .receive ], h −→ E1 [null] | E2 [v ], h
ComSS
E1 [c .receiveS (x ){e }] | E2 [c .sendS (c )], h −→ E1 [null] | e [c /x ] | E2 [null], h
ComSIf-true
E1 [c .sendIf (true){e 1 }{e 2 }] | E2 [c .receiveIf {e 3 }{e 4 }], h −→ E1 [e 1 ] | E2 [e 3 ], h
CommSIf-false
E1 [c .sendIf (false){e 1 }{e 2 }] | E2 [c .receiveIf {e 3 }{e 4 }], h −→ E1 [e 2 ] | E2 [e 4 ], h
ComSWhile
E1 [c .sendWhile (e ){e 1 }] | E2 [c .receiveWhile {e 2 }], h −→
E1 [c .sendIf (e ){e 1 ; c .sendWhile (e ){e 1 }}{null}]
| E2 [c .receiveIf {e 2 ; c .receiveWhile {e 2 }}{null}], h
Threads. The reduction rules for threads, shown in Fig. 6, are given modulo the stan-
dard structural equivalence rules of the π-calculus [23], written ≡. We define multi-step
def
reduction as: →→ = (−→ ∪ ≡)∗ .
When spawn{ e } is the active redex within an arbitrary evaluation context, the
thread body e becomes a new thread, and the original spawn expression is replaced by
null in the context.
Rule Connect describes the opening of sessions: if two threads require a session on
the same channel name c with dual session types, then a new fresh channel c is created
and added to the heap. The freshness of c guarantees privacy and bilinearity of the
session communication between the two threads. Finally, the two connect expressions
are replaced by their respective session bodies where the shared channel c has been
substituted by the live channel c .
Rule ComS gives simple session communication: value v is sent by one thread and
received by another. Rule ComSS formalises the act of delegating a session. One thread
awaits to receive a live channel, which will be bound to the variable x within the expres-
sion e , and another thread is ready to send such a channel. Notice that when the channel
is exchanged, the receiver spawns a new thread to handle the consumption of the dele-
Session Types for Object-Oriented Languages 337
gated session. This strategy is necessary in order to avoid deadlocks in the presence of
circular paths of session delegation; see Example 4.1.
In rules ComSIf-true and ComSIf-false, depending on the value of the boolean, ex-
ecution proceeds with either the first or the second branch. Rule CommSWhile simply
expresses the iteration by means of the conditional. This operation allows to repeat a se-
quence of actions within a single session, which is convenient when describing practical
communication protocols (see [8, 10]).
The following example justifies some aspects of our operational semantics.
Example 4.1. demonstrates the reason for the definition of rule ComSS which creates a
new thread out of the expression in which the sent channel replaces the channel variable.
A more natural and simpler formulation of this rule would avoid spawning a new thread:
E1 [c .receiveS (x ){e }] | E2 [c .sendS (c )], h −→ E1 [e [c /x ]] | E2 [null ], h
However, using the above version of the rule the threads P1 and P2 in the table below
reduce to
c 1 .send (5) ; c 1 .receive | null, h · c 1
where c 1 is the fresh live channel that replaced c 1 when the connection was established.
Notice that both ends of the session are in one thread, so the last configuration is stuck.
P1 P2
It is clear that P3 expects to receive first an integer and then a boolean via channel c ;
but P3 could communicate first with P1 and then with P2 (or vice versa) receiving two
338 M. Dezani-Ciancaglini et al.
1 connect c1 s 1 { 1 connect c1 s 1 {
2 connect c2 s 2 { 2 c1.receive;
3 c2.sendS(c1) }; 3 c1.send(3)
4 c1.receive } 4 }
P1 P2
1 connect c2 s 2 {
2 c2.receiveS(x){ x.send(4) }
3 }
P3
then, starting with a heap h, the above three threads in parallel reduce to:
c 1 .receive | c 1 .receive ; c 1 .send(3) | c 1 .send(4), h · c 1 · c 2
where c 1 and c 2 are the fresh live channels that replaced respectively c 1 and c 2 when
the sessions began. Clearly, this configuration violates the bilinearity condition.
We therefore need a notion of whether a live channel has been consumed, i.e., whether
it can still be used for the communication of values. There is no explicit user syntax for
consuming channels. Instead, channels are implicitly consumed 1) at the end of a con-
nection, 2) when they are sent over a channel, and 3) when they are used within spawn.
However, types distinguish consumed channels using the end suffix. Hence, when a live
channel is passed as parameter in a method call it can potentially become consumed.
In § 6.1 we show that P1 is type incorrect for any s 1 and s 2 .
Progress in M OOSE means that indefinite waiting may only happen at the point where
a connection is required, and in particular when the dual of a connect is missing. In
other words, there will never be a deadlock at the communication points. This can only
be guaranteed if the communications are always processed in a given order, i.e., if there
is no interleaving of sessions.
Example 5.3. demonstrates how session interleaving may cause deadlocks.
P1 P2
Session Types for Object-Oriented Languages 339
In the above example we have indefinite waiting after establishing the connection, be-
cause P1 cannot progress unless P2 reaches the statement c 1 .receive , and P2 cannot
progress unless P1 reaches the statement c 2 .receive , and so we have a deadlock at a
communication point. Note that nesting of sessions does not affect progress. Let us
consider the following processes:
P1 = connect c 1 begin.?int .end{c 1 .receive ; connect c 2 begin.!int .end{c 2 .send (5)}}
P2 = connect c 1 begin.!int.end{c 1 .send (3); connect c 2 begin.?int .end{c 2 .receive }}
P3 = connect c 1 begin.!int.end{connect c 2 begin.?int .end{c 2 .receive }; c 1 .send (3)}
Parallel execution of P1 and P2 does not cause deadlock, while parallel execution of P1
with P3 does, but it does so at the connection point for c 2 . However, such deadlock is
acceptable, since it would disappear if we placed a suitable connect in parallel.
In order to avoid interleaving at live channels, we require that within each “scope”
no more than one live channel can be used for communication; we call this the “hot
set.” The formal definition can be found in § 6. In § 6.1, we show that P1 and P2 are
type incorrect.
Example 5.4. demonstrates that allowing methods with multiple live channel parame-
ters may cause interleaving. Consider a method m of class C with two parameters x
and y both of type ?int and body x .receive ; y .receive . In this case the two threads P1
and P2 below produce a deadlock due to the interleaving of sessions.
P1 P2
In order to avoid problems like the above, we restrict the number of live channel para-
meters to at most one.
We argue that the above conditions on live channels are not that restrictive. First, we
can represent most of the communication protocols in the session types literature, as
well as traditional synchronisation [24, § 3], while at the same time ensuring progress.
Secondly, since these conditions are only essential for progress, if we remove hot sets
from typing judgements, we will obtain a more relaxed type system which allows dead-
lock on live channels, but still preserves type safety.
6 Type System
Each session type s starts with the keyword begin and has one or more endpoints,
denoted by end. Between the start and each ending point, a sequence of session parts
describe the communication protocol.
Session parts, ranged over π, represent communications and their sequencing; † is
a convenient abbreviation that ranges over {!, ?}. The types !t and ?t express respec-
tively the sending and reception of a value of type t , while !(ρ) and ?(ρ) represent the
exchange of a live channel, and therefore of an active session, with remaining commu-
nications determined by type ρ.
The conditional session part has the shape †π1 , π2 : when † is ! this type describes
sessions which send a boolean value and proceed with π1 if the value is true, or π2
if the value is false; when † is ? the behaviour is the same, except that the boolean
that determines the branch is to be received instead. The iterative session part †π∗
describes sessions that respectively send or receive a boolean value, and if that value is
true continue with π, iterating, while if the value is false, continue to following session
parts, if any. Session parts can be composed into sequences using ‘.’, hence forming
longer session parts inductively; note that we use ε for the empty sequence. A complete
session part is a session part concatenated either with end or with a conditional whose
branches in turn are both complete session parts. We use ρ to range over complete
session parts and η to range over both complete and incomplete session parts. Each
session type s has a corresponding dual, denoted s , which is obtained as follows:
– ? =! ! =?
– begin.ρ = begin.ρ
– π.end = π.end π. †ρ1 , ρ2 = π.†ρ1 , ρ2
– †t = †t †(ρ) = †(ρ) †π1 , π2 = †π1 , π2 †π∗ = †π∗ π1 .π2 = π1 .π2
Type System. We type expressions and threads with respect to a fixed class table, so
only the classes declared in this table are types. We could easily extend the syntax to
allow dynamic class creation, but this is orthogonal to session typing. We use the same
table to judge subtyping <: on class names: we assume the subtyping between classes
causes no cycle as in [21]. In addition, we have (s , s ) <: s and (s , s ) <: s , as in stan-
dard π-calculus channel subtyping rules [19]: a channel on which both communication
directions are allowed may also transmit data following only one of the two directions.
The typing judgements for threads have two environments, i.e., they have the shape:
Γ; Σ P : thread
where the standard environment Γ associates types to this, parameters and objects,
while the session environment Σ contains only judgements for live channels. These
environments are defined as follows, under the condition that no subject occurs twice.
When typing expressions we need also to take into account which is the unique (if
any) channel identifier currently used to communicate data. This is necessary in order
to avoid session interleaving. Therefore we record a third set, the hot set S , which
Session Types for Object-Oriented Languages 341
can be either empty or can contain a single channel identifier belonging to the session
environment. Thus the typing judgements for expressions have the shape:
Γ; Σ; S e : t
Meth
Γ; Σ0 ; S
e : C Γ; Σi ; S
e i : t i i ∈ {1 . . . n}
mtype(m ,C) = t 1 . . . t n → t
Γ; Σ0 .Σ1 . . . Σn ; S
e .m (e 1 . . . e n ) : t
MethLin
Γ; Σ0 ; {u}
e : C Γ; Σi ; {u}
e i : t i i ∈ {1 . . . n}
mtype(m ,C) = t 1 . . . t n , η → t
Γ; Σ0 .Σ1 . . . Σn .{u : η}; {u}
e .m (e 1 . . . e n , u ) : t
Expressions. We highlight the interesting typing rules for expressions in Fig. 7 and
Fig. 8. Looking at these rules two observations on hot sets are immediate:
– in all rules except Conn, ReceiveS and Weak the hot sets of all the premises and
of the conclusion coincide;
– in all rules whose conclusion is a session expression the hot set of the conclusion is
the subject of the session expression.
These two conditions ensure that all communications use the same live channel, i.e., that
sessions are not interleaved. In rule Conn the live channel becomes shared, and there-
fore in the conclusion the hot set is empty. Since u .receiveS (x ){e } in rule ReceiveS
receives along the live channel u a channel that will be replaced to x , the hot set of the
premise is {x } while that of the conclusion is {u }. Lastly, rule Weak allows to replace
an empty hot set by a set containing an arbitrary element of the domain of the session
environment.
The session environments of the conclusions are obtained from those of the premises
and possibly other session environments using the concatenation defined below.
The concatenation of two live channel types η and η is the unique live channel type (if
it exists) which prescribes all the communications of η followed by all those of η . The
342 M. Dezani-Ciancaglini et al.
Conn
/ 0/
u : begin.ρ Γ \ u ; Σ, u : ρ; {u}
e : t
Γ; 0;
Γ; Σ; 0/
connect u begin.ρ {e } : t
Send Receive
Γ; Σ; {u}
e : t Γ
ok
t : tp
Γ; Σ.{u :!t }; {u}
u .send ( e ) : Object Γ; {u : ?t }; {u}
u .receive : t
ReceiveS SendS
Γ \ x ; Σ, x : ρ; {x}
e : t closed(Σ) Γ
ok
ρ : tp
Γ; {u : ?( ρ)}.Σ; {u}
u .receiveS (x ){e } : Object Γ; {u : ρ, u : !( ρ)}; {u}
u .sendS (u ) : Object
ReceiveIf SendIf
Γ; Σ, u : ηi ; {u}
e i : t i ∈ {1, 2} Γ; Σ1 ; {u}
e : bool η =!η1 , η2
η =?η1 , η2 Γ; Σ2 , u : ηi ; {u}
e i : t i ∈ {1, 2}
Γ; Σ, u : η ; {u}
u .receiveIf {e 1 }{e 2 } : t Γ; Σ1 .{Σ2 , u : η }; {u}
u .sendIf (e ){e 1 }{e 2 } : t
ReceiveWhile SendWhile
Γ; {u : π}; {u}
e : t Γ; {u : π}; {u}
e : bool Γ; {u : π }; {u}
e : t
∗
Γ; {u :?π }; {u}
u .receiveWhile {e } : t Γ; {u : π.!π .π }; {u}
u .sendWhile (e ){e } : t
∗
closed(Σ) = ∀u : η ∈ Σ ∃ρ. η = ρ
Rule MethLin retrieves the type of the method m from the class table using the auxil-
iary function mtype(m,C). This rule expects the last actual parameter u to be a channel
identifier that will be used within the method body directly as if it was part of an open
session. Therefore the hot sets of all the premises and of the conclusion must be {u }.
The session environments of the premises are also concatenated with {u :η} which rep-
resents the communication protocol of the live channel u during the execution of the
method body.
Rule Conn ensures that a session body properly uses its unique channel according to
the required session type. The first premise says that the channel identifier used for the
session (u ) can be typed with the appropriate shared session type (begin.ρ). The second
premise ensures that the session body can be typed in the restricted environment Γ \ u
with a session environment containing u : ρ and with hot set {u }.
Session Types for Object-Oriented Languages 343
Rule M-ok checks that a method that does not have live channel parameters is well-
formed, by type-checking its body and succeeding with both an empty session envi-
ronment and an empty hot set, i.e., it ensures that no channel can be used outside the
scope of a session within the method body. Rule MLin-ok performs the same check,
but requires that the last parameter is a live channel which is the element of the hot set
in the typing of the method body.
Thread. In the typing rules for threads, we need to take into account that the same
channel can occur with dual types in the session environments of two premises. For this
reason we compose the session environments of premises using the parallel composi-
tion defined below.
– η||η = if η = η otherwise η||η = ⊥; and ||η = η|| = || = ⊥.
– Σ||Σ = Σ \ dom(Σ ) ∪ Σ \ dom(Σ) ∪ {u : Σ(u ) || Σ (u ) | u ∈ dom(Σ) ∩ dom(Σ )}
Using the above operator, the typing rules for processes are straightforward. Rule Start
promotes an expression to the thread level; and rule Par types a composition of threads
if the composition of their session environments is defined.
Start Par
Γ; Σ; S
e : t Γ; Σ
P : thread Γ; Σ
P : thread
ConnI
Γ
e : t Σ; S Σ((u )) = η s = begin.σ(η ↓) u ∈ dom(Γ) Γ = Γ if u name else Γ, u : s
Γ
connect u s {e } : σ(t ) σ(Σ) \ u ; 0/
ReceiveI ReceiveSI
Γ
ok Γ
e : t Σ; S x ∈ Γ S ⊆ {x } Σ((x )) = η
Γ
u .receive : φ {u : ?φ}; {u } Γ
u .receiveS (x ){e } : Object {u : ?( η ↓)}.Σ ↓; {u }
SendSI
Γ
ok
Γ
u .sendS(u ) : Object {u : ψ.end, u : !( ψ.end)}; {u }
6.2 Inference of Session Environments, Hot Sets, and Session Types for connect
Although the type system is flexible enough to express interesting protocols, typing
as described so far is somewhat inconvenient, in that it requires a) the hot sets, and the
session environments to be assumed (or “guessed”), and b) the session types to be stated
explicitly for connect expressions.
To address (a), in this section, we develop inference rules for expressions and threads
which have the shape
Γ e : t Σ; S and Γ P : thread Σ
and which express that session environments and hot sets are derived rather than as-
sumed. Based on these rules, at the end of the section, we address (b) and show how
session types can be inferred for connect expressions.
Fig. 9 gives some of the inference rules. The rules are applicable only if all sets in the
conclusion are defined. We extend the syntax of types with type variables, ranged over
by φ, which stand for types, and part of session type variables, ranged over by ψ, which
stand for part of session types. Rule ReceiveI introduces φ, since we do not know the
type of the data that will be received. Rule SendSI introduces ψ, since we do not know
the type of the channel that will be sent.
As usual, the inference rules are structural, i.e., depend on the structure of the ex-
pression being typed; typically, the inference system does not have rules like Weak.
Therefore, the inference rules must play also the role of the non-structural type rules.
Because in rule ConnI we do not know if the session environment inferred for e
contains a premise for u , we define:
order to unify the session type s with begin.η where η↓ being inferred may contain
variables. That is, we require s = begin.σ(η↓).
The following proposition states that inference computes the least session environ-
ments and hot sets.
Note that the inference of Σ does not rely on S so that we can obtain the same result for
the system without S .
0/
5 : int 0;
/ 0/
0/
x .send (5) : Object {x :!int }; {x }
0/
c 2 .receiveS (x ){x .send (5)} : Object {c 2 :?(!int .end)}; {c 2 }
0/
e : Object 0;
/ 0/ 0/
e : φ {c 1 :?φ}; {c 1 }
0/
e ; e : φ {c 1 :?φ}; {c 1 }
0/
connect c 1 begin.?int .end{e ; e } : int 0;
/ 0/
As an example we show the inference for the thread P1 of Example 4.1 in Fig. 10.
We can now address (b), i.e., the inference of session types in connect expressions.
This requires to modify the syntax by dropping the session types in the connect expres-
sions. It is enough to modify the inference rule for connect avoiding to use the inference
substitution for obtaining the required session types. Thus, the new inference rule is:
ConnI
Γ
e : t Σ; S Σ((u )) = η u ∈ dom(Γ) Γ = Γ if u is a name else Γ, u : s
Γ
connect u {e } : t Σ \ u ; 0/
With this rule, users would not need to declare session types explicitly in connect; for
example, they could write connect c {c .send(true); c .send (false)} instead of writing
connect c begin.!bool.!bool.end {c .send (true); c .send (false)}.
Since explicit declarations are useful for program documentation, the inclusion of
type inference for connect should be up to the individual language designer.
346 M. Dezani-Ciancaglini et al.
These properties hold for a thread obtained by reducing a well-typed closed thread in
which all expressions are user expressions. We write ∏0≤i <n e i for e 0 | e 1 | ... | e n−1 .
/ 0/
P : thread is derivable and P ≡ ∏0≤i <n e i where e i
We say a thread P is initial if 0;
is a user expression. Notice that this implies 0; / 0/
P; 0.
/ For stating P1, we add a new
constant CommErr (communication error) to the syntax and the following rule:
E1 [e ] | E2 [e ] −→ CommErr
if e and e are session expressions with the same subject and they are not dual of each
other. We can now prove that we never reach a state containing such incompatible ex-
pressions.
The proof of the above theorem is straightforward from the subject reduction theorem.
Next we show that the progress property P2 holds in our typing system.
Theorem 7.5 (Progress). Assume P0 is initial and P0 , 0/ →→ P, h. Then one of the fol-
lowing holds.
The key in showing progress is the natural correspondence between irreducible session
expressions and parts of session types formalised in the following definition.
Definition 7.6. Define ∝ between irreducible session expressions and parts of session
types as follows:
Notice, that the relation e ∝ π reflects the “shape” of the session, rather than the precise
types involved. For example, e ∝?t implies e ∝?t for any type t .
Using the generation lemmas and Lemma 7.3 we can show the correspondence be-
tween an irreducible session expression inside an evaluation context and the type of the
live channel which is the subject of the expression.
Lemma 7.7. Let e be an irreducible session expression with subject c and Γ; Σ
E[e ]:
thread . Then Σ(c ) = π.η with e ∝ π.
348 M. Dezani-Ciancaglini et al.
The proof of Theorem 7.5 argues that if the configuration does not contain waiting
connects or null pointer errors, but contains an irreducible session expression e 1 , then
by subject reduction and well-formedness of the session environment, the rest of the
thread independently moves or it contains the dual of that irreducible expression, e 2 .
Then by Lemma 7.7, we get e 1 ∝ π and e 2 ∝ π. Therefore e 1 and e 2 are dual of each
other and can communicate.
Note that Theorem 7.5 shows that threads can always communicate at live channels.
From the above theorem, immediately we get:
Corollary 7.8 (Completion of Sessions). Assume P0 is initial and P0 , 0/ →→ P, h. Sup-
pose P ≡ ∏0≤i <n e i and irreducible. Then either all e i are values (0 ≤ i < n ) or there
is some j (0 ≤ j < n ) such that e j ∈ {NullExc, E[connect c s {e }]}.
Finally we state the main property (P3) of our typing system. For this purpose, we define
the partial order on live channel types as the smallest partial order such that: η π.η;
πi .η †π1 , π2 .η (i ∈ {1, 2}); ρi †ρ1 , ρ2 (i ∈ {1, 2}); and †π.π∗ , ε.η π∗ .η.
This partial order takes into account reduction as formalised in the following the-
orem: any configuration E[e 0 ] | Q, h reachable from the initial configuration and con-
taining the irreducible session expression e 0 , if it proceeds, then either (1) it does so
in the sub-thread Q, or (2) Q contains the dual expression e 0 , which (a) interacts with
e 0 , and (b) has a dual type at c (and therefore, through application of Lemma 7.7 the
two expressions conform to the “shape” of their type, i.e., η = π.η0 with e 0 ∝ π and
e 0 ∝ π), and (c) then the type of channel c “correctly shrinks” as η η.
Theorem 7.9 (Communication-Order Preservation). Let P0 be initial. Assume that
P0 , 0/ →→ E[e 0 ] | Q, h −→ P , h where e 0 is an irreducible session expression with sub-
ject c . Then:
1. P ≡ E[e 0 ] | Q , or
2. Q ≡ E [e 0 ] | R with e 0 dual of e 0 and
(a) E[e 0 ] | E [e 0 ] | R, h −→ e | e | R , h ;
(b) Γ; Σ, c : η
E[e 0 ] : thread and Γ; Σ , c : η
E [e 0 ] : thread ; and
(c) Γ; Σ̂, c : η
e : thread and Γ; Σ̂ , c : η
e : thread with η η.
8 Related Work
Linear typing systems. Session types for the π-calculus originate from linear typing
systems [19, 22], whose main aim is to guarantee that a channel is used exactly or at
most once within a term.
In the context of programming languages, [12] proposes a type system for checking
protocols and resource usage in order to enforce linearity of variables in the presence of
aliasing. They implemented the typing system in Vault [9], a low level C-like language.
The main issue that they had to address is that a shared component should not refer
to linear components, since aliasing of the shared component can result in non-linear
usage of any linear elements to which it provides access. To relax this condition, they
proposed operations for safe sharing, and for controlled linear usage. In our system non-
interference is ensured by operational semantics in which substitution of shared fresh
Session Types for Object-Oriented Languages 349
channels takes place when reducing connect , and therefore we do not need explicit
constructs for this purpose. Finally, note that the system of [12] is not readily applicable
in a concurrent setting, and hence in channel-based communication.
Programming languages and sessions. In [29] the authors extend previous work [15],
and define a concurrent functional language with session primitives. Their language
supports sending of channels and higher-order values, and incorporates branching and
selection, along with recursive sessions and channel sharing. Moreover, it incorporates
the multi-threading primitive fork, whose operational semantics is similar to that of
spawn. Finally, their system allows live channels as parameters to functions, and tracks
aliasing of channels; as a result, their system is polymorphic.
In [27], the authors formalise an extension to CORBA interfaces based on session
types, which are used to determine the order in which available operations can be in-
voked. The authors define protocols consisting of sessions, and use labelled branches
and selection to model method invocation within each session. Labelled branches are
also used to denote exceptions, and their system incorporates recursive session types.
However, run-time checks are considered in order to check protocol conformance, and
there is no formalisation in terms of operational semantics and type system.
We developed our formalism building upon previous experience with Ldoos [10],
a distributed object-oriented language with basic session capabilities. In the present
work we have chosen to simplify the substrate to that of a concurrent calculus, and
focus on the integration of advanced session types. In [10], shared channels could only
be associated with a single session type each, and therefore runtime checks were not
required for connections; however, this assumption is not necessary, and we preferred
to compromise such superficial type-checking — the essence of our system is in typing
a session body against the session type.
In our new formulation we chose not to model RMI, and in fact, an interesting ques-
tion is whether we can encode RMI as a form of degenerate session in the spirit of [27].
Also, we have now introduced more powerful primitives for thread and (shared) chan-
nel creation, along with the ability to delegate live sessions via method invocation and
higher-order sessions. None of these features are considered in [10]. We discovered a
flaw in the progress theorem in Ldoos [10], and developed the new type system with hot
sets in order to guard against the offending configurations.
Subject Reduction and Progress. In all previously mentioned papers on session types,
typability guarantees absence of run-time communication errors. However, not all of
them have the subject reduction property: the problem emerges when sending a live
channel to a thread which already uses this channel to communicate, as in Example 4.1.
This example can be translated into the calculi studied in [6, 14, 20, 29], and this issue
has been discussed with some of their authors [2].
M OOSE has been inspired by the previously mentioned papers, however, we believe
that it is the only calculus which guarantees absence of starvation on live channels. For
example, we can encode the counterpart of Example 5.3 in the calculi of [6, 14, 20, 29].
More details on these two issues can be found in [1].
Note that we can flexibly obtain a version of the typing system which preserves the
type safety and type inference results, but allows deadlock on live channels like the
350 M. Dezani-Ciancaglini et al.
above mentioned literature, by simply dropping the hot set. In this sense, our system is
not only theoretically sound, but also modular.
This paper proposes the language M OOSE, a simple multi-threaded object-oriented lan-
guage augmented with session communication primitives and types. M OOSE provides
a clean object-oriented programming style for structural interaction protocols by pre-
scribing channel usages as session types. We develop a typing system for M OOSE and
prove type safety with respect to the operational semantics. We also show that in a
well-typed M OOSE program, there will never be a communication error, starvation on
live channels, nor an incorrect completion between two party interactions. These results
demonstrate that a consistent integration of object-oriented language features and ses-
sion types is possible where well-typedness can guarantee the consistent composition
of communication protocols. To our best knowledge, M OOSE is the first application of
session types to a concurrent object-oriented class-based programming language. Fur-
thermore, type inference on session environments (Proposition 6.1), and the progress
property on live channels (Theorem 7.5) have never been proved in any work on ses-
sion types including those in the π-calculus.
Advanced session types. An issue that arises with the use of sessions is how to group
and distinguish different behaviours within a program or protocol. In [20] and subse-
quently in [29] the authors utilise labelled branching and selection; the first enables a
process to offer alternative session paths indexed by labels, and the second is used du-
ally to choose a path by selecting one of the available labels. In [13, 17, 20, 28], branch-
ing and selection are considered as an effective way to simulate methods of objects.
Several advancements have been made, ranging from simple session subtyping [13] to
more complex bounded session polymorphism [17], which corresponds to parametric
polymorphism within session types. Our conditional constructs are a simplification of
branching and selection, therefore the same behaviour realised by branching types can
also be expressed using our types.
As another study on the enrichment of basic session types, in [6] the authors integrate
the correspondence assertions of [16] with standard session types to reason about multi-
party protocols comprising of standard interleaved sessions.
In this work, our purpose was to produce a reliable and extensible object-oriented
core, and not to include everything in the first attempt; however, such richer type struc-
tures are attractive in an object-oriented framework. M OOSE can be used as a core
extensible language incorporating other typing systems.
We plan to study transformations from methods with more than one live channel
parameters to methods with only one live channel parameter; and from interleaved ses-
sions to non-interleaved ones for investigating expressiveness of our type system.
Exceptions, timeout and implementation. Another feature not considered in our sys-
tem, although important in practice, is exceptions; in particular, we did not provide any
way for a session type to declare that it may throw a checked exception, so that when this
Session Types for Object-Oriented Languages 351
occurs both communicating processes can execute predefined error-handling code. One
obvious way to encode an exception would be to use a branch as in [27]. In addition,
when a thread becomes blocked waiting for a session to commence, in our operational
semantics, it will never escape the waiting state unless a connection occurs. In prac-
tice, this is unrealistic, but it could have been ameliorated by introducing a ‘timeout’
version of our basic connection primitive such as connect(timeout)u s {e}. However,
controlling exceptions during session communication and realising timeout would be
non-trivial if we wish to preserve the progress property on live channels.
Finally, we are considering a prototype implementation using source to source trans-
lation from M OOSE to Java code. Firstly, the current notation for session types is conve-
nient for our calculus, but sessions can be long and complex in large programs, making
the types difficult to understand. We are developing an equivalent but more scalable
way to describe sessions, using an alternative notation in which sessions are declared as
nominal interface-like types. Other interesting issues are the choice of a suitable run-
time representation for both shared and linear channels, the ability to detect and control
implicit multi-threading, and the efficient implementation of higher-order sessions.
References
1. A full version of this paper. http://www.doc.ic.ac.uk/˜dm04.
2. Personal communication by e-mails between the authors of [6, 7, 13, 15, 18, 20, 26, 27, 29].
3. Conversation with Steve Ross-Talbot. ACM Queue, 4(2), 2006.
4. A. Ahern and N. Yoshida. Formalising Java RMI with Explicit Code Mobility. In OOP-
SLA ’05, pages 403–422. ACM Press, 2005.
5. G. Bierman, M. Parkinson, and A. Pitts. MJ: An Imperative Core Calculus for Java and Java
with Effects. Technical Report 563, Univ. of Cambridge Computer Laboratory, 2003.
6. E. Bonelli, A. Compagnoni, and E. Gunter. Correspondence Assertions for Process Synchro-
nization in Concurrent Communications. J. of Funct. Progr., 15(2):219–248, 2005.
7. E. Bonelli, A. Compagnoni, and E. Gunter. Typechecking Safe Process Synchronization. In
FGUC 2004, volume 138 of ENTCS, pages 3–22. Elsevier, 2005.
8. M. Carbone, K. Honda, and N. Yoshida. A Theoretical Basis of Communication-centered
Concurrent Programming. Web Services Choreography Working Group mailing list, to ap-
pear as a WS-CDL working report.
9. R. DeLine and M. Fahndrich. Enforcing High-Level Protocols in Low-Level Software. In
PLDI ’01, volume 36(5) of SIGPLAN Notices, pages 59–69. ACM Press, 2001.
10. M. Dezani-Ciancaglini, N. Yoshida, A. Ahern, and S. Drossopoulou. A Distributed Object
Oriented Language with Session Types. In TGC’05, volume 3705 of LNCS, pages 299–318.
Springer-Verlag, 2005.
11. S. Drossopoulou. Advanced issues in object oriented languages course notes.
http://www.doc.ic.ac.uk/˜scd/Teaching/AdvOO.html.
352 M. Dezani-Ciancaglini et al.
12. M. Fahndrich and R. DeLine. Adoption and focus: practical linear types for imperative
programming. In PLDI ’02, pages 13–24. ACM Press, 2002.
13. S. Gay and M. Hole. Types and Subtypes for Client-Server Interactions. In ESOP’99, volume
1576 of LNCS, pages 74–90. Springer-Verlag, 1999.
14. S. Gay and M. Hole. Subtyping for Session Types in the Pi-Calculus. Acta Informatica,
42(2/3):191–225, 2005.
15. S. Gay, V. T. Vasconcelos, and A. Ravara. Session Types for Inter-Process Communication.
TR 2003–133, Department of Computing, University of Glasgow, 2003.
16. A. D. Gordon and A. Jeffrey. Typing Correspondence Assertions for Communication Proto-
cols. In MFPS’01, volume 45 of ENTCS, pages 379–409. Elsevier, 2001.
17. M. Hole and S. J. Gay. Bounded Polymorphism in Session Types. Technical Report TR-
2003-132, Department of Computing Science, University of Glasgow, 2003.
18. K. Honda. Types for Dyadic Interaction. In CONCUR’93, volume 715 of LNCS, pages
509–523. Springer-Verlag, 1993.
19. K. Honda. Composing Processes. In POPL’96, pages 344–357. ACM Press, 1996.
20. K. Honda, V. T. Vasconcelos, and M. Kubo. Language Primitives and Type Disciplines for
Structured Communication-based Programming. In ESOP’98, volume 1381 of LNCS, pages
22–138. Springer-Verlag, 1998.
21. A. Igarashi, B. C. Pierce, and P. Wadler. Featherweight Java: a Minimal Core Calculus for
Java and GJ. ACM TOPLAS, 23(3):396–450, 2001.
22. N. Kobayashi, B. C. Pierce, and D. N. Turner. Linearity and the Pi-Calculus. ACM TOPLAS,
21(5):914–947, Sept. 1999.
23. R. Milner, J. Parrow, and D. Walker. A Calculus of Mobile Processes, Parts I and II. Infor-
mation and Computation, 100(1), 1992.
24. D. Mostrous. Moose: a Minimal Object Oriented Language with Session Types. Master’s
thesis, Imperial College London, 2005.
25. B. C. Pierce. Types and Programming Languages. MIT Press, 2002.
26. K. Takeuchi, K. Honda, and M. Kubo. An Interaction-based Language and its Typing System.
In PARLE’94, volume 817 of LNCS, pages 398–413. Springer-Verlag, 1994.
27. A. Vallecillo, V. T. Vasconcelos, and A. Ravara. Typing the Behavior of Objects and Com-
ponents using Session Types. In FOCLASA’02, volume 68(3) of ENTCS. Elsevier, 2002.
28. V. Vasconcelos. Typed Concurrent Objects. In ECOOP’94, volume 821 of LNCS, pages
100–117. Springer-Verlag, 1994.
29. V. T. Vasconcelos, A. Ravara, and S. Gay. Session Types for Functional Multithreading. In
CONCUR’04, volume 3170 of LNCS, pages 497–511. Springer-Verlag, 2004.
30. Web Services Choreography Working Group. Web Services Choreography Description Lan-
guage. http://www.w3.org/2002/ws/chor/.
Parameterized Modules for Classes and
Extensible Functions
University of Washington
Department of Computer Science and Engineering
Box 352350, Seattle WA 98195-2350, USA
{klee, chambers}@cs.washington.edu
1 Introduction
package Lang;
abstract class Expr {
Expr() {}
abstract Int eval(); }
In the classic “expression problem” [35, 41], one wishes to add both new types
of Expr and new functions over Expr types. In object-oriented languages, one
can straightforwardly do the former without changing the original code:
package ConstPackage;
class Const extends Expr {
Int value;
Const(Int v_in) {value=v_in;}
Int eval() {return value;} }
However, to add a new dispatching function for Expr — say, print — we must
invasively alter the original code:
abstract class Expr { ... // as before
abstract String print(); }
class Const extends Expr { ... // as before
String print() { return value.toString(); } }
Traditional functional languages have the converse problem: adding new func-
tions is easy, but adding new cases to data types requires invasive changes, either
to the original source code, or to existing clients.
Previous work on Eml [28] and related languages [27, 10] integrates both
object-oriented and functional extensibility in a single unified framework. These
languages include extensible class hierarchies and method overriding (as in tra-
ditional object-oriented languages), while also allowing functions to be added
externally to classes, and to dynamically dispatch on any subset of their argu-
ments (as in traditional functional languages). In Eml, we would write:
module Lang = { abstract class Expr() of {}
abstract fun eval:Expr -> Int }
It is straightforward to add new data types:
module ConstMod uses Lang = {
class Const(v_in:Int) extends Lang.Expr() of {value:Int = v_in}
extend fun Lang.eval(Const {value=v}) = v }
Note that extends adds a new subclass to an existing class, and extend fun
adds a new (pattern-matching) case to an existing function.
It is also straightforward to add new functions:
module PrintMod uses Lang, ConstMod = {
fun print:Lang.Expr -> String
extend fun print(Lang.Expr) = ""
extend fun print(ConstMod.Const {value=v})= Std.intToString(v) }
Eml therefore supports both data type and function extensibility (with some
restrictions, which is why print has a default case for Expr — see Section 3.2).
Now, we would like it to support code reuse as well. Suppose the interpreter
code base had many features — i.e., expression types, and functions over those
types — and we wished to combine various subsets to produce a product line [24]
of interpreters. In this case, we would like to define a feature once, typecheck it
once, and reuse it to extend several interpreter instances.
Like ML [23, 30, 19, 11], Eml supports functors, or parameterized modules:
signature LangSig = sig { abstract class Expr() of {}
abstract fun eval:Expr -> Int }
Parameterized Modules for Classes and Extensible Functions 355
MakePlus defines a function over modules; it can be applied to any module that
implements LangSig, to produce a module containing a (freshly minted) class
Plus and its eval implementation.
Note that Plus inherits from L.Expr, a class provided by the module para-
meter. In principle, such parameterization supports and subsumes many useful
idioms, including mixins [5, 17] (Plus is a mixin), mixin layers [37] (which apply
mixins to multiple classes at once), and certain aspect-oriented extensions [21]
that extend members of multiple base modules.
However, limitations in Eml prevent it from realizing this potential:
– Eml functors are sensitive to the names of classes and functions in their
arguments. In our example, MakePlus could only be applied to modules with
a class named Expr. However, a truly reusable functor should be insensitive
to inessential details like class names — other mixin systems, for example,
do not constrain the names of classes with which a mixin may be composed.
– Eml’s signature language could only specify direct subclassing relations in
functor arguments. Therefore, for example, it would be impossible to write
an Eml functor that extended a transitive subclass of Expr.
– Eml included no useful form of signature subsumption. Therefore, for ex-
ample, a module that provided all the features of Lang, plus some extra
declarations, would be incompatible with LangSig.
In combination, these limitations meant that Eml functors were not truly
reusable. The contributions of the present work are as follows:
Signatures
Sd ::= signature S uses M = Se
Se ::= sig { Sb } | (M : Se) -> Se | S
Sb ::= [abstract] class c(τ ) [extends C] of {L : τ }
| fun f : τ # -> τ
| extend fun F τ
| val x : τ
Qualified names, identifiers
M̂ ::= M | ThisMod C ::= M̂ .c F ::= M̂ .f L ::= M̂ .l
S, M, f, c, l, x ::= identifier
which construct a fresh value of class C by invoking its constructor with the
argument tuple (e); named function references F ; message sends e e , which
apply e to e ; local pattern-bound variables x; or val-bound variables M̂ .x.
At runtime, a message send dispatches to the globally most-specific case among
all method cases that have been defined for the invoked function. The specificity
relation between method cases is defined by the subtyping relation between the
patterns in their arguments (Section 3.1 gives a formal description of the dispatch
semantics). The dynamic semantics of dispatch give no priority to any particular
position in (the abstract syntax tree of) a method’s argument pattern — i.e.,
dispatching is symmetric.
Now, recall that we would like to reuse this extension in many contexts. How-
ever, consider the following reasonable definition of an “algebra”. First, use Lang
and PlusMod as defined in Section 1; finally, define a third module:
360 K. Lee and C. Chambers
Fig. 4. Revision of Algebra from Fig. 2, and a module satisfying this signature
If we use this revised Algebra, then both the functor definition MakeDist and
the functor application MakeDist(LangAlgebra) will typecheck. Our fix uses all
Parameterized Modules for Classes and Extensible Functions 363
τ # ::= (τ , τ # , τ ) | #C {}
e ::= (e) | C {L = e} | C (e) | F | e e | x
v ::= (v) | C {L = v} | F
Method names (bare and qualified); qualified names; fresh names
q ::= identifier Q ::= M̂ .q Y ::= C | F | Q φ ::= Y
field types, whereas τ P (the type of a pattern) may include more precise infor-
mation about fields. Restricting the type syntax in this manner simplifies our
proof strategy, while still requiring us to deal with the essence of the ambiguity
and incompleteness problems arising from extensibility and multiple dispatch.
Sixth, lists of fresh names φ are fully qualified, and may include method names.
Lastly, we include instances C {L = e} in the grammar of expressions; these are
not available at source level, but arise when defining small-step reduction.
The challenge in designing a type system that is both useful and sound arises
from the combination of F(Eml)’s uniform, symmetric dispatching model and its
powerful extensibility constructs. In Section 3.1, we elaborate on the dynamic se-
mantics of dispatching, focusing on how evaluation can go wrong. In Section 3.2,
we describe typechecking. Section 3.3 states the soundness theorems. The full
formalization of F(Eml) will appear in our companion report [22].
(Link-Empty) Δ Md ⇓∗ Δ
Δ ⇓∗ Δ
Δ Md ⇓∗ Δ Δ M = Me ⇓ Mv
(Link-Mod)
Δ Md; (module M uses M = Me) ⇓∗ Δ , M → Mv
Δ M = Me ⇓ Mv
Δ M = (M : Sv) -> Me ⇓ (M : Sv) -> Me
(L-Funct)
Δ(M ) = (M1 : Sv) -> Me1 Me = [M1 → M ]Me1 Δ M0 = Me ⇓ Mv
Δ M0 = M (M ) ⇓ Mv
(L-App)
alias, it (transitively) “chases aliases” until it finds a fresh declaration, and re-
places the reference to the alias with a reference to that fresh declaration source.
Third, for structures, linkage rewrites self-references via ThisMod to refer to the
module’s linked name.
Fig. 7 gives the (small-step, operational) semantics of core expression eval-
uation and auxiliary judgments. Execution uses the dynamic context Δ, but
otherwise these rules are exactly analogous to those for Eml [28]. We include
fairly complete rules here for reference, but we will only discuss those parts
absolutely necessary to explain the typechecking problems that follow.
Note that some syntactic sequences with an overbar have a superscripted
range, e.g. v 1..n ; this is shorthand for v1 , . . . , vn . We use the set membership
operator ∈ on syntactic sequences, e.g. Mb ∈ Mb indicates that the Mb is an
element of the sequence Mb. We write Mb ∈ Δ(M ) as shorthand for (Δ(M ) =
{ Mb }) ∧ (Mb ∈ Mb). We use a long double arrow =⇒ for logical implication,
to distinguish it from → (for small-step evaluation) and ⇒ (for signature gener-
ation, which we will see in in Section 3.2). Superscripted brackets [ ]k around a
part of the rule indicate that those parts are optional, but either all bracketed
portions superscripted with the same k must be present, or all must be absent.
Evaluation uses the dynamic subpattern and subclass relations, which are
given in Fig. 7. Note that these judgments are entirely distinct from the static
relation deduction that we describe later.
Evaluation can get stuck in two cases. First, the program could attempt to
construct a class marked abstract; call this an abstract instantiation error. Sec-
ond, the program could send a message for which Δ
lookup(F, v) = q, B, e is
not derivable, which can occur in two ways. Informally, the premises of Lookup
specify that there must exist some (fresh) method in Δ such that (1) its pattern
P matches the argument value, and (2) P is strictly more specific than the pat-
terns of all other matching methods in Δ. Therefore, this rule can fail either if
there are zero applicable methods, or if there are multiple applicable methods,
none of which is strictly more specific than all the others. The former case is a
message not understood error; the latter case is an ambiguous message error.
3.2 Typechecking
Δ e1 → e2 Δ e1 → e2 Δ e → e
(E-App-L) (E-App-R)
Δ e1 e → e2 e Δ v e1 → v e2
Δ concrete(C) Δ e1 → e2
Δ rep(C (v)) = {L = e } (E-Rep)
(E-New) Δ C {L = v, L = e1 , L = e }
Δ C (v) → C {L = e } → C {L = v, L = e2 , L = e }
Δ e1 → e2
Δ lookup(F, v) = q, B, e
Δ (v, e1 , e ) → (v, e2 , e ) (E-App-Red)
Δ F v → [B]e
(E-Tuple)
Δ concrete(C)
(class c [extends ] of { }) ∈ Δ(M )
(Concrete)
Δ concrete(M.c)
Δ match(P, v) = B Δ lookup(F, v) = q, B, e
(extend fun F with q P = e) ∈ Δ(M )
(∀M ∈ dom(Δ).∀(extend fun F with q P = e ) ∈ Δ(M ).
((Δ match(P , v) = B ) ∧ (M.q = M .q )) =⇒ ((Δ P ≤ P ) ∧ ¬(Δ P ≤ P )))
Δ lookup(F, v) = q, B, e
(Lookup)
Δ match(P, v) = B
Δ match(P, v) = B
(Match-Bind) (Match-Wild)
Δ match(x as P, v) = x → v, B Δ match( , v) =
∀n
i=1 .(Δ match(Pi , vi ) = Bi )
1..n
(Match-Tuple)
Δ match((P ), (v 1..n )) = ∪n
1B
Δ P ≤ P
Δ P ≤ P Δ P ≤ P
Δ (x as P ) ≤ P Δ P ≤ (x as P ) ΔP ≤
(PSub-Bind-L) (PSub-Bind-R) (PSub-Wild)
∀n
i=1 .Δ Pi ≤ Pi ΔC≤C
∀n
1 i.Δ Pi ≤ Pi
1..n 1..n 1..n 1..n
Δ (P ) ≤ (P ) Δ C {L = P , L = P } ≤ C {L = P }
(PSub-Tuple) (PSub-Class)
Γ, D Md ⇒∗ Γ , D Program typechecking
Γ, D Md ⇒ M : Sv, M
Module declaration typechecking
Γ ; M M e : Sv Module principal signatures
Γ, M Sv OK arg OK functor argument signature
declRels(Mb) = K Relation context formation
Γ, K, Mb
Mb : Sb Signature of a module body decl
Γ, K, Mb
Y : M̂ , Sb
Lookup or compute sig for a name
Γ, M Mb OK in Sb Module body decl well-formedness
Γ Sv ≤ Sv Signature subsumption
Γ, K, Sb Sb ≤ Sb Sig body decl subsumption
Γ, K, Sb Sb droppable Sig body width subsumption
K C1 RC C2 Class relation deduction
K F1 RF F2 Function relation deduction
K Q1 RQ Q2 Method relation deduction
K τ1 Rτ τ2 Type relation deduction
Γ, K, β e : τ Expression typing
K, R ptype(P, τ ) = τ P , β
Type and bindings of a pattern
Γ, Mb
rep(C) = {L : τ } Class representation lookup
and D with a left-to-right fold on the module declaration list), so we skip directly
to the “meat” of module expression typechecking, shown in Fig. 9. DN (Mb) is
an auxiliary function that extracts the set of class, function, and method names
introduced in Mb. There are three cases for module expression typechecking:
structures, functors, and functor applications.
For structures, informally, the premises of Mod-Struct specify that: (line
1) the module’s declared names must be unique; (line 2) we extract a “relation
context” K = φ, ρ from the members Mb, and a principal signature can be gen-
erated for Mb; (lines 3-4) in the context enriched by the relation and declaration
signatures, each Mb is well-formed.
For functors, we typecheck the body in the context extended with the for-
mal argument’s signature. Informally, the Sv OK arg judgment checks that the
fresh φ clause in Sv is empty, since declarations in functor arguments are
never fresh (declarations in a functor formal argument are always potentially
aliases).
For functor applications, we check that an alias of the actual argument’s
signature would be subsumed by the formal argument signature. (Informally, the
aliasOf function, whose definition we omit, erases freshness information and adds
equality relations between declarations in the actual and formal parameters.) We
then substitute the actual argument name for the formal name in the signature
body. Notice that we do not need to typecheck the functor body again.
Recall the major technical innovations that F(Eml) adds relative to Eml:
generalized relations, alias declarations, and a non-trivial definition of signature
subsumption. Before describing the mechanics of these features, we must first
show how signatures are constructed, and summarize certain implementation
restrictions inherited from Eml; we do this in the next two subsections. Then,
Parameterized Modules for Classes and Extensible Functions 369
1..(i−1) (i+1)..n
∀n
1 i.DN(Mbi ) ∩ DN(Mb ; Mb )=∅ Γ ; M Me : Sv
1..n
declRels(Mb ) = φ, ρ
∀n1 i.Γ, φ, ρ
, Mb
Mbi : Sbi
1..n
Γ = Γ, ThisMod → (sig { Sb fresh φ where ρ })
1..n 1..n
∀Mbi ∈ Mb .Γ Mbi OK in Sb
1..n 1..n
(Mod-Struct)
Γ ; M { Mb } : sig { Sb fresh φ where ρ }
Γ ; M Sv OK arg
(Γ, M → [ThisMod → M ]Sv); (M , M ) Me : Sv
(Mod-Funct)
Γ ; M ((M : Sv) -> Me) : ((M : Sv) -> Sv )
Γ (M1 ) = (M : Sv1 ) -> Sv1 Γ (M2 ) = Sv2
Γ aliasOf(Sv2 , M2 ) ≤ Sv1
(Mod-App)
Γ ; M M1 (M2 ) : [M → M2 ]Sv1
Building Signatures. Fig. 10 shows selected rules for generating the signatures
of module body declarations, and the extraction of initial relation information:
fresh declarations generate an element of φ; alias declarations generate equality
relations; and a subclass generates a direct subclassing (<1 ) relation.
Function signatures (S-Fun) are trivial; the auxiliary function unmark(τ # ),
whose definition we omit, simply erases the hash mark from a marked type.
To generate a method signature (S-Method), we first compute a finite map
R from all visible class names C to representation types {L : τ } (informally, the
reps function iterates over all classes in Γ and Mb, and builds the mapping by
accumulating field lists). Then, we compute the type of the argument pattern.
Lastly, we sanity-check that the function to be extended exists. Note that this
last check uses the judgment for signature lookup or computation from Fig. 8;
this looks either in the global context Γ for the signature, or computes the
signature from Mb if it refers to a locally defined name.
Signatures for fresh class declarations are more involved. The premises of
S-Class and S-Abs-Class compute the class’s representation and abstract
functions. Representation computation involves looking up the superclass rep-
resentation (if a superclass is declared) and “copying it down” into the current
class’s signature. Abstract function computation involves looking up all functions
“owned” by this class and checking whether there is a default implementing case;
if no such default exists, then the function is abstract for this class, and must
appear in the class’s abstract on clause. We revisit owners in Section 3.2.
We omit the rules that generate signatures for alias declarations, as they are
verbose but straightforward. Informally, these lookup or compute the signature
of their right-hand side, and then substitute the alias declaration’s name for
the referred-to declaration’s name. For example, for alias class C1 = M.C2,
we would look up the signature of M.C2 in the environment, and C1’s signature
370 K. Lee and C. Chambers
Γ, K, Mb
Mb : Sb
unmark(τ # ) = τ
# # (S-Fun)
Γ, K, Mb
(fun f : τ -> τ ) : (fun f : τ -> τ open below τ )
1..k
[Γ, Mb
rep(C) = {L : τ }]1 Γ, K, Mb
abstractFuns(c[, C]1 ) = ∅
1..n (S-Class)
1..m
Γ, K, Mb
class c (x : τ ) [extends C (e)]1 of {l : τ = e }
1..m 1..n 1..k
: class c (τ ) of {ThisMod.l : τ [, L : τ ]1 }
1..k
[Γ, Mb
rep(C) = {L : τ }]1 Γ, K, Mb
abstractFuns(c[, C]1 ) = F
(S-Abs-Class)
1..n
Γ, K, Mb
abstract class c (x : τ 1..m ) [extends C e]1 of {l : τ = e }
1..n 1..k
: class c (τ 1..m ) of {ThisMod.l : τ [, L : τ ]1 } abstract on F
declRels(Mb) = K
∀n
i=1 .fresh(Mbi ) = φi ∀n
i=1 .rel(Mbi ) = ρi φ, ρ
= ∪n n
i=1 φi , ∪i=1 ρi
(Decl-Rels)
declRels(Mb1 , . . . , Mbn ) = φ, ρ
Mb fresh(Mb) rel(Mb)
[abstract] class c( ) of { } ThisMod.c −
[abstract] class c( ) extends C( ) of { } ThisMod.c ThisMod.c <1 C
alias class c = C − ThisMod.c <0 C
fun f : -> ThisMod.f −
alias fun f = F − ThisMod.f = F
extend fun F with q P -> e ThisMod.q −
alias extend fun F with q = Q − ThisMod.q = Q
would have the same representation, constructor (if present), and abstract on
clause (if present), but with C1 substituted for C2.
K C1 < i C2 K C2 < j C3 C1 RC C2 ∈ ρ K C1 RC C2
i+j
K C1 < C3 φ, ρ
C1 RC C2
(CRel-Trans-Count) (CRel-Lookup)
We do not show function and method relation deduction rules, but these are
straightforwardly parallel to a subset of the class relation rules. For example,
FRel-Lookup looks up a function relation F1 RF F2 in ρ, and FRel-Neq
deduces that all function names in φ refer to (pairwise) distinct functions.
3.3 Soundness
Previous work [28] established the soundness of Mini-Eml (the formal core of
Eml, analogous to Mini-F(Eml)) via the following standard theorems:
Theorem 1 (Mini-Eml Subject Reduction). Given: (1) ∀Bn ∈ dom(BT ).
BT (Bn) OK, (2)
E : T in the context of BT , and (3) E −→ E in the context
of BT , then
E : T for some T such that T ≤ T .
Theorem 2 (Mini-Eml Progress). Given: (1) ∀Bn ∈ dom(BT ). BT (Bn) OK,
(2)
E : T in the context of BT , and (3) E is not a value, then ∃E .E −→ E .
374 K. Lee and C. Chambers
Γ, K, Sb Sb ≤ Sb
1
Γ, K, Sb class c (τ ) of {L : τ } [abstract on F ]
≤ closed class c (τ ) of {L : τ } [abstract on F ]1
(SB-Close)
1
[F ⊆ F]
(SB-Closed-Abs)
Γ, K, Sb closed class c (τ ) of {L : τ } abstract on F
≤ closed class c [(τ )]1 of {L : τ } [abstract on F ]1
K τ ≤ τ
# (SB-Seal)
Γ, K, Sb fun f : τ -> τr open below τ ≤ fun f : τ # -> τr open below τ
ThisMod.c ∈ freeNames(Sb)
(Drop-Class)
Γ, K, Sb ( [abstract] class c . . .) droppable
ThisMod.f ∈ freeNames(Sb)
(Drop-Fun)
Γ, K, Sb (fun f : -> open below ) droppable
Here, the “block table” BT is a finite map from block names Bn to blocks (module
values), E is a Mini-Eml core expression, and T is a Mini-Eml type. BT (Bn) OK
denotes the Mini-Eml judgment that the block BT (Bn) is well-formed.
E : T
denotes that E has the Mini-Eml type T . E −→ E is the Mini-Eml small-
step evaluation relation. Now, we define a function
which translates Mini-
F(Eml) syntax into Mini-Eml: D; Δ; e
denotes the translation of a compiled
Mini-F(Eml) program into a Mini-Eml program BT ; E, assuming the module
dependency relation D. We then require two extra properties:
Theorem 3 (Well-Formed Translation). If (1) ∅, ∅
Md ⇒∗ Γ, D, (2) ∅
Md ⇓∗ Δ, and (3) D, Δ
= BT , then (G1) ∀Bn ∈ dom(BT ).BT (Bn) OK.
Theorem 4 (Type Preservation). If (1) ∅, ∅
Md ⇒∗ Γ, D, (2) ∅
Md ⇓∗
Δ, (3) D, Δ; e
= BT ; E, and (4) Γ, ∅, ∅
e : τ , then (G1)
E : τ
in BT .
Provided the above properties hold, it follows that if a Mini-F(Eml) program
typechecks, then its Mini-Eml translation typechecks, and the translated pro-
gram does not go wrong. We working towards completion of the proofs, which
will appear in a companion technical report [22].
Parameterized Modules for Classes and Extensible Functions 375
4 Related Work
As previously mentioned, the direct predecessor to F(Eml) is Eml [28]. A sibling
of Eml is MultiJava [10, 29], which explores many of the same issues and could
be extended with parameterized modules in closely analogous ways. Nice [3]
resembles Eml (though it is built on a different formalism) in providing multiple
dispatch and a form of modular typechecking, without parameterized modules.
A mixin [5, 17] is a class that inherits from a parameter to be provided later.
Bracha and Cook first proposed mixins [5] for a single-dispatch object-oriented
language. Statically typed mixin languages prior to our work generally have
not supported multiple dispatch, or permitted addition of dispatching functions
from outside the receiver class. Traits [36, 38, 32] are a mixin-like multiple in-
heritance mechanism wherein classes can inherit one ordinary superclass and
multiple traits, where traits may not define constructors or state. Traits lan-
guages would still gain additional flexibility if combined with functors: a class
defining constructors and state could (by functorization of the containing mod-
ule) be parameterized by a superclass that also defined constructors and state.
Many languages allow general multiple inheritance, which can support mixin-
like idioms. Multiple inheritance comes with a number of known problems, e.g.,
the “diamond inheritance” problem. Like traditional mixin languages, F(Eml)
sidesteps these problems (with some loss of expressiveness) by offering single
inheritance, plus the alternative composition mechanism of parameterization.
Virtual types (or virtual classes [25]) extend class-level inheritance with over-
ridable type members nested inside classes. Virtual types can statically type-
check many idioms like those supported by parameterized classes and mod-
ules [7, 40, 14]. In languages like gbeta [13], Scala [32], Jx [31], and CaesarJ [2],
virtual types also support family polymorphism [13], an idiom for writing code
that is generic over multiple instantiations of related groups of types. Virtual
and parametric types share deep connections, and we suspect that any given
language feature raises closely analogous issues in either style of system. For
example, if one added multiple dispatch to virtual type systems, then determin-
ing whether a type member could be safely overridden in a subclass might raise
issues like those that F(Eml) encounters in defining subsumption for classes in
functor argument signatures. Conversely, adding family polymorphism support
to F(Eml) might require dependent type mechanisms akin to those in virtual
type systems.
F(Eml)’s functors are inspired by ML’s parameterized module system [19].
Many extensions to ML parameterized modules have been proposed [23, 18, 11],
but none have incorporated extensible data types, extensible functions, and sym-
metric multiple dispatch. OML [34], OCaml [33], and Moby [15] combine ML-
style modules orthogonally with object-oriented classes, but these classes are tra-
ditional receiver-oriented constructs: dispatching methods can only be declared
with their receiver class, and cannot be externally added without modifying the
original declaration. ML≤ [4] generalizes ML datatypes with subtyping and sym-
metric dispatch, but does not support addition of new cases to existing functions
from outside of the extended declaration’s original module. Several proposals
376 K. Lee and C. Chambers
extend ML with mixin modules [12, 20]; these systems do not currently support
subtyping among datatype cases, ruling out object-oriented idioms.
Jiazzi [26] (based on Units [16]) and JavaMod [1] extend Java with parameter-
ized modules that support many idioms, including mixins. These languages only
support single dispatch, so in this sense they are more restrictive than F(Eml);
however, conversely, they support recursive module linkage, which our work does
not (although we believe recursive linkage could be added to F(Eml)). Jiazzi
also supports the addition of dispatching functions externally to a class, through
an open class design pattern, though this requires more advance planning than
in F(Eml), where external functions can be added directly.
Classes in C++ templates [39] can inherit from a template parameter, but
templates do not support separate typechecking of template bodies. Parameter-
ized classes in GJ [6] support separate typechecking, but disallow inheritance
from the type parameter, ruling out idioms like mixins.
References
1. D. Ancona, E. Zucca. True Modules for Java-like Languages. 15th ECOOP, 2001.
2. I. Aracic, V. Gasiunas, M. Mezini, K. Ostermann. An Overview of CaesarJ. Trans.
on Aspect-Oriented Development I, LNCS 3880 pp. 135-173, Feb. 2006.
Parameterized Modules for Classes and Extensible Functions 377
Ralph E. Johnson
Tao Xie
Abstract. A test case consists of two parts: a test input to exercise the program
under test and a test oracle to check the correctness of the test execution. A test
oracle is often in the form of executable assertions such as in the JUnit test-
ing framework. Manually generated test cases are valuable in exposing program
faults in the current program version or regression faults in future program ver-
sions. However, manually generated test cases are often insufficient for assuring
high software quality. We can then use an existing test-generation tool to generate
new test inputs to augment the existing test suite. However, without specifications
these automatically generated test inputs often do not have test oracles for expos-
ing faults. In this paper, we have developed an automatic approach and its sup-
porting tool, called Orstra, for augmenting an automatically generated unit-test
suite with regression oracle checking. The augmented test suite has an improved
capability of guarding against regression faults. In our new approach, Orstra first
executes the test suite and collects the class under test’s object states exercised
by the test suite. On collected object states, Orstra creates assertions for assert-
ing behavior of the object states. On executed observer methods (public methods
with non-void returns), Orstra also creates assertions for asserting their return
values. Then later when the class is changed, the augmented test suite is executed
to check whether assertion violations are reported. We have evaluated Orstra on
augmenting automatically generated tests for eleven subjects taken from a va-
riety of sources. The experimental results show that an automatically generated
test suite’s fault-detection capability can be effectively improved after being aug-
mented by Orstra.
1 Introduction
To expose faults in a program, developers create a test suite, which includes a set of test
cases to exercise the program. A test case consists of two parts: a test input to exercise
the program under test and a test oracle to check the correctness of the test execution. A
test oracle is often in the form of runtime assertions [2, 36] such as in the JUnit testing
framework [19]. In Extreme Programming [7] practice, writing unit tests has become
an important part of software development. Unit tests help expose not only faults in the
current program version but also regression faults introduced during program changes:
these written unit tests allow developers to change their code in a continuous and con-
trolled way. However, some special test inputs are often overlooked by developers and
typical manually created unit test suites are often insufficient for assuring high software
quality. Then developers can use one of the existing automatic test-generation tools
[31, 8, 42, 11, 12, 43, 44] to generate a large number of test inputs to complement the
manually created tests. However, without specifications, these automatically generated
test inputs do not have test oracles, which can be used to check whether test executions
are correct. In this paper, we have developed a new automatic approach that adds asser-
tions into an automatically generated test suite so that the augmented test suite has an
improved capability of guarding against regression faults.
Our approach focuses on object-oriented unit tests, such as the ones written in the JU-
nit testing framework [19]. An object-oriented unit test consists of sequences of method
invocations. Our approach proposes a framework for asserting the behavior of a method
invocation in an object-oriented unit-test suite. Behavior of an invocation depends on
the state of the receiver object and method arguments at the beginning of the invocation.
Behavior of an invocation can be asserted by checking at the end of the invocation the
return value of the invocation (when the invocation’s return is not void), the state of
the receiver object, and the states of argument objects (when the invocation can modify
the states of the argument objects). Automatic test-generation tools often do not create
assertions but rely on uncaught exceptions or program crashes to detect problems in a
program [11, 12].
To address insufficient test oracles of an automatically generated test suite, we have
developed an automatic tool, called Orstra, to augment the test suite for guarding against
regression faults. Orstra executes tests in the test suite and collects the class under test’s
object states exercised by the test suite; an object’s state is characterized by the values
of the object’s transitively reachable fields [43]. On collected object states, Orstra in-
vokes observers (public methods with non-void returns) of the class under test, collects
their actual return values, and creates assertions for checking the returns of observers
against their actual collected values. In addition, for each collected object state S, Orstra
determines whether there is another collected object state S that is equivalent to S
(state equivalence is defined by graph isomorphism [8, 43]); if so, Orstra reconstructs
S with method sequences and creates an assertion for checking the state equivalence
of S and S .
This paper makes the following main contributions:
The rest of this paper is organized as follows. Section 2 presents an illustrating exam-
ple. Section 3 presents our framework for asserting behavior of a method invocation in
a test suite. Section 4 presents our Orstra tool for automatically augmenting a test suite.
382 T. Xie
2 Example
We next illustrate how Orstra augments an automatically generated test suite’s regres-
sion oracle checking. As an illustrating example, we use a Java implementation of a
bounded stack that stores unique elements. Stotts et al. [40] used this Java implemen-
tation to experiment with their algebraic-specification-based approach for systemati-
cally creating unit tests. In the abbreviated implementation shown in Figure 1, the class
MyInput is the comparable type of elements stored in the stack. In the class implemen-
tation of the bounded stack, the array elems contains the elements of the stack, and
numberOfElements is the number of the elements and the index of the first free loca-
tion in the stack. The max is the capacity of the stack. The public methods in the class
interface include two standard stack operations: push and pop, as well as five observer
methods, whose returns are not void.
Given a Java class, existing automatic test-generation tools [31, 11, 12, 43, 44] can
generate a test suite automatically for the class. For example, Jtest [31] allows users
to set the length of calling sequences between one and three, and then generates
random calling sequences whose lengths are not greater than the user-specified one.
JCrasher [11] automatically constructs method sequences to generate non-primitive ar-
guments and uses default data values for primitive arguments. JCrasher generates tests
as calling sequences with the length of one.
For example, given the UBStack class, existing automatic test-generation tools [31,
11,12,43,44] can generate test suites such as the example test suite UBStackTest with
two tests (exported in the JUnit testing framework [19]) shown in Figure 2. Each test
has several method sequences on the objects of the class. For example, test1 creates a
stack s1 and invokes push, top, pop, and isMember on it in a row.
Note that there are no assertions generated in the UBStackTest test suite. Therefore,
when the test suite is run, tools such as JCrasher [11] and CnC [12] detect problems by
observing whether uncaught exceptions are thrown; tools such as Korat [8] detect prob-
lems by observing whether the execution of the test suite violates design-by-contract
annotations [28,23,9] (equipped with the program under test), which are translated into
run-time assertions [2, 36].
Given a test suite such as UBStackTest, Orstra systematically augments the test
suite to produce an augmented test suite such as UBStackAugTest shown in Figure 3.
For illustration, we annotate UBStackAugTest with line numbers and mark in bold
font those lines of statements that correspond to the statements in UBStackTest. The
augmented test suite UBStackAugTest is equipped with comprehensive assertions,
which reflect the behavior of the current program version under test. These new asser-
tions can guard against regression faults introduced in future program versions.
We next illustrate how Orstra automatically creates assertions for UBStackTest
to produce UBStackAugTest. By running UBStackTest, Orstra dynamically mon-
itors the method sequences executed by UBStackTest and collects the exer-
cised state of a UBStack-receiver object by collecting the values of the re-
Augmenting Automatically Generated Unit-Test Suites 383
ceiver object’s transitively reachable fields. Based on the collected method in-
vocations, Orstra identifies UBStack’s observer methods that are invoked by
UBStackTest: top(), isMember(new MyInput(3)), isEmtpy(), isFull(), and
getNumberOfElements().
Then on each UBStack-receiver-object state exercised by UBStackTest, Orstra in-
vokes the collected observer methods. For example, after the constructor invocation
(shown in Line 2 of Figure 3), Orstra invokes the five observer methods on the UBStack
object s1. After invoking these observer methods, Orstra collects their return values and
then makes an assertion for each observer method by adding a JUnit assertion method
(assertEquals), whose first argument is the observer method’s return and second ar-
gument is the collected return value. The five inserted assertions are shown in Lines
4-9. Similarly, Orstra inserts assertions after the push invocation (shown in Line 12)
for asserting the state of the receiver s1. Because in test1 of UBStackTest, there is
384 T. Xie
an observer method top invoked immediately after the push invocation, in the inserted
assertions for s1 after the push invocation, Orstra does not include another duplicate
top observer invocation. Then Orstra still adds an assertion for the original top invoca-
tion (shown in Line 19). When Orstra collects the return value of top, it determines that
the value is not of a primitive type but of the MyInput type. It then invokes its own run-
time helper method (Runtime.genStateStr) to collect the state-representation string
of the MyInput-type return value. The string consists of the values of all transitively
reachable fields of the MyInput-type object, represented as “o:3;”, where o is the field
name and 3 is the field value.
After the top invocation (shown in Line 19), Orstra inserts no new assertion for as-
serting the state of s1 immediately after the top invocation, because Orstra dynamically
determines top to be a state-preserving or side-effect-free method: all its invocations in
the test suite do not modify the state of the receiver object.
After the pop invocation (shown in Line 21), Orstra detects that s1’s state is equiva-
lent to another collected object state that is produced by a shorter method sequence: an
object state produced after the constructor invocation; Orstra determines state equiva-
lence of two objects by comparing their state-representation strings. Therefore, instead
Augmenting Automatically Generated Unit-Test Suites 385
of invoking observer methods on s1, Orstra constructs an assertion for asserting that the
state of s1 is equivalent to the state of temp s1, which is produced after the construc-
tor is invoked. Orstra creates the assertion by using an equals-assertion-builder method
(EqualsBuilder.reflectionEquals) from the Apache Jakarta Commons subpro-
ject [4]. This method uses Java reflection mechanisms [5] to determine if two objects
are equal based on field-by-field comparison. If an equals method is defined as a pub-
lic method of the class under test, Orstra can also alternatively use the equals method
for building the assertion.
After the isMember invocation (shown in Line 26), Orstra inserts no new assertion
for asserting the state of s1 immediately after the isMember invocation, because Orstra
dynamically determines isMember to be a state-preserving method.
When augmenting test2, Orstra does not insert assertions for the state of s2 imme-
diately after the constructor invocation, because the object state that is produced by the
same method sequence has been asserted in testAug1. In testAug2, Orstra adds as-
sertions only for those observer-method invocations that are originally in test2 (shown
in Lines 34-36).
3 Framework
This section formalizes some notions introduced informally in the previous section. We
first describe approaches for representing states of non-primitive-type objects and then
compare these approaches. We finally describe how these state representations can be
used to build assertions for the receiver object and return value of a method invocation.
Each object or value is represented with an expression. Arguments for a method in-
vocation are represented as sequences of zero or more expressions (separated by com-
mas); the receiver of a non-static, non-constructor method invocation is treated as the
first method argument. A static method invocation or constructor invocation does not
have a receiver. The .state and .retval expressions denote the state of the receiver
after the invocation and the return of the invocation, respectively. For brevity, the gram-
mar shown above does not specify types for the expressions. A method is represented
uniquely by its defining class, name, and the entire signature. (For brevity, we do not
show a method’s defining class or signature in the state-representation examples of this
paper.) For example, in test1, the state of the object s1 after the push invocation is
represented by
push(UBStack<init>().state, MyInput<init>(3).state).state.
Orstra dynamically monitors the invocations of the methods on the actual ob-
jects created at runtime and collects the actual argument values for these invocations.
For example, it represents the states of both s3 and s4 at the end of test3 and
test4 as push(push(UBStack<init>().state, MyInput<init>(0)).state,
MyInput<init>(1)).state.
The above-shown grammar does not capture a method execution’s side effect on an
argument: a method can modify the state of a non-primitive-type argument and this ar-
gument can be used for another later method invocation. Following Henkel and Diwan’s
suggested extension [22], we can enhance the first grammar rule to address this issue:
exp ::= prim | invoc “.state” | invoc “.retval” | invoc “.argi ”
where the added expression (invoc “.argi ”) denotes the state of the modified ith argu-
ment after the method invocation.
If test code modifies directly some public fields of an object without invoking any
of its methods, these side effects on the object are not captured by method sequences
in the method-sequence representation. To address this issue, Orstra can be extended to
create a public field-writing method for each public field of the object, and then monitor
object-field accesses in the test code. If Orstra detects at runtime the execution of the
object’s field-write instruction in the test code, it can insert a corresponding field-writing
method invocation in the method-sequence representation.
Augmenting Automatically Generated Unit-Test Suites 387
Definition 2. Two heaps O1 , E1 and O2 , E2 are isomorphic iff there is a bijection
ρ : O1 → O2 such that:
The definition allows only object identities to vary: two isomorphic heaps have the same
fields for all objects and the same values for all primitive fields.
The state of an object is represented with a rooted heap, instead of the whole program
heap.
Definition 3. A rooted heap is a pair r, h of a root object r and a heap h whose all
nodes are reachable from r.
Orstra linearizes rooted heaps into strings such that checking heap isomorphism corre-
sponds to checking string equality. Figure 4 shows the pseudo-code of the linearization
algorithm. The linearization algorithm traverses the entire rooted heap in the depth-first
388 T. Xie
order, starting from the root. When the algorithm visits a node for the first time, it as-
signs a unique identifier to the node, and keeps this mapping in ids so that already
assigned identifiers can be reused by nodes that appear in cycles. We can show that
the linearization normalizes rooted heaps into strings. The states of two objects are
equivalent if their strings resulted from linearization are the same.
Definition 4. An observer of a class c is a method ob in c’s interface such that the return
type of ob is not void.
Note that when m’s return is void, r is void; when m is a static method, Sentry and
Sexit are empty; when m is a constructor method, Sentry is empty.
When a method execution e is a public method of the class under test C and none of
e’s indirect or direct callers is a method of C, we call that e is invoked on the interface
of C. For each such method execution e invoked on the interface of C, if Sexit is not
empty, Sexit can be asserted by using the following ways:
– If another method sequence can be found to produce an object state S that is ex-
pected to be equivalent to Sexit , an assertion is created to compare the state repre-
sentations of S and Sexit .
– If an observer method ob is defined by the class under test, an assertion is created to
compare the return of an ob invocation on Sexit with the expected value (the ways
of comparing return values are described below).
390 T. Xie
In the state-capturing phase, Orstra runs a given test suite T (in the form of a JUnit test
class [19]) for the class under test C and dynamically rewrites the bytecodes of each
class at class loading time (based on the Byte Code Engineering Library (BCEL) [13]).
Orstra rewrites the T class bytecodes to collect receiver object references, method
names, method signatures, arguments, and returns at call sites of those method se-
quences that lead to C-object states or argument-object states for C’s methods. Then
Orstra can use the collected method call information to reconstruct the method se-
quence that leads to a particular C-object state or argument-object state. The recon-
structed method sequence can be used in constructing assertions for C-object states in
the assertion-building phase.
Orstra also rewrites the C class bytecodes in order to collect a C-object’s concrete-
state representations at the entry and exit of each method call invoked through the C-
object’s interface. Orstra uses Java reflection mechanisms [5] to recursively collect all
the fields that are reachable from a C-object and uses the linearization algorithm (shown
in Figure 4) to produce the object’s state-representation string.
Additionally Orstra collects the set OM of observer-method invocations exercised
by T . These observer-method invocations are used to inspect and assert behavior of an
C-object state in the assertion-building phase.
In the assertion-building phase, Orstra iterates through each C-object state o exercised
by the initial test suite T . If o is equivalent to a nonempty set O of some other object
states exercised by T , Orstra picks the object state o in O that is produced by the short-
est method sequence m . Then Orstra creates an assertion for asserting state equivalence
by using the techniques described in Section 3.2.
In particular, if an equals method is defined in C’s interface, Orstra creates the
following JUnit assertion method (assertTrue) [19] to check state equivalence after
invoking the method sequence m to produce o :
C o’ = m’;
assertTrue(o.equals(o’))
Note that m needs to be replaced with the actual method sequence in the exported
assertion code.
If no equals method is defined in C’s interface, Orstra creates an assertion by using
an equals-assertion-builder method (EqualsBuilder.reflectionEquals), which is
from the Apache Jakarta Commons subproject [4]. This method uses Java reflection
mechanisms [5] to determine if two objects are equal by comparing their transitively
reachable fields. We can show that if two objects o and o have the same state represen-
tation strings, the return value of EqualsBuilder.reflectionEquals(o, o’) is
true. Orstra creates the following assertion to check state equivalence after invoking
the method sequence m to produce o :
C o’ = m’;
EqualsBuilder.reflectionEquals(o, o’)
392 T. Xie
where t str is the return value of the toString method invocation. If no toString
method is defined in R’s interface, Orstra creates the following assertion:
assertEquals(Runtime.genStateStr(o.om), s_str);
5 Experiment
This section presents our experiment conducted to address the following research ques-
tion:
– RQ: Can our Orstra test-oracle-augmentation tool improve the fault-detection ca-
pability (which approximates the regression-fault-detection capability) of an auto-
matically generated test suite?
Augmenting Automatically Generated Unit-Test Suites 393
Table 1 lists eleven Java classes that we use in the experiment. These classes were
previously used in evaluating our previous work [43] on detecting redundant tests.
UBStack is the illustrating example taken from the experimental subjects used by Stotts
et al. [40]. IntStack was used by Henkel and Diwan [22] in illustrating their approach
of discovering algebraic specifications. ShoppingCart is an example for JUnit [10].
BankAccount is an example distributed with Jtest [31]. The remaining seven classes
are data structures previously used to evaluate Korat [8]. The first four columns show
the class name, the number of methods, the number of public methods, and the number
of non-comment, non-blank lines of code for each subject.
To address the research question, our experiment requires automatically generated
test suites for these subjects so that Orstra can augment these test suites. We then use
two third-party test-generation tools, Jtest [31] and JCrasher [11], to automatically gen-
erate test inputs for these eleven Java classes. Jtest allows users to set the length of
calling sequences between one and three; we set it to three, and Jtest first generates all
calling sequences of length one, then those of length two, and finally those of length
three. JCrasher automatically constructs method sequences to generate non-primitive
arguments and uses default data values for primitive arguments. JCrasher generates
tests as calling sequences with the length of one. The fifth and sixth columns of Table 1
show the number of tests generated by Jtest and JCrasher.
Although our ultimate research question is to investigate how much better an aug-
mented test suite guards against regression faults, we cannot collect sufficient real re-
gression faults for the experimental subjects. Instead, in the experiment, we use general
fault-detection capability of a test suite to approximate regression-fault-detection ca-
pability. In particular, we measure the fault-detection capability of a test suite before
and after Orstra’s augmentation. Then our experiment requires faults for these eleven
Java classes. These Java classes were not equipped with such faults; therefore, we used
Ferastrau [24], a Java mutation testing tool, to seed faults in these classes. Ferastrau
modifies a single line of code in an original version in order to produce a faulty version.
394 T. Xie
We configured Ferastrau to produce around 300 faulty versions for each class. For three
relatively small classes, Ferastrau generates a much smaller number of faulty versions
than 300. The last column of Table 1 shows the number of faulty versions generated by
Ferastrau.
5.2 Measures
Table 2 shows the experimental results. The results for JCrasher-generated test suites
are shown in Columns 2-4 and the results for Jtest-generated test suites are shown in
Columns 5-7. Columns 2 and 5 show the fault-exposure ratios of the original test suites
(before test-oracle augmentation). Columns 3 and 6 show the fault-exposure ratios of
the test suites augmented by Orstra. Columns 4 and 7 show the improvement factors
of the augmented test suites over the original test suites. The last two rows show the
average and median data for Columns 2-7.
Without containing any assertion, a JCrasher-generated test exposes a fault if an un-
caught exception is thrown during the execution of the test. We observed that JCrasher-
generated tests has 0% fault-exposure ratios for two classes (ShoppingCart and
BankAccount), because no seeded faults for these two classes cause uncaught excep-
tions. Jtest equips its generated tests with some assertions: these assertions typically
assert those method invocations whose return values are of primitive types. (Section 7
discusses main differences between Orstra and Jtest’s assertion creation.) Generally,
Jtest-generated test suites have higher fault-exposure ratios than JCrasher-generated test
suites. The phenomenon is due to two factors: Jtest generates more test inputs (with
longer method sequences) than JCrasher, and Jtest has stronger oracle checking (with
additional assertions) than JCrasher.
After Orstra augments the JCrasher-generated test suites with additional assertions,
we observed that the augmented test suites achieve substantial improvements of fault-
exposure ratios. After augmenting the JCrasher-generated test suite for TreeMap, Orstra
achieves an improvement factor of even beyond 50. The augmented Jtest-generated test
suites also gain improvements of fault-exposure ratios (although not substantially as
JCrasher-generated test suites), except for the first four classes. These four classes are
relatively simple and seeded faults for these classes can be exposed with a less com-
prehensive set of assertions; Jtest-generated assertions are already sufficient to expose
those exposable seeded faults.
6 Discussion
In general, the number of assertions generated for an initial test suite can be approxi-
mately characterized as
|assertions| = O(|nonEqvStates| × |observers|+
|statesEqvT oAnother|)
where |nonEqvStates| × |observers| is the number of nonequivalent object states
exercised by the initial test suite being multiplied by the number of observer calls ex-
ercised by the initial test suite; recall that Orstra generates an assertion for the return
of an observer invoked on a nonequivalent object state. |statesEqvT oAnother| is the
number of object states (produced by nonequivalent method executions in the initial test
suite) that can be found to be equivalent to another object state produced by a different
method sequence; recall that Orstra generates an assertion for asserting that an object
state produced by a method sequence is equivalent to another object state produced by
a different method sequence if any.
Using Orstra in regression testing activities incurs two types of extra cost. The first
type is the cost of augmenting the initial test suite. In our experiment, the elapsed real
time of running our test augmentation is reasonable, being up to several seconds, de-
termined primarily by the class complexity, the number of tests in the test suite, the
number of generated assertions. Note that Orstra needs to be run once when the initial
test suite is augmented for the first time, and later to be run when reported assertion
violations are determined not to be caused by regression faults. In future work, follow-
ing the idea of repairing GUI regression tests [27], we plan to improve Orstra so that it
can fix those violated assertions in the augmented test suite without re-augmenting the
whole initial test suite.
The second type of cost is the cost of running additional assertion checking in the
augmented test suite, determined primarily by the number of generated assertions. Al-
though this cost is incurred every time the augmented test suite is run (after the pro-
gram is changed), running the initial unit-test suite is often fast and running these ad-
ditional assertion checking slows down the execution of the test suite within several
factors. Indeed, if an initial test suite exercises many non-equivalent object states and
the program under test has many observer methods, the cost of both augmenting the
test suite and running the augmented test suite could be high. Under these situations,
developers can configure Orstra to trade weaker oracle checking for efficiency by in-
voking a subset of observer methods during assertion generation. In addition, regres-
sion test prioritization [15] or test selection [20] for Java programs can be used to order
or select tests in the Orstra-augmented test suite for execution when the execution time
is too long.
Orstra observes behavior of the program under test when being exercised by a test suite
and then automatically adds assertions to the test suite to assert the program behavior is
Augmenting Automatically Generated Unit-Test Suites 397
preserved after future program changes. Indeed, sometimes violations of inserted asser-
tions do not necessarily indicate real regression faults. For example, consider that the
program under test contains a fault, which is not exposed by the initial test suite. Orstra
runs the test suite on the current (faulty) version and create assertions, some of which
assert wrong behavior. Later developers find the fault and fix the program. When run-
ning the Orstra-augmented test suite on the new program version, assertion violations
are reported but there are no regression faults. In addition, although Orstra has been
carefully designed to assert as few implementation details in object-state representation
as possible, some program changes may violate inserted assertions but still preserve
program behavior that developers care about. To help developers to determine whether
an assertion violation in an augmented test suite indicates real regression faults, we
can use change impact analysis tools such as Chianti [33] to identify a set of affecting
changes that were responsible for the assertion violation.
Some types of programs (such as multi-threaded programs or programs whose be-
haviors are related to time) may exhibit nondeterministic or different behaviors across
multiple runs: running the same test suite twice may produce different observer returns
or receiver-object states. For example, a getTime method returns the current time and a
getRandomNumber method returns a random number. After we add assertions for these
types of method returns in a test suite, running the augmented test suite on the current
or new program version can report assertion violations, which do not indicate real faults
or regression faults. To address this issue, we can run a test suite multiple times on the
current program version and remove those assertions that are not consistently satisfied
across multiple runs.
7 Related Work
Richardson [34] developed the TAOS (Testing with Analysis and Oracle Support)
toolkit, which provides different levels of test oracle support. For example, in lower
levels, developers can write down expected outputs for a test input, specify ranges for
variable values, or manually inspect actual outputs. The oracle support provided by our
Orstra tool is in TAOS’ lower levels: generating expected outputs for test inputs. In
higher levels, developers can use specification languages (such as Graphical Interval
Logic Langauge and Real-Time Interval Logic Language) to specify temporal proper-
ties. There exist a number of proposed approaches for providing oracle supports based
on different types of specifications [35, 32, 14, 26, 9]. In particular, for testing Java pro-
grams, Cheon and Leavens [9] developed a runtime verification tool for Java Modelling
Language (JML) [23] and then provided oracle supports for automatically generated
tests. This oracle checking approach was also adopted by automatic specification-based
test generation tools such as Korat [8]. Different from these specification-based oracle
supports, Orstra does not require specifications but Orstra can enhance oracle checking
only for exposing regression faults.
When specifications do not exist, automatic test-generation tools such as
JCrasher [11] and CnC [12] use program crashes or uncaught exceptions as symptoms
of the current program version’s faulty behavior. Like Orstra, Jtest [31] can also cre-
ate some assertions for its generated tests. Orstra differs from Jtest in several ways.
Jtest creates assertions for its own generated tests only, whereas Orstra can augment
any third-party test suite. Jtest creates assertions for method invocations whose return
values are of primitive types, whereas Orstra creates more types of assertions, such
as asserting returns with non-primitive types and asserting behavior of receiver-object
states. Unlike Orstra, Jtest does not systematically or exhaustively create assertions to
assert exercised program behavior. Our experimental results (shown in Section 5.3) in-
dicate that Orstra can still effectively augment a Jtest-generated test suite, which has
been equipped with Jtest-generated assertions.
Saff and Ernst [38] as well as Orso and Kennedy [29] developed techniques for
capturing and replaying interactions between a selected subsystem (such as a class) and
the rest of the application. Their techniques focus on creating fast, focused unit tests
from slow system-wide tests, whereas our Orstra tool focuses on adding more assertions
to an existing unit-test suite. In addition, Orstra’s techniques go beyond capturing and
replaying, because Orstra creates new helper-method invocations for assertion checking
and these new method invocations might not be exercised in the original test suite.
Memon et al. [25] model a GUI state in terms of the widgets that the GUI contains,
their properties, and the values of the properties. Their experimental results show that
comparing more-detailed GUI states (e.g., GUI states associated with all or visible win-
dows) from two versions can detect faults more effectively than comparing less-detailed
GUI states (e.g., GUI states associated with the active window or widget). Our exper-
iment shows a similar result: checking more-detailed behavior (with augmented test
suites) can more effectively expose regression faults.
Both Harrold et al’s spectra comparison approach [21] and our previous value-spectra
comparison approach [47] also focus on exposing regression faults. Program spectra
usually capture internal program execution information and these approaches compare
400 T. Xie
program spectra from two program versions in order to expose regression faults. Our
new Orstra tool compares interface-visible behavior of two versions without comparing
internal execution information. On one hand, Orstra may not report behavioral differ-
ences that are reported by spectra comparison approaches, if these internal behavioral
differences cannot cause behavioral differences in the interface. On the other hand,
Orstra may report behavioral differences that are not reported by spectra comparison
approaches, if these behavioral differences are exhibited only by new Orstra-invoked
observers (spectra comparison approaches do not create any new method invocation).
When there are no oracles for a large number of automatically generated tests, devel-
opers cannot afford to inspect the results of such a large number of tests. Our previous
operational violation approach [45] selects a small subset of automatically generated
tests for inspection; these selected tests violates the operational abstractions [17] in-
ferred from the existing test suite. Pacheco and Ernst [30] extended the approach by
additionally using heuristics to filter out illegal test inputs. Agitar Agitator [1] automat-
ically generates initial tests, infers operational-abstraction-like observations, lets devel-
opers confirm these observations to assertions, and generates more tests to violate these
inferred and confirmed observations. The operational violation approach primarily in-
tends to expose faulty behavior exhibited by new generated tests on the current program
version, whereas Orstra intends to enhance the oracle checking of an existing test suite
so that it has an improved capability of exposing faulty behavior exhibited by the same
test suite on future program versions.
Orstra has been implemented based on our two previous approaches. Our previous
Rostra approach [43] provides state representation and comparison techniques, but Ros-
tra compares states in order to detect redundant tests out of automatically generated tests.
Our previous Obstra approach [46] also invokes observers on object states exercised by
an existing test suite. Obstra uses the return values of observers to abstract concrete states
and constructs abstract-object-state machines for inspection. Obstra allows developers to
inspect the behavior of the current program version, whereas Orstra uses the return val-
ues of observers as well as receiver object states to assert that behavior of future program
versions is the same as behavior of the current program version. In contrast to Rostra
and Obstra, Orstra makes new contributions in developing an approach for enhancing
the regression oracle checking of an automatically generated test suite.
8 Conclusion
An automatic test-generation tool can be used to generate a large number of test inputs for
the class under test, complementing manually generated tests. However, without specifi-
cations these automatically generated test inputs do not have test oracles to guard against
faults in the current program version or regression faults in future program versions. We
have developed a new automated approach for augmenting an automatically generated
test suite in guarding against regression faults. In particular, we have proposed a frame-
work for asserting behavior of a method invocation in an object-oriented unit-test suite.
Based on the framework, we have developed an automatic test-oracle-augmentation tool,
called Orstra, that systematically adds assertions into an automatically generated test
suite in order to improve its capability of guarding against regression faults. We have
Augmenting Automatically Generated Unit-Test Suites 401
Acknowledgments
We would like to thank Alex Orso and Andreas Zeller for discussions that lead to the
work described in this paper. We thank Darko Marinov for providing the Ferastrau
mutation testing tool and Korat subjects used in the experiment.
References
1. Agitar Agitatior 2.0, Novermber 2004. http://www.agitar.com/.
2. D. M. Andrews. Using executable assertions for testing and fault tolerance. In Proc. the 9th
International Symposium on Fault-Tolerant Computing, pages 102–105, 1979.
3. J. H. Andrews, L. C. Briand, and Y. Labiche. Is mutation an appropriate tool for testing
experiments? In Proc. 27th International Conference on Software Engineering, pages 402–
411, 2005.
4. The Jakarta Commons Subproject, 2005.
http://jakarta.apache.org/commons/lang/apidocs/org/apache/
commons/lang/builder/EqualsBuilder.html.
5. K. Arnold, J. Gosling, and D. Holmes. The Java Programming Language. Addison-Wesley
Longman Publishing Co., Inc., 2000.
6. R. S. Arnold. Software Change Impact Analysis. IEEE Computer Society Press, 1996.
7. K. Beck. Extreme programming explained. Addison-Wesley, 2000.
8. C. Boyapati, S. Khurshid, and D. Marinov. Korat: automated testing based on Java predicates.
In Proc. International Symposium on Software Testing and Analysis, pages 123–133, 2002.
9. Y. Cheon and G. T. Leavens. A simple and practical approach to unit testing: The JML
and JUnit way. In Proc. 16th European Conference Object-Oriented Programming, pages
231–255, June 2002.
10. M. Clark. Junit primer. Draft manuscript, October 2000.
11. C. Csallner and Y. Smaragdakis. JCrasher: an automatic robustness tester for Java. Software:
Practice and Experience, 34:1025–1050, 2004.
12. C. Csallner and Y. Smaragdakis. Check ’n’ Crash: Combining static checking and testing. In
Proc. 27th International Conference on Software Engineering, pages 422–431, May 2005.
13. M. Dahm and J. van Zyl. Byte Code Engineering Library, April 2003.
http://jakarta.apache.org/bcel/.
14. L. K. Dillon and Y. S. Ramakrishna. Generating oracles from your favorite temporal logic
specifications. In Proc. 4th ACM SIGSOFT Symposium on Foundations of Software Engi-
neering, pages 106–117, 1996.
15. H. Do, G. Rothermel, and A. Kinneer. Empirical studies of test case prioritization in a
JUnit testing environment. In Proc. 15th International Symposium on Software Reliability
Engineering, pages 113–124, 2004.
16. R.-K. Doong and P. G. Frankl. The ASTOOT approach to testing object-oriented programs.
ACM Trans. Softw. Eng. Methodol., 3(2):101–130, 1994.
17. M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely
program invariants to support program evolution. IEEE Trans. Softw. Eng., 27(2):99–123,
2001.
402 T. Xie
18. M. Fowler. Refactoring: Improving the Design of Existing Code. Addison Wesley, 1999.
19. E. Gamma and K. Beck. JUnit, 2003. http://www.junit.org.
20. M. J. Harrold, J. A. Jones, T. Li, D. Liang, and A. Gujarathi. Regression test selection for
Java software. In Proc. 16th ACM SIGPLAN Conference on Object-Oriented Programming,
Systems, Languages, and Applications, pages 312–326, 2001.
21. M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi. An empirical investigation of the
relationship between spectra differences and regression faults. Journal of Software Testing,
Verification and Reliability, 10(3):171–194, 2000.
22. J. Henkel and A. Diwan. Discovering algebraic specifications from Java classes. In Proc.
17th European Conference on Object-Oriented Programming, pages 431–456, 2003.
23. G. T. Leavens, A. L. Baker, and C. Ruby. Preliminary design of JML: A behavioral inter-
face specification language for Java. Technical Report TR 98-06i, Department of Computer
Science, Iowa State University, June 1998.
24. D. Marinov, A. Andoni, D. Daniliuc, S. Khurshid, and M. Rinard. An evaluation of ex-
haustive testing for data structures. Technical Report MIT-LCS-TR-921, MIT CSAIL, Cam-
bridge, MA, September 2003.
25. A. M. Memon, I. Banerjee, and A. Nagarajan. What test oracle should I use for effective GUI
testing? In Proc. 18th IEEE International Conference on Automated Software Engineering,
pages 164–173, 2003.
26. A. M. Memon, M. E. Pollack, and M. L. Soffa. Automated test oracles for GUIs. In Proc.
8th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages
30–39, 2000.
27. A. M. Memon and M. L. Soffa. Regression testing of GUIs. In Proc. 9th European Software
Engineering Conference held jointly with 11th ACM SIGSOFT International Symposium on
Foundations of Software Engineering, pages 118–127, 2003.
28. B. Meyer. Eiffel: The Language. Prentice Hall, 1992.
29. A. Orso and B. Kennedy. Selective capture and replay of program executions. In Proc. 3rd
International ICSE Workshop on Dynamic Analysis, pages 29–35, St. Louis, MO, May 2005.
30. C. Pacheco and M. D. Ernst. Eclat: Automatic generation and classification of test inputs. In
Proc. 19th European Conference on Object-Oriented Programming, pages 504–527, Glas-
gow, Scotland, July 2005.
31. Parasoft Jtest manuals version 4.5. Online manual, April 2003.
http://www.parasoft.com/.
32. D. Peters and D. L. Parnas. Generating a test oracle from program documentation. In Proc.
1994 Internation Symposium on Software Testing and Analysis, pages 58–65, 1994.
33. X. Ren, F. Shah, F. Tip, B. G. Ryder, and O. Chesley. Chianti: a tool for change impact
analysis of Java programs. In Proc. 19th Annual ACM SIGPLAN Conference on Object-
Oriented Programming, Systems, Languages, and Applications, pages 432–448, 2004.
34. D. J. Richardson. TAOS: Testing with analysis and oracle support. In Proc. 1994 ACM
SIGSOFT International Symposium on Software Testing and Analysis, pages 138–153, 1994.
35. D. J. Richardson, S. L. Aha, and T. O. O’Malley. Specification-based test oracles for reactive
systems. In Proc. 14th International Conference on Software Engineering, pages 105–118,
1992.
36. D. S. Rosenblum. Towards a method of programming with assertions. In Proc. 14th Inter-
national Conference on Software Engineering, pages 92–104, 1992.
37. A. Rountev. Precise identification of side-effect-free methods in Java. In Proc. 20th IEEE
International Conference on Software Maintenance, pages 82–91, Sept. 2004.
38. D. Saff, S. Artzi, J. H. Perkins, and M. D. Ernst. Automatic test factoring for Java. In Proc.
21st IEEE International Conference on Automated Software Engineering, pages 114–123,
Long Beach, CA, November 2005.
Augmenting Automatically Generated Unit-Test Suites 403
39. A. Salcianu and M. Rinard. Purity and side effect analysis for Java programs. In Proc. 6th
International Conference on Verification, Model Checking and Abstract Interpretation, pages
199–215, Paris, France, January 2005.
40. D. Stotts, M. Lindsey, and A. Antley. An informal formal method for systematic JUnit test
case generation. In Proc. 2002 XP/Agile Universe, pages 131–143, 2002.
41. Sun Microsystems. Java 2 Platform, Standard Edition, v 1.4.2, API Specification. Online
documentation, Nov. 2003. http://java.sun.com/j2se/1.4.2/docs/api/.
42. W. Visser, C. S. Pasareanu, and S. Khurshid. Test input generation with Java PathFinder.
In Proc. 2004 ACM SIGSOFT International Symposium on Software Testing and Analysis,
pages 97–107, 2004.
43. T. Xie, D. Marinov, and D. Notkin. Rostra: A framework for detecting redundant object-
oriented unit tests. In Proc. 19th IEEE International Conference on Automated Software
Engineering, pages 196–205, Sept. 2004.
44. T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra: A framework for generating object-
oriented unit tests using symbolic execution. In Proc. 11th International Conference on Tools
and Algorithms for the Construction and Analysis of Systems, pages 365–381, April 2005.
45. T. Xie and D. Notkin. Tool-assisted unit test selection based on operational violations. In
Proc. 18th IEEE International Conference on Automated Software Engineering, pages 40–
48, 2003.
46. T. Xie and D. Notkin. Automatic extraction of object-oriented observer abstractions from
unit-test executions. In Proc. 6th International Conference on Formal Engineering Methods,
pages 290–305, Nov. 2004.
47. T. Xie and D. Notkin. Checking inside the black box: Regression testing by comparing value
spectra. IEEE Transactions on Software Engineering, 31(10):869–883, October 2005.
Automated Detection of Refactorings in Evolving
Components
1 Introduction
Part of maintaining a software system is updating it to use the latest version of its com-
ponents. Developers like to reuse software components to quickly build a system, but
reuse makes the system dependent on the components. Ideally, the interface of a com-
ponent never changes. In practice, however, new versions of components often change
their interfaces and require the developers to change the system to use the new versions
of the components.
An important kind of change in object-oriented software is a refactoring. Refactor-
ings [FBB+ 99] are program transformations that change the structure of a program
but not its behavior. Example refactorings include changing the names of classes and
methods, moving methods and fields from one class to another, and splitting methods
or classes. An automated tool, called refactoring engine, can apply the refactorings to
change the source code of a component. However, a refactoring engine can change only
the source code that it has access to. Component developers often do not have access
to the source code of all the applications that reuse the components. Therefore, refac-
torings that component developers perform preserve the behavior of the component but
not of the applications that use the component; in other words, although the change is
a refactoring from the component developers’ point of view, it is not a refactoring from
the application developers’ point of view.
One approach to automate the update of applications when their components change
is to extend the refactoring engine to record refactorings on the component and then to
replay them on the applications. Record-and-replay of refactorings was demonstrated
in CatchUp [HD05] and JBuilder2005 [Bor] and recently incorporated in Eclipse 3.2
Milestone 4 [Ecl]. As component developers refactor their code, the refactoring engine
creates a log of refactorings. The developers ship this log along with the new version of
the component. An application developer can then upgrade the application to the new
version by using the refactoring engine to play back the log of refactorings.
While replay of refactorings shows great promise, it relies on the existence of refac-
toring logs. However, logs are not available for the legacy versions of components. Also,
logs will not be available for all future versions; some developers will not use refactor-
ing engines with recording, and some developers will perform refactorings manually. To
exploit the full potential of replay, it is therefore important to be able to automatically
detect the refactorings used to create a new version of a component.
We propose a novel algorithm that detects a likely sequence of refactorings between
two versions of a component. Previous algorithms [APM04, DDN00, GW05, GZ05,
RD03] assumed closed-world development, where codebases are used only in-house
and changes happen abruptly (e.g., one entity dies in a version and a new refactored
entity starts from the next version). However, in the open-world development, com-
ponents are reused outside the organization, therefore changes do not happen overnight
but follow a long deprecate-replace-remove lifecycle. Obsolete entities will coexist with
their newer counterparts until they are no longer supported. Also, multiple refactorings
can happen to the same entity or related entities. This lifecycle makes it hard to accu-
rately detect refactorings. Our algorithm works fine for both closed- and open-world
paradigms.
We aim for our algorithm to help the developer infer a log of refactorings for replay.
To be practical, the algorithm needs to detect refactorings with a high accuracy. On one
hand, if the algorithm adds to a log a change that is not actually a refactoring (false posi-
tive), the developer needs to remove it from the log or the replay could potentially intro-
duce bugs. On the other hand, if the algorithm does not add to a log an actual refactoring
(false negative), the developer needs to manually find it and add it to the log. Previous
algorithms [APM04, DDN00, GW05, GZ05, RD03] aimed at detection of refactorings
for the purpose of program comprehension. Therefore, they can tolerate lower accuracy
as long as they focus the developer’s attention on the relevant parts of the software.
Our algorithm combines a fast syntactic analysis to detect refactoring candidates and
a more expensive semantic analysis to refine the results. Our syntactic analysis is based
on Shingles encoding [Bro97], a technique from Information Retrieval. Shingles are a
fast technique to find similar fragments in text files; our algorithm applies shingles to
source files. Most refactorings involve repartitioning of the source files, which results in
similar fragments of source text between different versions of a component. Our seman-
tic analysis is based on the reference graphs that represent references among source-level
entities, e.g., calls among methods1 . This analysis considers the semantic relationship
between candidate entities to determine whether they represent a refactoring.
1
These references do not refer to pointers between objects but to references among the source-
code entities in each version of the component.
406 D. Dig et al.
AbstractTextEditor AbstractTextEditor
Fig. 1. An excerpt from Eclipse versions 2.1 and 3.0 showing two refactorings, rename method
and changed method signature, applied to the same method. The squares represent classes, the
ellipses methods, and arrows are method calls. The method that changes signature also changes
name from performRevertOperation to performRevert.
2 Example
We next illustrate some refactorings that our algorithm detects between two versions of
a component. We use an example from the EclipseUI component of the Eclipse devel-
opment platform. We consider two versions of EclipseUI, from Eclipse versions 2.1.3
and 3.0. Each of these versions of EclipseUI has over 1,000 classes and 10,000 methods
in the public API (of non-internal packages). Our algorithm first uses a fast syntactic
analysis to find similar methods, classes, and packages between the two versions of the
component. (Section 4 presents the details of our syntactic analysis.) For EclipseUI,
our algorithm finds 231,453 pairs of methods with similar bodies, 487 pairs of similar
classes, and 22 pairs of similar packages. (Section 8 presents more details of this case
study.) These similar entities are candidates for refactorings. Our example focuses on
two pairs of similar methods.
Figure 1 shows two pairs of similar methods from the two versions of the
class AbtstractTextEditor from Eclipse 2.1 and 3.0. The syntactic analysis
finds that the method doRevertToSaved in version 2.1 is similar to (although
not identical with) the method doRevertToSaved in version 3.0, and the method
performRevertOperation is similar to the method performRevert. Our algorithm
then uses a semantic analysis to detect the refactorings that were performed on these
pairs. As the result, our algorithm detects that the method performRevertOperation
Automated Detection of Refactorings in Evolving Components 407
was renamed to performOperation, and its signature changed from having two ar-
guments in the version 2.1 to no argument in the version 3.0. Our previous manual
inspection [DJ05] of the Eclipse documentation and code indeed found that these two
refactorings, renamed method and changed method signature, were performed.
Our semantic analysis applies a series of detection strategies that find whether can-
didate pairs of similar entities are indeed results of refactorings. The key informa-
tion that the strategies consider is the references between the entities in each version.
For methods, the references correspond to call edges. For our example methods, both
performRevertOperation and performRevert have only one call in the entire
EclipseUI: they are both called exactly once from doRevertToSaved. Our analysis
represents this information with an edge, labeled with the number of calls, between
these methods. We present how the two strategies for renamed methods and changed
method signature proceed in our running example.
The strategy that detects renamed methods discards the pair of doRevertToSaved
methods since they have the same name. This strategy, however, investigates further
whether performRevert is a renaming of performRevertOperation. The strategy
(lazily) finds the calls to these two methods and realizes that they are called (the same
number of times) from the corresponding doRevertToSaved methods in both ver-
sions. Therefore, methods performRevertOperation and performRevert (i) are
both in class AbtstractTextEditor, (ii) have similar method bodies, (iii) have sim-
ilar incoming call edges, but (iv) differ in the name. The strategy thus concludes that
performRevert is a renaming of performRevertOperation.
The strategy that detects changed method signatures also considers all pairs
of similar methods. This strategy discards the pair of doRevertToSaved meth-
ods since they have the same signature. This strategy, however, investigates further
performRevertOperation and performRevert methods, because they represent
the same method but renamed. It is important to point out here that strategies share
detected refactorings: although performRevertOperation and performRevert
seemingly have different names, the RenameMethod strategy has already found that
these two methods correspond. The ChangedMethodSignature strategy then finds that
performRevertOperation and performOperation (i) have similar method bod-
ies, (ii) “same” name, (iii) similar call edges, but (iv) different signatures. The strategy
thus correctly concludes that a changed method signature refactoring was applied to
performOperation.
3 Algorithm Overview
This section presents a high-level overview of our algorithm for detection of refactor-
ings. Figure 2 shows the pseudo-code of the algorithm. The input are two versions of
a component, and the output is a log of refactorings applied on c1 to produce c2. The
algorithm consists of two analyses: a fast syntactic analysis that finds candidates for
refactorings and a precise semantic analysis that finds the actual refactorings.
Our syntactic analysis starts by parsing the source files of the two versions of the
component into the lightweight ASTs, where the parsing stops at the declaration of the
methods and fields in classes. For each component, the parsing produces a graph (more
408 D. Dig et al.
precisely, a tree to which analysis later adds more edges). Each node of the graphs
represents a source-level entity, namely a package, a class, a method, or a field. Each
node stores a fully qualified name for the entity, and each method node also stores the
fully qualified names of method arguments to distinguish overloaded methods. Nodes
are arranged hierarchically in the tree, based on their fully qualified names: the node
p.n is a child of the node p.
The heart of our syntactic analysis is the use of the Shingles encoding to find similar
pairs of entities (methods, classes, and packages) in the two versions of the component.
Shingles are “fingerprints” for strings with the following property: if a string changes
slightly, then its shingles also change slightly. Therefore, shingles enable detection of
strings with similar fragments much more robustly than the traditional string matching
techniques that are not immune to small perturbations like renamings or small edits.
Section 4 presents the computation of shingles in detail.
The result of our syntactic analysis is a set of pairs of entities that have similar
shingles encodings in the two versions of the component. Each pair consists of an entity
from the first version and an entity of the same kind from the second version; there are
separate pairs for methods, classes, and packages. These pairs of similar entities are
candidates for refactorings.
Our semantic analysis detects from the candidate pairs those where the second entity
is a likely refactoring of the first entity. The analysis applies seven strategies for detect-
ing specific refactorings, such as RenameMethod or ChangeMethodSignature discussed
in section 2. Section 5 presents the strategies in detail. The analysis applies each strat-
egy until it finds all possible refactorings of its type. Each strategy considers all pairs
of entities e1 , e2 of the appropriate type, e.g., RenameMethod considers only pairs
of methods. For each pair, the strategy computes how likely is that e1 was refactored
into e2 ; if the likelihood is above a user-specified threshold, the strategy adds the pair
Automated Detection of Refactorings in Evolving Components 409
to the log of refactorings that the subsequent strategies can use during further analy-
sis. Note that each strategy takes into account already detected refactorings; sharing
detected refactorings among strategies is a key for accurate detection of refactorings
when multiple types of refactorings applied to the same entity (e.g., a method was re-
named and has a different signature) or related entities (e.g., a method was renamed and
also its class was renamed). Our analysis cannot recover the list of refactorings in the
order they were performed, but it finds one path that leads to the same result.
4 Syntactic Analysis
To identify possible candidates for refactorings, our algorithm first determines pairs
of similar methods, classes, and packages. Our algorithm uses the Shingles encod-
ing [Bro97] to compute a fingerprint for each method and determines two methods
to be similar if and only if they have similar fingerprints. Unlike the traditional hashing
functions that map even the smallest change in the input to a completely different hash
value, the Shingles algorithm maps small changes in the input to small changes in the
fingerprint encoding.
void doRevertToSaved() {
IDocumentProvider p= getDocumentProvider(); Shingles: { -1942396283, -1672190785,
if (p == null) -12148775115, -56733233372, 208215292,
return; 1307570125, 1431157461,
performRevertOperation(createRevertOperation(), 190471951, 969607679 }
getProgressMonitor());
}
the methods in that class. Analogously, the shingles for a package are the minimum
Spackage values of the union of the shingles of the classes in that package. This way,
the algorithm efficiently computes shingles values and avoids recalculations.
5 Semantic Analysis
We present the semantic analysis that our algorithm uses to detect refactorings. Re-
call from Figure 2 that the algorithm applies each detection strategy until it reaches a
fixed point and that all strategies share the same log of detected refactorings, rlog.
This sharing is crucial for successful detection of refactorings when multiple types of
Automated Detection of Refactorings in Evolving Components 411
refactorings happened to the same entity (e.g., a method was renamed and has a dif-
ferent signature) or related entities (e.g., a method was renamed and also its class was
renamed). We first describe how the strategies use the shared log of refactorings. We
then describe references that several strategies use to compute the likelihood of refac-
toring. We also define the multiplicity of references and the similarity that our algorithm
computes between references. We finally presents details of each strategy. Due to the
sharing of the log, our algorithm imposes an order on the types of refactorings it detects
first. Specifically, the algorithm applies the strategies in the following order:
1. RenamePackage (RP)
2. RenameClass (RC)
3. RenameMethod (RM)
4. PullUpMethod (PUM)
5. PushDownMethod (PDM)
6. MoveMethod (MM)
7. ChangeMethodSignature (CMS)
where suf and pre are functions that take a fully qualified name and return its simple
name (suffix) and the entire name but the simple name (prefix), respectively. The func-
tion ρ recursively checks whether a renaming of some part of the fully qualified name
is already in rlog.
5.2 References
The strategies compute the likelihood of refactoring based on references among the
source-code entities in each of the two versions of the component. In each graph that
represents a version of the component, our algorithm (lazily) adds an edge from a node
n to a node n if the source entity represented by n has a reference to a source entity
represented by n. (The graph also contains the edges from the parse tree.) We define
references for each kind of nodes/entities in the following way:
Fig. 4. Syntactic and semantic checks performed by different detection strategies for refac-
torings: RP=RenamePackage, RC=RenameClass, RM=RenameMethod, PUM=PullUpMethod,
PDM=PushDownMethod, MM=MoveMethod, and CMS=ChangeMethodSignature
finding the overlap corresponds to finding the intersection of one multiset of edges with
the other multiset of edges (for nodes corresponding with respect to the refactorings).
In this view, similarity between edges in the graph is conceptually analogous to the
similarity of multisets of shingles.
Version 1 Version 2
Class1 Class1
+m2()
Class2
Class2
+m1()
+m1()
+m2()
Fig. 5. PullUpMethod: method m2 is pulled up from the subclass C2 into the superclass C1
Version 1 Version 2
Class1 Class1
+m2()
Class2 Class2
+m1() +m1()
+m2()
Fig. 6. PushDown: method m2 is pushed down from the superclass C1 into the subclass C2
into the superclass such that the method can be reused by other subclasses. Figure 6
illustrates a PDM that pushes down the declaration of a method from a superclass into
a subclass that uses the method because the method is no longer reused by other sub-
classes. In general, the PUM and PDM can be between several classes related by in-
heritance, not just between the immediate subclass and superclass; therefore, PUM and
PDM check that the original class is a descendant and an ancestor, respectively, of the
target class. These inheritance checks are done on the graph g2.
MoveMethod (MM) has the second syntactic check that requires the parent classes
of the two methods to be different. Without this check, MM would incorrectly classify
all methods of a renamed class as moved methods. The second semantic check requires
that the declaration classes of the methods not be related by inheritance; otherwise, the
refactorings would be incorrectly classified as MM as opposed to a PUM/PDM. The
third check requires that all references to the target class be removed in the second
version and that all calls to methods from the initial class be replaced with sending a
message to an instance of the initial class. We illustrate this check on the sample code
in Figure 7. In the first version, method C1.m1 calls a method C1.xyz of the same class
C1 and also calls a method C2.m2. After m1 is moved to the class C2, m1 can call any
method in C2 directly (e.g., m2), but any calls to methods residing in C1 need to be
executed through an instance of C1.
Automated Detection of Refactorings in Evolving Components 415
Version 2
Version 1
Class C1 {
Class C1 {
public void xyz() { ..... }
public void m1(C2 c2) {
}
.......
xyz();
Class C2 {
c2.m2();
public void m1(C1 c1) {
........
..............
}
c1.xyz();
m2();
public void xyz() { ..... }
.............
}
}
Class C2 {
public void m2() {....}
public void m2() { .....}
}
}
Fig. 7. Method m1 moves from class C1 in one version to class C2 in the next version. The method
body changes to reflect that the local methods (e.g., m2) are called directly, while methods from
the previous class (e.g., xyz) are called indirectly through an instance of C1.
ChangeMethodSignature (CMS) looks for methods that have the same fully qual-
ified name (modulo renamings) but different signatures. The signature of the method
can change by gaining/loosing arguments, by changing the type of the arguments, by
changing the order of the arguments, or by changing the return type.
The example from Section 2 illustrates some of the challenges in automatic detection
of refactorings that happened in reusable components. We next explicitly discuss three
main challenges and present how our algorithm addresses them.
The first challenge is the size of the code to be analyzed. An expensive semantic
analysis—for example finding similar subgraphs in call graphs (more generally, in the
entire reference graphs)—might detect refactorings but does not scale up to the size of
real-world components with tens of thousands of entities, including methods, classes,
and packages. A cheap syntactic analysis, in contrast, might find many similar entities
but is fallible to renamings. Also, an analysis that would not take into account the se-
mantics of entity relationships would produce a large number of false positives. Our
algorithm uses a hybrid of syntactic and semantic analyses: a fast syntactic analysis
creates pairs of candidate entities that are suspected of refactoring, and a more precise
semantic analysis on these candidates detects whether they are indeed refactorings.
The second challenge is the noise introduced by preserving backward compatibility
in the components. Consider for example the following change in the Struts framework
from version 1.1 to version 1.2.4: the method perform in the class Controller was
renamed to execute, but perform still exists in the later version. However, perform
is deprecated, all the internal references to it were replaced with references to execute,
and the users are warned to use execute instead of perform. Since it is not feasible
to perform an expensive analysis on all possible pairs of entities across two versions of
416 D. Dig et al.
C1REN
RenameClass RenameMethod
m1 m2
C1 C1REN
RenameMethod RenameClass
m1 m2REN
m1 m2
C1
m1 m2REN
Fig. 8. Refactorings affect related entities class C1 and method m2. The class rename happens
before the method rename in the upper path, the reverse happens in the bottom path. Both paths
end up with the same result.
a component, any detection algorithm has to consider only a subset of pairs. Some pre-
vious algorithms [APM04, DDN00, GZ05] consider only the outdated entities that die
in one version and then search for refactored counterparts that are created in the next
version. The assumption that entities change in this fashion indeed holds in the closed-
world development (where the only users of components are the component develop-
ers) but does not hold in the open-world development where outdated entities coexist
with their refactored counterparts. For example, the previous algorithms cannot detect
that perform was renamed to execute since perform still exists in the subsequent
version. Our algorithm detects that perform in the first version and execute in the
second version have the same shingles and their call sites are the same, and therefore
our algorithm correctly classifies the change as a method rename.
The third challenge is multiple refactorings happening to the same entity or related
entities. The example from Section 2, for instance, shows two refactorings, rename
method and change method signature, applied to the same method. An example of
refactorings happening to related entities is renaming a method along with renaming
the method’s class. Figure 8 illustrates this scenario. Across the two versions of a com-
ponent, class C1 was renamed to C1REN, and one of its methods, m2, was renamed to
m2REN. During component evolution, regardless of whether the class or method rename
was executed first, the end result is the same. In Figure 8, the upper part shows the case
when the class rename was executed first, and the lower part shows the case when the
method rename was executed first.
Our algorithm addresses the third challenge by imposing an order on the detection
strategies and sharing the information about detected refactorings among the detection
strategies. Any algorithm that detects refactorings conceptually reconstructs the log of
refactorings and thus not only the start and the end state of a component but also the
Automated Detection of Refactorings in Evolving Components 417
intermediate states. Our algorithm detects the two refactorings in Figure 8 by following
the upper path. When detecting a class rename, the algorithm takes into account only the
shingles for class methods and not the method names. Therefore, our algorithm detects
class C1REN as a rename of class C1 although one of its methods was renamed. This
information is fed back into the loop; it conceptually reconstructs the state 2a, and the
analysis continues. The subsequent analysis for the rename method checks whether the
new-name method belongs to the same class as the old-name method; since the previous
detection discovered that C1 is equivalent modulo rename with C1REN, m2REN can be
detected as a rename of m2.
The order in which an algorithm detects the two refactorings matters. We described
how our algorithm detects a class rename followed by a method rename. Consider, in
contrast, what would happen to an algorithm that attempts to follow the bottom path.
When analyzing what happened between the methods m2 and m2REN, the algorithm
would need the intermediate state 2b (where m2REN belongs to C1) to detect that m2
was renamed to m2REN. However, that state is not given, and in the end state m2REN
belongs to C1REN, so the algorithm would mistakenly conclude that m2REN was moved
to another class (C1REN). The subsequent analysis of what happened between classes
C1 and C1REN would presumably find that they are a rename and would then need to
backtrack to correct the previously misqualified move method as a rename method.
For this reason, our algorithm imposes an order on the detection strategies and runs
detection of renamings top-down, from packages to classes to methods.
To achieve a high level of accuracy, our algorithm uses a fixed-point computation in
addition to the ordering of detection strategies. The algorithm runs each strategy repeat-
edly until it finds no new refactorings. This loop is necessary because entities are inter-
twined with other entities, and a strategy cannot detect a refactoring in one entity until
it detects a refactoring in the dependent entities. For instance, consider this example
change that happened in the Struts framework between the versions 1.1 and 1.2.4: in the
class ActionController, the method perform was renamed to execute. The imple-
mentation of perform in ActionController is a utility class that merely delegates
to different subclasses of Action by sending them a perform message. For 11 of these
Action classes, their callers consist mostly of the ActionController.perform.
Therefore, unless a tool detects first that perform was renamed to execute, it can-
not detect correctly the similarity of the incoming call edges for the other 11 methods.
After the first run of the RenameMethod detection, our RefactoringCrawler tool misses
the 11 other method renames. However, the feedback loop adds the information about
the rename of perform, and the second run of the RenameMethod detection correctly
finds another 11 renamed methods.
Even though we only analyze seven types of refactorings, conceptually similar com-
bination of syntactic and semantic analysis can detect many other types of refactorings.
A lot of the refactorings published by Fowler et al. [FBB+ 99] can be detected in this
way, including extract/inline method, extract/inline package, extract/inline class or in-
terface, move class to different package, collapse class hierarchy into a single class,
replace record with data class, replace anonymous with nested class, replace type con-
ditional code with polymorphism, as well as some higher-level refactorings to design
418 D. Dig et al.
patterns [GHJV95] including create Factory methods, form Template Method, replace
type code with State/Strategy.
The largest extension to the current algorithm is required by ‘replace type conditional
code with polymorphism’. This refactoring replaces a switch statement whose branches
type-check the exact type of an object (e.g., using instanceof in Java) with a call to
a polymorphic method that is dynamically dispatched at run time to the right class.
All the code in each branch statement is moved to the class whose type was checked
in that branch. To detect this refactoring, the syntactic analysis should not only detect
similar methods, but also similar statements and expressions within method bodies. This
requires that shingles are computed for individual statements and expressions, which is
overhead to the current implementation, but offers a finer level of granularity. Upon
detection of similar statements in a switch branch and in a class method, the semantic
analysis needs to check whether the class has the same type as the one checked in
the branch and whether the switch is replaced in the second version with a call to the
polymorphic method.
7 Implementation
RefactoringCrawler performs the analysis and returns back the results inside an
Eclipse view. RefactoringCrawler presents only the refactorings that happened to the
public API level of the component since only these can affect the component users.
RefactoringCrawler groups the results in categories corresponding to each refactoring
strategy. Double clicking on any leaf Java element opens an editor having selected the
declaration of that particular Java element. RefactoringCrawler also allows the user to
export the results into an XML format compatible with the format that CatchUp [HD05]
uses to load a log of refactorings. A similar XML format is used for the Eclipse 3.2
Milestone 4. Additionally, the XML format allows the developer to further analyze and
edit the log, removing false positives or adding missed refactorings.
The reader can see screenshots and is encouraged to download the tool from the
website [Ref].
8 Evaluation
We evaluate RefactoringCrawler on three real-world components. To measure the ac-
curacy of RefactoringCrawler, we need to know the refactorings that were applied in
the components. Therefore, we chose the components from our previous study [DJ05]
that analyzed the API changes in software evolution and found refactorings to be re-
sponsible for more than 80% of the changes. The previous study considered components
with good release notes describing the API changes. Starting from the release notes, we
manually discovered the refactorings applied in these components. These manually dis-
covered refactorings helped us to measure the accuracy of the refactoring logs that Refac-
toringCrawler reports. In general, it is easier to detect the false positives (refactorings
that RefactoringCrawler erroneously reports) by comparing the reported refactorings
against the source code than it is to detect the false negatives (refactorings that Refactor-
ingCrawler misses). To determine false negatives, we compare the manually found refac-
torings against the refactorings reported by RefactoringCrawler. Additionally, Refactor-
ingCrawler found a few refactorings that were not documented in the release notes. Our
previous study and the evaluation of RefactoringCrawler allowed us to build a repository
of refactorings that happened between the two versions of the three components. The case
study along with the tool and the detected refactorings can be found online [Ref].
For each component, we need to choose two versions. The previous study [DJ05]
chose two major releases that span large architectural changes because such releases
are likely to have lots of changes and to have the changes documented. We use the
same versions to evaluate RefactoringCrawler. Note, however, that these versions can
present hard cases for RefactoringCrawler because they are far apart and can have large
changes. RefactoringCrawler still achieves practical accuracy for these versions. We
believe that RefactoringCrawler could achieve even higher accuracy on closer versions
with less changes.
Eclipse Platform. [eclipse.org] provides many APIs and many different smaller frame-
works. The key framework in Eclipse is a plug-in based framework that can be used to
develop and integrate software tools. This framework is often used to develop Integrated
Development Environments (IDEs). We focus on the UI subcomponent (Eclipse.UI)
that contains 13 plug-ins.
We chose two major releases of Eclipse, 2.1 (March 2003) and 3.0 (June 2004).
Eclipse 3.0 came with some major themes that affected the APIs. The responsiveness
theme ensured that more operations run in the background without blocking the user.
New APIs allow long-running operations like builds and searches to be performed
in the background while the user continues to work. Another major theme in 3.0 is
rich-client platforms. Eclipse was designed as a universal IDE. However many compo-
nents of Eclipse are not particularly specific to IDEs and can be reused in other rich-
client applications (e.g., plug-ins, help system, update manager, window-based GUIs).
This architectural theme involved factoring out IDE-specific elements. APIs heavily af-
fected by this change are those that made use of the filesystem resources. For instance
IWorkbenchPage is an interface used to open an editor for a file input. All methods
that were resource specific (those that dealt with opening editors over files) were re-
moved from the interface. A client who opens an editor for a file should convert it first
to a generic editor input. Now the interface can be used by both non-IDE clients (e.g.,
an electronic mail client that edits the message body) as well as IDE clients.
Struts. [struts.apache.org] is an open source framework for building Java web applica-
tions. The framework is a variation of the Model-View-Controller (MVC) design para-
digm. Struts provides its own Controller component and integrates with other technolo-
gies to provide the Model and the View. For the Model, Struts can interact with standard
data access technologies, like JDBC and EJB, and many third-party packages. For the
View, Struts works with many presentation systems.
We chose two major releases of Struts, 1.1 (June 2003) and 1.2.4 (September 2004).
All the API changes reveal consolidation work that was done in between the two re-
leases. The developers eliminated duplicated code and removed unmaintained or buggy
code.
graphical figures, and support for saving, loading, and printing drawings. The frame-
work has been used to create many different editors.
We chose two major releases of JHotDraw, 5.2 (February 2001) and 5.3 (January
2002). The purpose of 5.3 release was to clean up the APIs and fix bugs.
To measure the accuracy of RefactoringCrawler, we use precision and recall, two stan-
dard metrics from the Information Retrieval field. Precision is the ratio of the number
of relevant refactorings found by the tool to the total number of irrelevant and relevant
refactorings found by the tool. It is expressed as the percentage:
Recall is the ratio of the number of relevant refactorings found by the tool (good re-
sults) to the total number of actual refactorings in the component. It is expressed as the
percentage:
Ideally, precision and recall should be 100%. If that was the case, the reported refac-
torings could be fed directly into a tool that replays them to automatically upgrade
component-based applications. However, due to the challenges mentioned in Section 6,
it is hard to have 100% precision and recall.
Table 2 shows how many instances of each refactoring were found for the three
components. These results use the default values for the parameters in Refactor-
ingCrawler [Ref]. For each refactoring type, we show in a triple how many good results
RefactoringCrawler found, how many false positives RefactoringCrawler found, and
how many false negatives (according to the release notes [DJ05]) RefactoringCrawler
found. For each component, we compute precision and recall that take into account the
refactorings of all kinds.
We further analyzed why RefactoringCrawler missed a few refactorings. In
Struts, for instance, method RequestUtils.computeParameters is moved to
TagUtils.computeParameters, and method RequestUtils.pageURL is
moved to TagUtils.pageURL. There are numerous calls to these methods from
a test class. However, it appears that the test code was not refactored, and therefore it
still calls the old method (that is deprecated), which results in quite different call sites
for the old and the refactored method.
422 D. Dig et al.
Fig. 9. Running time for JHotDraw decreases exponentially with higher threshold values used in
the syntactic analysis
8.3 Performance
The results in Table 2 were obtained when RefactoringCrawler ran on a Fujitsu lap-
top with a 1.73GHz Pentium 4M CPU and 1.25GB of RAM. It took 16 min 38 sec for
detecting the refactorings in EclipseUI, 4 min and 55 sec for Struts, and 37 sec for JHot-
Draw. Figure 9 shows how the running time for JHotDraw varies with the change of the
method similarity threshold values used in the syntactic analysis. For low threshold val-
ues, the number of candidate pairs passed to the semantic analysis is large, resulting in
longer analysis time. For high threshold values, fewer candidate pairs pass into the se-
mantic analysis, resulting in lower running times. For JHotDraw, a .1 method similarity
threshold passes 1842 method candidates to the RenameMethod’s semantic analysis, a
.5 threshold value passes 88 candidates, while a .9 threshold passes only 4 candidates.
The more important question, however, is how precision and recall vary with the
change of the similarity threshold values. Very low threshold values produce a larger
number of candidates to be analyzed, which results in a larger number of false positives,
but increases the chance that all the relevant refactorings are found among the results.
Very high threshold values imply that only those candidates that have almost perfect
body resemblance are taken into account, which reduces the number of false positives
but can miss some refactorings. We have found that threshold values between 0.5 and
0.7 result in practical precision and recall.
Strengths
– High precision and recall. Our evaluation on the three components shows that both
precision and recall of RefactoringCrawler are over 85%. Since RefactoringCrawler
Automated Detection of Refactorings in Evolving Components 423
combines both syntactic and semantic analysis, it can process a realistic size of
software with practical accuracy. Compared to other approaches [APM04, DDN00,
GW05, GZ05, RD03] that use only syntactic analysis and produce large number
of false positives, our tool requires little human intervention to validate the refac-
torings. RefactoringCrawler can significantly reduce the burden necessary to find
refactoring logs that a replay tool uses to automatically upgrade component-based
applications.
– Robust. Our tool is able to detect refactorings in the presence of noise introduced
because of maintaining backwards compatibility, the noise of multiple refactorings,
and the noise of renamings. Renamings create huge problems for other approaches
but do not impede our tool. Since our tool identifies code entities (methods, classes,
packages) based on their body resemblance and not on their names, our tool can
successfully track the same entity across different versions, even when its name
changes. For previous approaches, a rename is equivalent with an entity disappear-
ing and a brand new entity appearing in the subsequent version. Another problem
for previous approaches is the application of multiple refactorings to the same en-
tity. Our tool takes this into account by sharing the log of refactorings between
the detection strategies and repeating each strategy until it reaches a fixed point.
Lastly, our tool detects refactorings in an open-world development where, due to
backwards compatibility, obsolete entities coexist with their refactored counterparts
until the former are removed. We can detect refactorings in such an environment
because most of refactorings involve repartitioning the source code. This results in
parts of the code from a release being spread in different places in the next release.
Our algorithm starts by detecting the similarities between two versions.
– Scalable. Running expensive semantic analysis (like identifying similar subgraphs
in the entire reference graph) on large codebases comprising of tens of thousands of
nodes (methods, classes, packages) is very expensive. To avoid this, we run first an
inexpensive syntactic analysis that reduces the whole input domain to a relatively
small number of candidates to be analyzed semantically. It took RefactoringCrawler
16 min 38 sec to analyze for the org.eclipse.ui subcomponent (352 KLOC) of the
Eclipse Platform.
Limitations
– Poor support for interfaces and fields. Since our approach tracks the identity
of methods, classes, and packages based on their textual bodies and not on their
names, it does not fit for those entities that lack a body. Both class fields and in-
terface methods do not contain any body other than their declaration name. After
the syntactic analysis, only entities that have a body resemblance are passed to the
semantic analysis. Therefore, refactorings that happened to fields or interface meth-
ods cannot be detected. This was the case in org.eclipse.ui where between versions
2.1.3 and 3.0 many static fields were moved to other classes and many interface
methods were moved to abstract classes. To counteract the lack of textual bodies
for fields or interface methods, we treated their associated javadoc comments as
their text bodies. This seems to work for some cases, but not all.
– Requires experimentation. As with any approach based on heuristics, coming up
with the right values for the detection algorithms might take a few trials. Selecting
424 D. Dig et al.
threshold values too high reduces the false positives toward zero but can miss some
refactorings as only those candidates that have perfect resemblance are selected.
Selecting too low threshold values produces a large number of false positives but
increases the chances that all relevant refactorings are found among the results.
The default threshold values for RefactoringCrawler are between 0.5 and 0.7 (for
various similarity parameters) [Ref]. When default values do not produce adequate
results, users could start from high threshold values and reduce them until the num-
ber of false positive becomes too large.
9 Related Work
We provide an overview of related work on refactoring, automated detection of refac-
torings, and the use of Shingles encoding.
9.1 Refactoring
Programmers have been cleaning up their code for decades, but the term refactoring
was coined much later [OJ90]. Opdyke [Opd92] wrote the first catalog of refactor-
ings, while Roberts and Brant [RBJ97,Rob99] were the first to implement a refactoring
engine. The refactoring field gained much popularity with the catalog of refactorings
written by Fowler et al. [FBB+ 99]. Soon after this, IDEs began to incorporate refac-
toring engines. Tokuda and Batory [TB01] describe how large architectural changes in
two frameworks can be achieved as a sequence of small refactorings. They estimate that
automated refactorings are 10 times quicker to perform than manual ones.
More recent research on refactoring focuses on the analyses for automating powerful
refactorings. Tip et al. [TKB03] use type constraints to support an analysis for refactor-
ings that introduce type generalization. Donovan et al. [DKTE04] use a pointer analysis
and a set-constraint-based analysis to support refactorings that replace the instantiation
of raw classes with generic classes. Dincklage and Diwan [vDD04] use various heuris-
tics to convert from non-generic classes to generic classes. Balaban et al. [BTF05] pro-
pose refactorings that automatically replace obsolete library classes with their newer
counterparts. Component developers have to provide mappings between legacy classes
and their replacements, and an analysis based on type constraints determines where the
replacement can be done. Thomas [Tho05] points out that refactorings in the compo-
nents result into integration problems and advocates the need for languages that would
allow developers to specify refactorings to create customizable refactorings.
merge into one). Based on Vector Space cosine similarity, they compare the class identi-
fiers found in two subsequent releases. Therefore, if a class, say Resolver, was present
in version n but disappears in version n + 1 and a new class SimpleResolver ap-
pears in version n + 1, they conclude that a class replacement happened. Godfrey and
Zou [GZ05] are the closest to the way how we envision detecting structural changes.
They implemented a tool that can detect some refactorings like renaming, move method,
split, and merge for procedural code. Whereas we start from shingles analysis, they em-
ploy origin analysis along with a more expensive analysis on call graphs to detect and
classify these changes. Rysselberghe and Demeyer [RD03] use a clone finding tool
(Duploc) to detect methods that were moved across the classes. Gorg and Weisger-
ber [GW05] analyze subsequent versions of a component in configuration management
repositories to detect refactorings.
Existing work on automatic detection of refactorings addresses some of the needs
of reverse engineers who must understand at a high level how and why components
evolved. For this reason, most of the current work focuses on detecting merging and
splitting of classes. However, in order to automatically migrate component-based ap-
plications we need to know the changes to the API. Our work complements existing
work because we must look also for lower level refactorings that affect the signatures
of methods. We also address the limitations of existing work with respect to renamings
and noise introduced by multiple refactorings on the same entity or the noise introduced
by the deprecate-replace-remove cycle in the open-world components.
10 Conclusions
Syntactic analyses are too unreliable, and semantic analyses are too slow. Combining
syntactic and semantic analyses can give good results. By combining Shingles encoding
with traditional semantic analyses, and by iterating the analyses until a fixed point was
discovered, we could detect over 85% of the refactorings while producing less than 10%
false positives.
The algorithm would work on any two versions of a system. It does not assume that
the later version was produced by any particular tool. If a new version is produced by a
refactoring tool that records the refactorings that are made, then the log of refactorings
will be 100% accurate. Nevertheless, there may not be the discipline or the opportunity
426 D. Dig et al.
to use a refactoring tool, and it is good to know that refactorings can be detected nearly
as accurately without it.
There are several applications of automated detection of refactorings. First, a log of
refactorings helps in the automated migration of component-based applications. As our
previous study [DJ05] shows, more than 80% of the API changes that break compati-
bility with existing applications are refactorings. A tool like Eclipse can replay the log
of refactorings. The replay is done at the application site where both the component
and the application reside in the same workspace. In this case, the refactoring engine
finds and correctly updates all the references to the refactored entities, thus migrating
the application to the new API of the component.
Second, a log of refactorings can improve how current configuration management
systems deal with renaming. A tool like CVS looses all the change history for a source
file whose main class gets renamed, since this appears as if the old source file was
removed and a source file with a new name was added. A log of refactorings can help
the configuration management system to correlate the old files/folders with the new
files/folders when the main class or package was renamed.
Third, a log of refactoring can help a developer understand how an object-oriented
system has evolved from one version to another. For example, an explicit list of re-
namings tells how the semantics of the refactored entity changed, while a list of moved
methods tells how the class responsibilities shifted.
The tool and the evaluation results are available online [Ref].
Acknowledgments
We would like to thank Zheng Shao and Jiawei Han who suggested the use of shin-
gles for detecting similar methods. Adam Kiezun, Russ Ruffer, Filip Van Rysselberghe,
Danny Soroker, anonymous reviewers, and members of the SAG group at UIUC pro-
vided valuable feedback on the drafts of this paper. This work is partially funded
through an Eclipse Innovation Grant for which we are very grateful to IBM.
References
[APM04] Giuliano Antoniol, Massimiliano Di Penta, and Ettore Merlo. An automatic ap-
proach to identify class evolution discontinuities. In IWPSE’04: Proceedings of
International Workshop on Principles of Software Evolution, pages 31–40, 2004.
[Bor] What’s new in Borland Jbuilder 2005. http://www.borland.com/resources/en/pdf/
white papers/jb2005 whats new.pdf.
[Bro97] Andrei Broder. On the resemblance and containment of documents. In SE-
QUENCES ’97: Proceedings of Compression and Complexity of Sequences, pages
21–29, 1997.
[BTF05] Ittai Balaban, Frank Tip, and Robert Fuhrer. Refactoring support for class library
migration. In OOPSLA ’05: Proceedings of Object-oriented programming, systems,
languages, and applications, pages 265–279, New York, NY, USA, 2005. ACM
Press.
[DDN00] Serge Demeyer, Stéphane Ducasse, and Oscar Nierstrasz. Finding refactorings via
change metrics. In OOPSLA’00: Proceedings of Object oriented programming, sys-
tems, languages, and applications, pages 166–177, 2000.
Automated Detection of Refactorings in Evolving Components 427
[DJ05] Danny Dig and Ralph Johnson. The role of refactorings in api evolution. In
ICSM’05: Proceedings of International Conference on Software Maintenance,
pages 389–398, Washington, DC, USA, 2005. IEEE Computer Society.
[DKTE04] Alan Donovan, Adam Kiezun, Matthew S. Tschantz, and Michael D. Ernst. Con-
verting Java programs to use generic libraries. In OOPSLA ’04: Proceedings of
Object-oriented programming, systems, languages, and applications, volume 39,
pages 15–34, New York, NY, USA, October 2004. ACM Press.
[Ecl] Eclipse Foundation. http://eclipse.org.
[FBB+ 99] Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refac-
toring: Improving the Design of Existing Code. Adison-Wesley, 1999.
[GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns:
Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995.
[GW05] Carsten Gorg and Peter Weisgerber. Detecting and visualizing refactorings from
software archives. In IWPC’05: Proceedings of the 13th International Workshop
on Program Comprehension, pages 205–214, Washington, DC, USA, 2005. IEEE
Computer Society.
[GZ05] Michael W. Godfrey and Lijie Zou. Using origin analysis to detect merging and
splitting of source code entities. IEEE Transactions on Software Engineering,
31(2):166–181, 2005.
[HD05] Johannes Henkel and Amer Diwan. CatchUp!: Capturing and replaying refactorings
to support API evolution. In ICSE’05: Proceedings of International Conference on
Software Engineering, pages 274–283, 2005.
[KDLT04] Purushottam Kulkarni, Fred Douglis, Jason D. LaVoie, and John M. Tracey. Redun-
dancy elimination within large collections of files. In USENIX Annual Technical
Conference, General Track, pages 59–72, 2004.
[LLMZ04] Zhenmin Li, Shan Lu, Suvda Myagmar, and Yuanyuan Zhou. CP-Miner: A tool for
finding copy-paste and related bugs in operating system code. In OSDI’04: Pro-
ceedings of the Sixth Symposium on Operating System Design and Implementation,
pages 289–302, 2004.
[Man93] Udi Manber. Finding similar files in a large file system. Technical Report 93-33,
University of Arizona, 1993.
[OJ90] Bill Opdyke and Ralph Johnson. Refactoring: An aid in designing application
frameworks and evolving object-oriented systems. In SOOPPA’90: Proceedings of
Symposium on Object-Oriented Programming Emphasizing Practical Applications,
1990.
[Opd92] Bill Opdyke. Refactoring Object-Oriented Frameworks. PhD thesis, University of
Illinois at Urbana-Champaign, 1992.
[Rab81] Michael O. Rabin. Fingerprinting by random polynomials. Technical Report 15-81,
Harvard University, 1981.
[RBJ97] Don Roberts, John Brant, and Ralph E. Johnson. A refactoring tool for Smalltalk.
TAPOS, 3(4):253–263, 1997.
[RD03] Filip Van Rysselberghe and Serge Demeyer. Reconstruction of successful software
evolution using clone detection. In IWPSE’03: Proceedings of 6th International
Workshop on Principles of Software Evolution, pages 126–130, 2003.
[Ref] RefactoringCrawler’s web page:. https://netfiles.uiuc.edu/dig/RefactoringCrawler .
[RILD04] Lakshmish Ramaswamy, Arun Iyengar, Ling Liu, and Fred Douglis. Automatic
detection of fragments in dynamically generated web pages. In WWW ’04: Pro-
ceedings of the 13th international conference on World Wide Web, pages 443–454,
New York, NY, USA, 2004. ACM Press.
[Rob99] Don Roberts. Practical Analysis for Refactoring. PhD thesis, University of Illinois
at Urbana-Champaign, 1999.
428 D. Dig et al.
[TB01] Lance Tokuda and Don Batory. Evolving object-oriented designs with refactorings.
Automated Software Engineering, 8(1):89–120, January 2001.
[Tho05] Dave Thomas. Refactoring as meta programming? Journal of Object Technology,
4(1):7–11, January-February 2005.
[TKB03] Frank Tip, Adam Kiezun, and Dirk Bauemer. Refactoring for generalization using
type constraints. In OOPSLA ’03: Proceedings of Object-oriented programing, sys-
tems, languages, and applications, volume 38, pages 13–26, New York, NY, USA,
November 2003. ACM Press.
[vDD04] Daniel von Dincklage and Amer Diwan. Converting Java classes to use generics.
In OOPSLA ’04: Proceedings of Object-oriented programming, systems, languages,
and applications, pages 1–14. ACM Press, 2004.
Modeling Runtime Behavior in
Framework-Based Applications
1 Introduction
Large-scale applications are being built from increasingly many reusable frame-
works, such as web application servers (that use SOAP [5], EJB, JSP), portal
servers, client platforms (Eclipse), and industry-specific frameworks. Over the
past several years, our research group has analyzed the performance of dozens of
industrial framework-based applications. In every application we looked at, an
enormous amount of activity was executed to accomplish simple tasks. This was
the case, even after some tuning effort has been applied. For example, a stock
brokerage benchmark [13] executes 268 method calls and creates 70 new objects
just to move a single date field from SOAP to Java. Beyond identifying bottle-
necks, this paper presents an approach to understanding the general causes of
runtime complexity and inefficiency in these applications.
In our experience, inefficiencies are not typically manifested in a few hot meth-
ods. They are mostly due to a constellation of transformations. Each transfor-
mation takes data produced in one framework and makes it suitable for another.
Problems are less likely to be caused by poor algorithm choices, than by the
combined design and implementation choices made in disparate frameworks. In
a web-based server application, for example, the data arrives in one format, is
transformed into a Java business object, and is sent to a browser or another sys-
tem – e.g. from SOAP, to an EJB, and finally to XML. Surprisingly, inside each
transformation are often many smaller transformations; inside these are often
yet more transformations, each the result of lower-level framework coupling. In
addition, many steps are often required to facilitate these transformations. For
example, a chain of lookups may be needed to find the proper SOAP deserializer.
In our benchmark example, moving that date from SOAP to Java took a total
of 58 transformations.
How do we know if 58 transformations is excessive for this operation? And if
so, what could possibly require so many? Traditional performance tools model
runtime behavior in terms of implementation artifacts, such as methods, pack-
ages, and call paths [1, 2, 3, 8, 10, 20, 23]. Transformations, however, are imple-
mented as sequences of method calls, spanning multiple frameworks. In this
paper, we present an approach for understanding and quantifying behavior in
terms of transformations and what they accomplish. We demonstrate how this
model enables:
Fig. 1. A dataflow diagram of how the Trade benchmark transforms a date, from a
SOAP message to a Java object
processing of each subfield requires five copies and six type changes, indicating
poor framework coupling at the subfield level.
2 Structuring Behavior
We model runtime behavior using data flow. Using the raw data flow information
would give too much, and too low a level of information to make sense of. In
this section we present our approach to filtering and grouping activity into a
hierarchy of data flow diagrams.
Figure 1 shows a dataflow diagram from a configuration of the Trade 6 bench-
mark [13] that acts as a SOAP client.1 The figure follows the flow of one small
piece of information, a field representing the purchase date of a stock holding,
from a web service response into a field of the Java object that will later be used
for producing HTML. We follow this field because, of all the fields of a holding,
it is the most expensive to process.
Each edge shows the flow of the physical form of some logical content. In the
figure, the same purchase date is shown on three edges: first as some subset of the
1
We omit the standard data flow notation for sources and sinks, and instead represent
them as unterminated edges.
Modeling Runtime Behavior in Framework-Based Applications 433
bytes in a SOAP response, then as a Java Calendar (and its subsidiary objects),
and finally as a Java Date. Each node denotes a transformation of that data,
and it groups together invocations of many methods or method fragments, drawn
from multiple frameworks. In Sections 3.3 and 3.4 we discuss transformations
and logical content in more depth.
Structuring in this way relates the cost of disparate activity to the data it pro-
duced. Figure 1 shows that the cost of the first transformation was 268 method
calls and 70 new objects, mostly temporaries.2 All this, just to produce an in-
termediate (Java object) form of the purchase date.
Section 3 gives more precise definitions of logical and physical change. Note that
some of the inputs to a transformation will be facilitators, such as schemas or
converters. In the diagram for that transformation, we also include the sequence
of transformations needed to produce these facilitators. Section 3.1 discusses
facilitators in more depth.
While one diagram shows data flow at a single level of granularity, it will
also show those transformations that transition between that granularity and
the next lower one. For example, the transformation that extracts a field from a
record will be included in the diagram of the record.
We form additional levels of diagram to distinguish the parties responsible for
a given cost. We define an architectural unit to be a set of classes. Given a set of
architectural units, a hierarchical dataflow diagram splits the behavior so that
the activity at one level of diagram is that caused by at most one architectural
unit. The choice of architectural units allows flexibility in assigning responsibility
for the existence of transformations. In our experience, architectural units do
not necessarily align with package structure. The diagram of Figure 1 shows the
field-level activity that the application initiates. Other field-level activity that
SOAP is responsible for is grouped under the first node. To analyze the behavior
that SOAP causes, we can zoom in, to explore a subdiagram.
Fig. 2. Zooming in on the first step of Figure 1 shows how the SOAP framework
transforms the purchase date field
436 N. Mitchell, G. Sevitsky, and H. Srinivasan
Format library class for parsing. This is an expensive step, creating 39 objects
(38 of them temporaries). Below, we explore that diagram to find out why.3 It
then returns a new Date object, and joins that object with the original time
zone and milliseconds.
The Java library has two date classes. A Date object stores the number of
milliseconds since a fixed point in time. A Calendar stores a date in two different
forms, and can convert between them. One form is the same as in Date; the other
is seventeen integer fields that are useful for operating on dates, such as year,
month, day, hour, or day of the week.
In the top row is an expensive transformation that builds a new default Cal-
endar from the current time. Our Date object is then used to set the value of this
Calendar again. Finally, that Calendar becomes the purchase date field of our
business object, via a reflective call to a setter method. Java’s reflection interface
requires the Calendar to first be packaged into an object array.
Diagram level 2. Figure 3 zooms in to show the Java library’s responsibility
for the SimpleDateFormat parse transformation. The String containing the date
is input, and each of its six subfields – year, month, day, hour, minute, and
second – is extracted and parsed individually.
The SimpleDateFormat maintains its own Calendar, different from the one
discussed earlier at the SOAP level. Once a subfield of date has been extracted
and parsed into an integer, the corresponding field of the Calendar is set. After
all six subfields are set, the Calendar converts this field representation into a
time representation. This is then used to create a new Date object.
Diagram level 3. Figure 4 shows the detail of extracting and parsing a single
date subfield, in this case, a year. Even at this microscopic level, the standard
Java library requires six transformations to convert a few characters in the String
(in “YYYY” representation) into the integer form of the year.
3
It often seems that things named “Simple” are expensive.
Modeling Runtime Behavior in Framework-Based Applications 437
Fig. 4. Zooming into the first step of Figure 3 shows how the standard Java library’s
number-handling code transforms a subfield of a purchase date (such as a year, month,
or day)
438 N. Mitchell, G. Sevitsky, and H. Srinivasan
The first five transformations come from the general purpose DecimalFormat
class. It can parse or format any kind of decimal number. SimpleDateFormat,
however, uses it for a special case, to parse integer months, days, and years. The
first, fifth, and sixth transformations are necessary only because of this overgen-
erality. The first transformation looks for a decimal point, an E for scientific no-
tation, and rewraps the characters.4 Furthermore, since DecimalFormat.parse()
returns either a double or long value, the fifth transformation is needed to box
the return value into an Object, and the sixth transformation is only necessary
to unbox it.
3 Classifying Behavior
Table 1. We classify data flowing along an edge according to this taxonomy of how
the subsequent transformation uses it
Phenomena Example
carrier the Java form of an Employee object
schema Java class info; record layout
metadata
facilitator
Figure 5 shows the SOAP level of parsing of a Date, with the input of each
transformation classified according to purpose. Note the carriers along the middle
row. Also note facilitators such as converters: the Calendar in the top row, the
SimpleDateFormat in the middle row, and the Deserializer in the bottom row.
All three serve the same broad purpose, though their implementations and the
kinds of conversions they facilitate are different. One input to the “parse using
SimpleDateFormat” transformation is a ParsePosition, a Java library class that
maintains a position in a String; it acts as a cursor.
Note that the same data may be used as input to more than one transforma-
tion. In this case, it may serve multiple purposes. The Calendar in the top row
of Figure 5 first serves as a converter when it facilitates the “set time” trans-
formation, and then as a carrier of the purchase date into the “box into array”
transformation.
Classifying by purpose helps to assess the appropriateness of costs. For exam-
ple, one would not expect the initialization of a converter to depend on the data
being converted, but only on the type of data. It would seem strange, then, to
see many converters for the parsing of fields. The scenario of Figure 5 requires
three converters to process a field.
Fig. 5. Showing the same dataflow diagram as Figure 2, with the edges classified by
data purpose
Modeling Runtime Behavior in Framework-Based Applications 441
exchange), or alter just the value represented (value add). Note that a given
transformation may map to more than one phenomenon. For example, a trans-
formation that formats stock holdings and quotes into HTML is both a join of
the two sets of records, and is adding value by formatting them.
Figure 4 shows the six transformations to process a subfield of date in our
Trade example. The first of the six transformations extracts the digits of the
subfield (e.g. year, month, day) from a String representing the entire date. The
last five of the six transformations preserve the information content. Looking
at the analysis scenario in this way – as one extraction and five information
preserving transformations – makes it clear what was (not) accomplished.
4 Quantifying Behavior
This section presents two classes of metrics for quantifying complexity and re-
source usage of framework-based applications. Both quantify behavior in terms
independent of any one framework, enabling meaningful evaluation and compar-
ison across applications.
Table 5. Two behavior signatures of Figure 5, with transformations and object cre-
ations broken down by change in logical content
Note that for the latter behavior signature, while we measure objects created
by all sub-transformations, in this case we chose to assign those costs based on
the category label of just the top-level transformations. This allows the developer
of the code at that level, who controls only how the top-level transformations
affect logical content, to consider the cumulative costs incurred by his or her
choices.
446 N. Mitchell, G. Sevitsky, and H. Srinivasan
Table 6. Two flow-induced behavior signatures for Figure 5 that break down cost by
the purpose of data produced
5 Further Examples
This section presents two examples that demonstrate the power of the metrics
presented in Section 4.
implementation bit
(fragments from various classes) change copy exchange carrier
pre 1.4.2 append(String.valueOf(x)) 1 2 0 1
char[] A = threadLocal.get();
Integer.getChars(x,A);
1.4.2 append(A); 1 2 1 0
ensureCapacity(stringSizeOfInt(x));
char[] A = this.value;
1.5.0 Integer.getChars(x,A); 1 1 0 0
construction of a new carrier object, but adds a lookup transformation instead (to
fetch that array from thread-local storage). In the most recent, Java 1.5, implemen-
tation, StringBuffer simply asks Integer to fill in its own character array directly.
Each row of Table 7 is a behavior signature that captures the runtime com-
plexity of an implementation. It is natural that appending should, at a minimum,
require a copy. We’d also expect, since integers and characters have different rep-
resentations, to see one bit-changing transformation. The behavior signature of
the third implementation shows these and nothing more.
Note that this analysis also highlights a difference between the two appli-
cations. Upon closer inspection, we found that the two applications used very
different physical models for their business objects. This points out one of the
challenges in designing good benchmarks for framework-based applications: to
capture the great variety of reasons things can go wrong.
6 Related Work
The design patterns work provides a vocabulary of common architectural and im-
plementation idioms [11]. Allowing developers to relate specific implementations
to widely known patterns has been of immense value to how they conceptualize,
communicate, and evaluate designs. While design patterns abstract the struc-
ture of an implementation, our phenomena abstract what a run accomplishes
in the transformation of data. Other work introduces classification in abstract
terms for component interconnections [18]. and for characterizing configuration
complexity [6].
Recent work on mining jungloids [15] addresses a similar problem to ours, but
at development time. They observe that, in framework-based applications, the
coding process is difficult, due to the need to navigate long chains of framework
calls.
There are many measures of code complexity and ways to normalize them,
such as function points analysis [16], cyclomatic complexity [17], and the main-
tainability index [24]. Our measures are geared toward evaluating runtime be-
havior, especially as it relates to surfacing obstacles to good performance.
Performance understanding tools assign measurements to the artifacts of a
specific application or framework [1, 2, 3, 8, 10, 14, 20, 23]. Some have identified
that static classes do not capture the dynamic behavior of objects [3, 14]. Char-
acterization approaches [9, 21], on the other hand, allow comparisons across ap-
plications, but usually in terms of low-level, highly aggregated physical measures,
leaving little room for understanding what is occurring and why. By combining
measurement with a framework-independent vocabulary of phenomena, we are
able to provide a descriptive characterization. The work on characterizing con-
figuration complexity [6] has similar benefits in its domain.
There is much work on using data flow diagrams, at design time, to capture the
flow of information through processes at a conceptual level [7, 12]. In contrast,
Modeling Runtime Behavior in Framework-Based Applications 449
compilers and some tools analyze the data flow of program variables in source
code [22]. In our work we use the data flow of logical content to structure runtime
artifacts. This also sets us apart from existing performance tools, which typically
organize activity based on control flow.
Finally, there is much work on recovering the design of complex applica-
tions [4, 19].
8 Conclusions
That developers make such reuse of frameworks has been a boon for the de-
velopment of large-scale applications. The flip side seems to be complex and
poorly-performing programs. Developers can not make informed design deci-
sions because costs are hidden from them. Moreover, framework designers can
not predict the usage of their components. They must either design overly gen-
eral frameworks, or ones specialized for use cases about which they can only
guess. Our intent in this paper has been to introduce a way to frame discussions
and analysis of this kind of complexity.
Acknowledgments
We wish to thank Tim Klinger, Harold Ossher, Barbara Ryder, Edith Schonberg,
and Kavitha Srinivas for their contributions.
References
1. Alexander, W.P., Berry, R.F., Levine, F.E., Urquhart, R.J.: A unifying approach to
performance analysis in the java environment. IBM Systems Journal 39(1) (2000)
2. Ammons, G., Choi, J., Gupta, M., Swamy, N.: Finding and removing performance
bottlenecks in large systems. In: The European Conference on Object-Oriented
Programming. (2004)
3. Arisholm, E.: Dynamic coupling measures for object-oriented software. In: Sym-
posium on Software Metrics. (2002)
4. Bellay, B., Gall, H.: An evaluation of reverse engineering tool capabilities. Journal
of Software Maintenance: Research and Practice 10 (1998)
5. Box, D., Ehnebuske, D., Kakivaya, G., Layman, A., Mendelsohn, N., Nielsen, H.F.,
Thatte, S., Winer, D.: Simple object access protocol (SOAP) 1.1. Technical Re-
port 08, W3C World Wide Web Consortium (2000)
6. Brown, A.B., Keller, A., Hellerstein, J.L.: A model of configuration complexity
and its application to a change management system. In: Integrated Management.
(2005)
7. Coad, P., Yourdon, E.: Object-Oriented Analysis. 2 edn. Prentice-Hall, Englewood
Cliffs, NJ (1991)
8. De Pauw, W., Mitchell, N., Robillard, M., Sevitsky, G., Srinivasan, H.: Drive-by
analysis of running programs. In: Workshop on Software Visualization. (2001)
9. Dieckmann, S., Hlze, U.: A study of the allocation behavior of the SPECjvm98
Java benchmark. In: The European Conference on Object-Oriented Programming.
(1999) 92–115
10. Dufour, B., Driesen, K., Verbrugge, L.J.H.C.: Dynamic metrics for java. In: Object-
oriented Programming, Systems, Languages, and Applications. (2003) 149–168
11. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of
Reusable Object-Oriented Software. Addison-Wesley (1994)
12. Gane, C., Sarson, T.: Structured Systems Analysis. Prentice-Hall, Englewood
Cliffs, NJ (1979)
13. IBM: Trade web application benchmark.
(http://www.ibm.com/software/webservers/appserv/wpbs_download.html)
Modeling Runtime Behavior in Framework-Based Applications 451
14. Kuncak, V., Lam, P., , Rinard, M.: Role analysis. In: Symposium on Principles of
Programming Languages. (2002)
15. Mandelin, D., Xiu, L., Bodik, R., Kimmelman, D.: Mining jungloids: Helping to
navigate the api jungle. In: Programming Language Design and Implementation.
(2005)
16. Marciniak, J.J., ed.: Encyclopedia of Software Engineering. John Wiley & Sons
(2004)
17. McCabe, T.J., Watson, A.H.: Software complexity. Crosstalk, Journal of Defense
Software Engineering 7(12) (1994) 5–9
18. Mehta, N.R., Medvidovic, N., Phadke, S.: Towards a taxonomy of software con-
nectors. In: International Conference on Software Engineering. (2000)
19. Richner, T., Ducasse, S.: Using dynamic information for the iterative recovery of
collaborations and roles. In: International Conference on Software Maintenance.
(2002)
20. Sevitsky, G., De Pauw, W., Konuru, R.: An information exploration tool for per-
formance analysis of java programs. In: TOOLS Europe 2001, Zurich, Switzerland
(2001)
21. Sherwood, T., Perelman, E., Hamerly, G., , Calder, B.: Automatically character-
izing large scale program behavior. In: Architectural Support for Programming
Languages and Operating Systems. (2002)
22. Tip, F.: A survey of program slicing techniques. Journal of Programming Lan-
guages (1995)
23. Walker, R.J., Murphy, G.C., Steinbok, J., Robillard, M.P.: Efficient mapping of
software system traces to architectural views. In: CASCON. (2000) 31–40
24. Welker, K.D., Oman, P.W.: Software maintainability metrics models in practice.
Crosstalk, Journal of Defense Software Engineering 8(11) (1995) 19–23
Modular Software Upgrades for Distributed Systems
1 Introduction
A third point is that upgrades must be able to retain yet transform persistent state.
Persistent state may need to be transformed in some application-dependent way, e.g., to
move to a new file format; and transformations can be costly, e.g., if the local file state
is large. We do not attempt to preserve volatile state (e.g., open connections) because
upgrades can be scheduled (see below) to minimize inconvenience to users of losing
volatile state.
A fourth requirement is automatic deployment. The systems of interest are too large
to upgrade manually (e.g., via remote login). Instead, upgrades must be deployed auto-
matically: the upgrader defines an upgrade at a central location, and the upgrade system
propagates and installs it on each node.
A fifth requirement is controlled deployment. The upgrader must be able to control
when nodes upgrade. Reasons for controlled deployment include: allowing a system to
provide service while an upgrade is happening, e.g., by upgrading replicas in a repli-
cated system one-at-a-time (especially when the upgrade involves a time-consuming
persistent state transform); testing an upgrade on a few nodes before installing it every-
where; and scheduling an upgrade to happen at times when the load on nodes being
upgraded is light.
A sixth requirement is continuous service. Controlled deployment implies there can
be long periods of time when the system is running in mixed mode, i.e., when some
nodes have upgraded and others have not. Nonetheless, the system must provide service,
even when the upgrade is incompatible. This implies the upgrade system must provide
a way for nodes running different versions to interoperate, without restricting the kinds
of changes an upgrade can make.
Our system provides an upgrade infrastructure that supports these requirements. We
make two main contributions. Ours is the first approach to provide a complete solution
for automatic and controlled upgrades in distributed systems. It allows upgraders to
define scheduling functions that control upgrade deployment, transform functions that
control transforming persistent state, and simulation objects that enable the system to
run in mixed mode. Our techniques are either entirely new, or are major extensions
of what has been done before. We support all schedules used in real systems, and our
support for mixed mode improves on what is done in practice and is more powerful than
earlier approaches based on wrappers [12, 29, 24], which support only a very restricted
set of upgrades.
Second, our approach provides a way to understand and specify mixed mode. In par-
ticular, we address the question: what should happen when a node runs several versions
at once, and different clients interact with the different versions? We address this ques-
tion by defining requirements for upgrades and providing a way to specify upgrades that
enables reasoning about whether the requirements are satisfied. The specification cap-
tures the meaning of executions in which different clients interact with different versions
of an object and identifies when calls must fail due to irreconcilable incompatibilities.
The upgrade requirements and specification technique are entirely new.
We have implemented a prototype, called Upstart, that automatically deploys up-
grades on distributed systems. We present results of experiments that show that our in-
frastructure introduces only modest overhead, and therefore our approach is practical.
454 S. Ajmani, B. Liskov, and L. Shrira
We also discuss the usability of our approach in the context of several upgrades we have
implemented and run.
The remainder of the paper is organized as follows. Section 2 presents an overview
of our approach. Section 3 describes how to specify upgrades. Sections 4–6 discuss
the three core components of our approach; Section 7 presents an example upgrade.
Section 8 evaluates the overhead of our prototype, Section 9 discusses related work,
and Section 10 concludes. A more detailed discussion of the approach can be found in
a technical report [1].
2 Overview
This section presents an overview of our methodology and infrastructure.
We model a distributed system as a collection of objects. An object has an identity, a
type that defines its behavior, and a state; it is an instance of a class that defines how it
implements its type. Objects communicate by calling one another’s methods (e.g., via
RPC [27]); extending the model to general message-passing is future work. A portion of
an object’s state may be persistent. A node may fail at any point; when it node recovers,
the object reinitializes itself from the persistent portion of its state.
To simplify the presentation, we assume each node runs a single top-level object
that responds to remote calls. Thus, each node runs a top-level class—the class of the
top-level object. Upgrades are limited to replacing top-level classes: we upgrade entire
nodes at once. The top-level object may of course make use of other objects on its node
to respond to requests, and an upgrade will also contain new code for these lower-level
objects. We could extend this model to allow multiple top-level objects per node, in
which case each could be upgraded independently.
An upgrade moves a system from one version to the next by specifying a set of
class upgrades, one for each (top-level) class that is being replaced. The initial version
has version number one (1) and each subsequent version has the succeeding version
number.
A class upgrade has six components: oldClass, newClass, TF, SF, pastSO, futureSO.
OldClass identifies the class that is now obsolete; newClass identifies the class that is to
replace it. TF identifies a transform function that generates an initial persistent state for
the new object from the persistent state of the old one. SF identifies a scheduling func-
tion that tells a node when it should upgrade. PastSO and futureSO identify classes for
simulation objects that enable nodes to interoperate across versions. A futureSO object
allows a node to support the new class’s behavior before it upgrades; a pastSO object
allows a node to support the old class’s behavior after it upgrades. These components
can be omitted when not needed.
The effect of an upgrade is (ultimately) to cause every node running an object of an
old class to instead run an object of the new one. We could add filters to the model that
would determine some subset of nodes that need to upgrade. Adding filters is enough
to allow restructuring a system in arbitrary ways. Of course it is also possible (without
using upgrades) to add new nodes to a system and to initialize them to run either existing
classes or entirely new ones.
When the UL learns of a newer version, it communicates with the upgrade server
to download a small upgrade description. Then it checks whether the upgrade affects
it, i.e., whether the upgrade contains an old class that is running at the node. (A node
might be several versions behind, but it can process the upgrades one-by-one.) If the
node is affected, the UL fetches the class upgrade components that concern it; drains
any currently-executing RPCs; then starts a future SO if necessary, e.g., if the new type
is a subtype of the old one, or if the upgrade is incompatible.
Next, the upgrade layer invokes the class upgrade’s scheduling function, which runs
in parallel with the node’s other processing. The scheduling function notifies the UL
when it is time to upgrade.
To upgrade, the UL restarts the node and runs the transform function to convert the
node’s persistent state to the representation required by the new class. After this, the
UL does “normal” node recovery, during which it creates the current object and the
SOs. Because SOs delegate toward the current object, the UL must create them in an
order that allows this. First, it creates the current object, which recovers from the newly-
transformed persistent state. Then it creates any past and future SOs as needed, in order
of their distance from the current object.
Finally, the upgrade layer notifies the upgrade database that its node is running the
new version.
When all nodes have moved to a new version, the previous version can be retired (or
this could happen on command). Information about retirement arrives in messages from
the upgrade server. In response, a UL discards past SOs for retired versions. This can be
done lazily, since keeping past SOs around does not affect the behavior or performance
of later versions.
3 Specifying Simulation
A key contribution of our approach is that we allow simulation so that nodes running
different versions can nevertheless interact. But for simulation to make sense, we need
to explain what it means.
Simulation enables a node to support multiple types. It implements its current type
using its current object; it simulates old types (of classes that it upgraded from in the
past) using past SOs and new types (of classes that it will upgrade to in the future) using
future SOs. Some clients interact with the node via the current type, while others interact
via an older or newer type. Yet all the objects implementing these types share a single
identity and thus each call needs to affect and be affected by the others. It’s straightfor-
ward to define these interactions when the old and new class implement the same type,
or one is a subtype of the other [19], because in these cases the types already have a
relationship that defines the meaning of the upgrade. Things get interesting, however,
when there is an incompatible upgrade: when the two types are unrelated by subtyping.
This section explains what it means to simulate correctly. We capture the effects of
simulation for a particular class upgrade by defining a specification for the upgrade; the
specification guides the design of the simulation objects and transform function.
Correct simulation must support reasoning about client programs, not only when
they call nodes that are running their own version, but also when they call nodes that are
Modular Software Upgrades for Distributed Systems 457
running newer or older versions, when they interact with other clients that are using the
same node via a different version, and when the client itself upgrades and then continues
using a node it was using before it upgraded. Furthermore upgrades of servers should be
transparent to clients: clients should not notice when a node upgrades and changes its
current type (except that more or fewer calls may fail as discussed below). Essentially,
we want nodes to provide service that makes sense to clients, and we want this service
to make sense across upgrades of nodes and clients.
We begin by defining some requirements that an upgrade must satisfy. Clearly, we
require:
Type Requirement. The class for each version must implement its type.
In particular, the class implementing a future SO must implement the new type, and a
class implementing the past SO must implement the old one. This requirement ensures
that a client’s call behaves as expected by that client.
However, we also need to define the effects of interleaving. Interleaving occurs when
different clients running different versions interact with the same node, e.g.,
O1 .m(args); O1 .m(args); [version 2 introduced at server];
O1 .m(args); O2 .p(args); [server upgrades from 1 to 2];
O1 .m(args); O2 .p(args); [version 1 retired];
O2 .p(args); O2 .p(args);
where ON is the object with which version N clients interact. Between the introduction
of version 2 and the retirement of version 1, there can be an arbitrary sequence of calls
to O1 and O2 . If the server is supporting more than two types, calls to objects of all
supported types can be interleaved. Although these calls can be running concurrently,
we assume they occur one-at-a-time in some serial order; we discuss concurrency in
Section 4.1.
To define what happens with interleaving we require:
Sequence Requirement. Each event in the computation at a node must reflect the ef-
fects of all earlier events in the computation in the order they occurred.
An event is a call, an upgrade, or the introduction of a version.
This requirement means method calls to a current object or SO must reflect the effects
of calls made to the others. If the method is an observer, its return value must reflect all
earlier modifications made via other objects; if it is a mutator, its effects must reflect all
earlier modifications made via other objects, and must be visible to later calls made via
other objects.
When the node upgrades and its current type changes, observations made via any of
the objects after the upgrade must reflect the effects of all modifications made via any
object before the upgrade. For example, if a node is running several versions of a file
system, modifications to a file using one of the versions must be visible to clients using
the others and must continue to be visible after the node upgrades.
Together, the type and sequence requirements can be overconstraining: it may not
be possible to satisfy both of them for all possible computations. When this happens,
we resolve the problem by disallowing calls. The system causes disallowed calls to fail
458 S. Ajmani, B. Liskov, and L. Shrira
(i.e., to throw a failure exception). In essence, we meet the requirements above by ruling
out calls that would otherwise cause problems. However, we require:
Disallow Constraint. Calls to the current object must not be disallowed.
In other words, we can only disallow calls to past and future SOs. The rationale is that
the current object provides the “real behavior” of the node, so it should not be affected
by the node’s support for other versions. Another point is that the code that implements
the current object need not be concerned with whether there are simulation objects also
running at its node, and therefore we simplify the implementation that really matters.
Disallowing takes advantage of the fact that any RPC can fail, e.g., because of net-
work problems, so that clients won’t be surprised by such a failure.
void FlavorSet.$insertColor(x, c)
effects: ¬∃ x, f ∈ thispre ⇒ thispost = thispre ∪ {x, grape}
void FlavorSet.$delete(x)
effects: thispost = thispre − {x, f }
These definitions satisfy invariant (1). They do not work for invariant (2) since in that
case the shadows must preserve the color-flavor mapping. Our original mapping func-
tion and shadow methods would work for invariant (3), but we could use weaker defin-
itions, e.g., define FlavorSet.$delete to have no effect.
460 S. Ajmani, B. Liskov, and L. Shrira
There was no need to disallow any methods in the example above. But sometimes dis-
allowing is needed.
When we specify an upgrade we implicitly define a “compound type,” T old&new . This
type has the methods of both T old and T new . Its objects contain the old state and the new
state and they satisfy the invariant I.
The specification of a mutator is a combination of its original specification and its
shadow specification provided in the upgrade; the former defines its effect on its own
type, and the latter defines its effect on the other type in the upgrade. E.g., the specifi-
cation of insertFlavor states its effect on the FlavorSet (its original specification) and
on the ColorSet (as defined by the specification of ColorSet.$insertFlavor).
If T old&new is a subtype of both the old and new types, the simulation is working prop-
erly, since users will always see the behavior they expect. In the case of the upgrade from
FlavorSet to ColorSet, this subtype property holds. But sometimes it doesn’t, and in
this case we solve the problem by disallowing. We might disallow all calls to a method,
or only some calls, based on the parameters of the call or the current state of the object.
For example, consider an upgrade that replaces GrowSet with IntSet; a GrowSet is
like an IntSet except that it never shrinks because it has no delete method. The shadow
of delete on a GrowSet object must remove the deleted object, assuming the invariant
that the two objects have the same elements. Since GrowSet objects never shrink, we
must disallow the delete method in the future SO for IntSet. However, once the node
upgrades, we can no longer disallow this method since the current object is now an
IntSet. Therefore the state of the past SO for GrowSet can shrink. Since this does
not match the specification of GrowSet, we must disallow any GrowSet methods that
would expose the problem. Thus we would need to disallow GrowSet.isIn.
Thus disallowing is done differently for the future SO and the past SO: for the future
SO we only disallow methods of the new type, while in the past SO, we only disallow
old type methods. These restrictions on disallowing follow from our disallow constraint:
they ensure that all methods of the current object are allowed.
To disallow for the future SO, we proceed as follows. First we disallow all mutators
of the new type whose shadow definitions for the old type would cause violations of
the specification of the old type; this disallowing will ensure that T old&new is a subtype
of the old type. In addition, if the new shadows of any old type methods violate the
specification of the new type, we disallow new methods that expose these violations;
this ensures that users of the future SO won’t notice that something strange is going on.
The situation for the old type is similar. We disallow any old methods whose shadows
would cause violations of the specification of the new type; this way we will obtain a
subtype of the new type. Also, if any shadows of the new type methods violate the
specification of the old type, we disallow old methods that expose these violations to
ensure that users won’t see the odd behavior.
This notion of “exposing violations” has a different meaning for past and future SOs,
because a future SO will eventually become the current object and at that point all its
methods will be allowed. These calls represent another way of noticing a violation, and
must be taken into account when disallowing. For example, consider the reverse up-
grade (from IntSet to GrowSet). The future SO in this case must disallow both isIn
Modular Software Upgrades for Distributed Systems 461
and insert. It must disallow insert because once the GrowSet becomes the current ob-
ject, calls of isIn will be allowed, and at that point the absence of an object that had
previously been inserted into the GrowSet object would be noticed!
Weakening the invariant can reduce the need to disallow. For example, if we allowed
the GrowSet object to contain a superset of the elements of the IntSet object, we would
not need to disallow any methods in either the past or future SO.
In general, the upgrader should choose the weakest invariant that makes sense for
the two types in the upgrade, in order to disallow as little as possible. Disallowing is
unlikely to be what users want; therefore the upgrader may choose to avoid it by using
an accelerated schedule for the upgrade (see Section 6).
The previous sections have discussed what is needed to specify and upgrade in isolation,
assuming that no other upgrade is “active.” In other words we considered a system that
was everywhere running a particular version, and defined an upgrade to move it to the
next version. Now we consider a more general case, in which more than one upgrade
may be in progress.
If some upgrades are in progress when a new one is defined, and if some of those
earlier upgrades are incompatible, we are in a situation where the previous upgrade
is actually defining not T new but in fact T old&new . Therefore, we need to extend our
specification approach so that we define the intended behavior of these extra methods—
the ones in T old&new of the previous upgrade that aren’t also in T new of the previous
upgrade. The extra methods are precisely the shadows of the mutators of the old type.
(We do not need to consider the shadow definitions for the mutators of the new type
because those details are handled by the previous implementation.)
Thus we need to provide shadows for these shadows. In addition, we need to use
T old&new from the previous upgrade when deciding what methods to disallow for the
past and future SOs of the current upgrade.
As an example, suppose we define a second upgrade to follow the upgrade from Col-
orSet to FlavorSet. This second upgrade defines a CommentSet in which each element
of the set has both a flavor and an associated comment. This upgrade is compatible since
CommentSet is a subtype of FlavorSet.
However to define the upgrade we need to provide an explanation of the effect
of a call on ColorSet.insertColor on the CommentSet. This is done by considering
FlavorSet.$insertColor; the specification of this shadow explains the effect of run-
ning the insertColor on the FlavorSet. We provide a shadow for this method, Com-
mentSet.$$insertColor, which explains the additional effect on the CommentSet. In
this example, it isn’t necessary to disallow any new methods because we have the sub-
type property.
One point about writing these specifications is that a kind of “transitive” disallowing
is possible. Suppose the specification for the old upgrade disallows a method of the new
type. Then when we shadow this method, there are two cases: either it is disallowed
(because its upgrade hasn’t yet been installed) or not. However, when the old method is
disallowed, this necessarily implies that the new one is too. Therefore we require that
462 S. Ajmani, B. Liskov, and L. Shrira
the shadow specification only explain what happens when the method being shadowed
is allowed.
The key question about specifying upgrades when many upgrades are in progress is
modularity: how much does an upgrader need to know to specify an upgrade? Clearly
the upgrader must know the old and new types of the current upgrade plus the specifica-
tion of the earlier upgrade. However, this earlier upgrade has both an old and new type,
and it’s possible that in order to understand its specification it is necessary to understand
both of them. Fortunately, this appears to not be necessary most of the time because the
shadows of T old methods are usually specified in terms of the T new state; in this case
the definer of the next upgrade need not understand T old . The CommentSet example
is like this, and so are all the real examples we looked at; the only ones that aren’t are
pathological examples we invented.1
If a pathological example were to arise, it may be possible to avoid the problem by
changing the invariant. Otherwise it may be necessary to go arbitrarily far back in the
chain of “active” upgrades (ones whose old type has not yet been retired). To avoid this,
the upgrader might decide to use an eager schedule for the upgrade to limit the time
during which defining future upgrades requires understanding of the old type.
4 Implementing Simulation
This section presents ways to use simulation objects to implement multiple types. The
approaches differ in how calls are dispatched to objects (i.e., which objects implement
which types) and how simulation objects can interact with one another. The first “direct”
approach is simple and is similar to what others have proposed [12, 29, 24]. However it
lacks expressive power, and therefore we instead use a much more powerful “intercep-
tor” approach.
1
An example that causes problems is the following. The old upgrade replaces ColorSet with
FlavorSet, but the invariant specifies some function f that maps colors to flavors, where several
colors map to the same flavor. Furthermore the specification of ColorSet.setColor states that
the color of an item in the set can be changed only when its current color is blue. To define the
shadow FlavorSet.$setColor, we need to consult the state of the ColorSet object to determine
the current color of the item, since only then will we know what its flavor will be:
void FlavorSet.$setColor(x, c)
effects: x, blue ∈ prev.thispre ⇒ thispost = thispre − {x, ∗} ∪ {x, f(c)}
Modular Software Upgrades for Distributed Systems 463
Fig. 2. The direct approach, presented as a sequence of states of a node. Large arrows are state
transitions. In each state, the box is the current object, and the circles are SOs. Objects may
delegate calls as indicated by the small arrows. Each object handles calls only for its own version.
The direct approach is simple but has limited expressive power. The most serious
problem is that there is no way for an SO to be informed about calls that go directly to
its delegate, and as a result it can do the wrong thing. For example, consider an SO that
implements ColorSet by delegating to an object that implements IntSet. The delegate
stores the state of the set (the integers in the set), and the SO stores the associated
colors, which it updates when it runs its own methods. However, consider the following
sequence of calls (here O refers to the SO’s delegate): SO.insertColor(1, red); O.de-
lete(1); O.insert(1); SO.getColor(1). The result of the final call will be “red,” because
the SO cannot know that 1 was ever removed; but because 1 was removed and re-
inserted, its color should be the default color, e.g., “blue”, as specified for the shadow
of IntSet.insert(x).
Since we cannot prevent the SO state from being stale, our only recourse is to dis-
allow SO methods (we cannot disallow O.delete because of the disallow constraint). It
may seem that we must disallow SO.getColor, since it is the method that revealed the
problem in our example, but in fact we must disallow SO.insertColor because otherwise
we’ll be able to observe the problem when the upgrade is installed (since at that point
calls to the getColor will be allowed). And disallowing SO.insertColor is sufficient; we
needn’t disallow SO.getColor in addition (because every integer is blue).
A second problem is that the direct approach provides no way for the different ver-
sions to synchronize. Since calls go directly to the different versions, SOs have no way
to control how calls are applied to their delegates. For example, suppose the current
object implements a queue with methods enq and deq, and the future SO implements a
queue with an additional method, deq2, that dequeues two consecutive items. With the
direct model, how can the future SO ensure that two adjacent items are dequeued, since
a client could call deq directly on the delegate while the SO is carrying out deq2?
It does not work for the upgrade layer to force methods to execute one-at-a-time,
as this may cause the distributed system to deadlock. Instead, the delegate might pro-
vide some form of application-level concurrency control, such as a lockdeq method
that locks the queue on behalf of the caller for any number of deq calls, but allows
enq calls from other clients to proceed. The delegator can use lockdeq to implement
464 S. Ajmani, B. Liskov, and L. Shrira
deq2 correctly. This solution is complex, however. Furthermore, if the delegate does
not provide appropriate concurrency control methods, the upgrader’s only choice is to
disallow deq2.
Fig. 3. The interceptor approach, presented as a sequence of states of a node. Large arrows are
state transitions. In each state, the box is the current object, and the circles are SOs. Objects may
delegate calls as indicated by the small arrows. One object in each state intercepts all calls.
The interceptor model works well as long as there is only one active incompatible
upgrade. However, this model has only one past SO object in existence at any time,
and this object must handle all the legacy behavior. It can do this by using the previous
past SO as a subobject, which can delegate to the current object if the upgrade that just
happened is compatible. Otherwise the new past SO will have to do more of the work
of simulating past behavior.
Therefore a good upgrade strategy is to always retire an incompatible upgrade before
introducing the next incompatible upgrade. We believe this is a reasonable approach
since incompatible upgrades are introduced relatively infrequently.
initialized state, and methods that are called after this point complete the initialization
(e.g., by making calls on the delegate). This limitation on how an SO initializes is
intentional so that SO installation can be a lightweight (and fast) operation.
5 Transform Functions
A transform function (TF) reorganizes a node’s persistent state from the representation
required by the old instance and future SO to that required by the new instance and
past SO. It must implement the identity mapping: the post-TF abstract state of the past
SO is the same as the pre-TF state of the old object, and the post-TF abstract state of
the new object is the same as the pre-TF state of the future SO. Thus, clients do not
notice that the node has upgraded, except that clients of the new type may see improved
performance and fewer rejected calls, and clients of the old type may see decreased
performance and more rejected calls.
A TF must be restartable, because the node might fail while the TF is running.
If this happens, the upgrade infrastructure simply re-runs the TF, which must recover
appropriately.
A TF may not call methods on other nodes, because we can make no guarantees
about when one node upgrades relative to another, so other nodes may not be able to
handle the calls a TF might make. This restriction does not limit expressive power;
if a node needs to recover state from another node (e.g., in a replicated system), it
can transfer this state after it has completed the upgrade. This restriction helps avoid
deadlocks that may occur if nodes upgrading simultaneously attempt to obtain state
from each other. It also makes TFs simpler to implement and reason about.
6 Scheduling Functions
Scheduling functions (SFs) allow an upgrader to control upgrade progress. SFs run on
the nodes themselves, so they can consider the node’s state in deciding when to upgrade.
But often what’s more important for SFs is the state of the system; in particular, the
upgrade state of other nodes. Therefore we provide SFs with additional information: a
central upgrade database (UDB) that records the upgrade status of every node and can
contain user-defined tables (e.g., that authorize the upgrades of subsets of nodes), and
per-node local databases (LDBs) that record information about the status of other nodes
with which a node communicates regularly. Each class upgrade has its own scheduling
function, which allows the upgrader to consider additional factors, such as the urgency
of the class upgrade and how well the SOs for that class upgrade work.
When defining an SF, the first priority is to ensure that all nodes eventually upgrade.
We guarantee this trivially by requiring that the upgrader specify a timeout for each SF.
The second priority is to minimize service disruption during the upgrade. How this
is accomplished depends on how the system is designed. For example, Brewer [7] de-
scribes several upgrade schedules used in industry; each of these can be implemented
easily as scheduling functions:
Modular Software Upgrades for Distributed Systems 467
– A rolling upgrade causes a few nodes to upgrade at a time; this makes sense for
replicated systems and can be implemented by an SF that queries its local database
to decide when its node should upgrade, e.g., by waiting its turn in a sequence.
– A big flip causes half the nodes in a system to upgrade at once; this makes sense for
systems that need to upgrade quickly and can be implemented by an SF that flips a
coin to decide whether its node should be in the first or second upgrade group.
– A fast reboot causes all nodes to upgrade at once; this make sense when cross-
version simulation is poor and can be implemented by an SF that causes its node
to upgrade at a particular wall-clock time. Alternatively, this SF could wait for an
explicit signal written to the UDB or sent via RPC.
The implementations of these SFs are each just a few lines of script.
A variety of other schedules are possible, e.g., “wait until the node’s servers up-
grade,” “wait until all nodes of class C upgrade,” “wait until the node is lightly loaded,”
and “avoid creating blind spots in the sensor network.” Some of these schedules require
centralized knowledge, which is provided via the UDB; others require local knowl-
edge, which is provided via the node’s state and LDB. Our goal is to provide sufficient
flexibility so that upgraders can build a library of SFs according to the needs of their
system; once this is done, an upgrader simply selects an SF for each class upgrade from
the library.
Upgrade schedules can help the upgrader avoid implementing difficult SO features.
For example, it may be impractical to simulate a certain method of a new server type. We
can avoid the need to simulate this method by scheduling the upgrade such that servers
upgrade to the new type before any clients upgrade; thus, the difficult-to-simulate method
will not be called until the servers have upgraded.
An upgrader may want to test an upgrade on a few nodes and, if those upgrades fail,
roll them back and abort the remaining upgrades. This policy is implementable with
SFs (by recording upgrade failure in the UDB), though we do not discuss the details of
how to rollback the failed upgrades here.
7 Example
In developing our methodology we looked at many examples, focusing on incompati-
ble upgrades and real distributed systems including Thor [18], NFS [8], and DHash [9].
Some of the upgrades were ones that had actually happened, while others were invented.
Our goal was to come up with challenging examples so that we could make sure our ap-
proach had sufficient expressive power, and so that we could understand the challenges
in specifying upgrades and implementing SOs.
In this section we present a brief example of an incompatible upgrade to illustrate
our approach. The example is a challenging one because the old and new types are quite
different and there are several ways to resolve the differences. The upgrade replaces a
file system that uses Unix-style permissions with one that uses per-file access control
lists (ACLs) [15]. We assume the file system is distributed: the files are stored at many
servers. The upgrade contains two class upgrades: one for clients (to switch to using
ACLs) and one for servers (to switch to providing ACLs).
468 S. Ajmani, B. Liskov, and L. Shrira
We assume there is no particular order in which nodes upgrade; thus clients might be
ahead of servers and vice versa. A possible schedule might have a client SF that waits
until the client is idle, while the server SF upgrades servers round-robin over some
extended time period.
Each file in the old system has read, write, and execute bits for its owner, its group,
and everyone else (the “world”). Thus, the old state (Oold ) is a set of tuples:
filename, content, owner, or, ow, ox, group, gr, gw, gx, wr, ww, wx
Only the owner of a file can modify the file’s permissions, group, or owner. The new
state (Onew ) is a set of
filename, content, acl
tuples, where acl is a sequence of zero or more principal, r, w, x, a tuples. Principals
with the a permission are allowed to modify the ACL.
There are many invariants one could imagine for this example. Our invariant I(Oold ,
Onew ) is very weak:
filename, content, owner, or, ow, ox, group, gr, gw, gx, wr, ww, wx ∈ Oold
⇔ (filename, content, acl ∈ Onew
∧ (owner, or, ow, ox, “true” ∈ acl ∨ (owner = “nobody” ∧ ¬ or ∧ ¬ ow ∧ ¬ ox))
∧ (group, gr, gw, gx,“false” ∈ acl ∨ (group = “nobody” ∧ ¬ gr ∧ ¬ gw ∧ ¬ gx))
∧ (“system:world”, wr, ww, wx, “false” ∈ acl ∨ (¬ wr ∧ ¬ ww ∧ ¬ wx))
This invariant says that each file in Oold is in Onew with the same contents, and either
the owner of the file in Oold appears in the ACL in Onew with the same permissions plus
the ACL-modify permission, or the owner is the special user “nobody” and the owner
permissions are all false, and similarly for the group and world permissions (except
these have no ACL-modify permission). We need to include the “nobody” case so that
I is total, i.e., so there is a defined state of Oold for each state of Onew , and vice versa
(in particular, consider the case when the ACL is empty). Clearly other invariants are
possible, e.g., to select a particular owner among several in the ACL to be the owner in
the permissions.
The mapping function for this upgrade states that each file in Onew has the same
contents as in Oold and an ACL containing the owner, group, and world permissions
from Oold . The initial ACL grants ACL-modify permissions only to the owner.
The shadow methods must preserve I. When a client modifies a file in Oold , that file
is also modified in Onew , and vice versa. Furthermore, the file system must only allow
file operations that are consistent with the file’s permissions (in the old system) or ACL
(in the new system). But consistency is a problem, since ACLs are more expressive than
permissions.
Let’s consider the case of the future SO first. If the future SO allows modifications
of ACLs, clients of the permissions system may see modifications made by clients of
the new system that do not appear to have the correct permissions. For example, if an
owner in the ACL system adds as a second owner a user of the permissions system, and
later removes that user as an owner, a client using the permissions system and running
as that user might notice odd behavior.
Modular Software Upgrades for Distributed Systems 469
To prevent this, we might disallow such operations in the future SO. However, we
cannot disallow modifications of ACLs once the server has upgraded, which means that
we must figure out what to do for users of the permissions systems when such changes
happen. A possible solution is to make it impossible for users of the permissions system
to notice odd behavior by not allowing them to do anything at all. But this doesn’t seem
like a good idea: clearly we don’t want to prevent users of the permissions system
access to files. A second possibility is to disallow only cases where observation of odd
behavior is possible. For example, we might disallow access only for files where there
is more than one owner. This second solution is less draconian than the first but still
seems undesirable.
In general when defining an upgrade it may not be possible to allow all behavior, and
furthermore, almost always disallowing isn’t desirable. In this particular example, how-
ever, we have an out because file systems don’t guarantee that owners are in complete
control, since the superuser can change anything: the specification of a file system does
not rule out the kinds of odd behavior discussed above. Therefore we can in fact allow
all methods in both the past and future SO.
Now let’s consider how to implement the past and future SOs. Implementing the past
SO is easy: it just needs to present the permissions corresponding to the ACLs in Onew
and map any permissions modifications to the appropriate ACL modifications.
The implementation of the future SO is trickier. If it allows ACL mutations without
restrictions it must keep track of all the entries in each ACL, not just the ones that map
to permissions in Oold (Onew may be more permissive than Oold because of these extra
ACL entries). Furthermore, it would need to run with superuser privileges in order to
support the behavior in the ACL, which may be undesirable. Therefore the upgrader
might choose to disallow the creation of ACLs via the future SO that have entries with
no corresponding permissions in Oold .
The effort to implement the SOs is modest. SOs need to provide the extra behavior
needed at that version, e.g., to store the extra information in the ACL for the future SO;
the rest of the work is delegated. Furthermore, what is happening in the SO is similar to
what will happen in the version it is simulating, once that becomes the current version,
and therefore this code can be used in implementing the SO. For example, all the code
for manipulating ACLs is available when the future SO for this upgrade is implemented.
The TF must produce the state of Onew (files and ACLs) from that of Oold (files
and permissions) and the future SO (if it has state). Therefore, if we decide to allow
unrestricted ACLs creation in the future SO, the TF would need to access to its state to
create the current object.
The exact choice of what to allow is up to the definer of the upgrade, and as this
example shows, there may be several possible choices. Furthermore, the decision might
take into account implementation difficulties: the upgrader might choose to disallow
some behavior because it would be difficult to implement.
8 Evaluation
This section evaluates Upstart, our prototype upgrade infrastructure. The purpose of
this prototype is to demonstrate that our methodology can be realized efficiently, not to
470 S. Ajmani, B. Liskov, and L. Shrira
8.1 Microbenchmarks
The most important performance issue is the overhead imposed by the upgrade layer
when no upgrades are happening, as this is the common case. This section presents
experiments that measure these overheads and show them to be modest.
We ran the experiments with the client and server on the same machine (connected
over the loopback interface) and on separate machines (connected by a crossover cable).
Each machine is a Dell PowerEdge 650 with four 3.06 GHz Intel CPUs, 512 KB cache,
2 GB RAM, and an Intel PRO/1000 gigabit ethernet card. We also ran experiments on
the Internet; we do not report the results here, as the latency and bandwidth constraints
of the network dwarf the overhead of the upgrade infrastructure.
Modular Software Upgrades for Distributed Systems 471
In each experiment we ran a benchmark and compared its baseline performance with
the costs imposed by our system. In the graphs, Baseline measures the performance of
the benchmark alone. TESLA measures the performance of the benchmark running with
the TESLA “dummy” handler on all nodes; it adds the overhead for interposing between
the benchmark and the socket layer, context switching between the benchmark and
the TESLA process, and copying data between the benchmark to the TESLA process.
Upstart measures the performance of the benchmark running with the upgrade layer
on all nodes; it adds the overhead for adding/removing version numbers on messages
and bookkeeping in the proxy object. In our experiments, we disabled upgrade server
polling and periodic header exchanges. In our prototype, prepending a version number
to a message requires copying the message to a new buffer; so each RPC incurs two
extra copies. These copies could be avoided by extending TESLA to support scatter-
gather I/O.
Table 1 summarizes the results.
Table 1. Microbenchmark results (N=100000 for Null RPC, N=100 for TCP). For each experi-
ment, the 5th, 50th (median), and 95th percentile latencies are given.
In the Null RPC benchmark, a client issues empty RPCs to a server one-at-a-time
using UDP. By instrumenting the code with timers, we found that the time spent in
the client and server ULs is approximately equal, which is as expected since each side
sends and receives one message per RPC. Half the time in the UL is spent in the proxy
objects, and the other half is spent adding and removing version numbers.
Over the loopback interface, the latencies are normally distributed; but over the
crossover cable, we see significant variance. This is due to interrupt coalescing done
by the gigabit ethernet card, in which the card and/or driver delay interrupts so that one
interrupt can be used for multiple packets. A cumulative distribution function of this
data (not shown) reveals that the latencies cluster at 125μs intervals; this accounts for
the fact that TESLA’s 5th percentile is close to the median value.
In the TCP benchmark, a client transfers 100 MB of data to a server using TCP
(without RPCs) over a crossover cable. The upgrade layer sees the 100 MB transfer as
12,800 8 KB messages (8 KB is the block size in the benchmark). The UL overhead is
due to copying these messages and adding/removing version numbers.
8.2 Experience
To evaluate Upstart “in the field,” we defined and ran a simple upgrade on PlanetLab, a
large research testbed [21]. Specifically, we deployed DHash [9], a peer-to-peer storage
system, on 205 nodes and installed a null upgrade on it. We chose a null upgrade to
isolate the effect of the upgrade infrastructure on system performance and behavior.
472 S. Ajmani, B. Liskov, and L. Shrira
45
40
35
Block fetch latency (ms)
30
25
20
15
10
^ Upgrade installed ^ Node 2 up ^ Node 3 up ^ Node 4 up
^ Node 2 down ^ Node 3 down ^ Node 4 down
5
0
1 2 3 4 5 6 7 8 9
Time (min)
9 Related Work
Distributed upgrades have been explored in systems with a wide variety of require-
ments, some similar, some different from ours. We compare our approach to the related
work in research systems and to the current practice in real-world Internet systems.
Modular Software Upgrades for Distributed Systems 473
to see new behaviors while others see the old ones, which means its client-facing nodes
may upgrade gradually.
Internally, Internet services are tiered: the topmost tier faces clients; middle tiers
implement application logic; and the bottommost tiers manage persistent state. Internal
upgrades change the code of one or more tiers and may change the protocols between
them. Compatible upgrades are straightforward: the lowermost affected tier is restarted
using a rolling upgrade [7], then the next-lowermost tier is upgraded, and so on up the
stack. Since the upgrade is compatible, calls made by higher tiers can always be handled
by the lower tiers.
Incompatible, internal upgrades are typically executed by upgrading datacenters
round-robin: drain a datacenter of all traffic (and redirect its clients to other datacen-
ters), upgrade all its nodes, warm up the datacenter, restore its traffic, then repeat for the
next datacenter. Thus nodes in the same datacenter never encounter incompatibilities.
Incompatible, externally-visible upgrades are rare for web services that use HTTP
but are more common in non-web services like persistent online games. In such systems,
clients are forced to disconnect while the service upgrades and, when they reconnect,
are forced to upgrade their client software to the latest version. This ensures that the
service never needs to support old behaviors and that all clients see the same version of
the service. Some systems support such upgrades by implementing multiple versions.
For example, NFS servers implement both NFSv2 and NFSv3. The problem with this
approach is that there is no barrier between these implementations, so one can corrupt
the other; simulation objects prevent this by modularizing the implementation, and they
furthermore make it easy to retire the old code.
Our methodology supports all these kinds of upgrades and enables systems to pro-
vide service during incompatible upgrades via simulation. Simulation eliminates the
need to take down whole datacenters for incompatible upgrades and can allow clients
to delay upgrading until convenient. Our methodology is especially important for peer-
to-peer systems, since in those systems there are no tiers or clients; rather every node
must upgrade, the upgrade must happen gradually, and even compatible upgrades re-
quire simulation so that upgraded nodes can call new methods on non-upgraded nodes.
10 Conclusions
We have presented a new automatic upgrade system. Our approach targets upgrades
for large-scale, long-lived distributed systems that manage persistent state and need to
provide continuous service. We support very general upgrades: the new version of the
system may be incompatible with the old. Such incompatible upgrades, while infre-
quent, are important for controlling software complexity and bloat. We allow upgrades
to be deployed automatically, but under control: upgraders can define flexible upgrade
scheduling policies. Furthermore, our system supports mixed mode operation in which
nodes running different versions can nevertheless interoperate.
In addition, we have defined a methodology for upgrades that takes mixed mode op-
eration into account. Our methodology defines requirements for upgrades in systems
running in mixed mode and provides a way to specify upgrades that enables reasoning
about whether the requirements are satisfied. Our specification techniques are modu-
Modular Software Upgrades for Distributed Systems 475
lar: only the old and new types of the upgrade must be considered, and possibly the
specification of the previous upgrade.
We also presented a powerful implementation approach (running SOs as intercep-
tors) that allows all behavior permitted by the upgrade specification to be implemented.
Our approach allows the upgrader to define how long legacy behavior must be sup-
ported, by defining the deployment schedule for the incompatible upgrade.
We have implemented a prototype infrastructure called Upstart and shown that it
imposes modest overhead. We have also evaluated the usability of our system by im-
plementing a number of examples. The most challenging problem is defining SOs, but
they can mostly be implemented by a combination of delegation and use of code that
will be in the new version provided by the upgrade.
References
1. Sameer Ajmani. Automatic Software Upgrades for Distributed Systems. Ph.D., MIT, Sep-
tember 2004. Also available as technical report MIT-LCS-TR-1012.
2. Joao Paulo A. Almeida, Maarten Wegdam, Marten van Sinderen, and Lambert Nieuwenhuis.
Transparent dynamic reconfiguration for CORBA, 2001.
3. S. Amer-Yahia, P. Breche, and C. Souza. Object views and updates. In Journes Bases de
Donnes Avances, 1996.
4. C. Bidan, V. Issarny, T. Saridakis, and A. Zarras. A dynamic reconfiguration service for
CORBA. In Intl. Conf. on Configurable Dist. Systems, pages 35–42, May 1998.
5. Toby Bloom. Dynamic Module Replacement in a Distributed Programming System. PhD
thesis, MIT, 1983.
6. Chandrasekhar Boyapati, Barbara Liskov, Liuba Shrira, Chuang-Hue Moh, and Steven Rich-
man. Lazy modular upgrades in persistent object stores. In OOPSLA, 2003.
7. Eric A. Brewer. Lessons from giant-scale services. IEEE Internet Computing, July 2001.
8. B. Callaghan, B. Pawlowski, and P. Staubach. NFS version 3 protocol specification. RFC
1813, Network Working Group, June 1995.
9. Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Wide-area
cooperative storage with CFS. In SOSP, October 2001.
10. R. S. Fabry. How to design systems in which modules can be changed on the fly. In Intl.
Conf. on Software Engineering, 1976.
11. Michael J. Freedman, Eric Freudenthal, and David Mazières. Democratizing content publi-
cation with Coral. In NSDI, San Francisco, CA, March 2004.
12. Ophir Frieder and Mark E. Segal. On dynamically updating a computer program: From
concept to prototype. Journal of Systems and Software, pages 111–128, 1991.
13. Michael W. Hicks, Jonathan T. Moore, and Scott Nettles. Dynamic software updating. In
Programming Language Design and Implementation, pages 13–23, 2001.
14. Christine R. Hofmeister and James M. Purtilo. A framework for dynamic reconfiguration of
distributed programs. Technical Report CS-TR-3119, University of Maryland, College Park,
1993.
15. Michael Kaminsky, George Savvides, David Mazières, and M. Frans Kaashoek. Decentral-
ized user authentication in a global file system. In SOSP, pages 60–73, October 2003.
16. Deepak Kapur. Towards a theory for abstract data types. Technical Report MIT-LCS-TR-
237, MIT, June 1980.
17. J. Kramer and J. Magee. The Evolving Philosophers Problem: Dynamic change management.
IEEE Transactions on Software Engineering, 16(11):1293–1306, 1990.
476 S. Ajmani, B. Liskov, and L. Shrira
18. Barbara Liskov, Miguel Castro, Liuba Shrira, and Atul Adya. Providing persistent objects in
distributed systems. In European Conf. on Object-Oriented Programming, June 1999.
19. Barbara Liskov and Jeannette Wing. A behavioral notion of subtyping. ACM Transactions
on Programming Languages and Systems, 16(6):1811–1841, November 1994.
20. Simon Monk and Ian Sommerville. A model for versioning of classes in object-oriented
databases. In British National Conf. on Databases, pages 42–58, Aberdeen, 1992.
21. L. Peterson, D. Culler, T. Anderson, and T. Roscoe. A blueprint for introducing disruptive
technology into the Internet. In HotNets I, October 2002.
22. Tobias Ritzau and Jesper Andersson. Dynamic deployment of Java applications. In Java for
Embedded Systems Workshop, London, May 2000.
23. Jon Salz, Alex C. Snoeren, and Hari Balakrishnan. TESLA: A transparent, extensible
session-layer architecture for end-to-end network services. In USITS, 2003.
24. Twittie Senivongse. Enabling flexible cross-version interoperability for distributed services.
In Distributed Objects and Applications, 1999.
25. Andrea H. Skarra and Staney B. Zdonik. The management of changing types in an object-
oriented database. In OOPSLA, pages 483–495, 1986.
26. Craig A. N. Soules, Jonathan Appavoo, Kevin Hui, Robert W. Wisniewski, Dilma Da Silva,
Gregory R. Ganger, Orran Krieger, Michael Stumm, Marc Auslander, Michal Ostrowski,
Bryan Rosenburg, and Jimi Xenidis. System support for online reconfiguration. In USENIX
Annual Technical Conf., 2003.
27. R. Srinivasan. RPC: Remote procedure call specification version 2. RFC 1831, Network
Working Group, 1995.
28. G. Stoyle, M. Hicks, G. Bierman, P. Sewell, and I. Neamtiu. Mutatis mutandis: Safe and
flexible dynamic software updating. In Principles of Programming Languages, 2005.
29. L. A. Tewksbury, L. E. Moser, and P. M. Melliar-Smith. Live upgrades of CORBA applica-
tions using object replication. In ICSM, pages 488–497, November 2001.
Demeter Interfaces: Adaptive Programming Without
Surprises
1 Introduction
An adaptive program is written in terms of loosely coupled contexts, i.e., data structure
and behavior (computations) with a third definition succinctly binding the two contexts
together. In DAJ [1], the most recent AP tool, a textual representation of the class hierar-
chy, called a class dictionary, defines the program’s data structures. Specialized visitor
classes define computation that takes place during the traversal of the program’s data
structures. The traversal specification, called a strategy, defines paths on the program’s
data structure to which visitor instances can be attached bridging together structure and
behavior. Strategies are defined using a domain specific language that operates on a
graph-based model of the program’s data structure. Strategies can abstract over graph
node names, edges and subpaths thus allowing certain modifications to the underlying
data structure that do not alter the program’s overall behavior.
This adaptive nature of AP programs better lends itself towards iterative software
development [2, 3]. Programs are built in small iterations where each iteration adds a
new small piece of program behavior. Typically modifications to the underlying data
structure are necessary and often lead to code modifications of older iterations. The
adaptive nature of AP systems assists in limiting, but not completely removing, such
situations.
Consider a simple example of an application that collects information from a data
structure that represents a bus route. The data structure consists of a list of BusRoute
objects each one with a list of Bus objects as its data member. In turn, each bus main-
tains a list of Person objects as its member where each Person object holds the ticket
price payed by each passenger. Calculating the total amount of ticket money collected
from current bus passengers riding on a bus route requires a traversal to all Person ob-
jects, i.e., using the strategy “from BusRoute to Person”, collecting the ticket price
from each object along the way and adding the values together.
It is clear that the strategy depends on the names BusRoute and Person limiting
its reusability but also the renaming of these classes in the program’s data structure.
The strategy makes the implicit assumption that all Person objects reached through
a BusRoute object are all bus passengers. Extending the data structure so that a
BusRoute also holds a list of BuStops which themselves contain a list of waiting
passengers does not invalidate the strategy, but calculates the wrong amount of ticket
money collected. DAJ (as well as the other AP tools DemeterJ and DJ) offers no way to
define and check for such assumptions. Programmers resort to extensive testing as the
only mechanism for identifying this kinds of violations.
The problems due to modifications that alter the meaning of the program make it-
erative and parallel development difficult. As dependencies between computations and
traversals arise it becomes harder to properly test and detect bugs in adaptive programs.
With larger AP software program comprehension decreases since strategies are defined
directly on the complete data structure rather than just the important –from the adaptive
code’s viewpoint– information.
In this paper we propose Demeter Interfaces (DIs) as a mechanism within DAJ
that allows the definition of an Interface Class Graph (ICG) [4, 5], which provides
an interface for the concrete data structure. A DI further specifies its relevant traver-
sal files which consist of traversal declarations, strategy declarations and constraints.
Constraints define properties that both the ICG and the underlying data structure must
satisfy. Computation is specified either as inter-type declarations (ala AspectJ) that in-
troduce extra methods to classes, or as Visitors that are attached to traversals via adap-
tive methods. Visitors define methods that get to execute during the traversal of the
data structure, i.e., before and after specific nodes are reached. We further extend the
concrete data structure definition with an implements clause used to specify which
DI(s) are implemented along with a mapping between its concrete data members and the
DI(s) data members. Finally we extend DAJ to statically verify the mapping provided
and validate all constraints from the related Demeter Interfaces.
Demeter Interfaces: Adaptive Programming Without Surprises 479
Demeter Interfaces hit a sweat spot between flexibility and safety. They restrict what
AP can do but without going back to the old way of writing the Structural Recursion
template manually [6]. They are safer because the adaptive program’s intent is defined
and used to check any future data type against it. As a result adaptive programs become
better documented, more understandable and more reusable.
The remainder of this paper is structured as follows, section 2 introduces Demeter In-
terfaces by presenting an example application implemented in plain DAJ and then with
Demeter Interfaces. Section 3 discusses some of the implementation details and section
4 describes the design benefits enjoyed by adaptive programs that deploy DIs. Section 5
discusses the connection between DIs and XML technologies, section 6 presents related
work. Section 7 presents future work and section 8 concludes.
2 Demeter Interfaces
In this section we illustrate the usage of DIs and their advantages through an example
of an equation system and the implementation of a semantic checker. We first provide
a solution in DAJ [1, 7] which we also use to describe the DAJ system itself. We then
iteratively extend the equation system, exposing the issues with the current DAJ imple-
mentation. We then show a solution for the same example using DIs and analyze the
advantages over our initial implementation.
Our example is about systems of equations in which we want to check that all used
variables are defined (we call this a semantic checker). We define a simple equation
system where each equation introduces a new variable binding and bindings have global
scope, e.g., x = 5; y = 9; z = x + y;.
Adaptive programs in DAJ are defined through a Class Dictionary (cd), a set of
Traversal Files (trv) and a mixture of Java and AspectJ code. Listing 1.1 shows the class
dictionary for the simple equation system and Listing 1.3 shows the traversal, visitor and
main class that implements the semantic checker. A cd file is a textual representation of
the object oriented structure of the program which specifies classes and their members.
Figure 1 provides the UML representation of the class dictionary in Listing 1.1. Each
line of the class dictionary defines a class and its direct members. An equal sign (“=”)
defines a concrete class with the class name on the left hand side of the equals and the
members of the class in the right hand side of the equals. Replacing the equal sign with a
colon (“:”) defines an abstract class with its subclasses on the right of the colon. Names
enclosed in “< >” define class member variable names, classes with no members are
specified using an equal sign followed by a dot ( A =.). The class dictionary further
defines a graph-based model of the program’s structure, referred to as a class graph
with each class (concrete or abstract) represented as a node and each member variable
represented as an edge (Figure 1). Inheritance is also represented as an edge, as in UML
class diagrams, but with the direction of the arrow reversed to point to subclasses instead
of the super class.
480 T. Skotiniotis, J. Palm, and K. Lieberherr
1..*
EqSystem Equation
equations
r hs
lhs
lrand Expr
rrand
Compound Simple
op
Op Numerical Variable
val
Sub Div
The system uses a class dictionary as a grammar definition, providing a language that
can parse in sentences and create the appropriate object instances. Tokens in the class
dictionary surrounded in quotes define the generated language’s syntax tokens (List-
ing 1.2). Parameterized classes are defined using a tilde (“˜”) operator, e.g., BList(S)
defines a list enclosed in parentheses of one (or more) elements of type S each element
separated by a semicolon.
Listing 1.2. An instance of a simple equations system given as input to DAJ
(x = 5;
y = (x − 2);
z = ((y−x) + (y+9)))
Demeter Interfaces: Adaptive Programming Without Surprises 481
Traversal files are an extension to AspectJ’s aspect definition which allow inter-type
declarations based on AspectJ’s built-in extension capability: the declare statement.
DAJ extends AspectJ’s declare statement to define strategies and traversals. Strat-
egy declarations provide a name for the strategy (e.g., defined in Listing 1.3) fol-
lowed by the strategy expression as a string. Strategies can refer to class nodes by name
and to class edges using the syntax -> Source,Label,Target. (e.g., class A with
member c of type B can be expressed as ->A,c,B). In place of a class name or edge
name the * pattern is used to match any name. In our simple equation system the strat-
egy defined visits all Variable objects starting form an EqSystem object and
bypassing any edge with the name rhs along the way.
Listing 1.3. Additional traversal file (SemanticChecker), visitor class (CollectDef) and
main driver class (Main) for the system of simple equations
// SemanticChecker.trv
aspect SemanticChecker {
declare strategy : defined: ”from EqSystem bypassing −> ∗,rhs,∗ to Variable”;
declare traversal: void printDefined(): defined(CollectDef);
// CollectDef.java Visitor
class CollectDef{
void before (Variable v) { System.out.println(”Found Ident :” + v.ident.toString()); }
}
// Main Class
import java.io.∗;
class Main{
public static void main(String args[]) {
try {
EqSystem eqSystem = EqSystem.parse(new File(args[0]));
System.out.println(”Defined are : ”);
eqSystem.printDefined();
System.out.println(”Used are : ”);
eqSystem.printUsed();
}catch (Exception e){ e.printStackTrace(); }
}
}
Traversal declarations require a method signature and a strategy with a visitor name
as an argument. The method signature provided to a traversal declaration gets intro-
duced as a new public method to the source class of the traversal’s strategy. We call
these methods adaptive methods. DAJ automatically generates the method body that
performs the necessary calls for traversing the object’s structure according to the given
strategy. At each such call the attached visitor implementation is consulted and any
applicable method is executed. Visitors in DAJ are Java classes where the one argu-
482 T. Skotiniotis, J. Palm, and K. Lieberherr
ment method names before, after and return hold a special meaning. During
a traversal, if an object’s type matches the argument type of a before method then
that method is called before traversing the object. After methods behave in a similar
way with the method being called after traversing the object. Return provides the final
value of a traversal and is executed upon traversal termination. The return type of the
return method must match the return type of the adaptive method that the visitor is
attached to. In the simple equation system, before a Variable object is traversed the
CollectDef visitor prints out the variable’s name.
With the completed AP implementation of the semantic checker in place we can
now evaluate our solution and verify the claims made, both in favor and against, AP.
The principle behind AP [8] states
“A program should be designed so that the interface of objects can be changed
within certain constraints without affecting the program at all.”
For the simple equation system example, modifying the system so that equations are
now in prefix notation does not affect the program’s behavior. Doing so requires a single
modification to the class dictionary,
Compound = ‘‘(’’ <op> Op <lrand> Expr <rrand> Expr ‘‘)’’.
No other changes are needed to the traversal file or the visitor. The modification simply
changed the order between the Op data member and the first Expr data member of
the Compound class. This is not surprising since even in plain Java, switching the
order of member definitions does not change a program’s behavior. Lets consider a
more drastic extension, lets add exponent operations to our system but also impose
precedence between operators. Listing 1.4 shows the complete class dictionary file,
the definitions have been factored to accommodate for operator precedence. Again, no
other changes are need to either the traversal file or the visitor. The semantic checker
still functions correctly.
Why is the semantic checker unaffected by these changes? In both cases the modi-
fications to the cd file did not falsify the strategy (i.e., there is still a path from source
to the target) and it did not affect the way by which variables are defined and used in
the equation system (i.e., there is no other way of binding a variable to equations other
than ‘‘=’’ and variables still have global scope). Any modification to the class dictio-
nary that does not falsify the strategy and does not alter the assumptions about variable
definition and usage within the equation system will not affect the semantic checker’s
code.
However any alteration that either
– modifies class and/or class member variable names that are explicitly referenced by
traversals and/or visitors,
– or, breaks an assumption about the system on which adaptive code depends on (e.g.,
adding a new variable binding construct to the equation system like let for local
bindings or functions with arguments).
will alter the program’s behavior.
For example, altering the equation system to allow for function definitions with ar-
guments causes no compile time error, but results in erroneous program behavior. This
modification breaks two assumptions:
Demeter Interfaces: Adaptive Programming Without Surprises 483
Adaptive methods, as well as the visitor, depend on these assumptions. However these
assumptions are not explicitly captured in AP programs. There is no tool support to
stop such modifications. In fact naively extending the equation system to accommodate
for functions parameters, as in Listing 1.5, will generate a valid AP program that will
provide the wrong results for the semantic checker.
With larger AP programs, it becomes nearly impossible to find all these implicit as-
sumptions and even harder to predict which modifications will cause erroneous behav-
ior. Programmers have to rely on exhaustive testing in order to increase their confidence
that the program still behaves according to its specification. This in turn limits the ef-
fectiveness of AP and its application in iterative development since modifications to the
data structure due to an iteration can introduce bugs in parts of the code developed in
previous iterations.
These dependencies impede parallel development and decrease productivity. Ad-
dressing these issues requires
– The ability to define the assumptions made by adaptive code about the underlying
data structure,
– Tool support to allow for the verification of these assumptions,
– Decrease the dependency on class and class member variable names,
– The modularization of only the relevant data structure information for each adaptive
behavior instead of the whole class dictionary.
484 T. Skotiniotis, J. Palm, and K. Lieberherr
Listing 1.5. Class Graph for equation systems with functions of one argument
ESystem
// ExprICG.di 1..*
di ExprICG with SemanticChecker{
Definition
ESystem = List(Definition).
Definition =
<def> DThing ”=” <body> Body.
dy
def
bo
Body = List(UThing). 1 1
UThing = Thing.
DThing = Thing. Body DThing
Thing = .
List(S) ˜ ”(” S ”)”.
} 1..* 1
UThing Thing
// SemanticChecker.trv File
aspect SemanticChecker with DVisitor{
declare strategy: gdefinedIdents: ”from ESystem via DThing to Thing”.
declare strategy: gusedIdents: ”from ESystem via UThing to Thing”.
declare strategy definedIdent: ”from Definition via DThing to Thing”.
declare constraints:
unique(definedIdent), nonempty(gusedIdents), nonempty(gdefinedIdents).
}
// DVisitor.java File
class DVisitor {
public void before(Thing t){ System.out.println(t.toString()); }
}
Fig. 2. The Demeter Interface for the simple equations system defines an interface class graph, the
SemanticChecker traversal file defines strategies traversals and constraints. The UML diagram is
equivalent to the interface class graph defined in ExprICG.
method traverses the data structure according to the strategy provided with the adaptive
method. Before traversing over an object, the visitor attached to this adaptive method
is advised. If the type of the object to be traversed matches the argument type of a
method, then the method is executed. As the method names imply, before methods
are executed before traversing the object, after methods are executed after the ob-
ject is traversed. Figure 3 gives an example implementation of the ExprICG Demeter
Interface (on the right) along with a visitor implementation and a driver class (on the
left). DAJ’s class dictionaries are extended in two ways; a header is introduced allow-
ing for an implements statement that specifies which DIs are being implemented
and a mapping between the concrete class dictionary classes to the ICG’s classes. The
concrete class dictionary InfixEQSystem (Figure 3) provides a definition of its
equation system and a mapping M between the classes in its class graph and all the
classes in ExprICG’s interface class graph. The mapping definition can map class(es)
to class(es), class member variable name(s) to class member variable name(s), a class
to the target(s) of a strategy and a class member variable name to a strategy.
Fig. 3. InfixEQSystem defines a class graph and a mapping of the entities in the class graph
to the interface class graph of ExprICG. The driver class Main uses the adaptive methods in-
troduced by ExprICG.
Demeter Interfaces: Adaptive Programming Without Surprises 487
With the simple equation system implemented using Demeter Interfaces we now
extend the system and verify that DIs assist the prevention of modifications that alter
the program’s behavior. We perform the same extensions as in Section 2.1 and show
that in the situations where modifications did not affect the systems behavior are not
affected by the incorporation of DIs. Modifications that did result in erroneous program
behavior before, result in compile time errors in the presence of DIs.
As a first evolution step we want to change from infix notation to prefix notation. This
is a modification that does not alter the program’s behavior even in the original DAJ
solution. Moving to a prefix notation requires to change the definition of Compound in
InfixEQSystem to
This change does not affect the Demeter Interface at all. We update the equation system
class graph while keeping the original mapping M . All constraints of the DI are still
satisfied after they are mapped into the actual interface class graph and the adaptive
methods function correctly.
It is important to note that during this evolution step, only the DI and the concrete
implementation of the interface class graph was needed. Under the assumption that the
DI’s constraints capture the semantic checker’s intend, the static assurances provided
by the tool because of the DI, suffice to show that the strategies pick the correct paths
and that the semantic checker still operates as expected. The Demeter Interface allows
in this case for separate development and ease of evolution. The concrete class graph
and its mapping can be a maintained separately while adaptive code can be developed
based on the publicly available DI. Alterations made to the concrete class graph do
not need to be visible to adaptive code maintainers unless it affects the mapping to an
implemented DI. This form of data hiding through the Demeter Interface also provides
for easier maintainability and higher system modularity.
As our next evolution step we extend the set of operators to include exponents and
add operator precedence. Keeping the headers and mapping definition the same as in
InfixEQSystem and replacing the data structure definition by that of Listing 1.4
gives us a working AP system. The modifications made to the data structure to ac-
commodate for exponents and operator precedence do not invalidate any of the DI’s
constraints and the resulting AP program behaves as expected.
In the next evolution step we want to add functions with one argument to the equa-
tion system. This evolution step affects information that is relevant to the semantic
checker. The semantic checker has to deal with parameter names on each function de-
finition but also usages of function definitions that may appear on the right-hand side
of equations. Unlike definitions so far function parameters do not have global scope,
their scope is local to the function definition. A naive approach would be to alter the
class dictionary as in Listing 1.6.1 Altering the data structure and only the mapping
to DThing results in a compile time error. The reason for this error is the predicate
unique(definedIdent) from ExprICG, it no longer holds. The modification to
allow functions with one parameter breaks one of the assumptions of the interface, in
1
To keep the example simple we do not allow the usage of function calls as arguments to other
functions, i.e., f (f (3)).
488 T. Skotiniotis, J. Palm, and K. Lieberherr
ESystem
// ParamExprICG.di File
di ParamExprICG
1..*
with SemanticChecker{
ESystem = List(Definition). fnc
Definition DFThing
Definition = <def> DThing 1
<fnc> DFThing
fname
fparam
<body> Body.
body
de
f
DThing = Thing. 1 1 1 1
DFThing = <fname> DThing
Body DThing
<fparam> DThing.
Body = <fc> List(UFThing)
List(UThing).
UFThing = <name> UThing 1..* 1
fc
aparam
name
List(S) ˜ ”(” S ”)”.
} 1..*
UFThing
// SemanticChecker.trv File
import java.util.∗;
aspect SemanticChecker with PVisitor{
declare strategy: definedIdents : ”from ESystem to DThing”.
declare strategy: usedIdents : ”from ESystem to UThing”.
declare strategy: dName : ”from DFThing via −> ∗,fparam,∗ to Thing”.
declare strategy: uName : ”from UFThing via −> ∗,aparam,∗ to Thing”.
declare strategy: dFName : ”from DFThing via −> ∗,fname,∗ to Thing”.
declare strategy: uFName : ”from UFThing via −> ∗,name,∗ to Thing”.
declare constraints:
nonempty(definedIdents), nonempty(usedIdents), unique(dName),
unique(uName), unique(dFName), unique(uFName).
//Inter-type definition
public boolean ESystem.checkBindings(LinkedList l1, LinkedList l2){
// checks appropriate variable usage
}}
Fig. 4. The evolved Demeter Interface and the UML representation of the extended Demeter
Interface class graph
Demeter Interfaces: Adaptive Programming Without Surprises 489
particular the fact that we can reach more than one variable through the left hand side of
the equal sign. With one argument functions the meaning of what is defined and what is
its scope has changed and these changes have to be reflected in the Demeter Interface.
Listing 1.6. Extending the class dictionary to accommodate function definitions using the
ExprICG DI
It is important to note that for this evolution step that the interface has to change
(Figure 4). With a new interface class graph ParamExprICG we can abstractly rea-
son about semantically checking systems with one argument functions. The two strate-
gies definedIdent and usedIdents are used to navigate to definitions and refer-
ences of variable names, both function names as well as simple variables. The strategies
dName and uName are then used to collect arguments (at function definition) and ac-
tual arguments (at function invocation) respectively. Similarly dFName and uFName
collect function names at function definitions and function usage respectively. The tra-
versal declarations use the strategies to collect Thing objects. The implementation of
the method checkBindings is introduced into ESystem and it is used to check
the correct usage of variable and function definitions. The inputs to this function are
two lists where the first represents variable and function definition names at different
scopes and the second represents names of variables and functions references at their
corresponding scope.
490 T. Skotiniotis, J. Palm, and K. Lieberherr
Listing 1.7. Modifications to the concrete class dictionary to accommodate single argument func-
tions
Listing 1.7 shows the class graph that implements ParamExprICG. The class dic-
tionary maps the edge args to the edge fparam and its source and target nodes ac-
cordingly. Also the fargs edge is mapped to the edge aparam and fname is mapped
to name with their source and target nodes mapped accordingly. Function is mapped
to DFThing and FunCall to UFThing. All reachable Variable objects via the
lhs edge of Equation that bypass Function are mapped to DThing. In a similar
way all Variable objects that can be reached from the rhs edge of Equation by
bypassing FunCall are mapped to UThing. Figure 5 shows the visitor implemen-
tation and the driver class. The visitor interface defines the method return which is
called by DAJ at the end of a traversal. The return value of the return method is also
the return value of the traversal.
In this evolution step, the Demeter Interface helped by disallowing a naive extension
that would violate the intended behavior of the original Demeter Interface. The nature
Demeter Interfaces: Adaptive Programming Without Surprises 491
import java.util.LinkedList;
import java.io.∗;
class PVisitor {
LinkedList env; class Main {
public static void main(String[] args){
PVisitor(){ boolean codeOk ;
this.env = new LinkedList(); }
public void before(Thing t) { PVisitor defV = new PVisitor();
env.add(t); } PVisitor useV = new PVisitor();
public void before(UFThing ud) { ParamEquations pe = ParamEquations.
LinkedList rib = getUsedName(); parse(new File(args[0]));
rib.addAll(getUsedArg()); codeOk = pe.checkBindings(
env.add(rib); } pe.getDefined(defV),
public void before(DFThing ud) { pe.getUsed(useV));
LinkedList rib = getDefName(); if (!codeOk)
ribaddAll(getDefArg()); System.out.println(” Variables used
env.add(rib); } before they where defined”);
public LinkedList return(){ }
return env; } }
}
Fig. 5. Changes to the interface affect Main. The definition of PVisitor is used to check for
the local parameter names in parametric equations.
of the evolution required an extension of the interface and that resulted to changes in the
driver class and a new concrete class dictionary. It is important to note how the Demeter
Interface exposed the erroneous usage of the ExprICG interface for this evolution
step and assisted in updating all the dependent components due to the definition of
ParamExprICG.
Although Demeter Interfaces are a big improvement over traditional AP, their usage
does not completely remove the need for testing adaptive code after modifications are
made to the class graph. The mechanisms behind Demeter Interfaces rely on the appro-
priate constraints and mapping between the ICG and the class dictionary. Constraints in
the interface could be too permissive allowing modifications to a class dictionary that
lead to unintended behavior. Through testing we can verify the program’s behavior and
strengthen the program’s constraints accordingly. The abstraction provided by the ICG
assist programmers in this task allowing them to focus on the relevant subset of their
application.
(traversal files and visitor implementations) is compiled through a separate phase. This
phase typechecks traversal files and visitor implementations, i.e., names in strategy de-
finitions and in visitor methods are defined in the DI’s ICG. Also, the return type for
traversal declarations matches the return type of the visitor’s special return method.
Finally, for this phase, the constraints defined inside traversal files are verified. Success-
ful completion of the first phase produces an archive of the necessary files.
The second compilation phase takes as input a concrete class dictionary and an
archive generated as output from a compilation of all required DIs for this concrete class
dictionary. Using the mapping in the concrete class dictionary, DI code is expanded for
the specific concrete class graph. After expansion the DI constraints are verified and
finally the complete AP application is generated as compiled Java class files.
A mapping can be thought of as a relation between all classes and edges in an ICG and
classes and edges in the concrete class dictionary. Informally, mapping directives allow
mappings between:
To make the mapping semantics in DAJ clear we use the definitions of class graphs
and the notion of paths and path sets from previous work [9]. Formally a class graph
is a labelled graph where nodes are class names and edges are field names or reverse
inheritance edges, i.e., the inheritance arrow points to the subclass instead of the super
class. Fix a finite set C of class names where each class name is either abstract or
concrete. We also fix a finite set L of field names or sometimes referred to as labels. We
assume the existence of two distinguished symbols: this ∈ L and reverse inheritance
edge ∈ / L. A class graph model is a graph G = (V, E, L) such that
We use the (reflexive) notion of a superclass: given a class graph G = (V, E, L), we
say that v ∈ V is a superclass of u ∈ V if there is a (possibly empty) path of reverse
inheritance edges from v to u. The collection of all super-classes of a class v is called
the ancestry of v. We further define a path as a sequence v0 l1 v1 l2 . . . ln vn where
li
vi−1 → vi ∈ E for all 0 < i ≤ n. For any path p we define Source(p) = v0 and
Target(p) = vn . Any given strategy s with source node vs and target node vt can
be defined as a set of paths from vs to vt and a set of predicates over each edge in
a path. The set of predicates denote the strategy directives in s, i.e., a directive such
as bypassing A generates a predicate that fails on edges with A as their targets. Our
Demeter Interfaces: Adaptive Programming Without Surprises 493
3.2 Constraints
Constraints may be placed on the ICG of a DI that further restrict the implementing
class graphs. These are specified declaratively in the DI in a separate section as shown
in Figure 2. Constraints take strategies as arguments and are evaluated on Traversal
Graphs not the whole class graph. Traversal graphs are a specific view of a strategy
under a specific class graph. That is, irrelevant nodes and edges that cannot help in
satisfying the strategy s under a class graph G are removed. The AP Library represents
traversal graphs as objects parameterized by a strategy and a class graph.
Currently the following primitive constraints for strategies s and t over a traversal
graph G are provided in DAJ:
– unique(s): The size of PathSetG (s) must be equal to one and the path itself must
have no loops.
– nonempty(s): The size of PathSetG (s) is greater than zero.
494 T. Skotiniotis, J. Palm, and K. Lieberherr
– subset(s,t): The path set of s is a subset of the path set of t, PathSetG (s) ⊆
PathSetG (t)
– multiple(s): The size of PathSetG (s) is greater than one.
All primitives can be combined using common logical operators such as and (&&),
or (||), and negation !. With this ability one could, for example, define equivalence of
two strategies s and t as:
equiv(s,t) = subset(s,t) && subset(t,s).
The equivalence predicate can be used to define the same set of paths using different
strategies. In this way we can express variants of a strategy where each variant imposes
different restrictions on the interface class graph and its mapped class dictionary.
The tool implements these primitives through functionality of the current AP Library
[11]. nonempty can be implemented in the current release, while subset and unique
are implemented through calls to the new interface AlgorithmsI2:
interface AlgorithmsI {
Descriptive.Boolean isSubset(TraversalGraph t1,
TraversalGraph t2);
Descriptive.Boolean isUnique(TraversalGraph t);
}
2
Descriptive.Boolean is a utility class that contains a boolean value and descriptive
reason why that value was returned.
Demeter Interfaces: Adaptive Programming Without Surprises 495
allow, the problems of surprise behavior are present in these systems as well. XML
and XPath queries are technologies widely used today that share similar issues with
AP. Specifically one can think of a DTD as a class dictionary and XPath expression as
strategies [15]. The problems of surprise behavior are prominent in these technologies
as well since modifications to the XML document might break assumptions that the
XPath query depends upon. Consider the following DTD
<?xml version=‘‘1.0’’?>
<!ELEMENT busRoute (bus∗)>
<!ELEMENT bus (person∗)>
<!ELEMENT person EMPTY>
<!ATTLIST bus number CDATA #REQUIRED>
<!ATTLIST person
name CDATA #REQUIRED
ticketprice CDATA #IMPLIED>
that defines busRoute containing a list of buses. Each bus contains a list of person
which in turn contains a name for the person and a ticketprice for the bus fare.
Consider the following XPath query
var nodes=xmlDoc.selectNodes(".//person")
that collects all person elements from a busRoute. We can think of a simple Java
script that will iterate through nodes and calculate the total amount of money received
by the current passengers riding the bus.
Making a correspondence between a DTD to class dictionary and an XPath query
to a strategy it is straightforward to create a corresponding DAJ program. In fact, the
two systems are so alike in this respect that they also share the same problems when it
comes to modifications of their underlying data structure.
Leaving the JavaScript code and the XPath query the same we can extend the DTD
(Listing 1.8) to accommodate for villages with bus stops along the bus route. The re-
sulting amount this time is not the total of all passengers riding the bus, but instead the
total amount for all passengers, both riding and waiting at bus stops. Similar problems
are found in other XML technologies that use XPath like mechanisms, such as XLinq
and XQuery, to select elements from a graph like structure.
Ideas from Demeter Interfaces can help to stop this kind of situations. Just like any
XML document can define the DTD to which it confronts to, DTDs can define the
XPath interfaces that they support and a mapping between the DTD’s elements and the
elements of the XPath interfaces that they implement. For instance, if the current total
of all passengers riding the bus is to be supported by the DTD representing bus routes
then it should make available an interface with XPath queries and constraints on these
queries. The constraints are the guarantees provided to programmers by DTDs. At the
same time, the mapping between the interface and the DTD itself allows for changes
to the DTD to both names of entities as well as structure within the bounds of the
constraints without imposing modifications to client code.
Demeter Interfaces: Adaptive Programming Without Surprises 497
<?xml version=”1.0”?>
<!ELEMENT busRoute (bus∗,village∗)>
<!ELEMENT bus (person∗)>
<!ELEMENT village (busStop∗)>
<!ELEMENT busStop (person∗)>
<!ELEMENT person EMPTY>
<!ATTLIST bus number CDATA #REQUIRED>
<!ATTLIST person
name CDATA #REQUIRED
ticketprice CDATA #IMPLIED>
The usage of Demeter Interfaces also provides a clear distinct separation of respon-
sibilities. In the case where an modification to the DTD breaks one of the XPath con-
straints then the blame lies with the DTD maintainer for breaking an interface that the
DTD claims to implement.
6 Related Work
The idea of abstracting over a class graph in adaptive programs using an interface class
graph was first introduced with adaptive plug-and-play-components (APPC) [4]. The
mapping of interface class graphs allows for the mapping of a class name to a class
name and for an edge to an edge or strategy. APPC have no provision for further con-
straints on interface class graphs or on concrete class dictionaries. A further devel-
opment of APPC [5] allows for the mapping of methods resulting in a more general
Aspect-Oriented system.
Ovlinger and Wand [6] propose a domain specific language as a means to specify
recursive traversals of object structures used with the visitor pattern [16]. The domain
specific language further allows for intermediate results from subtraversals support-
ing functional style visitor definitions. The explicit full definition of the recursive data
structure provides an interface between visitors and the underlying data structure. This
approach enforces that each object in a traversal is explicitly defined allowing no room
for adaptiveness.
Modularity issues in AOSD [14, 17, 18] have received great attention recently. Kicza-
les and Mezini [14] advocate that in the presence of aspects, a module’s interface has to
further include pointcuts from aspects that apply to the module in question. These aug-
mented interface definitions, named aspect-aware interfaces, can only be determined
after the complete configuration of the system’s components is known. Aspect-aware
interfaces do not provide any extra information hiding capabilities to the base program’s
modules.
Open Modules [17] extend the traditional notion of a software module to include
in its interface pointcut specifications. In this way a module can export, and as such
make publicly available, pointcuts within its implementation. This approach gives a
balanced control between module and aspect developers in terms of information hiding
thus allowing for separate (parallel) evolution of aspects and modules on the agreed
498 T. Skotiniotis, J. Palm, and K. Lieberherr
upon interface. The interface of a crosscutting concern can affect multiple modules at
different join points on each one. Thus an aspect’s interface is sprinkled along module
interfaces and not localized making it harder (if not impossible at times) for aspect
developers to develop their aspects.
Sullivan et. al. [18] advocate XPIs (crosscutting program interface) as a means to
achieve separate development and define explicit dependencies between implementa-
tions of crosscutting concerns and base code. DIs can be viewed as a specialization of
XPIs for AP. A more recent paper [19] by the same authors, demonstrated (partially)
mechanized checking of XPIs through the usage of complementary aspects used to
check for the appropriate interface constraints.
Kiczales and Mezini in [20] discuss the benefits of using different programming
language mechanisms (procedures, annotations, advice and pointcuts) to provide sepa-
ration of concerns at the code level. The resulting guidelines from their analysis sketch
the situations where each mechanism will be most effective. The inherent modularity
issues associated with each technology are not addressed.
In parallel to our work, Kellens et. al. [21] address the issue of the fragile pointcut
problem [22, 23] in a general purpose AOP language. Pointcut definitions in AOP lan-
guages allow for the identification of points in the program’s structure and/or execution
where new functionality can be added. These pointcut definitions directly depend on
the underlying program’s structure causing aspects and the base system to be tightly
coupled. As a result of this high coupling local modifications in the base program break
pointcut semantics and hamper software evolution. This issue has been dubbed as the
fragile pointcut problem. As a solution to this problem Kellens et. al. define a con-
ceptual model of the base program against which pointcut definitions are declared and
evaluated. The conceptual model further provides the means to define and verify struc-
tural and semantic constraints through their IntesiveVE [24] tool. Demeter Interfaces
take a similar approach to the manifestation of the fragile pointcut problem in Adaptive
Programming.
7 Future Work
Work is currently under way for allowing visitor methods to advise edges removing the
limitation that classes mapped to edges cannot be advised by visitors.3 The mapping
definition can become long and difficult at times. A GUI tool that will help visualize
ICG and class dictionaries as graphs will assist developers. Also a naive inferencing
engine that can match mappings by structure, i.e., mapping a class will automatically
map its edges by position (as they appear in the class dictionary and the ICG) is already
under development.
Another extension would be the provision of contracts on adaptive methods. Collab-
orations of adaptive methods exchange data and certain assumptions are being made
that are not explicitly captured by the current system. The addition of pre- and post-
conditions would assist in both defining and validating these assumptions. The compo-
sition of Demeter Interfaces to provide new Demeter Interfaces is also an interesting
3
For the latest release of DAJ with support for Demeter Interfaces visit DAJ’s Beta Release web
page [25].
Demeter Interfaces: Adaptive Programming Without Surprises 499
future research direction. Also the interactions between DIs in an AP program still re-
mains an open issue.
8 Conclusions
We introduce Demeter Interfaces as an extension to Adaptive Programming. Demeter
interfaces encapsulate all the information and dependencies between the adaptive code
and the underlying data structure. Through the definition of an interface class graph
and a set of graph constraints Demeter Interfaces impose restrictions on any concrete
data structure to which adaptive code will be attached. These restrictions are enforced
at compile time disallowing modifications to the underlying data structure that would
otherwise provide incorrect program results. With Demeter interfaces in place we have
shown how modularity as well as understandability of adaptive programs increases dra-
matically leading to better program design and promoting parallel development.
Additional material related to this paper is available from http://www.ccs.
neu.edu/research/demeter/biblio/demeter-interf.html.
References
1. The Demeter Group: The DAJ website. http://www.ccs.neu.edu/research/demeter/DAJ
(2005)
2. Lieberherr, K.J., Orleans, D.: Preventive program maintenance in Demeter/Java (research
demonstration). In: International Conference on Software Engineering, Boston, MA, ACM
Press (1997) 604–605
3. Lieberherr, K.J., Riel, A.J.: Demeter: A CASE study of software growth through parameter-
ized classes. Journal of Object-Oriented Programming 1(3) (1988) 8–22 A shorter version
of this paper was presented at the 10th International Conference on Software Engineering,
Singapore, April 1988, IEEE Press, pages 254-264.
4. Mezini, M., Lieberherr, K.J.: Adaptive plug-and-play components for evolutionary software
development. In Chambers, C., ed.: Object-Oriented Programming Systems, Languages and
Applications Conference, in Special Issue of SIGPLAN Notices, Vancouver, ACM (1998)
97–116
5. Lieberherr, K.J., Lorenz, D., Mezini, M.: Programming with Aspectual Components. Techni-
cal Report NU-CCS-99-01, College of Computer Science, Northeastern University, Boston,
MA (1999)
6. Ovlinger, J., Wand, M.: A language for specifying recursive traversals of object structures. In:
OOPSLA ’99: Proceedings of the 14th ACM SIGPLAN conference on Object-oriented pro-
gramming, systems, languages, and applications, New York, NY, USA, ACM Press (1999)
70–81
7. Sung, J.: Aspectual Concepts. Technical Report NU-CCS-02-06, Northeastern University
(2002) Master’s Thesis, http://www.ccs.neu.edu/home/lieber/theses-index.html.
8. Lieberherr, K.J.: Adaptive Object-Oriented Software: The Demeter Method with Propagation
Patterns. PWS Publishing Company, Boston (1996) 616 pages, ISBN 0-534-94602-X.
9. Lieberherr, K., Patt-Shamir, B., Orleans, D.: Traversals of object structures: Specification
and efficient implementation. ACM Trans. Program. Lang. Syst. 26(2) (2004) 370–412
10. Lieberherr, K.J., Patt-Shamir, B.: The refinement relation of graph-based generic programs.
In Jazayeri, M., Loos, R., Musser, D., eds.: 1998 Schloss Dagstuhl Workshop on Generic
Programming, Springer (2000) 40–52 LNCS 1766.
500 T. Skotiniotis, J. Palm, and K. Lieberherr
11. Doug Orleans and Karl J. Lieberherr: AP Library: The Core Algorithms of AP: Home page.
http://www.ccs.neu.edu/research/demeter/AP-Library/ (1999)
12. Skotiniotis, T., Lorenz, D.: Conaj: Generating contracts as aspects. Technical Report NU-
CCIS-04-03, College of Computer and Information Science, Northeastern University (2004)
13. Skotiniotis, T., Lorenz, D.H.: Cona: aspects for contracts and contracts for aspects. In: OOP-
SLA ’04: Companion to the 19th annual ACM SIGPLAN conference on Object-oriented
programming systems, languages, and applications, New York, NY, USA, ACM Press (2004)
196–197
14. Kiczales, G., Mezini, M.: Aspect-oriented programming and modular reasoning. In: ICSE
’05: Proceedings of the 27th International Conference on Software Engineering, New York,
NY, USA, ACM Press (2005) 49–58
15. Lieberherr, K.J., Palm, J., Sundaram, R.: Expressiveness and complexity of crosscut lan-
guages. In: Proceedings of the 4th workshop on Foundations of Aspect-Oriented Languages
(FOAL 2005). (2005)
16. Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable
Object-Oriented Software. Addison-Wesley (1995)
17. Aldrich, J.: Open Modules:modular reasoning about advice. In: European Conference on
Object-Oriented Programming. (2005)
18. Sulivan, K., Griswold, W.G., Song, Y., Cai, Y., Shonle, M., Tewan, N., Rajan, H.: On the
criteria to be used in decomposing systems into aspects. In: European Software Engineer-
ing Conference and International Symposium on the Foundations of Software Engineering.
(2005)
19. Sullivan, K., Griswold, W.G., Song, Y., Shonle, M., Tewari, N., Cai, Y., Rajan, H.: Modular
software design and crosscutting interfaces. In: IEEE Software, Special Issue on Aspect
Oriented Programming. (2006)
20. Kiczales, G., Mezini, M.: Separation of concerns with procedures, annotations, advice and
pointcuts. In: European Conference on Object-Oriented Programming. (2005)
21. Kellens, A., Mens, K., Brichau, J., Gybels, K.: Managing the evolutions of aspect-oriented
software with model-based pointcuts. In: European Conference on Object Oriented Program-
ming. (2006)
22. Koppen, C., Störzer, M.: PCDiff: Attacking the fragile pointcut problem. In: European
Interactive Workshop on Aspects in Software (EIWAS). (2004)
23. Störzer, M., Graf, J.: Using pointcut delta analysis to support evolution of aspect-oriented
software. In: 21st IEEE International Conference on Software Maintenance. (2005)
24. Mens, K., Kellens, A., Pluquet, F., Wuyts, R.: ¡co-evolving code and design with intensional
views - a case study. Computer Languages, Systems and Structures (2006)
25. The Demeter Group: The DAJ beta website. http://www.ccs.neu.edu/home/skotthe/daj
(2005)
Managing the Evolution of Aspect-Oriented
Software with Model-Based Pointcuts
1 Introduction
Ever since its inception almost ten years ago, aspect-oriented software devel-
opment (AOSD) has been promoted as a powerful development technique that
extends the modularisation capabilities of existing programming paradigms such
as object orientation [1]. To this extent, aspect-oriented programming languages
provide a new kind of modules, called aspects, that allow one to modularise the
implementation of crosscutting concerns which would otherwise be spread across
various modules. The resulting improved modularity and separation of concerns
intends not only to aid initial development, but also to allow developers to better
Ph.D. scholarship funded by the “Institute for the Promotion of Innovation through
Science and Technology in Flanders” (IWT Vlaanderen).
manage software complexity, evolution and reuse [2]. Given the fact that main-
tenance and evolution of software applications account for the largest part of the
software development process [3], the introduction and use of AOSD techniques
looks promising.
Paradoxically, the essential techniques that AOSD proposes to improve soft-
ware modularity seem to restrict the evolvability of that software. AOSD puts
forward that aspects are not explicitly invoked but instead, are implicitly in-
voked [4]. This has also been referred to as the ‘obliviousness’ property of aspect
orientation [5]. It means that the developer of the base program (i.e., the program
without the aspects) does not need to explicitly invoke the aspects because the
aspects themselves specify when and where they need to be invoked, by means
of a pointcut definition. As a consequence, these pointcut definitions typically
rely heavily on the structure of the base program.
This tight coupling of the pointcut definitions to the base program’s structure
and behaviour can hamper the evolvability of the software [6]: it implies that all
pointcuts of each aspect need to be checked and possibly revised whenever the
base program evolves. Indeed, due to changes to the base program, the pointcuts
may unanticipatedly capture join points that were not supposed to be captured,
or may no longer capture join points that should have been affected by the
aspect. This problem has been coined the fragile pointcut problem [7,8].
We address the fragile pointcut problem by replacing the intimate dependency
of pointcut definitions on the base program by a more stable dependency on a
conceptual model of the program. These model-based pointcut definitions are less
likely to break upon evolution, because they are no longer defined in terms of
how the program happens to be structured at a certain point in time.
Because model-based pointcut definitions are decoupled from the actual struc-
ture of the base program, the fragile pointcut problem is thus transferred to a
more conceptual level. Whereas traditional pointcut definitions may cause unan-
ticipated captures and accidental misses of program entities upon evolution of
the base program, model-based pointcut entities may lead to mismatches be-
tween the conceptual model of the program and the program entities to which
the model is mapped. Hence, the fragile pointcut problem is transformed into
the problem of keeping a conceptual model of the program synchronised with
that program, when the program evolves.
To solve this derived problem, we rely on previous research that enables docu-
menting the program structure and behaviour at a more conceptual level, where
appropriate support is provided for keeping the ‘conceptual model documenta-
tion’ consistent with the source code when the program evolves. More specifically,
we implement our particular solution to the fragile pointcut problem through
an extension of the CARMA aspect language [9] combined with the formalism
of intensional views [10]. The resulting approach tightly integrates the inten-
sional views development tool with an aspect-oriented language. In essence, the
pointcuts defined in the aspect language rely on the model that is built using
the development tool. We validate our solution on SmallWiki, a medium-sized
Managing the Evolution of Aspect-Oriented Software 503
2.1 Definitions
According to Stoerzer et al. [7,8], pointcuts are fragile because their semantics
may change ‘silently’ when changes are made to the base program, even though
the pointcut definition itself remains unaltered. The semantics of a pointcut
change if the set of join points that is captured by that pointcut changes. Several
other authors have observed symptoms of the fragile pointcut problem [6,11].
Before elaborating on these observations, we define the fragile pointcut problem:
Sullivan et al. [6] observed that the criterion of obliviousness in AOSD comes
at a considerable cost to aspect designers. They describe how aspect designers
are confronted with complex pointcut definitions and extreme sensitivity of the
validity of pointcuts to changes in the base program.
The fragile pointcut problem can be considered as the aspect-oriented equiv-
alent of the fragile base class problem [12] found in object-oriented software
development. In the fragile base class problem, one cannot tell whether a change
to a base class is safe simply by examining the base class’ methods in isolation;
instead, one should examine all subclasses of that base class as well [13]. Analo-
gously, in the fragile pointcut problem, one cannot tell whether a change to any
part of the base program is safe without examining all pointcut definitions and
determining the impact of that local change on each pointcut definition.
2.2 Examples
1. The technique used to define the pointcut (e.g., enumeration of join points,
pattern-based matching, . . . );
2. The expressiveness of the pointcut language (i.e., the structural and behav-
ioural properties available to capture join points);
3. The join point model, more particularly, the kinds of join points that can be
captured by a pointcut (method executions, method calls, variable assign-
ments, . . . )
class Buffer {
private Object content[];
private int index = 0;
...
public Object get() {
... return content[index] ... };
public void set(Object el) {
... content[index] := el ... };
...
}
pointcut accessors()
call(void Buffer.set(Object)) || call(Object Buffer.get());
pointcut accessors()
call(* set*(..) ) || call(* get*(..) );
This pointcut is also fragile w.r.t. evolution of the base program. New methods
can be added and existing ones can be removed such that they are captured by
the pointcut definition, as long as they follow the naming convention encoded in
the pattern. In addition, consider an evolution of the base code where a method
named setting is added. A call to this method is unintendedly captured by the
pointcut because its name happens to start with set.
1
We use this hypothetical AspectJ-like syntax to avoid having to explain the details
of the CARMA syntax here.
2
The predicates also consider indexing in arrays for variable accesses and assignments.
506 A. Kellens et al.
pointcut setters()
call(?class.?method(..) ) &&
assigns(?class.?method,?iv) &&
instanceVariable(?iv,?class);
pointcut getters()
call(?class.?method(..) ) &&
returnsVariable(?class.?method,?iv) &&
instanceVariable(?iv,?class);
Although these pointcuts are no longer fragile w.r.t. changes in the name of the
methods, they are still fragile because they capture only methods that respect
the structural convention codified by the pointcut. Consider, for example, the
following ‘getter’ method that does not directly return the instance variable in
a return statement but returns another (temporary) variable:
Object get() {
Object temp := content[index];
..
return temp;
}
Although the variable temp contains the actual value of the instance variable, a
call join point to this method would be missed by our previous pointcut defini-
tion. Hence, once again, the pointcut is fragile to changes in the base program’s
source code.
pointcut optimisedGetter() :
getters() &&
!cflow(getters());
Uncapturable join points. While the previous examples illustrated the fra-
gility of the pointcut due to the definition technique or the provided expressive-
ness of the pointcut language, another major reason for fragility lies in the fact
that some intended join points simply cannot be captured because:
– The join point model is too restrictive and the code to be advised by the
aspect is not confined to a join point. For example, most aspect languages
today do not allow to advice pieces of method bodies. In our buffer example,
this would mean that we must structure the possible critical sections in
the buffer implementation as complete methods. Otherwise, they cannot be
advised by the synchronisation aspect.
– The pointcut cannot be described because the join points do not share suf-
ficient structural or behavioural properties to allow them to be qualified in
a pointcut definition. As a consequence, developers are forced to use fragile
enumeration-based pointcuts.
3 Model-Based Pointcuts
We tackle the fragile pointcut problem with model-based pointcuts. This new
pointcut definition mechanism achieves a low coupling of the pointcut definition
with the source code, while at the same time providing a means of documenting
and verifying the design rules on which the pointcut definitions rely.
Model-based pointcut definitions are defined in terms of a conceptual model of
the base program, rather than referring directly to the implementation structure
of that base program. Figure 1 illustrates this difference between model-based
and traditional source-code based pointcuts. On the left-hand side, a traditional
source-code based pointcut is defined directly in terms of the source code struc-
ture. On the right-hand side, a model-based pointcut is defined in terms of a
conceptual model of the base program. This conceptual model provides an ab-
straction over the structure of the source code and classifies base program entities
according to the concepts that they implement. As a result, model-based point-
cuts capture join points based on conceptual properties instead of structural
properties of the base program entities. In addition to decoupling the pointcut
definitions from the base program’s implementation structure, the classifications
in the conceptual model are specifically conceived to provide support for detect-
ing evolution conflicts between the conceptual model and the base program.
For example, assuming that the conceptual model contains a classification
of all accessor methods in the buffer implementation, the model-based pointcut
that captures all call join points to these accessor methods could be defined as:
pointcut accessors():
classifiedAs(?methSignature,AccessorMethods) &&
call(?methSignature);
Legend
Aspect using traditional Aspect using
pointcuts model-based pointcuts specified in terms of
Conceptual model
Class
Name Class
Class
Attributes Name
Name Attributes Attributes
Attributes Operations Attributes
Attributes Operations Operations
Operations 1
1 Operations
Operations 1
* * *
Class Class
Class Name Name
Name Attributes Attributes
Attributes Attributes Attributes
Attributes Operations Operations
Operations
Operations
Operations Operations
Source code
the base program: if the conceptual model correctly classifies all accessor meth-
ods, this pointcut remains correct. In a certain sense, model-based pointcuts are
similar to Kiczales and Mezini’s annotation-call and annotation-property point-
cuts [17]. Indeed, the classifications of source-code entities in the conceptual
model could be constructed using annotations in the source code.
By defining pointcuts in terms of a conceptual model, the fragile pointcut
problem has now been translated into the problem of keeping the classifications
of the conceptual model consistent with the base program. To detect incorrectly
classified source entities, the conceptual model goes beyond mere classification or
annotation and defines extra constraints over the classifications that need to be
respected by the source-code entities, for the model to be consistent. Formally,
we distinguish two cases, defined below and illustrated by figure 2:
1. We define the set of possible unintended captures for a concept A as those
entities that are classified as belonging to A but that do not satisfy some of
the constraints defined on A:
U nintendedCapturesA = (A − ext(C))
C∈CA
where CA is defined as the set of all constraints on A and ext(C) denotes the
set of all source-code entities satisfying constraint C. The intuition behind
this definition is that if an entity belongs to A but does not satisfy the
constraints defined on A then maybe the entity is misclassified.
2. We define the set of possible accidental misses as those entities that do not
belong to A, but do satisfy at least one of the constraints C defined on A:
AccidentalM issesA = (ext(C) − A)
C∈CA
510 A. Kellens et al.
The intuition behind this definition is that if an entity does not belong to A
but does satisfy some of the constraints defined on A, then maybe the entity
should have been classified as belonging to A. To avoid having an overly
restrictive definition (yet at the risk of having a too liberal one), we do not
require the missed entity to satisfy all constraints defined on A. As soon as
it satisfies one constraint, we flag it as a potential accidental miss.
A ext(C)
tal
den
cci
ded 1. a miss
ten
nin re
2. u captu
Unintended Accidental
Captures A Misses A
The set of all potential unintended captures and accidental misses that can be
detected is then defined as
M ismatchesA = (A Δ ext(C))
C∈CA
4 View-Based Pointcuts
As a particular instantiation of model-based pointcuts, we introduce view-based
pointcuts, which:
1. use the formalism of intensional views [10], and its associated tool suite
IntensiVE3 , to express a conceptual model of a program and to keep it
synchronised with the source code of that program;
2. specify pointcuts in terms of this conceptual model, using an extension to
the aspect-oriented language CARMA.
We briefly present the formalism of intensional views and how it can be used to
define a conceptual model of a program. Next, we introduce the CARMA aspect
language and its extension to define aspects over intensional views. Throughout
this section we use examples taken from the SmallWiki system. Section 5 explains
this case in more detail.
Without explaining all details of the Soul syntax and semantics, upon evaluation
the above query accumulates all solutions for the logic variable ?entity, such
that ?entity is a method, implemented by a class in the SmallWiki namespace,
whose name starts with execute. This query is the intension of the view.
Since the declared intension is sometimes too broad or too restrictive with
respect to the actual program code, intensional views provide means to deal
with deviations to a view, allowing to explicitly ‘include’ or ‘exclude’ specific
program entities from a view. For example, if the SmallWiki implementation
3
Available for download on http://www.intensional.be
512 A. Kellens et al.
would contain a method that starts with execute but does not perform an
action, we would put that method in the excludes set of the Wiki Actions view.
Upon evolution of the program, a simple view such as the one defined above
can capture or miss particular program entities accidentally, which is similar
to the fragile pointcut problem. Therefore, a set of constraints on and between
views (as defined in Section 3) is at the heart of the intensional views formalism.
This set of constraints can be validated with respect to the program code and
allows keeping an intensional view model synchronized with the program. We
highlight two different types of constraints that can be defined on intensional
views: alternative intensions and intensional relations.
Alternative Intensions. Often, the same set of program entities can be spec-
ified in different ways, e.g. when they share multiple naming or coding conven-
tions. A first kind of constraints that can be declared on an intensional view is
through the definition of multiple alternative intensions for that view. Each of
these alternatives is required to be extensionally consistent, meaning that they
need to describe exactly the same set of program entities. For example, all meth-
ods performing actions on Wiki pages do not only have a name that starts with
execute, but are also implemented in a method protocol4 called action. We can
therefore define the Wiki Actions view in an alternative way, using a logic query
that accumulates all SmallWiki methods implemented in that protocol. Since
both alternatives are supposed to define the same concept (i.e. Wiki actions),
we require both alternatives to capture the same set of methods.
Q1 x ∈ V iew1 : Q2 y ∈ V iew2 : x R y
where Qi are logic quantifiers (∀, ∃, ∃! or ), V iewi are intensional views, and
R is a verifiable binary relation over the source-code entities (denoted by x and
y) contained in those views. An example of such a constraint in SmallWiki is
that all Wiki Actions should be implemented by Action Handlers. Assuming we
have defined an Action Handlers view (a set of classes implementing actions),
we express this dependency as:
Having chosen the formalism in which to express the conceptual model of the
base program, we still need a pointcut language that permits us to define point-
cuts in terms of that model. Given that both the formalism of intensional
views and the aspect-oriented programming language CARMA rely on the logic
metaprogramming language Soul, to specify intensions and pointcuts respec-
tively, we extended CARMA with the ability to define view-based pointcuts.
Fig. 3. Some basic predicates provided by CARMA for capturing join points
CARMA is very similar to the AspectJ language but features a logic point-
cut language, and is an aspect-oriented extension to Smalltalk instead of Java.
Pointcuts in CARMA are logic queries that can express structural as well as
dynamic conditions over the join points that need to be captured by the point-
cut. To this extent, a query can make use of a number of predefined predicates,
stating conditions over join points, which form the heart of the CARMA lan-
guage. Some basic CARMA predicates that are used in this paper are shown in
Figure 3.
The expressive power of CARMA is a direct consequence of the logic language
features of unification and recursive logic rules, together with a complete and
open reification of the entire base program. CARMA has already proven useful
to write more robust property-based and pattern-based pointcut definitions [9].
For this paper, we further enhanced CARMA with view-based pointcuts using an
additional predicate classifiedAs(?entity,?view) that allows to define join
points in terms of the intensional views defined over a program. For example, we
can define a view-based pointcut that captures all calls to methods contained in
the Wiki Actions view as:
514 A. Kellens et al.
pointcut wikiActionCalls():
classifiedAs(?method,WikiActions),
methodInClass(?method,?selector,?class),
send(?joinpoint,?selector,?arguments)
The above pointcut definition is tightly coupled to the intensional view model of
SmallWiki but it is decoupled from the actual program structure. In combination
with the support for verifying consistency of the intensional views model with
respect to the source code, we can thus alleviate part of the fragile pointcut
problem, as is illustrated by the experiment in the following section.
Case selection. The case study we selected is SmallWiki [19], a fully object-
oriented and extensible Wiki framework, written by Lukas Renggli, that was
developed entirely in VisualWorks Smalltalk. A Wiki is a collaborative web ap-
plication that allows users to add content, but also allows anyone to edit the
content. The original version of SmallWiki we studied was version 1.54, the first
internal release of SmallWiki (14-12-2002), offering an operational Wiki server
with rather limited functionality: only the rendering and editing of fairly simple
Wiki pages was supported. This version contained 63 classes and 424 methods.
and the program. We compared the conflicting program entities with those
that caused the fragile pointcut problem before. By using the feedback of the
IntensiVE tool suite, we brought the program in sync with the conceptual
model, and analysed the implication of these changes in the light of the
fragile pointcut problem.
In the next subsections we elaborate on each of the steps of our experiment.
Line 1 of this pointcut selects all classes in the SmallWiki namespace; line 2
and 3 select all methods within those classes whose name start with the string
execute; finally, line 4 selects all message reception join points of those methods.
Lines 1–3 collect all page element classes (i.e., all classes in SmallWiki which
are either located in the hierarchy of PageComponent or in the hierarchy of
Structure); lines 4–5 select all methods that render output (i.e., methods im-
plemented on classes in the OutputVisitor hierarchy); finally, lines 6 and 7
select all join points from within Wiki page elements which perform a message
send to a method that renders output.
show all views here5 , but limit ourselves to those interesting for the remainder
of this paper, as summarized by Figure 4, and explained below.
– The Wiki Actions view groups all methods implementing an action on a Wiki
page. Two alternative intensions for this view were explained in subsection
4.1.
– The Page Elements view groups all classes representing components out of
which Wiki pages can be constructed (e.g., text, links, tables, lists). This
view is defined by the following alternative intensions:
1. All subclasses of either the PageComponent or Structure class;
2. All classes in the Wiki system implementing a method named accept;
3. All classes in the Wiki system containing a protocol named visiting.
Alternatives 2 and 3 codify the knowledge that, in order to implement op-
erations on Page Elements, SmallWiki relies heavily on the Visitor pattern.
– The Output Generation view groups all classes that generate output (e.g.,
HTML, Latex) for (the components of) a Wiki Page, and has as intension
all subclasses of OutputVisitor.
– The Visitor Classes view groups all subclasses of Visitor.
– The Classes Visited view describes another aspect of the Visitor pattern and
groups all classes that are visited by some Visitor in the SmallWiki system.
– The relation that the Page Elements view is a subset of the Classes Visited
view codifies the knowledge that all page elements can be visited by a vistor.
– The are Accepted by relation captures the important fact that all page
elements can be rendered by an Output Generation visitor.
– The third intensional relation states that the classes that render output are
a particular kind of Visitor Classes.
5
For a more exhaustive list of views that were defined on SmallWiki, and how they
were defined, see [10].
518 A. Kellens et al.
In this subsection we show how we defined the pointcuts of the two aspects
introduced earlier as view-based pointcuts in terms of the views discussed in
subsection 5.4.
To implement the logging of actions aspect using a view-based pointcut, we
use the following pointcut definition:
1 classifiedAs(?method,WikiActions),
2 methodWithName(?method,?message),
3 reception(?joinpoint,?message,?args),
4 withinClass(?joinpoint,?class)
This pointcut selects all reception join points of a message which is implemented
by a method in the Wiki Actions view. Line 4 is added to the pointcut in order
to propagate context information, concerning the class in which the join point
is present, to the advice.
Analogously, we declare a view-based pointcut for the italic output aspect.
We define this pointcut in terms of the Page Elements and Output Generation
views discussed in subsection 5.4:
1 classifiedAs(?class,PageElements),
2 send(?joinpoint,?message,?args),
3 withinClass(?joinpoint,?class),
4 classifiedAs(?outputclass,OutputGeneration),
5 methodWithNameInClass(?method,?message,?outputclass)
Lines 1 and 2 select all message sends that occur in Wiki page elements. Lines
4 and 5 further restrict these sends to those invoking a method that generates
output for the Wiki elements.
Note that both our traditional source-code based pointcut definitions and
our view-based pointcut definitions provided a fine-grained description of the
actual join points in the program execution that we wish to capture. Our model-
based pointcut definitions, however, do not refer to the syntactical or structural
organisation of the program on which they act.
first alternative, namely their name did not start with execute. Also, the tool
reported that the executeSearch and executePermission were not captured
by the second alternative, though they did satisfy the first alternative. Note
that these inconsistencies match exactly the cases that caused the join point
mismatches on our traditional implementation of the logging pointcut.
We resolved these mismatches between the model and the source code by
performing the following two actions. First, we explicitly declared the save and
authenticate methods as deviating cases to be included in the first alternative
of the view. Second, we modified the classification of the executeSearch and
executePermission so that they were correctly classified in the action protocol.
It is important to realise that, to detect and resolve these inconsistencies, we
did not have to reason about the view-based pointcut itself, but only about the
view(s) in terms of which it was defined. Furthermore, after having synchronised
the conceptual model with the program again, the pointcut correctly captured
the intended join points and we could safely apply the aspect to the code.
The italic output aspect depends on two intensional views: Wiki Page Ele-
ments and Output Generation. As for the logging aspect, before applying the
aspect to the evolved code, we first verified validity of these views with respect
to the source code. While the views themselves remained consistent during the
considered evolution, our tool suite warned us that the relation Wiki Page El-
ements are Accepted By Output Generation was invalidated. More specifically,
it informed us that the relation failed because of the classes LinkInternal and
LinkMailTo. These are exactly the same classes that caused join point mis-
matches on our traditional implementation of the italic output pointcut.
Before applying the italic output aspect to the evolved code, we first resolved
the conflict. The problem was that the conflicting classes did not have a cor-
responding visit method in the OutputVisitor hierarchy and thus were not
directly visited by an output visitor. By adapting the Output Generation view
so that these classes are explicitly defined as deviating cases to the view, we
reconciled the intensional relation with the program. When applying the italic
output aspect to the code now, it worked as desired.
6 Discussion
Our experiment showed that, when view-based pointcuts are used to implement
the logging of actions and italic output aspects on SmallWiki, the formalism and
tool suite of intensional views allowed us to discover exactly those evolution con-
flicts that caused join point mismatches in a more traditional implementation
of the aspects. To do so, our tool did not reason about the pointcut definitions
themselves, but only about the conceptual model in terms of which they were
defined. Indeed, because the evolution we applied concerned changes to struc-
ture of the base program, only the structural dependencies contained in the
conceptual model were affected. After resolving all detected inconsistencies, by
520 A. Kellens et al.
synchronising the conceptual model with the evolved program, we could straight-
forwardly apply the aspects to that program, without having to alter the original
view-based pointcut definition.
The core of our contribution lies in the observation that the fragile pointcut
problem can be transferred to a problem space where the fundamental cause of
the problem (i.e., the structural dependency of pointcuts on code) is isolated and
easier to resolve. Rather than addressing the problem at the level of program
code, we transfer it to the level of a conceptual model, where extra conceptual
information is available that allows us to detect certain join point mismatches.
The only requirement is the existence of a conceptual model which allows to
express design-level constraints that can be verified against the code, and an as-
pect language that features model-based pointcuts which can refer to concepts
in the conceptual model. Although view-based pointcuts provide a powerful in-
stantiation of model-based pointcuts, one can easily imagine using other models
and aspect languages, as we will describe in section 7.
We do not claim that our technique detects and resolves all occurrences of
the fragile pointcut problem. Everything depends on the constraints imposed
by the conceptual model. Since, in our particular example, we started from
a case study which had already been well-documented with intensional views
and relations before [10], we were able to detect and avoid all occurrences of
the fragile pointcut problem as compared to a traditional AOP approach. In
general, the more constraints defined by the conceptual model, the lesser the
chance that certain inconsistencies go unnoticed. Further research is required on
methodological guidelines to design the conceptual model such that it provides
sufficient coverage to detect violations of the design rules.
Adoption of our model-based pointcut approach requires developers to de-
scribe a conceptual model of their program and its mapping to the program
code. This should not be seen as a burden, because it provides an explicit and
verifiable design documentation of the implementation. Such documentation is
not only valuable for evolution of aspect-oriented programs but for the evolution
of software in general. Providing a means of explicitly codifying and verifying
the coding conventions and design rules employed by developers allows them
to better respect these conventions and rules. The short term cost of having
to design the conceptual model thus pays off on longer-term because it allows
keeping the design consistent with the implementation and, consequently, allows
detecting potential conflicts when the program evolves.
under investigation. The CARMA language, for example, offers a complete logic
programming language for the definition of pointcuts. The language features of
unification and recursion offer expressiveness to render pointcut definitions more
robust [9]. The Alpha aspect language also uses a logic programming language
for the specification of pointcuts and enhances the expressiveness by providing
diverse automatically-derived models of the program and associated predicates
that can, for example, reason over the entire state and execution history [15].
EAOP [14] and JAsCo [20] offer event-based or stateful pointcuts that allow to
express the activation of an aspect based on a sequence of events during the
program’s execution.
Although such expressive pointcut languages permit to render pointcut def-
initions much less brittle, they do not make the problem disappear altogether.
A pointcut definition still needs to refer to specific base program structure or
behaviour to specify its join points. This dependency on the base program re-
mains an important source of fragility. To deal with the fragility based on struc-
tural dependencies, the user-defined conceptual model, presented in this paper,
would even complement the behavioural models used in Alpha and provide ad-
ditional expressiveness for pointcuts that need to refer to structural properties.
Furthermore, one could even argue that, in the occurrence of complex behav-
ioural property-based pointcuts, the rules that the base program needs to respect
become very complex to understand. Hence, although more expressive pointcut
languages reduce the fragility of pointcut definitions, they may render the actual
detection of broken pointcuts more difficult.
Annotations. An alternative solution that has been proposed is to define point-
cuts in terms of explicit annotations in the code [16,17]. Similar to intensional
views, annotations classify source-code entities and thereby make explicit addi-
tional semantics that would otherwise be expressed through implicit program-
ming conventions. This solution, however, addresses the fundamental cause of
the problem only partially. While the pointcut definitions are now defined in
terms of semantic properties that would otherwise have remained implicit, the
problem is displaced to the annotations themselves. Instead of requiring base
developers to adhere to implicit programming rules, we now require them to an-
notate the base program explicitly. As a consequence, pointcuts are as brittle as
the annotations to which they refer. When the base code has not been correctly
annotated, or when annotations are not correctly updated when the base code
evolves, the ‘fragile pointcut problem’ resurfaces. Havinga et al. [16] try to solve
this problem by inserting the annotations in the code automatically by means
of a pointcut that introduces them. However, this again translates the problem
to the correctness of that pointcut expression, how well it captures the intention
of the aspect developer, and how robust it is towards future changes. Neverthe-
less, we can easily imagine implementing model-based pointcuts using AspectJ’s
annotation-property pointcuts, extended with a conceptual model that imposes
additional relations and constraints on the annotations that are used.
Pointcut-delta Analysis. Pointcut delta analysis [7] tackles the fragile point-
cut problem by analysing the difference in captured join points, for each pointcut
522 A. Kellens et al.
definition, before and after an evolution. The analysis considers statically deter-
minable pointcut deltas and provides a static approximation of the join points
which are newly captured or which are no longer captured. A developer can in-
spect these deltas and verify potential join point mismatches. He is aided in the
process because the analysis also states which changes led to the delta.
While this approach can help to assess a number of interesting join point
mismatches, accidental misses which result from the addition or modification of
source-code entities that should be captured by a pointcut, but which are not,
are impossible to detect using pointcut-delta analysis. For instance, if we look
back at the logging of actions pointcut from Section 5, we see that the addition
of the save method, which is accidentally missed by the pointcut, would not be
detected by analyzing the sets of captured join points.
Nevertheless, the ideas proposed in pointcut-delta analysis can be used to
create an interesting extension to the model of intensional views. Instead of
comparing the sets of the different join points which are captured by the aspects
before and after an evolution, we could do a delta analysis on the sets of source-
code entities which belong to an intensional view. This way, a developer would
be informed of elements which, for instance, change classification or which no
longer belong to any classification. Using this feedback, the developer can then
(re)classify the source-code entities if needed.
Open Modules, Design Rules and XPI. Yet another alternative approach
is to explicitly include the pointcut descriptions in the design and implementa-
tion of the software and to require developers to adhere to this design. Sullivan
et al. [6] propose such an approach by interfacing base code and aspect code
through design rules. These rules are documented in interface specifications that
base code designers are constrained to ‘implement’, and that aspect designers are
licensed to depend upon. Once the interfaces are defined (and respected), aspect
and base code become symmetrically oblivious to each others’ design decisions.
The bare design rules approach does not provide an explicit means to verify
if developers adhere to these rules, as opposed to the intensional views model
presented in this paper. More recently, the interfaces that are defined by the
design rules can be implemented as Explicit Pointcut Interfaces (XPI’s) using
AspectJ [21]. Using XPIs, pointcuts are declared globally and some constraints
can be verified on these pointcuts using other pointcuts. Our approach is dif-
ferent in the fact that we keep the pointcut description in the aspect, leaving
more flexibility to the aspect developer. While XPIs fix all pointcut interfaces
beforehand, our conceptual model only fixes a classification of the structural
source code entities. Another approach is presented by Aldrich in his work on
Open Modules [22]. In this approach, modules must advertise which join points
can be captured by the aspects that are external to the module. The difference
in applicability and expressiveness of these and our approaches remains to be
investigated.
Demeter Interfaces. Independently of our work, Skotiniotis et al. [23] ad-
dress a variant of the fragile pointcut problem, but in the context of adaptive
programming, in a way that is very similar to our solution. Adaptive program-
Managing the Evolution of Aspect-Oriented Software 523
8 Conclusion
The fragile pointcut problem is a serious inhibitor to evolution of aspect-oriented
programs. At the core of this problem is the too tight coupling of pointcut
definitions with the base program’s structure. To solve the problem we propose
the novel technique of model-based pointcuts, which translates the problem to
a more conceptual level where it is easier to solve. This is done, on the one
hand, by decoupling the pointcut definitions from the actual structure of the
base program, and defining them in terms of a conceptual model of the software
instead. On the other hand, the conceptual model classifies program entities and
imposes high-level conceptual constraints over those entities, which renders the
conceptual model more robust towards evolutions of the base program. Potential
evolution conflicts can now be detected at that level, and solving these conflicts
requires changing either the conceptual model or its mapping to the program
code, but leaves the model-based pointcut definitions themselves intact.
As a particularly powerful instantiation of model-based pointcuts, we imple-
mented a formalism of view-based pointcuts, which extends the CARMA aspect
524 A. Kellens et al.
Acknowledgements
The authors appreciate the feedback received on previous drafts of this paper. In
particular, they wish to thank Yann-Gaël Guéhéneuc, Pascal Costanza, Kevin
Sullivan, Sebastián González, Tom Tourwé, Mira Mezini and all anonymous
reviewers.
References
1. Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtoir, J.,
Irwin, J.: Aspect-oriented programming. In: European Conference on Object-
Oriented Programming (ECOOP). LNCS, Springer Verlag (1997) 220–242
2. Parnas, D.L.: On the criteria to be used in decomposing systems into modules.
Communications of the ACM 15(12) (1972) 1053–1058
3. Sommerville, I.: Software Engineering, 6th edition. Pearson Education Ltd (2001)
4. Xu, J., Rajan, H., Sullivan, K.: Understanding aspects via implicit invocation. In:
Automated Software Engineering (ASE), IEEE Computer Society Press (2004)
5. Filman, R., Friedman, D.: Aspect-oriented programming is quantification and
obliviousness (2000) Workshop on Advanced Separation of Concerns (OOPSLA).
6. Sullivan, K., Griswold, W., Song, Y., Chai, Y., Shonle, M., Tewari, N., Rajan, H.:
On the criteria to be used in decomposing systems into aspects. In: Symposium
on the Foundations of Software Engineering joint with the European Software
Engineering Conference (ESEC/FSE 2005), ACM Press (2005)
7. Stoerzer, M., Graf, J.: Using pointcut delta analysis to support evolution of aspect-
oriented software. In: International Conference on Software Maintenance (ICSM),
IEEE Computer Society Press (2005) 653–656
8. Koppen, C., Stoerzer, M.: Pcdiff: Attacking the fragile pointcut problem. In: First
European Interactive Workshop on Aspects in Software (EIWAS). (2004)
9. Gybels, K., Brichau, J.: Arranging language features for more robust pattern-based
crosscuts. In: Aspect-Oriented Software Development (AOSD). (2003)
10. Mens, K., Kellens, A., Pluquet, F., Wuyts, R.: Co-evolving code and design with
intensional views - a case study. Computer Languages, Systems and Structures
32(2-3) (2006) 140–156
11. Kiczales, G., Mezini, M.: Aspect-oriented programming and modular reasoning.
In: International Conference on Software Engineering (ICSE), ACM Press (2005)
12. Mikhajlov, L., Sekerinski, E.: A study of the fragile base class problem. In: Euro-
pean Conference on Object-Oriented Programming (ECOOP). LNCS (1998)
13. Steyaert, P., Lucas, C., Mens, K., D’Hondt, T.: Reuse contracts: Managing the evo-
lution of reusable assets. In: Object-Oriented Programming, Systems, Languages
and Applications (OOPSLA’96), ACM Press (1996) 268–285
Managing the Evolution of Aspect-Oriented Software 525
14. Douence, R., Fritz, T., Loriant, N., Menaud, J.M., Ségura, M., Südholt, M.: An ex-
pressive aspect language for system applications with arachne. In: Aspect-Oriented
Software Development (AOSD). (2005)
15. Ostermann, K., Mezini, M. Bockisch, C.: Expressive pointcuts for increased mod-
ularity. In: European Conference on Object-Oriented Programming (ECOOP).
(2005)
16. Havinga, W., Nagy, I., Bergmans, L.: Introduction and derivation of annotations in
AOP: Applying expressive pointcut languages to introductions. In: First European
Interactive Workshop on Aspects in Software. (2005)
17. Kiczales, G., Mezini, M.: Separation of concerns with procedures, annotations,
advice and pointcuts. In: European Conference on Object-Oriented Programming
(ECOOP). LNCS, Springer Verlag (2005)
18. Mens, K., Michiels, I., Wuyts, R.: Supporting software development through declar-
atively codified programming patterns. Special issue of Elsevier Journal on Expert
Systems with Applications (2001)
19. Renggli, L.: Collaborative web : Under the cover. Master’s thesis, University of
Berne (2005)
20. Vanderperren, W., Suvee, D., Cibran, M.A., De Fraine, B.: Stateful aspects in
JAsCo. In: Software Composition (SC). LNCS (2005)
21. Griswold, W., Sullivan, K., Song, Y., Shonle, M., Teware, N., Cai, Y., Rajan.H.:
Modular software design with crosscutting interfaces. IEEE Software, Special Issue
on Aspect-Oriented Programming (2006)
22. Aldrich, J.: Open modules: Modular reasoning about advice. In: Proceedings of the
European Conference on Object-Oriented Programming. Volume 3586 of LNCS.,
Springer (2005) 144–168
23. Skotiniotis, T., Palm, J., Lieberherr, K.: Demeter interfaces: Adaptive program-
ming without surprises. In: European Conference on Object-Oriented Program-
ming (ECOOP). LNCS (2006)
24. Harrison, W., Ossher, H., Jr., S.M.S., Tarr, P.: Concern modeling in the con-
cern manipulation environment. IBM Research Report RC23344, IBM Thomas J.
Watson Research Center, Yorktown Heights, NY (2004)
25. Sutton, S., Rouvellou, I.: Modeling of software concerns in cosmos. In: Aspect-
Oriented Software Development (AOSD), ACM (2002) 127–133
26. Murphy, G., Notkin, D., Sullivan, K.: Software reflexion models: Bridging the
gap between source and high-level models. In: Symposium on the Foundations of
Software Engineering (SIGSOFT), ACM Press (1995) 18–28
27. Baniassad, A.L.A., Murphy, G.C.: Conceptual module querying for software reengi-
neering. In: International Conference on Software Engineering (ICSE), IEEE Com-
puter Society (1998) 64–73
Author Index