You are on page 1of 11

Sirius: Static Program Repair with Dependence

Graph-Based Systematic Edit Patterns


Kunihiro Noda, Haruki Yokoyama, and Shinji Kikuchi
Fujitsu Research, Japan
{noda.kunihiro, yokoyama.haruki, skikuchi}@fujitsu.com

Abstract—Software development often involves systematic edits, deltas in development histories [4]–[8]; learning systematic
similar but nonidentical changes to many code locations, that are program transformations from change examples and applying
error-prone and laborious for developers. Mining and learning them elsewhere [9]–[15]; building repair pipelines that auto-
such systematic edit patterns (SEPs) from past code changes
enable us to detect and repair overlooked buggy code that mate the processes of mining and applying SEPs [16].
arXiv:2107.04234v1 [cs.SE] 9 Jul 2021

requires systematic edits. To mine SEPs, most techniques take syntax-based (to-
A recent study presented a promising SEP mining technique ken/AST diff-based) approaches that capture similar struc-
that is based on program dependence graphs (PDGs), while tural changes and generalize them to patterns [5], [6],
traditional approaches leverage syntax-based representations. [17]. Recently, a dependence graph-based SEP miner, called
PDG-based SEPs are highly expressive and can capture more
meaningful changes than syntax-based ones. The next challenge CPatMiner, has been presented [4]. It represents code edits as
to tackle is to apply the same code changes as in PDG-based changes of program dependence graphs (PDGs) and detects
SEPs to other code locations; detection and repair of overlooked frequent and common graph patterns, resulting in PDG-based
locations that require systematic edits. Existing program trans- SEPs. Semantic change patterns (PDG-based SEPs) obtained
formation techniques cannot well address this challenge because from CPatMiner are more complex and meaningful than SEPs
(1) they expect many structural code similarities that are not
guaranteed in PDG-based SEPs or (2) they work on the basis of mined using syntax-based techniques.
PDGs but are limited to specific domains (e.g., API migrations). The next important challenge to tackle is to apply the
We present in this paper a general-purpose program transfor- same changes as in PDG-based SEPs to other code locations,
mation algorithm for applying PDG-based SEPs. Our algorithm meaning detection and repair of overlooked locations that
identifies a small transplantable structural subtree for each PDG require systematic edits. Most existing program transformation
node, thereby adapting code changes from PDG-based SEPs to
other locations. We construct a program repair pipeline Sirius techniques cannot effectively address this challenge because
that incorporates the algorithm and automates the processes of they expect many structural similarities among pattern in-
mining SEPs, detecting overlooked code locations (bugs) that stances and locations where SEPs are applied. Such structural
require systematic edits, and repairing them by applying SEPs. similarities are not guaranteed when using PDG-based SEPs.
We evaluated the repair performance of Sirius with a corpus
There are a few transformation techniques based on semantic
of open source software consisting of over 80 repositories. The
results indicate that Sirius greatly outperformed the state-of-the- edit patterns [18]–[21], but they are all limited to specific
art technique for syntax-based SEPs. Sirius achieved a precision domains (e.g., API migrations). There are no general-purpose
of 0.710, recall of 0.565, and F1-score of 0.630, while those transformation techniques for PDG-based SEPs presented yet.
of the state-of-the-art technique were 0.470, 0.141, and 0.216, In this paper, we propose a general-purpose program trans-
respectively.
formation algorithm for applying PDG-based SEPs to other
Index Terms—automated program repair, program depen-
dence graph, systematic edit patterns, program transformation code locations. We build a program repair pipeline Sirius that
incorporates our algorithm. It automates the processes of min-
ing PDG-based SEPs from repositories and detecting/repairing
I. I NTRODUCTION overlooked code locations that require systematic edits. In
The repetitiveness of code changes has been empirically Sirius, we leverage PDG-based representation of SEPs used
observed [1]–[3]. Similar but nonidentical changes to many in CPatMiner [4] with slight modifications, in which an SEP
code locations, called systematic edits, are often required is represented as a change from old to new PDGs.
in several development activities such as continual feature Sirius consists of three components: a miner, detector, and
additions, bug fixes, and API migrations. Manually searching transformer. The miner extracts past code changes as PDG
and applying nonidentical similar edits to many locations are changes, then obtains PDG-based SEPs by frequent subgraph
quite error-prone, and developers sometimes overlook some mining. The detector component detects locations where SEPs
code locations that require systematic edits. can be applied in the given client code (repair targets). It
By capturing such repetitive edits in reusable forms as represents client code as PDGs and finds matches against
systematic edit patterns (SEPs), we can automate the error- SEP graphs; matched locations (subgraphs) can be transformed
prone tasks and detect/repair the overlooked code locations according to SEPs. The transformer applies the same changes
to edit. Various studies on SEPs have been conducted for as in SEPs to client locations identified by the detector.
assisting development activities: mining SEPs from change One straightforward idea for applying a PDG-based SEP is
1 public void ex1(...) { 1 public void ex2(...) {
2 ... 2 ...
3 - License license = app.getLicense(); 3 - License li = app.getLicense();
4 + License license = app.readLicense(Env.getDefault()); 4 + License li = app.readLicense(env);
5 logger.info(license.getName()); 5 try {
6 + if (license.getType() != LicenseType.TRIAL) { 6 ...
7 context.add(license); 7 + if (LicenseType.TRIAL != li.getType()) {
8 + } 8 + logger.info(li.getType());
9 + app.refreshViews(); 9 context.add(li);
10 ... 10 + }
11 ...
Fig. 1. Code-change example 1 (Wex1 ).
Fig. 2. Code-change example 2 (Wex2 ).

𝐨𝐥𝐝 𝐧𝐞𝐰
to transform the client PDG such that it contains the same 𝐆(𝐖𝐞𝐱𝟏 ) 𝐆(𝐖𝐞𝐱𝟏 )
recv
structure as the new PDG of the SEP. However, this approach info() recv logger info() recv logger app refreshViews()
is infeasible because it is impossible to reconstruct program para para Env getDefault()
getName() getName() recv
para
code (esp., control flows) from the partially transformed PDG
recv recv
(as shown in Section II). To overcome this difficulty, instead of ref def
= ref =
license license license license
directly transforming the client PDG, our algorithm identifies recv def
para para para
para para
a transplantable induced subtree of the AST for each node in add() getLicense() getType() != readLicense()
recv recv cond para recv
a PDG-based SEP, then adapts it to the client AST. context add() if TRIAL app
recv ctrl
We evaluated the effectiveness of Sirius with a corpus of app recv
𝐨𝐥𝐝 context 𝐧𝐞𝐰
𝐆(𝐒𝐞𝐱𝟏 ) LicenseType 𝐆(𝐒𝐞𝐱𝟏 )
open source software (OSS). By mining the corpus consisting
of 80+ real-world projects, we obtained 6,060 PDG-based Data Operation Control Data or Control Map
SEPs. With the mined patterns, we conducted cross-validation Legend ctrl: control, def: definition, ref: reference, recv: receiver,
para: parameter, cond: condition, qual: qualified name
to test whether Sirius can correctly simulate (i.e., learn and
apply) a systematic edit for each pattern instance, which Fig. 3. Change graph CG1 that represents code-change example 1 in Fig. 1.
involved 31,273 repair trials. As a result, Sirius achieved a
precision of 0.710, recall of 0.565, and F1-score of 0.630 while
those of the state-of-the-art technique were 0.470, 0.141, and as a fine-grained PDG (fgPDG), and a code change is rep-
0.216, respectively. The contributions of this paper are: resented as a change graph. An fgPDG captures data/control
• a novel general-purpose program transformation algo- dependencies at an expression level and expresses API and
rithm for PDG-based SEPs; object usages on the basis of GROUM [22]. A change graph
• a program repair pipeline Sirius that fully automates the contrasts the old and new fgPDGs (i.e., fgPDGs before and
processes of mining SEPs, detecting overlooked code lo- after code changes) by placing the two graphs side by side.
cations (bugs) that require systematic edits, and repairing The code change in Fig. 1 is represented as change graph
them by applying SEPs; CG1 in Fig. 3. Map edges linking old and new nodes are set on
• a quantitative evaluation with a corpus of 80+ real-world the basis of AST node mappings calculated using a traditional
projects, which shows the effectiveness of Sirius. AST differencing algorithm. Note that by definition transitive
closure is computed, resulting in a graph with transitive edges,
II. M OTIVATING E XAMPLE but we omit those edges in Fig. 3 for simplicity.
We take the two past code changes shown in Figs. 1 and 2 After representing the code changes in Fig. 2 as change
as our motivating example. The changes have a common edit graph CG2 in the same manner, we find SEPs in CG1 and
pattern: a deprecated method getLicense is replaced with a CG2 by frequent subgraph mining. The blue rectangle part in
newer one readLicense, and context.add is surrounded with an Fig. 3 is common between CG1 and CG2; thus, it becomes
if statement regarding license types. There are also irrelevant an SEP (whose support in frequent subgraph mining is 2).
edits (colored in yellow); they will not be included in a mined We use the following notations: G(X) denotes an fgPDG of
SEP because they are not common between the two changes. a code fragment X, W stands for the whole code of a method,
We assume that a client method (Fig. 4) that still uses and S stands for the code snippet corresponding to an SEP.
the deprecated getLicense exists at the latest revision of the The superscripts ‘old’ and ‘new’ represent before and after a
client project. What we would like to do with Sirius are to code change, and the subscript ‘ex1’ shows the method name.
extract an SEP (how to change code) from the past changes
(Figs. 1 and 2), detect the overlooked location that requires B. Difficulties of Program Repair with fgPDG-Based SEPs
the systematic edit (l.3 and l.10 in Fig. 4), and repair it based To repair the overlooked (deprecated) code in the client, we
on the SEP. need to apply the same code changes as in the SEP to the dep-
recated location, i.e., replacing getLicense with readLicense
A. Change Pattern Representation with Fine-Grained PDGs and surrounding ctx.add with an if guard regarding license
We exploit the PDG-based representation of SEPs used in types. However, straightforward ideas for achieving this code
CPatMiner [4]. The program code of a method is represented transformation result in several difficulties, as follows.
getName()
1 public void c() { Miner Detector Transformer
def 𝐆(𝐒𝐜 )
2 ... recv ref
= (A), (B), (C), (D)
3 License license = app.getLicense(); license license Code Repositories Client code
4 ... para para ref.
recv
5 for (...) { add() getLicense()
6 ... recv recv code deltas transplant subtrees
7 if (license.isExpired()) { ctrl Client AST
ctx app
8 ...
9 } else { ctrl cond old/new ASTs
... if isExpired() (C) Client fgPDG Transformed
10 ctx.add(license); // ctx = context
client AST
11 ... ctrl ctrl
... for 𝐆(𝐖𝐜 ) (A) change graphs detect graph matching
Fig. 4. Client method (Wc ). mine freq. subgraphs
Fig. 5. fgPDG of Wc . locations Transformed
(B) (D) where SEPs client code
SEPs can be applied

1) Use of Existing Transformation Techniques: One pos- Fig. 6. Overview of our repair pipeline Sirius.
sible idea for applying the SEP to the client is to leverage
existing transformation techniques for syntax-based SEPs;
arguments of readLicense.
however, they do not work well for the motivating example.
They first compute generalized AST edit scripts between old Sirius that incorporates our transformation algorithm for
and new code in an SEP (i.e., a set of Wold and Wnew ). The PDG-based SEPs avoids the difficulties mentioned above. We
edit script is then applied to the client code. The problem is elaborate on the details of Sirius in the next section.
that program structures of the pattern instances (Wex1 , Wex2 ) III. S IRIUS : R EPAIR P IPELINE WITH PDG-BASED SEP S
and client (Wc ) mutually diverge; thus, it is quite difficult (or
We present an automated program repair pipeline Sirius.
impossible) to compute such generalized AST edit scripts and
Fig. 6 shows an overview of Sirius. There are three main
apply them to the client code.
components. (1) Miner mines fgPDG-based SEPs from devel-
For example, a state-of-the-art transformation technique
opment histories. (2) Detector detects locations where SEPs
ARES [11] fails to infer an edit pattern from the change
can be applied in the given client code. (3) Transformer applies
examples in Figs. 1 and 2, because there are many structural
the same code changes as in SEPs to the detected locations.
inconsistencies between Wex1 and Wex2 . Another state-of-the-
In the following, we denote the old version of a pattern
art technique GenPat [9] fails to apply inferred transformation
instance code where an SEP is extracted as Wold p and Sold
p .
to Wc . It succeeds in replacing getLicense with readLicense old
In our motivating example, Wp and Sp correspond to Wold
old
ex1
but inserts an if guard regarding license types to a wrong
position. It also inserts app.refreshViews() that is not a part and Sold
ex1 , respectively. The same notations are used for the new
of the systematic edit to the client. The existing techniques version of a pattern instance code (i.e., Wnewp and Snew
p ).
expect many structural similarities among pattern instances A. Miner
and client code, but such similarities are not guaranteed in
The miner extracts change graphs from development histo-
PDG-based SEPs; there are many structural inconsistencies
ries, then obtains PDG-based SEPs by frequent subgraph min-
among the code structures of the motivating examples.
ing, which is the same manner as CPatMiner [4]. We slightly
2) PDG Transformation and Code Reconstruction: Another modify the representation (abstraction level) of fgPDGs used
idea is to first transform the client PDG such that it contains in CPatMiner to reduce false positives.
the same structure as the new one of the SEP (G(Snew ex1 )), then To obtain SEPs via frequent subgraph mining, we have to
reconstruct program code from the transformed PDG. abstract (anonymize) some identifiers of program elements
Fig. 5 shows the fgPDG representation of the client Wc . because local identifier names are generally not consistent
The PDG transformation approach first replaces the subgraph among change graphs. In our motivating example (Figs. 1
G(Sc ), which is matched against G(Sold new
ex1 ), with G(Sex1 ). The and 2), variable names of License instances differ from one
approach then reconstructs client code from the transformed another: license and li. In Sirius (also in CPatMiner), the miner
PDG; however, this is infeasible due to the following reasons. abstracts some types of node labels in fgPDGs.
1) An fgPDG does not contain all types of program ele- Table I shows how to abstract node labels in fgPDGs.
ments (e.g., there are no PDG nodes corresponding to CPatMiner regards all variable names as ‘*’ (wildcard), literals
blocks and expression-statements); thus, it is impossible as their data type names (number, null, etc.), and method invo-
to recover program code from a transformed PDG. cations as their method names. This higher level of abstraction
2) It is undecidable how to connect PDG nodes inside G(S) leads to false positives in the mining results. Fig. 7 shows
to ones outside G(S). For example, the replacement of an example of a false SEP reported by CPatMiner. In Fig. 7,
G(Sc ) with G(Snew ex1 ) results in two control edges to the invoked methods have a commonly used name stop: therefore,
add node from the two if nodes (originating from G(Wc ) the change deltas using methods from different classes are
and G(Snew
ex1 )). It is impossible to determine which of the incorrectly regarded as the same systematic edit.
if statements is inside the other in the source code. It To avoid this issue, Sirius leverages fgPDGs with a lower
is also a problem that there are no PDG nodes for the level of abstraction than CPatMiner. Sirius abstracts variable
TABLE I surrounding AST nodes of recipient locations; this is the same
C OMPARISON OF HOW TO ABSTRACT NODE LABELS IN FG PDG S . type of issue described in Section II-B2-2. To address this
Tool Variable Literal Method invocation issue, we introduce Minimum Transplantable induced Subtree
Sirius type name type name receiver type + method name (MTS) and transplant AST nodes in units of MTSs.
CPatMiner * (wildcard) type name method name 2) Minimum Transplantable induced Subtree (MTS): In
transplanting AST nodes from A(Wnew p ) to A(Wc ), an MTS
1 protected void disconnectSensors() {
2 disconnectServiceUpIsRunning(); represents the smallest induced subtree such that we can
3 jmxFeed.stop(); determine the connections from the boundary (root and leaf)
4 - httpFeed.stop();
5 + if (httpFeed != null) httpFeed.stop(); nodes of the MTS to AST nodes in A(Wc ). We call an AST
6 ... Mined from apache/incubator-brooklyn node in A(Wnew p ) nc-mapped if there exists a corresponding
1 public void setDiagnosticsHandler(DiagnosticsHandler handler) { AST node in A(Wc ). MTS is defined as follows.
2
3 +
if(handler != null) {
if(diag_handler != null)
Definition 1: Given an fgPDG node g in G(Snew p ), the MTS
4 diag_handler.stop(); Mined from belaban/JGroups of g is denoted as MTS(g). MTS(g) is the smallest induced
Bold denotes Spold and Spnew.
5 ...
subtree in A(Wnew p ) that satisfies all the following conditions.
Fig. 7. Example of a false positive reported by CPatMiner. 1) MTS(g) contains an AST node in A(Wnew p ) that corre-
sponds to g;
2) The root node of MTS(g) is nc-mapped or a statement
and literal identifiers to their type names, and method invoca- node (e.g., expression-statement, if-statement, etc.);
tions to their receiver types followed by the method names. In 3) Each leaf of MTS(g) is nc-mapped or has no child.
our motivating example, the variables license and li are labeled In the following, we first elaborate on the mapping algo-
License, and the method invocations context.add and ctx.add rithm between A(Wnew p ) and A(Wc ) that are used for checking
are labeled Context#add in fgPDGs. whether each AST node is nc-mapped. We then describe how
B. Detector to apply an SEP to the client via MTS-based transplantation.
3) Mappings between A(Wnew p ) and A(Wc ): To compute
Given a client code (a repair target), the detector finds MTSs, we need to know which AST node is nc-mapped, i.e.,
locations where SEPs can be applied. To this end, it first the AST node mappings between A(Wnew p ) and A(Wc ).
converts the client code to an fgPDG (G(Wc )), then detects
Since A(Wold p ) and A(W new
p ) are the same method before
subgraphs in G(Wc ) that are isomorphic to old graphs of
and after a code change, most parts of the method are
SEPs (G(Sold p )). Code elements corresponding to the detected structurally consistent except for modified locations. Thus,
subgraphs will be transformed on the basis of the SEPs by
traditional AST differencing algorithms (e.g., GumTree [23])
the transformer. In our motivating example, G(Sc ) in Fig. 5
work well for computing their AST node mappings.
is detected as the location to transform by applying the SEP
On the other hand, A(Wnew p ) and A(Wc ) generally originate
because it is isomorphic to the old graph of the SEP (G(Sold ex1 )). from different methods. There can be many structural incon-
Some SEPs can be applied repeatedly to the same location.
sistencies in many parts of the methods. Therefore, even if we
For example, if we have an SEP p = “add(v) → if (v != null)
apply traditional AST differencing algorithms, we may not be
add(v)” and a client code c = “add(v),” we can apply p twice
able to correctly obtain their AST node mappings.
to c, resulting in “if (v != null) if (v != null) add(v).” To avoid
The detector component guarantees that G(Sc ) (i.e., client
such repetitive SEP application, the detector matches client
locations where SEPs can be applied) is isomorphic to G(Sold p ).
graphs against not only old graphs of SEPs (G(Sold p )) but also This means that AST substructures related to fgPDG nodes
new ones (G(Snew )). If a client graph matches against new
p in G(Sc ) and G(Sold p ) should be locally consistent, although,
graphs of SEPs, Sirius does not apply the SEPs to the client.
globally, Wc and Wold p have many structural inconsistencies.
In the above example, once p is applied to c, the transformed
Leveraging this property, we compute AST mappings from
c matches the new graph of p; we do not further apply p to c.
A(Wnew old
p ) to A(Wc ) through G(Wp ) and G(Wc ).
C. Transformer Algorithm 1 shows how to map AST nodes from A(Wnew p )
1) Overview: The transformer applies the same change as to A(Wc ). We assume that mappings between A(Wnew p ) and
in SEPs to code snippets corresponding to client subgraphs A(Wold A
p ) (denoted as Mnew-old ) have been computed in ad-
detected by the detector component. In the following, A(X) vance with traditional AST differencing algorithms such as
denotes an AST corresponding to a code snippet X. GumTree. Besides, mappings of fgPDG nodes inside an SEP
Since we cannot recover code or ASTs from fgPDGs (i.e., inside G(Sold p ) and G(Sc )) are given from the detector
(Section II-B2-1), instead of directly transforming fgPDGs, component and denoted as MG(S) old-cli .
our transformer identifies a set of transplantable AST nodes The algorithm transitively traverses mappings from the new
for each fgPDG node and transplants them from A(Wnew p ) to
version of a pattern instance (Wnew p ), through its old version
A(Wc ). If we simply transplant each AST node corresponding (Wold
p ), to the client (Wc ). During the traversal, the mappings
old
to each fgPDG node individually to the client graph, we between Wp and Wc that are originally given by the detector
cannot determine how to connect transplanted nodes to the are augmented on the basis of path similarities in fgPDGs.
Algorithm 1 Map nodes in A(Wnew
p ) to nodes in A(Wc ). Algorithm 2 Apply an SEP to A(Wc ).
Input: MAnew-old : AST node mappings b/w A(Wnew p ) and A(Wp );
old Input: MAnew-cli : AST node mappings b/w A(Wnew p ) and A(Wc ).
G(S) old
Mold-cli : fgPDG node mappings b/w G(Sp ) and G(Sc ). Output: A(Wnew c ): the client AST transformed by applying an SEP.
Output: MAnew-cli : AST node mappings b/w A(Wnew 1: Delete every AST node in A(Wc ) that corresponds to g ∈ G(Sc ).
p ) and A(Wc ).
2: mtsSet ← { MTS(gi , MA new
new-cli ) | gi ∈ G(Sp )
1: allOldG ← { g | n ∈ A(Wnew A
p ) s.t. getM(n, Mnew-old ) 6= null,
A
g ∈ getGNodes(getM(n, Mnew-old ))} ∧ ∀j(6= i), MTS(gi , Mnew-cli ) 6⊆ MTS(gj , MAnew-cli ) }
A

G(S) 3: for each mts ∈ mtsSet do


2: MG old-cli ← detectOldToClientGMap(allOldG, Mold-cli )
4: clientR ← getM(mts.rootNode, MAnew-cli )
3: Mnew-cli ← detectNewToClientMap(A(Wp ), MA
A new G
new-old , Mold-cli )
A 5: if clientR 6= null then
4: return Mnew-cli
6: connectASTNodes(clientR.parent, {mts.rootNode})
G(S) 7: else
5: function detectOldToClientGMap(allOldG, Mold-cli )
8: Let c ∈ mts be a nc-mapped node; m ← getM(c, MAnew-cli );
. In: allOldG: fgPDG nodes in G(Wold
p ); 9: connectASTNodeToAncestor(m, mts.rootNode)
MG(S) old
old-cli : fgPDG node mappings b/w G(Sp ) and G(Sc ). 10: for each l ∈ mts.leafNodes do
. Out: Mold-cli : fgPDG node mappings b/w G(Wold
G
p ) and G(Wc ). 11: clientL ← getM(l, MAnew-cli )
6: oldGW\S ← {g | g ∈ allOldG ∧ g 6∈ G(Sold p ) } 12: if clientL 6= null then
7: clientGW\S ← {g | g ∈ G(Wc ) ∧ g 6∈ G(Sc ) } 13: connectASTNodes(l, clientL.children)
G(W\S)
8: CMold-cli ← calcPathBasedGMap(oldGW\S , clientGW\S )
G(W\S) G(W\S)
9: Mold-cli ← solveMaxCardinalityBipartiteMatching(CMold-cli ) 14: function MTS(g, MA new-cli )
10:
G(S) G(W\S)
return Mold-cli ∪ Mold-cli 15: a ← the AST node corresponding to g; SMTS ← {a};
16: while (a is not nc-mapped ∧ a is not a statement) do
17: a ← a.parent; SMTS ← SMTS ∪ {a};
18: bfsQueue ← {c | c ∈ a.children}
As the result of the algorithm, all transitive mappings from 19: while (bfsQueue 6= ∅) do
A(Wnew A
p ) to A(Wc ) are collected in its output (Mnew-cli ). 20: c ← bfsQueue.poll(); SMTS ← SMTS ∪ {c};
In Algorithm 1, we first identify fgPDG nodes in G(Wold p )
21: if c is not nc-mapped then
new A 22: bfsQueue ← bfsQueue ∪ {c0 | c0 ∈ c.children}
that correspond to each AST node in A(Wp ) via Mnew-old
23: return SMTS
(l.1). getM(n, M ) returns the (AST or PDG) node mapped
from the given node n via the mapping M . If n has no mapped
node, it returns null. getGNodes(a) takes an AST node a in
A(X) and returns the corresponding fgPDG nodes in G(X). constructed from the corresponding AST. For an AST node
We then obtain mappings between G(Wold in A(Wnew p ), if the traversal reaches multiple AST nodes in
p ) and G(Wc ) by
detectOldToClientGMap (l.2). Since the mappings between A(Wc ), we select the mode value (appearing most often).
fgPDG nodes inside an SEP have already been identified by 4) MTS-Based SEP Application: Algorithm 2 shows how to
the detector, it needs to compute only the mappings of fgPDG apply an SEP to the client AST on the basis of MTSs. It deletes
nodes outside the SEP (i.e., outside G(Sold p ) and G(Sc )). To
the code element (AST node) corresponding to each old node
obtain the mappings, we compute path-based similarities for in the pattern graph (G(Sold p )) from the client. Then, it inserts
every pair of fgPDG nodes outside the SEP (ll.6–9). code elements corresponding to each new node in the pattern
At line 8, we encode every fgPDG node g outside the SEP graph (Snewp ) to the client in units of MTSs. Consequently, the
to a path set P(g) = {p | p ← hL(g), L(e), L(dest)i}. P(g) algorithm ensures that no code elements in Sold p and all the
enumerates every path p from the node g outside the SEP to ones in Snewp are present in the resulting client code.
a dest node inside the SEP via the edge e. L returns the label First, we delete AST nodes in A(Wc ) that correspond to
of the given node or edge. We calculate set similarities (e.g., fgPDG nodes in G(Sc ) (l.1). After that, we compute an MTS
Dice’s coefficient) for every pair of P(g1 ∈ oldGW\S ) and P(g2 for each node in G(Snew p ), resulting in a set of MTSs to trans-
∈ clientGW\S ), thereby detecting candidate node mappings be- plant (l.2). The MTS function returns the minimum induced
tween oldGW\S and clientGW\S . Each node g1 is mapped to the subtree that satisfies the conditions described in Section III-C2
node(s) g2 having the highest similarity score. Since g1 can be (ll.14–23). An MTS contained in another MTS is ignored (l.2).
mapped to more than one fgPDG node, we determine the final We then connect the boundary nodes of each MTS to the
one-to-one mappings by maximum-cardinality matching in a parent or leaves of the corresponding AST nodes in A(Wc )
bipartite graph of the two node-sets, oldGW\S and clientGW\S (ll.3–13). connectASTNodes(p, N ) connects every AST node
(l.9). In our motivating example, for example, getName in n ∈ N as the children of p. The index of n in the child list
G(Sold
ex1 ) is encoded as {hgetName, recv, licensei}. The client of p is the same as that in the AST tree having n. Note that
node getName in G(Sc ) has the same encoded value. Thus, n can be an expression; in that case, we wrap n in a newly
the two nodes are mutually mapped. created expression-statement node to connect as a child of p.
Finally, we identify the AST node in A(Wc ) that corre- If the root node of an MTS (mts.rootNode) is not nc-mapped
sponds to each AST node in A(Wnew p ) by transitively traversing (i.e., a statement node), we first select a node m in A(Wc )
the two mappings, Mnew-old and MG
A
old-cli (l.3). Note that the mapped from an nc-mapped node in the MTS (l.8). We then
correspondence between fgPDG nodes in G(X) and AST find the m’s nearest ancestor that can have statements as its
nodes in A(X) are easily identified because each fgPDG is children, and connect mts.rootNode to that node (l.9). Note
𝐀(𝐖𝐜 ) MD: client MD: ex1 𝐀(𝐖𝐩𝐧𝐞𝐰 ) names are inconsistent between the two methods. In MTS-
M: public SN: client BLK: M: public SN: ex1 BLK: based transplantation, we need to determine which names are
(A) VDS: FS: VDS: ES: IFS: appropriate to use. We also have to sometimes avoid conflicts
ST: VDF: ST: VDF: 𝐌𝐓𝐒(𝐫𝐞𝐚𝐝𝐋𝐢𝐜𝐞𝐧𝐬𝐞) of names between the two methods.
SN: License SN: license MI: getLicense SN: License SN: license MI: readLicense As a general policy, we try to use (preserve) names used in
SN: app SN: getLicense SN: app SN: readLicense MI: getDefault
Wc . Specifically, we determine the name of each variable or
literal X in MTSs as follows in the following order of priority.
SN: Env SN: getDefault
1) If there are fgPDG nodes in G(Wold p ) that have the same
MD: MethodDeclaration, M: Modifier, SN: SimpleName, BLK: Block,
VDS: VariableDeclarationStatement, IFS: IfStatement, ST: SimpleType, name as X, we try to find the corresponding node CG
VDF: VariableDeclarationFragment, MI: MethodInvocation, FS: ForStatement in G(Wc ) via MG old-cli . If found, we use the name of CG
for the name of X.
Fig. 8. Excerpt of AST related to ll.3–4 in Fig. 1.
2) If variable declaration fragments for X are transplanted
from Wnewp to Wc , we use (i.e., does not change) the
𝐀(𝐖𝐜 ) MD: client MD: ex1 𝐀(𝐖𝐩𝐧𝐞𝐰 )
name of X after transplantation.
M: public SN: client BLK: M: public SN: ex1 BLK: 3) If X corresponds to an AST node that is nc-mapped to
VDS: FS: VDS: ES: IFS: an AST node CA in A(Wc ), we use the name of CA for
MI: isValid BLK: IE: BLK:
𝐌𝐓𝐒(𝐢𝐟)
the name of X.
ES: IFS: MI: getType QN: ES: 4) Otherwise, we use (i.e., does not change) the name of
BLK: BLK: (B) MI: add X after transplantation.
ES: SN: context SN: add SN: license In 2 and 4, if the name of X has already been used for
MI: add a variable in Wc before transplantation, we rename X with a
QN: QualifiedName, IE: InfixExpression
SN: ctx SN: add SN: license fresh name to avoid a variable-name conflict.

Fig. 9. Excerpt of AST related to ll.6–8 in Fig. 1. IV. E VALUATION D ESIGN


A. Research Questions
that if a statement that is not nc-mapped cannot become the We evaluate Sirius in terms of effectiveness and efficiency.
root of an MTS, the size of an MTS enlarges, increasing RQ1: How effective is Sirius in repairing code that requires
the risk of excessive transplantation from A(Wnew p ) to A(Wc ). systematic edits represented by fgPDG-based SEPs?
Thus, we allow unmapped statements to be MTS root nodes. Our primary goal is to repair (overlooked) code locations
We show some concrete examples of MTS transplantation. that require systematic edits by leveraging fgPDG-based SEPs.
As an example, we apply the change of ll.3–4 in Fig. 1 to For RQ1, we evaluate the performance of Sirius in automated
the client: changing getLicense to readLicense. Fig. 8 shows program repair for such code locations.
the relevant parts of A(Wc ) and A(Wnew p ). First, we delete We conduct cross-validation to test whether Sirius can
AST nodes in A(Wc ) that correspond to the fgPDG nodes in correctly repair code by applying real-world systematic edits.
G(Sc ). For example, ‘VDF:’ and ‘MI: getLicense’ nodes are We measure the overall repair performance as precision, recall,
deleted. Then, computing the MTS of readLicense in G(Snew p ), and F1-score through this cross-validation. The details of the
we connect the root of the MTS to the parent of its mapped cross-validation setting are described in later sections.
node (i.e., (A) in Fig. 8). For the leaves of the MTS, since the Besides, we also investigate the performance against the
‘SN: app’ node has no children, there is no need to set up any graph sizes of patterns we exploit. We consider that the more
connections. As a result, the ‘VDS’ node ((A) in Fig. 8) has complex the change represented by a pattern is, the more
the MTS as its child, resulting in the transformed client code difficult it is to apply the pattern. The complexity of changes
to which the change of ll.3–4 in Fig. 1 has been applied. can be (indirectly) captured by the graph size (#nodes) of each
As another example, we apply the change of ll.6–8 in Fig. 1 SEP; we need to edit many AST substructures when there are
to the client: adding an if statement regarding license types. many nodes in a change graph. Thus, there is a possibility
Fig. 9 shows the relevant parts of A(Wc ) and A(Wnew p ). Since that Sirius can achieve higher performance (esp., precision)
the root r of MTS(if) is not mapped, we select an nc-mapped by exploiting only small patterns with sizes less than or equal
node ‘MI: add’ in the MTS, get its mapped node, find the to a threshold t, which is practically valuable. We measure the
nearest ancestor na that can have statements as its child (i.e., change of the performance when the size threshold t varies.
na is (B) in Fig. 9), and replace the na’s child with r. We then We also sample failure cases from the cross-validation
set up connections for the mapped leaf nodes in the MTS; we results to examine the causes of the failures.
replace the children of ‘MI: add’ in the MTS(if) with those of
‘MI: add’ in A(Wc ). As a result, add in Wc is replaced with RQ2: How efficiently (fast) can Sirius produce program
add surrounded by the if guard regarding license type. transformation results?
5) Transplantation of Variables and Literals: Since Wnew p In terms of practicality, time efficiency is also important
and Wc are generally different methods, variable and literal as well as repair performance. For this research question, we
evaluate the time efficiency of Sirius by measuring the time pi has a set of pattern instances {ep1i , ep2i , ...}. In our motivat-
spent in each phase: mining, detection, and transformation. ing example, the mined SEP has two pattern instances: one
originates from Wex1 and the other originates from Wex2 .
B. Dataset
Let C(ei ) be a function that takes a pattern instance ei and
We use the same dataset of OSS as for evaluating returns its change-id, a pair of hm, ti. m is the method where
CPatMiner [4]. There are two corpora provided by the authors the change graph of ei is extracted. t is the commit time of
of CPatMiner: survey and depth. The survey corpus was the commit where m was modified and ei is extracted. With
built for surveying developer responses, and consists of only C(ei ), for each pattern pi , we can enumerate the change-ids
fresh (the latest 50) commits of many projects to obtain of all pattern instances: CS = {C(ep1i ), C(ep2i ), ...}.
good responses from developers about their recent changes. We conduct leave-one-change-id-out cross-validation that
The depth corpus was built for in-depth analysis of pattern takes into account the chronological order of commits. Given
characteristics, and has longer histories (the latest 1K commits test data (i.e., a change-id) hm, ti, the training data are all
with Java file changes) of fewer but high-quality projects. change-ids {hmj1 , tj1 i, hmj2 , tj2 i, ...} ⊂ CS where ∀k, tjk ≤ t
We select the depth corpus as our dataset because the (i.e., future commits are not used as training data).
purpose of our experiment is not to obtain developer responses One iteration in our cross-validation is processed as follows.
but to measure the repair performance and investigate the Let C(eptest ) = hm, ti be test data and CStrain be training data
details of the results. The depth corpus contains 88 real-world where p ∈ PC is an fgPDG-based SEP. We denote the code of
projects in various domains. Every project has at least 10 m before (resp. after) the commit at the time t as mold (resp.
committers and 100 GitHub stars. The corpus contains a total mnew ); mold is the repair target and mnew contains the ground
of 1M Java files and 164M LOC. truth code of the repair. Sirius attempts to mine an fgPDG-
C. Baseline based SEP p0 from CStrain , detect code locations in mold where
We select ARES [11], a state-of-the-art program trans- p0 can be applied, and repair the detected location by applying
formation technique for syntax-based SEPs, as our baseline. p0 . Similarly, ARES attempts to infer a transformation pattern
ARES takes change examples, infers transformation patterns, from CStrain and apply it to mold for repair. We denote the
and applies the transformation to other code locations. ARES repair result of Sirius or ARES as mrepaired .
We judge that the repair result mrepaired is correct if all the
achieved higher accuracy in recommending code changes and
following conditions are satisfied: (1) mrepaired contains all the
outperformed the previous state-of-the-art technique.
ARES was originally evaluated in two different set- code elements CEnew in mnew such that CEnew corresponds
tings [11]. The first one uses two change examples for inferring to the pattern’s new graph (G(Snew p )); (2) mrepaired does not

more precise transformation patterns. The other one takes all contain any code elements CEold in mold such that CEold
change examples to increase recall, which might result in corresponds to the pattern’s old graph (G(Sold p )); and (3)

overgeneralization (less accurate results). Following that study, mrepaired contains all the code elements that are unchanged
we also use ARES in the same two settings, which we call between mold and mnew . Conditions (1) and (2) check success-
ARES-2 and ARES-all, respectively. ful insertion and deletion of pattern-related code, respectively.
There is another recent publicly-available transformation Condition (3) ensures that there are no excessive modifications.
technique, GenPat [9], which learns program transformation We check conditions (1) and (2) by testing whether the
from one change example while Sirius and ARES require fgPDG of mrepaired contains G(Snew p ) and does not con-

several (many) change examples to learn. The advantage of tain G(Soldp ), respectively. Also, we check condition (3) by
GenPat is that it can learn from rarely-conducted systematic AST/token-based comparison. Note that if mrepaired is syn-
edits. However, learning from just one example tends to tactically invalid, we cannot build its AST/fgPDG, regarding
result in wrong and excessive transformation as described in mrepaired as an incorrect result (repair failure).
Section II-B1. Direct comparison between GenPat and Sirius The above evaluation logic relies on fgPDG-based com-
is not fair. Thus, we do not select GenPat as our baseline. parison. Although fgPDGs constitute the basis for Sirius’s
transformation algorithm, the fgPDG-based evaluation logic
D. Procedure does not unfairly advantage Sirius. First, p0 used for Sirius’s
We conduct cross-validation to measure the repair perfor- transformation is different from p used for evaluation. Sirius
mance. First, we mine fgPDG-based SEPs from our dataset, (and ARES) does not know anything about p during repair.
resulting in a pattern collection PC. To obtain the PC, we run Besides, the transformation algorithm of Sirius transplants not
the pattern miner with the minimum support of 3 because, as fgPDG (sub)structures but partial ASTs (MTSs) to the client
described below, each pattern must have at least three pattern (e.g., Sirius does not transplant dependence edges). Thus, our
instances: more than two change examples as training data evaluation logic (fgPDG-based comparison) is independent of
and one example as test data. With PC, we conduct leave- the transformation logic of Sirius, meaning that it does not give
one-out cross-validation that simulates a systematic edit for any advantages to Sirius. Note that we cannot check conditions
each pattern instance, which reports the repair performance. (1) and (2) by AST-based comparison that compares the code
Let PC be a set of mined fgPDG-based SEPs: PC = difference between mold and mnew with that between mold
{p1 , p2 , ...} where pi is an fgPDG-based SEP. Each pattern and mrepaired , because the difference between mold and mnew
Sirius-P Sirius-R Sirius-F1
TABLE II #patterns #instances
100,000 ARES-2-P ARES-2-R ARES-2-F1
OVERALL REPAIR PERFORMANCE . 1
10,000
0.8
Tool Precision Recall F1-score 1,000
0.6
Sirius 0.710 (17,683/24,889) 0.565 (17,683/31,273) 0.630 100
0.4
ARES-2 0.470 (4,600/9,778) 0.141 (4,395/31,273) 0.216 10
0.2
ARES-all 0.166 (4,335/26,168) 0.123 (3,832/31,273) 0.141 1
0 10 20 30 40 50 60 70 80 0
0 10 20 30 40 50 60 70 80
Graph size Graph size threshold t
contains pattern-unrelated changes (e.g., the yellow-colored Fig. 10. Pattern distribution and repair performance against graph sizes of
lines in Figs. 1 and 2). Checking (1) and (2) intrinsically patterns. The left figure shows the numbers of patterns and their instances for
each graph size. The right figure shows the repair performance for patterns
requires fgPDG-based comparison. with sizes of t or less. P and R denote precision and recall, respectively.
V. E VALUATION R ESULTS
A. RQ1: Effectiveness
Fig. 11 shows the relation of success and failure cases
First, we ran the pattern miner on the entire set of projects between Sirius and ARES-2. Importantly, the number of cases
in the dataset. The miner analyzed 288,497 change graphs, for which Sirius failed but ARES-2 succeeded accounts for
then produced 6,060 fgPDG-based SEPs. We then conducted only 4.53% of the total while that for which Sirius succeeded
cross-validation using the mined patterns to measure the repair but ARES failed accounts for 47.02%. This means that Sirius
performance of Sirius and the baselines. This involved a total has an advantage over ARES-2 for almost all SEPs.
of 31,273 repair trials for each technique.
We sampled 100 Sirius’s failure cases to manually examine
1) Overall Performance: Table II shows the overall results
their causes. We partitioned the failure cases into four cate-
of the cross-validation. Sirius greatly outperforms the base-
gories on the basis of the size s of each pattern graph: small
lines for precision, recall, and F1-score. The total numbers
(s ≤ 4); medium (5 ≤ s ≤ 10); large (11 ≤ s ≤ 20); and very
of generated transformed code are shown in parentheses in
large (21 ≤ s). We used stratified sampling with proportional
Table II. Note that, for each repair trial in the cross-validation,
allocation over the four categories. As a result, we identified
the number of transformed code generated by ARES can be
the following major causes of repair failures.
more than one while that of Sirius is at most one.
ARES cannot well infer transformation patterns for many • (1) Incorrect mined SEPs: This type accounts for 40%
fgPDG-based SEPs because there are many structural inconsis- of the total failure cases. All the failure cases of this
tencies among pattern instances and client code, as described type are due to the lack of examples to learn. We
in Section II. As the number of change examples for inferring excluded future commits from training data as described
transformation increases, such inconsistencies are more likely in Section IV-D, which can lead to insufficiency of train-
to occur, which is one of the main reasons that the performance ing data. Patterns mined from fewer training examples
of ARES-all is lower than that of ARES-2. can contain changes specific only to the training data,
On the other hand, Sirius can handle such inconsistent meaning insufficient generalization of the patterns. Such
cases thanks to MTS-based transplantation that requires less patterns can be inapplicable to other code locations, or
structural similarity. can introduce unnecessary modifications (specific only to
Since ARES-2 outperforms ARES-all, we compare Sirius the training data) into test data.
with ARES-2 in the following. • (2) Excessive transplantation of AST nodes: This
2) Performance Against Graph-Size Threshold: The left type accounts for 24% of the total. Sirius transplants
figure in Fig. 10 shows the distribution of the number of mined AST nodes in units of MTSs. MTSs by definition can
patterns and their instances. The right figure in Fig. 10 shows include AST nodes outside SEPs. It means that MTS-
the repair performance of each technique for patterns with based transplantation intrinsically has a risk of excessive
sizes of t or less. transplantation.
Sirius achieves higher performance compared with ARES For example, we observed that arguments of method
at any point of the threshold t. Especially, Sirius has good invocations sometimes were unnecessarily modified via
precision (≈ 0.8) and recall (≈ 0.6) for small sizes of patterns. MTS-based transplantation. Suppose that getLicense in
Since patterns with a size of 10 or less account for around 90% the motivating examples requires an argument and the
of the total, this is valuable for practical use. same expression is specified as its argument before and
The performance of Sirius decreases as the size threshold t after the change in each example (i.e., l.3 in Fig. 1 is “... =
increases. This is because the complexity of transplantation app.getLicense(Env.getDefault());” and l.3 in Fig. 2 is “...
increases for larger pattern graphs where we need to edit = app.getLicense(env);”). Also, suppose that variable e is
much more AST nodes. On the other hand, as for ARES, we specified as the argument of getLicense at l.3 in the client
cannot observe any trends in the performance change against (Fig. 4). In this situation, the arguments of getLicense are
the graph size threshold. Since ARES does not use the pattern uncommon between the examples and thus they are not
graphs, the complexity of transformations in ARES does not included in the mined SEP. The ground truth of the client
necessarily depend on graph size. code after repair should contain readLicense(e); argument
e must not be changed by repair. However, when Sirius
transplants AST nodes from A(Wnew ex1 ) to the client, argu-
ment Env.getDefault() is included in MTS(readLicense)
(see Fig. 8); thus, readLicense(Env.getDefault()) is in-
serted into the client (i.e., argument e is changed), re-
sulting in a repair failure.
• (3) Insufficient deletion of AST nodes: This type
accounts for 17% of the total. Currently, our algorithm Fig. 12. Time spent in repair. L, D,
Fig. 11. Numbers of success and
T, and DT denote learning (mining),
simply deletes AST nodes corresponding to fgPDG nodes failure cases between Sirius and
detection, transformation, and the to-
ARES-2.
in pattern’s old graphs G(Soldp ). This deletion strategy is tal of D and T, respectively.
sometimes insufficient. For example, although a method
invocation node MI is in G(Sold p ), the receiver object of Sirius needs to compute subgraph isomorphism between
MI might not be in G(Sold p ). In that case, Sirius deletes pattern and client graphs in the detection phase, which takes
only the AST node of MI, leaving the receiver object no small amount of time. Transformation templates (patterns)
in the client code. This can lead to unparsable (invalid) inferred by ARES contain the information on how to edit
code; thus, the result is judged as failures. AST nodes, whereas change graphs obtained by Sirius do not
• (4) Incorrect detection results: This type accounts for
indicate AST edit operations. In our current implementation,
8% of the total. Sometimes, the detector reported more Sirius needs to compute AST mappings via dependence graphs
locations than necessary. Suppose that there is an SEP and MTSs in the transformation phase, which results in
that replaces m1() with m2(), and the client has three in- lower efficiency. Note that because some of the data used in
vocations of m1(). Even if the correct change is to replace the transformation phase can be computed in advance (e.g.,
only two of the three m1() with m2(), the detector will MTSs), there is room for improvement of the time efficiency
report all the three occurrences of m1() as the locations of the transformer in Sirius.
to transform, resulting in excessive transformation. The median time for applying a systematic edit with Sirius is
• (5) Imprecision in AST differencing: This type accounts
≈12 seconds. It would be a more suitable use-case to integrate
for 8% of the total. We compute the AST node mappings Sirius into continuous integration environments as a repair bot,
between A(Wnew old
p ) and A(Wp ) via a traditional AST dif- where a short response time is not necessarily required, rather
ferencing algorithm. Sometimes, the computed mappings than using it in real-time within IDEs during coding tasks.
are imprecise, leading to repair failures.
It is worth noting that, although it is not a major cause, C. Threats to Validity
we observed two cases where variable names were incorrectly Although the dataset we used in our evaluation contains
determined. As described in Section III-C5, we heuristically various OSS projects, we cannot further generalize the exper-
determine variable names via mappings between pattern and imental results to other projects. However, our transformation
client graphs. Some of the mappings are computed on the basis algorithm is general-purpose and is not limited to specific
of path similarities in fgPDGs (Algorithm 1), which can lead domains. The technique should work well for projects outside
to some imprecision. In our experiment, such imprecision led the dataset.
to incorrect variable names. We evaluated ARES in two different settings: ARES-2 and
Causes (1), (4), and (5) are common problems with Sirius ARES-all. We have not searched for the optimal number of
and existing techniques for syntax-based SEPs, while (2) and change examples to provide to ARES. Changing the num-
(3) are specific to our algorithm. These results point out ber of input examples might improve the performance of
possible directions of future research: developing techniques ARES. However, it is intrinsically difficult for ARES to infer
that can learn transformations from much fewer examples (for transformation patterns from a set of code examples with
(1)); improving the AST transformation algorithm (for (2) and many structural inconsistencies. Sirius should work better than
(3)); and making the detector more intelligent so that we can ARES for fgPDG-based SEPs regardless of the number of
control locations to transform appropriately (for (4)). input examples.
B. RQ2: Efficiency VI. R ELATED W ORK
Fig. 12 shows the distribution of the time spent on one A. Systematic Edit Patterns
repair trial (i.e., one iteration in the cross-validation). We do 1) Change Pattern Mining: Most existing techniques for
not plot the individual time of detection and transformation mining change patterns are based on syntax-based analyses.
for ARES-2 because they cannot be invoked individually. The simplest way for mining SEPs is line/token diff-based
Sirius is more time-efficient in learning (mining) while analysis [7], [16]. After anonymizing identifier names that can
ARES is more time-efficient in detection and transformation. be uncommon among different methods (e.g., variable names),
In terms of practicality, the time efficiency of detection and they find common changes of tokens, resulting in line/token-
transformation should be more important because learning can based SEPs. To identify the type of each token (e.g., method
be carried out in advance. name or variable), they also leverage AST-based analysis.
SysEdMiner [5] and C3 [6] use AST diff-based approaches representations of change examples. They require additional
to obtain SEPs. This type of approach computes the AST mappings between old and new API methods as their inputs,
edit script for each change example with AST-differencing whereas Sirius does not require them. Also, their transforma-
algorithms [23], [24]. It then finds generalized AST edit tion algorithms do not address the difficulties described in
operations that are common among examples via frequent Section II-B2; thus, we cannot leverage their algorithms to
itemset mining or clustering (e.g., DBSCAN [25]). apply fgPDG-based SEPs to other code locations.
CPatMiner [4] is a recently-proposed SEP mining technique. B. Test-Driven Program Repair
It detects frequent fgPDG changes as fgPDG-based SEPs.
There is a large body of work in the research field of
Compared with traditional syntax-based techniques, it can
automated program repair (APR) [33]–[36]. Most of them
capture more meaningful changes and is more robust against
leverage test cases in their repair flows, while repair techniques
structural inconsistencies among examples.
based on SEPs (including Sirius) do not require any test code.
While all the above techniques can be used for general pur-
Test-driven APR techniques first localize buggy locations,
poses, several techniques are specialized for specific domains.
then generate repair patches. Typically, fault localization (FL)
For example, A3 [18] mines API migration patterns from API
is carried out by analyzing test outputs [37], [38]. After
documents and change examples, and Phoenix [21] mines fix
FL, they generate candidate patches by, for example, us-
patterns for static checker warnings for automated repair.
ing predefined templates [39] or machine translation tech-
2) Program Transformation: Program transformation tech-
niques [40], [41]. They validate generated patches against
niques detect common change elements or edit operations
the existing test cases to obtain correct (plausible) ones. To
among change examples, then generalize them as executable
better explore the search space of candidate patches, some
transformations that apply change patterns to other locations.
techniques leverage machine-learned models [42], [43] or past
Most transformation techniques analyze syntax-based
change patterns [44]–[46]. Generated patches that pass existing
edit representations to obtain executable transformations.
test cases are often afflicted by overfitting problems. Many
Precfix [16] involves token-level generalization to change
techniques have tried to address the issue via, for example,
examples and infers generic patch templates. Lase [12] find
test case generation [47].
common AST edit operations among examples and relate them
One of the major practical issues of test-driven repair is the
with dependent code elements to represent edit contexts. It
lack of test cases. Test-driven repair requires test cases for FL
applies the generalized AST edit operations to the locations
and patch validation; however, such test cases are not available
having the same contexts. Rase [26] (built on Lase) automates
before bug fixing in realistic situations [48], [49]. On the other
a series of systematic editing for refactoring. VuRLE [27]
hand, SEP-based repair does not require any test cases and its
generates multiple repair templates for each type of vulnera-
feasibility is not impaired by this practical issue.
bility fixes, which increases the precision and recall of repair.
ARES [11], [28] leverages the specialized transformation tem- VII. C ONCLUSION
plates. It effectively handles variations among input examples Developers continually conduct systematic editing, similar
and move-operations, resulting in better accuracy than Lase. but nonidentical changes to many code locations, along with
A few techniques [9], [13], [29] can learn a transformation software evolution. Such repetitive code editing is often error-
from just one example while most existing tools learn from prone, leading to overlooked (buggy) code locations that
multiple examples. Because some types of systematic edits require systematic edits.
(e.g., bug fixes) do not frequently occur, learning from just In this paper, we proposed Sirius, a repair pipeline that
one example is an important feature in terms of practicality, automates all processes of mining fgPDG-based SEPs, de-
although they can be imprecise due to fewer examples (e.g., tecting overlooked locations that require systematic edits, and
excessive transformation described in Section II-B1). repairing the detected code locations. Sirius leverages our
The above techniques that find common syntactical edit op- general-purpose program transformation algorithm for apply-
erations do not work well for fgPDG-based SEPs because there ing the same code changes as in fgPDG-based SEPs to other
are many structural inconsistencies among input examples and code locations. There can be many structural inconsistencies
client code. among pattern instances and client code; therefore, tradi-
Refazer [10] leverages a domain-specific language (DSL) tional program transformation techniques cannot handle code
representation to express transformations. Direct comparison changes captured via fgPDG-based SEPs well. Our algorithm
between Refazer and Sirius is difficult because the target overcomes this difficulty by computing local and partial AST
programming languages are different. However, Refazer’s DSL node mappings via dependence graphs, and transplanting AST
does not take into account program dependencies; thus, it substructures in units of MTSs.
would not work well for fgPDG-based SEPs. We evaluated Sirius with a corpus of OSS consisting of over
There are some techniques specialized for specific domains 80 real-world projects. The results indicate that Sirius outper-
and use-cases: API migrations [18]–[20], [30], fixes of static formed the state-of-the-art program transformation technique
checker warnings [21], [31], and real-time recommendations regarding precision, recall, and F1-score. Sirius achieved good
in IDEs [32]. Of these, A3 [18] and LibSync [20], which precision and recall for most of the mined patterns, which
automate API migrations, leverage dependence graph-based would be valuable for practical use.
R EFERENCES [25] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm
for discovering clusters in large spatial databases with noise,” in Proc.
[1] M. Kim and D. Notkin, “Discovering and representing systematic code KDD, 1996, pp. 226–231.
changes,” in Proc. ICSE, 2009, pp. 309–319. [26] N. Meng, L. Hua, M. Kim, and K. S. McKinley, “Does automated
[2] H. A. Nguyen, A. T. Nguyen, T. T. Nguyen, T. N. Nguyen, and H. Rajan, refactoring obviate systematic editing?” in Proc. ICSE, 2015, pp. 392–
“A study of repetitiveness of code changes in software evolution,” in 402.
Proc. ASE, 2013, pp. 180–190. [27] S. Ma, F. Thung, D. Lo, C. Sun, and R. H. Deng, “VuRLE: Automatic
[3] E. T. Barr, Y. Brun, P. Devanbu, M. Harman, and F. Sarro, “The plastic vulnerability detection and repair by learning from examples,” in Proc.
surgery hypothesis,” in Proc. FSE, 2014, pp. 306–317. ESORICS, S. N. Foley, D. Gollmann, and E. Snekkenes, Eds., 2017, pp.
[4] H. A. Nguyen, T. N. Nguyen, D. Dig, S. Nguyen, H. Tran, and M. Hilton, 229–246.
“Graph-based mining of in-the-wild, fine-grained, semantic code change [28] G. Dotzler, “Learning code transformations from repositories,” Doctoral
patterns,” in Proc. ICSE, 2019, pp. 819–830. Thesis, FAU University Press, 2018.
[5] T. Molderez, R. Stevens, and C. De Roover, “Mining change histories [29] N. Meng, M. Kim, and K. S. McKinley, “Sydit: Creating and applying
for unknown systematic edits,” in Proc. MSR, 2017, pp. 248–256. a program transformation from an example,” in Proc. FSE, 2011, pp.
[6] P. Kreutzer, G. Dotzler, M. Ring, B. M. Eskofier, and M. Philippsen, 440–443.
“Automatic clustering of code changes,” in Proc. MSR, 2016, pp. 61–72. [30] M. Fazzini, Q. Xin, and A. Orso, “Automated api-usage update for
[7] Y. Higo, S. Hayashi, H. Hata, and M. Nagappan, “Ammonia: an android apps,” in Proc. ISSTA, 2019, pp. 204–215.
approach for deriving project-specific bug patterns,” Empirical Software [31] J. Bader, A. Scott, M. Pradel, and S. Chandra, “Getafix: Learning to fix
Engineering, vol. 25, no. 3, pp. 1951–1979, 2020. bugs automatically,” in Proc. OOPSLA, Oct. 2019, pp. 159:1–159:27.
[8] G. Liang, Q. Wang, T. Xie, and H. Mei, “Inferring project-specific bug [32] A. Miltner, S. Gulwani, V. Le, A. Leung, A. Radhakrishna, G. Soares,
patterns for detecting sibling bugs,” in Proc. FSE, 2013, pp. 565–575. A. Tiwari, and A. Udupa, “On the fly synthesis of edit suggestions,” in
[9] J. Jiang, L. Ren, Y. Xiong, and L. Zhang, “Inferring program transfor- Proc. OOPSLA, Oct. 2019.
mations from singular examples via big code,” in Proc. ASE, 2019, pp. [33] L. Gazzola, D. Micucci, and L. Mariani, “Automatic software repair: A
255–266. survey,” IEEE Transactions on Software Engineering, vol. 45, no. 1, pp.
[10] R. Rolim, G. Soares, L. D’Antoni, O. Polozov, S. Gulwani, R. Gheyi, 34–67, 2019.
R. Suzuki, and B. Hartmann, “Learning syntactic program transforma- [34] M. Monperrus, “The living review on automated program repair,”
tions from examples,” in Proc. ICSE, 2017, pp. 404–415. HAL/archives-ouvertes.fr, Tech. Rep. hal-01956501, 2018.
[11] G. Dotzler, M. Kamp, P. Kreutzer, and M. Philippsen, “More accurate [35] T. Durieux, F. Madeiral, M. Martinez, and R. Abreu, “Empirical review
recommendations for method-level changes,” in Proc. FSE, 2017, pp. of java program repair tools: A large-scale experiment on 2,141 bugs
798–808. and 23,551 repair attempts,” in Proc. FSE, 2019, pp. 302–313.
[12] N. Meng, M. Kim, and K. S. McKinley, “Lase: Locating and applying [36] C. L. Goues, M. Pradel, and A. Roychoudhury, “Automated program
systematic edits by learning from examples,” in Proc. ICSE, 2013, pp. repair,” Commun. ACM, vol. 62, no. 12, pp. 56–65, Nov. 2019.
502–511. [37] W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A survey on
[13] N. Meng, M. Kim, and K. S. McKinley, “Systematic editing: Generating software fault localization,” IEEE Transactions on Software Engineering,
program transformations from an example,” in Proc. PLDI, 2011, pp. vol. 42, no. 8, pp. 707–740, Aug 2016.
329–342. [38] A. Marginean, J. Bader, S. Chandra, M. Harman, Y. Jia, K. Mao,
[14] G. Santos, K. V. R. Paixão, N. Anquetil, A. Etien, M. de Almeida Maia, A. Mols, and A. Scott, “Sapfix: Automated end-to-end repair at scale,”
and S. Ducasse, “Recommending source code locations for system in Proc. ICSE-SEIP, 2019, pp. 269–278.
specific transformations,” in Proc. SANER, 2017, pp. 160–170. [39] K. Liu, A. Koyuncu, D. Kim, and T. F. Bissyandé, “TBar: Revisiting
[15] S. Sakala, V. K. Epuri, S. S. Cho, and M. Song, “LAC: Locating and template-based automated program repair,” in Proc. ISSTA, 2019, pp.
applying consistent and repetitive changes,” in Proc. COMPSAC, 2019, 31–42.
pp. 179–184. [40] T. Lutellier, H. V. Pham, L. Pang, Y. Li, M. Wei, and L. Tan, “CoCoNuT:
[16] X. Zhang, C. Zhu, Y. Li, J. Guo, L. Liu, and H. Gu, “Precfix: Large-scale Combining context-aware neural translation models using ensemble for
patch recommendation by mining defect-patch pairs,” in Proc. ICSE, program repair,” in Proc. ISSTA, 2020, pp. 101–114.
2020, pp. 41–50. [41] N. Jiang, T. Lutellier, and L. Tan, “CURE: Code-aware neural machine
[17] S. Negara, M. Codoban, D. Dig, and R. E. Johnson, “Mining fine-grained translation for automatic program repair,” in Proc. ICSE, 2021.
code changes to detect unknown change patterns,” in Proc. ICSE, 2014, [42] R. K. Saha, H. Yoshida, M. R. Prasad, S. Tokumoto, K. Takayama, and
pp. 803–813. I. Nanba, “Elixir: An automated repair tool for Java programs,” in Proc.
[18] M. Lamothe, W. Shang, and T. P. Chen, “A3: Assisting android api ICSE, 2018, pp. 77–80.
migrations using code examples,” IEEE Transactions on Software En- [43] F. Long and M. Rinard, “Automatic patch generation by learning correct
gineering, 2020. code,” in Proc. POPL, 2016, pp. 298–312.
[19] S. A. Haryono, F. Thung, H. J. Kang, L. Serrano, G. Muller, J. Lawall, [44] A. Koyuncu, K. Liu, T. F. Bissyandé, D. Kim, J. Klein, M. Monperrus,
D. Lo, and L. Jiang, “Automatic android deprecated-api usage update and Y. Le Traon, “FixMiner: Mining relevant fix patterns for automated
by learning from single updated example,” in Proc. ICPC, 2020, pp. program repair,” Empirical Software Engineering, vol. 25, no. 3, pp.
401–405. 1980–2024, May 2020.
[20] H. A. Nguyen, T. T. Nguyen, G. Wilson, A. T. Nguyen, M. Kim, and [45] S. Saha, R. K. Saha, and M. R. Prasad, “Harnessing evolution for multi-
T. N. Nguyen, “A graph-based approach to api usage adaptation,” in hunk program repair,” in Proc. ICSE, 2019, pp. 13–24.
Proc. OOPSLA, 2010, pp. 302–321. [46] X. B. D. Le, D. Lo, and C. Le Goues, “History driven program repair,”
[21] R. Bavishi, H. Yoshida, and M. R. Prasad, “Phoenix: Automated data- in Proc. SANER, vol. 1, 2016, pp. 213–224.
driven synthesis of repairs for static analysis violations,” in Proc. FSE, [47] Z. Yu, M. Martinez, B. Danglot, T. Durieux, and M. Monperrus,
2019, pp. 613–624. “Alleviating patch overfitting with automatic test generation: a study
[22] T. T. Nguyen, H. A. Nguyen, N. H. Pham, J. M. Al-Kofahi, and T. N. of feasibility and effectiveness for the Nopol repair system,” Empirical
Nguyen, “Graph-based mining of multiple object usage patterns,” in Software Engineering, vol. 24, no. 1, pp. 33–67, Feb 2019.
Proc. FSE, 2009, pp. 383–392. [48] K. Noda, Y. Nemoto, K. Hotta, H. Tanida, and S. Kikuchi, “Experi-
[23] J.-R. Falleri, F. Morandat, X. Blanc, M. Martinez, and M. Monperrus, ence report: How effective is automated program repair for industrial
“Fine-grained and accurate source code differencing,” in Proc. ASE, software?” in Proc. SANER, 2020, pp. 612–616.
2014, pp. 313–324. [49] A. Koyuncu, K. Liu, T. F. Bissyandé, D. Kim, M. Monperrus, J. Klein,
[24] B. Fluri, M. Würsch, M. Pinzger, and H. C. Gall, “Change distilling: and Y. Le Traon, “iFixR: Bug report driven program repair,” in Proc.
Tree differencing for fine-grained source code change extraction,” IEEE FSE, 2019, pp. 314–325.
Transactions on Software Engineering, vol. 33, no. 11, pp. 725–743,
Nov 2007.

You might also like