You are on page 1of 10

Comparing State- and Operation-based Change Tracking

Maximilian Koegel Markus Yang Li


Institut für Informatik Herrmannsdoerfer Institut für Informatik
Technische Universität Institut für Informatik Technische Universität
München Technische Universität München
Boltzmannstrasse 3, 87548 München Boltzmannstrasse 3, 87548
Garching, Germany Boltzmannstrasse 3, 87548 Garching, Germany
koegel@in.tum.de Garching, Germany liya@in.tum.de
herrmama@in.tum.de
Jonas Helming Joern David
Institut für Informatik Institut für Informatik
Technische Universität Technische Universität
München München
Boltzmannstrasse 3, 87548 Boltzmannstrasse 3, 87548
Garching, Germany Garching, Germany
helming@in.tum.de david@in.tum.de

ABSTRACT General Terms


In recent years, models are increasingly used throughout the Management, Documentation, Design, Experimentation
entire lifecycle in software engineering projects. In effect, the
need for managing these models in terms of change tracking
and versioning emerged. However many researchers have
Keywords
shown that existing approaches for Version Control (VC) do Change Tracking, State-based, Operation-based, Change-
not work well on graph-like models, and therefore proposed Based, Version Control System
alternative techniques and methods. They can be catego-
rized into two different classes: state-based and operation- 1. INTRODUCTION
based approaches. Existing research shows advantages of
Today, models are an essential artifact throughout the en-
operation-based over state-based approaches in selected use
tire lifecycle in software engineering projects. Model-driven
cases. However, there are no results available on the advan-
development is putting even more emphasis on models, since
tages of operation-based approaches in the most common use
they are not only an abstraction of the system under devel-
case of a VC system: review and understand change. In this
opment, but the system is (partly) generated from its mod-
paper, we present and discuss both approaches and their use
els. Consequently, models are about to cover the whole de-
cases. Moreover, we present the results of an empirical study
velopment process from requirements over design to deploy-
to compare a state-based with an operation-based approach
ment, including management of the process itself. With the
in the use case of reviewing and understanding change.
adoption of model-driven development in industry, the need
for managing these models in terms of change tracking and
versioning emerged. Version Control (VC), also commonly
known as Software Configuration Management (SCM), is al-
Categories and Subject Descriptors ready in wide-spread use for textual artifacts such as source
D.2.7 [Software Engineering]: D.2.7 Distribution, Main- code.
tenance, and Enhancement—Version control ; D.2.2 [Software However, many publications, e.g. [1, 6, 11, 12, 18, 19, 21],
Engineering]: D.2.2 Design Tools and Techniques—Computer- recognized that existing VC approaches do not work well
aided software engineering (CASE); D.2.2 [Software Engi- on models, which essentially are attributed graphs. The
neering]: D.2.9 Management—Software configuration man- traditional VC systems are geared towards supporting tex-
agement tual artifacts such as source code, managing them on a line-
oriented level. In contrast, many software engineering arti-
facts including models are not managed on a line-oriented
level, and thus a line-oriented change management is not
Permission to make digital or hard copies of all or part of this work for adequate. For example, adding an association between two
personal or classroom use is granted without fee provided that copies are classes in a UML class diagram is a structural change, which
not made or distributed for profit or commercial advantage and that copies is neither line-oriented, nor should be managed in a line-
bear this notice and the full citation on the first page. To copy otherwise, to oriented way. However, a single structural change in the
republish, to post on servers or to redistribute to lists, requires prior specific diagram is managed as multiple line changes by traditional
permission and/or a fee. VC systems. Nguyen et al. describe this problem as the
International Conference on Software Engineering 2010 Cape Town, South
Africa impedance mismatch between the flat textual data models
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. of traditional VC systems, and graph-based software models
[18]. Different approaches have been proposed to cope with
the shortcomings of existing methods and techniques to bet- Update
ter support change tracking and versioning of graph-based
models. They can be categorized into two different classes:
state-based and change-based approaches [4]. Commit
State-based approaches only store states of a model,
and thus need to derive differences by comparing two states, <<include>>
e.g. a version and its successor, after the changes occurred
Merge <<include>>
[4]. This activity is often referred to as diffing. The diff-
ing process can be viewed as a calculation to derive the
<<include>>
change post-mortem, and is generally expensive in compu-
Review and
tation time. Blame
Understand
<<include>>
Change-based approaches record the changes, while Changes
User
they occur, and store them in a repository. There is no need <<include>>
for diffing, since the changes are recorded and stored, and Show
<<include>>
thus do not need to be derived later on. Operation-based History
approaches are a special class of change-based approaches <<include>>
which represent the changes as transformation operations
Show
on a state [4]. The recorded operations can be applied to a Differences
state to transform it into the successor state.
Several publications exist that show advantages of change-
based and in particular operation-based approaches over Revert
state-based approaches in use cases such as conflict detection
and merging [6, 16, 17], repository mining [23], inconsistency
detection [2], and coupled evolution [9, 28]. However, there
are no results available on the advantages of operation-based Figure 1: Use cases of a VC system (UML use case
approaches in the most common use case of a VC system: diagram)
reviewing and understanding change. We claim that under-
standing change is the most important use case of a VC sys-
tem from a user’s point of view, as it is required for almost
any other use case, e.g. commit, update, merge, etc. There- use cases from well-known tools, such as the Revision Con-
fore, we believe that it is essential to conduct experiments trol System (RCS) [25], the Concurrent Versioning System
on how well this use case is supported by the state-based (CVS) [20], and Subversion (SVN) [26], from research tools
and operation-based approaches. such as SiDiff [24] and UNICASE [10], and from the sur-
In this paper, we discuss representatives of the different vey publications by Conradi and Westfechtel [4] and Dart
approaches as well as the advantages and disadvantages of [5]. Figure 1 illustrates the use cases that we consider im-
each type of approach in general. To qualitatively compare portant for discussing the differences between state-based
them, we present frequent use cases of a VC system, and and change-based change tracking. For every use case, we
show how they are supported by the respective approach. provide a short name and a description:
Finally, we present the results of an empirical study we
conducted to quantitatively compare a state-based with an Update. The users retrieve changes between their local ver-
operation-based approach for the purpose of reviewing and sion and a target version (mostly the current head ver-
understanding change. sion) from the repository. These incoming changes can
Outline. Section 2 introduces common use cases of a VC be reviewed by the user, before they are incorporated
system. Section 3 compares the state-based approach and into the local working copy of the model. If the user
the operation-based approach to change tracking in a VC accepts the changes, and they do not conflict with lo-
system. Related work is mentioned in the form of inline cal changes, the version of the local working copy is set
citations within these two sections. Section 4 presents the to the target version, and the changes are incorporated
design of the empirical study to compare both approaches, into the local copy.
and Section 5 presents its results. Section 6 concludes the
paper with a short summary. Commit. The user decides to share the changes from their
local working copy with the repository. The user re-
views the changes before the commit to ensure that
2. USE CASES OF A VERSION CONTROL only intentional changes are sent to the repository. If
the user proceeds, the changes are sent to the reposi-
SYSTEM tory to create a new version. In case they conflict with
A VC system has to fulfill a lot of use cases, many of other commits that occurred since the last update, the
which do not differ for state-based or change-based systems. commit is canceled by the VC system, and an update
Consider the use case of baselining, in which the user marks must occur first.
a certain approved version (e.g. a release). Since changes
are not at all considered in this use case, both approaches Merge. When incoming changes conflict with existing lo-
appear identical from a user’s point of view. Consequently, cal changes in the update use case, the merge use case
we only focus on use cases where a difference between state- is initiated. The goal of a merge operation is to fil-
based and change-based systems arises. We derived these ter or transform the incoming and/or local changes, so
that they do not conflict anymore. The result will be The comparison calculation requires O(n). The space com-
incorporated into the local workspace. The merging plexity for the whole diffing process is 2n, since both states
process involves manual work in most cases, requir- need to be present. In a state-based system, changes are not
ing to review the changes. Merging may also occur persisted in a way that reflects how they were actually per-
if two branches in the repository are synchronized or formed. The diffing process can be viewed as a calculation
rejoined, which essentially requires the same steps. to derive an approximation of the changes post-mortem.
Since the VC system is not required to be able to observe
Blame. To find out how and by whom a problem or an in- the changes while they occur, a total separation of the mod-
consistency was created, the user is interested in find- eling tools and the VC system is possible. This is a clear
ing recent changes on a certain part of the model. Typ- advantage over change-based systems. It is even possible to
ically, the last n changes on a selected set of model use line-oriented VC systems, and to perform diffing on the
elements need to be retrieved. The user reviews these client side. However, the diffing tool must at least know the
changes to find the change that causes the problem. models’ meta model, based on which the changes are calcu-
Show History. The user (often a project manager) reviews lated and represented. For example, EMF Compare [8] is
the history to get an impression of the current activi- a diffing tool for meta models defined with the meta mod-
ties in a project. Mostly, the user is not interested in eling language Ecore of the Eclipse Modeling Framework
individual changes, but in an overview of how many (EMF) [7].
and which type of changes on how many artifacts oc- There are three main disadvantages of the diffing concept:
curred. (1) The temporal order of changes is lost, and it can not be
perfectly derived. For understanding changes, the temporal
Show Differences. The user is interested in reviewing the order might be important. Moreover, the temporal order is
differences between two versions of a model, for exam- useful for conflict detection and merging [16]. (2) Groupings
ple two releases. The two versions are typically not of changes to composite changes are lost. Refactoring oper-
very close in terms of the number of changes between ations e.g. cause many changes that can be grouped. This
them. reduces the number of changes, and represents the change at
a higher level of abstraction. Deriving composite changes,
Revert. The user wants to undo some changes in the local e.g. to detect refactorings, is difficult and in some cases even
working copy. To ensure that the right changes are impossible due to masking problems [29]. (3) The compu-
undone, they need to be reviewed beforehand. tational complexity for diffing is high, especially if changes
Review and Understand Changes. The user reviews between many states need to be retrieved, or the model is
changes to understand what was changed, and most of a large size [15, 27]. By means of the empirical study,
importantly, how it was changed. the design of which is presented in Section 4, we want to
find out whether and how the disadvantages (1) and (2) will
Interestingly, the first seven use cases are placed most affect the ability of users to understand change.
prominently in many VC systems and their clients [10, 20, Considering the use cases presented in Section 2, we made
26]. We claim this is due to the fact that these are the most the following observations for the state-based approach:
frequently executed use cases for the majority of users. In
all of these seven use cases, the user reviews changes in one Merge. The merge result can not be as accurate, since com-
or another way. As is shown in Figure 1, all use cases thus posite changes are not available [11, 16, 17]. Refac-
include the use case ”Review and Understand Changes”. toring operations, for example, might only be partly
reflected, if not all their caused changes are accepted.
3. COMPARISON OF CHANGE TRACKING Show History. The computational complexity for diffing
In this section, we compare different approaches to change could result in a severe performance problem, espe-
tracking in a VC system. Sections 3.1 and 3.2 introduce cially when looking at many versions and the changes
the state-based and operation-based approach, respectively. that occurred in between them. Diffing is required for
Section 3.3 contrasts the advantages and disadvantages of every such version.
both approaches.
Review and Understand Changes. We believe that the
3.1 State-based Change Tracking disadvantages (1) and (2) are impacting the ability
State-based approaches derive differences by comparing of humans to understand change. The temporal or-
two states, e.g. a version and its successor, after the changes der of the changes, which is lost using state-based
occurred. This activity is often referred to as diffing, and is approaches, could help to understand the context in
performed in two phases: matching and comparison. In the which the changes where performed. Composite chan-
matching phase, for each node in a certain state, the cor- ges could group many changes that look unrelated, and
responding node in the other state is found. The matching thus have to be grouped in the user’s mind.
can be based on the similarity of the node’s content or on
the graph structure it is connected to [13]. If the model sup- 3.2 Operation-based Change Tracking
ports unique identifiers, the matching can be found in O(1), In contrast to state-based approaches, change-based ap-
otherwise O(n2 ) are required for n nodes in a model [15, proaches record the changes while they occur, and store
27]. Chawathe et al. even claim that the matching problem them in a repository. This implies that change-based sys-
for two states is NP-hard in its full generality [3]. In the tems persist changes in a way that reflects how they were
comparison phase, each node is compared with its match- actually performed. There is no need for diffing, since the
ing partner from the other state to derive potential changes. changes are already available by design. Operation-based
approaches are a special class of change-based approaches provides all the required information [23]. Moreover, many
which represent the changes as transformation operations researchers rely on information from a VC system to de-
on a state [16]. An operation can be applied to a state to rive quantitative data to evaluate their approaches. This
transform it into the successor state [4]. Figure 2 shows technique is in wide-spread use and is commonly known as
the simplified taxonomy of operations from the UNICASE repository mining. For some approaches, for example rec-
system [10, 11]. All operations refer to one ModelElement ommendation for traceability links, it is not only necessary
that is being changed by the operation, and that is unam- to be able to retrieve every version, but also to recover in-
biguously identified by a unique identifier. A ModelElement termediate states in between two versions. For example,
is a node of the graph that can be linked to other nodes, for a realistic simulation of a recommendation use case in a
and that has values for a number of attributes. An Attribu- post-mortem analysis, it is necessary to restore the state just
teOperation changes the value of an attribute of a model before the recommendation. This scenario can only be sup-
element. A ReferenceOperation creates or removes one or ported by operation-based VC systems, since the temporal
several links between model elements. A CreateDeleteOp- order of changes is available.
eration creates or deletes a model element. The created or Considering the use cases presented in Section 2, we made
deleted model element is contained in the operation in the the following observations for the operation-based approach:
case of operation-based change tracking. A CompositeOper-
ation allows to group several related operations to represent Update. The operations incoming from the repository are
a refactoring, for example. presented to the user. If the difference between local
and target version is large, the system can canonize the
isChangedBy operations to get a more compact representation. Con-
ModelElement Operation
* flict detection can fully rely on the operations, their
creates/deletes
temporal order and composites to supply a more ac-
curate result and avoid unnecessary conflicts [17]. In
CreateDelete Attribute Reference Composite general, conflict detectors apply a conservative estima-
Operation Operation Operation Operation
tion: If they are unsure about a potential conflict, they
raise a conflict to avoid later data corruption.
Figure 2: Taxonomy of operations (UML class dia-
Commit. The recorded operations can be presented to the
gram)
user, possibly after canonization. No diffing is re-
quired. Conflict detection may again profit from the
additional information.
The change-based approaches have one disadvantage in
common: they require the VC system to be present when Merge. The merge can operate on the top level operations.
the changes occur, i.e. when the modeling tool is manipu- Therefore, less decisions are needed, since many oper-
lating the model. This requires an integration of the VC ations are contained in a composite operation. More-
system into the modeling tool. However, this does not im- over, the decisions can not partially mask a refactoring,
ply that the system must instrument the modeling tool, but as opposed to the state-based case [29].
may only use the infrastructure on which the tool is built.
For change recording, observer mechanisms can be used; for Show Differences. This use case is best served by a state-
composite detection, the command pattern can be used. In based representation. It can be implemented directly
case of EMF models, one can rely on the EMF notifications by relying on an existing state-based approach or by
and command stack [7]. This effectively decouples the VC deriving it from the recorded operations.
system from the modeling tool.
In general, change-based approaches can preserve the ex- Review and Understand Changes. The changes can be
act temporal order, in which the changes occurred. This presented as operations in the correct temporal order
is an important information for understanding changes, but and can be grouped as composite operations.
is also useful to improve applications such as conflict de-
tection and merging [16, 17]. Moreover, the exact times at 3.3 Summary
which the changes occurred, can be recorded. Operation- State-based approaches exhibit the advantage that they
based systems can record composite operations which ex- are independent of the tool used for changing the models.
press the fact that the contained operations occurred in a State-based approaches derive an approximation of the exact
common context. For example, a refactoring can be cap- change which is sufficient for certain kinds of changes and for
tured in a composite operation. This can help to understand certain use cases. However, the need to derive the changes
changes, but is also helpful for conflict detection and merg- from the states is a disadvantage of state-based approaches:
ing [14, 17]. Since operations are essentially a command Due to the graph isomorphism problem, calculating the dif-
pattern with persistent commands, the operations can also ference is a computationally complex endeavor. Moreover,
be used to implement undo and redo functionality [17, 22]. state-based approaches can neither completely and correctly
Operation-based systems can provide a filter method to can- derive the exact temporal order of the changes nor are they
onize a sequence of operations, which hides operations that able to derive composite changes.
are fully masked by later operations. For example, a Pull up Operation-based approaches have the advantage that
to Superclass refactoring is fully masked by a later deletion the changes are explicitly recorded. Therefore, no computa-
of all the involved classes. tion effort is necessary to derive the changes, when they are
Robbes et al. even claim that only an operation-based VC required in the different use cases. Moreover, operation-
system allows for effective research on evolution, since it based approaches retain the exact temporal order of the
changes as well as composite changes. Operation-based ap-
proaches exhibit the disadvantage that they need to be in-
tegrated into the tool used for changing the models. As a
consequence, they cannot be used for existing tools which
do not provide such a functionality.
The operation-based approach might seem very different
from a state-based approach, but actually, it is only an en-
hancement. It records additional information that is lost in
the state-based approach. In an operation-based approach,
we can perform everything that can be done in a state-
based approach, by just ignoring the additional information.
This boils down to the question whether the additional ef-
fort for recording the changes is justified by its advantages.
Therefore, we have conducted an empirical study to com-
pare state-based with operation-based change tracking for
the use case of reviewing and understanding change.

4. DESIGN OF THE EMPIRICAL STUDY


In this section, we present the design of the empirical
study to compare state-based and operation-based change
tracking. Section 4.1 lists the research questions that un-
derlie the empirical study. Section 4.2 presents the tools we Figure 3: State-based representation (EMF Com-
chose as representatives for state-based and operation-based pare)
change tracking. Section 4.3 describes the data we used as
input for the evaluation. Section 4.4 enumerates the differ-
ent steps we carried out to conduct the empirical study.
the same unified model and stored in the same repository as
4.1 Research Questions project model elements such as tasks or users. UNICASE
is implemented based on EMF, and realizes operation-based
We conduct the empirical study to answer the following
change-tracking, conflict detection and merging [11].
research questions:
Figure 4 depicts the example change as represented by
1. Do users better understand the changes in a state- UNICASE. The representation shows the sequence of oper-
based or an operation-based representation? ations which have been executed on the source version to get
to the target version. Note that the temporal order is from
2. Which factors influence the user in understanding a bottom to top. For each operation, its affected elements
state-based or an operation-based representation? are shown below the operation. When an affected element
4.2 Setup is selected in the operation-based representation, it is auto-
matically selected in a view of the target version (which is
We chose representatives for the two different approaches not shown in the figure).
to change tracking.
State-based change tracking is represented by the
open-source tool EMF Compare [8], which is the state-of-
the-art diffing and merging implementation for EMF (Eclipse
Modeling Framework) [7]. It is regularly delivered with
the Eclipse Modeling Tools, which is one for several official
Eclipse products. As we do not want to disadvantage EMF
Compare by design, the used matching strategy is based on
unique identifiers to ensure correct matchings.
Figure 3 depicts an example of a change as represented by Figure 4: Operation-based representation (UNI-
EMF Compare. The upper part shows the changes between CASE)
two versions structured according to the target version. The
lower part shows the target and the source version, respec-
tively, and highlights affected elements. When a change is
selected in the upper part, the affected elements are auto- 4.3 Input
matically selected in the lower part. UNICASE was used to record operation histories, which
Operation-based change tracking is represented by we use as an input to the empirical study. UNICASE also
the open-source CASE tool UNICASE [10], which is based provides an Empirical Project Analysis Framework (EPAF).
on a unified model. It consists of a set of editors to ma- Iterators can be reused to run through all revisions of a
nipulate instances of a unified model, and a repository to model in a predefined way. Analyzers are used to analyze
persist and version the model as well as to collaborate on and extract data per revision, and exporters write the data
the model. The unified model covers the whole develop- to files. We used this framework to retrieve and analyze the
ment process from requirements over design to deployment, data from the UNICASE VC system.
including project management artifacts. System model el- The version model (see Figure 5) of UNICASE is a tree
ements such as requirements or UML elements, are part of of versions with revision links [12]. Every version contains
revises
(3) The commit category determines what kind of op-
1
erations a commit contains: Category 1 contains only
createdBy
1
Version
1 1
ChangePackage AttributeOperations and ReferenceOperations, category
1
2 also contains CreateDeleteOperations, and category 3
{ordered}
also contains CompositeOperations (for the taxonomy,
0..1
* see Figure 2).
ModelState Operation
2. Choose Users. We choose a number of users which
are familiar with the input, as well as a number of users
which are not familiar with the input. We record for
Figure 5: Version Model (UML class diagram) all users whether they are familiar with the input using
the attribute internal for every data set.
3. Extract data for users For each user, we randomly
a change package and may contain a full version state rep- select 18 commits from the operation-based repository.
resentation. A change package contains all operations that We select only commits with a size greater than 5 and
transformed the previous version into this version along with smaller than 30. We exclude the shorter commits, as
administrative information such as the modifying user, a we do not expect any difference between both repre-
time stamp and a log message. sentations. We exclude the longer commits, as un-
UNICASE was employed in a project named DOLLI2 derstanding them would take too much time in the
(Distributed Online Logistics and Location Infrastructure 2) interview.
at a major European airport. The objective of DOLLI2 was For each commit, we randomly decide whether the
integrating facility management and telemetry data into the user is shown either the operation-based or state-based
tracking and locating infrastructure developed in the previ- representation. Also we randomly determine the or-
ous project, together with expanding the 3D visualization der in which the commits are presented to the user.
on desktop computers as well as porting it to mobile de- We only take one representation for the same commit,
vices. More than 20 developers worked on the project for a as the first representation might ease the understand-
period of five months. All modeling was performed in the ing of the second representation. Moreover, we ensure
UNICASE tool. This resulted in a comprehensive model that each user is presented roughly the same number
consisting of about 1000 model elements and a history of of commits in operation-based and state-based repre-
over 600 versions. sentation. To be able to correlate the answers with
the complexity of the contained operations, we ran-
4.4 Conducting the Empirical Study domly sample 6 commits for each of the three above-
We apply the following process to conduct the empiri- mentioned categories. For each user, there is a so-
cal study, which consists of two phases. In the preparation called shadow user which is presented the same com-
phase, we randomly select a number of commits from the mits in the opposite representation. We generate the
repository. In the interview phase, we present these com- state-based representation for all the sample commits
mits in different representations to a number of users. using EMF Compare. For the operation-based rep-
The preparation phase consists of the following steps: resentation we just store a change package from the
UNICASE VC system.
1. Extract commits from repository We query the
UNICASE VC system for all commits of the DOLLI2 The interview phase is limited to a total of one hour per
project and extract the project state after the commit user including a short training in the different representa-
as well as the change package of the commit. Also tions. Since the order of the shown commits is random, the
we preserve for any version the state of its predecessor interview can be ended at any time. If the interviewee com-
version to be able to create a state-based diff in the pleted the interview on all 18 data sets in less than one hour,
following data extraction. For each commit, we addi- the interview was also stopped. In the interview phase, we
tionally record the following data: (1) commit size, (2) performed the following steps for each user and each commit:
commit complexity and (3) commit category. 1. Present the commit to the user. Based on the
(1) The commit size is measured in the number of data extracted before, we present the commit to the
primitive changes. user in a state-based or an operation-based represen-
tation. The user is shown the representation in the
(2) The commit complexity is measured by a depen-
respective tool which can be used to navigate through
dency depth value. This value is supposed to measure
the changes and the project (see Figures 3 and 4).
how much the changes of one commit depend on each
The user should try to understand and memorize the
other. Operation a requires operation b means that “a
changes at their best within a given time limit of 2
is not applicable on a project without b”. The requires
minutes. We determined the time limit by experiment-
relation is transitive [11]. We calculate the transitive
ing with several test subjects. The time limit is sup-
closure of the binary relation, requires, on an opera-
posed to prevent the user from memorizing the changes
tion sequence, and compute the 1-norm of the corre-
rather than understanding them, and building a men-
sponding adjacency matrix of this transitive closure.
tal abstraction of the changes.
We call this 1-norm dependency depth of an opera-
tion sequence. In other words, dependency depth is 2. Question the user about the commit. We assess
the longest path of requires relations in an operation the understanding of the user by means of exam ques-
sequence. tions. Once the user decides to take the exam or once
the time for understanding the changes is up, the users To get a first glance of the understanding process of the
can no longer look at the change representations. In user when confronted with the two representations, we di-
other words the exam is closed book. The exam con- vided the commits into two groups: those which result from
sists of the following question types: the state-based approach (n1 = 75 items) and those, which
(a) Understanding the impact of the changes: The user were operation-based (n2 = 87 items). We chose the vari-
is confronted with a randomly selected element that able compare score and log message score as the key vari-
was changed by the commit, in two versions: the ver- ables, because they describe to which extent the user has
sion before and the version after the commit. The user correctly understood the performed change. Our hypothesis
should try to answer the following question: “Which is was that the means of the compare score and/or log message
the version of the element after the commit?” Since score, respectively, differ significantly in both groups, since
elements that were deleted or created by the commit the operation-based representation should be more under-
are not present in both versions, we show a different standable. Assuming a normal distribution of both values,
question on deleted or created elements: “Was this el- we applied the T-test for independent samples to check the
ement created, deleted, neither deleted nor created?” hypothesis of equal mean values. However, the test results
We generate 5 instances of the first question type and 2 showed that there was no significant difference between the
of the latter, if the commit has enough elements to gen- two groups for neither variable. Then we applied the T-test
erate as many questions. The questions are supposed for taken time and self-assessment, also ending up without
to determine whether the user has truly understood any significant result.
the impact of the presented changes. Based on the advantages of the operation-based represen-
tation, we expected that it should be easier to understand
(b) Understanding the overall intention of the changes: more complicated commits in the operation-based represen-
The user is presented with 10 commit messages and tation. As described earlier, we have recorded different mea-
is asked to select the message that is assigned to the sures for complexity: commit size, commit complexity and
commit. The other messages are randomly selected commit category. Thus, we decided to narrow down the
from all commits of the respective project. Duplicates commit samples to the commits with higher complexity ac-
are removed. Out of these 10 messages, the user can cording to the provided measures. We tried all measures as
select and prioritize a maximum of three candidates decision criteria individually and in combination with oth-
for the message of the presented commit. ers. The statistical results showed that the most significant
During the interview, we recorded the following metrics difference between state- and operation-based representation
that can be used for statistical evaluation: are in the group of changes from category (2) and (3) (see
4.4), given a commit size greater than or equal to 10.
• the time taken for understanding the changes, which In the following, we detail the statistical results of this
is between 0 and 2 minutes. group. We separated the filtered commits into two sub-
groups: those which result from the state-based approach
• the compare score based on the exam questions of type (n1 = 34 items) and those, which were operation-based
(a). The compare score is the sum of the evaluation (n2 = 37 items). Our hypothesis was that the means of
of all questions divided by the number of questions. A the compare score differ significantly in both groups. As-
question evaluates to 1, in case it was answered cor- suming a normal distribution of both values, we applied
rectly, and to -1 otherwise. As a consequence, the the T-test for independent samples to check the hypothe-
aggregated result also is in the range of -1 and 1. sis of equal mean values. The 95%-confidence interval for
• the log message score based on the exam questions the difference of the mean values mstate − mop = −0.195 is
of type (b). The user obtains 4 points, if her first [−0.389, −0.002]. The T-test returned a T-value of −2.019,
candidate is the correct commit message, 2 points for which means that the critical value −c(df = 60, α = 5%) =
the second, 1 point for the third, and 0 points if none −1.67 for 60 degrees of freedom is exceeded according to
of her candidates is correct. amount and thus the null-hypothesis has to be rejected on
the 5% level of significance. Thereby, the variances of both
• the self-assessment of the user reflecting the difficulty groups cannot be assumed to be equal, since the F-test re-
she felt in understanding the changes. For this mea- turned an F-value of 4.0, which is greater than the critical
sure, we use a scale with the following five values: very value c(df1 = 34, df2 = 37, α = 5%) = 1.74 for that level
difficult (=1), difficult (=2), OK (=3), easy (=4), very of significance and the given degrees of freedom. Thus we
easy (=5). used the T-test for different variances in both groups, im-
plying a lower number of degrees of freedom (only about 60,
5. STUDY RESULT in contrast to1 n1 + n2 − 2 = 69). The variance of the user
assessment regarding the operation-based representation is
In this section, we present the results of the empirical
significantly lower than for the state-based representation.
study. Section 5.1 evaluates the results by means of statis-
This might argue for a higher robustness of the operation-
tical tests. Section 5.2 interprets the results in terms of the
based representation, since the users are more consistent in
research questions. Section 5.3 lists threats to the study’s
their assessment of the changes, as illustrated by Figure 6.
validity along with their mitigation.
Besides the discriminating variable compare score, we also
5.1 Evaluation performed tests on other variables, i.e. taken time, log mes-
sage score and self-assessment, but no substantial difference
We perform a number of statistical tests to evaluate the
measurements. 162 change assessments from 14 different 1
Two degrees are subtracted, since the expected value and
users were recorded and made available for statistical tests. the variance have to be estimated from the data sample.
changes in an operation-based representation, if the changes
are sufficiently complex and in-depth understanding is re-
quired. Also, we conclude that in the case of less complex
changes, neither representation shows big advantages, since
no significant difference in the variables’ distribution could
be determined.
Which factors influence the user in understand-
ing change in a state-based or an operation-based
representation? Based on the results from grouping the
commits by the measures introduced above, we made the
following observations on factors relevant for understanding
the change. The dependency depth does not seem to be rele-
vant, since its individual and combined use for grouping did
not improve significance for any of the variables for neither
Figure 6: Boxplot of the compare score distribution
representation. The category however seems to be a rele-
of state-based and operation-based representation
vant factor. Both category 2 and 3 only contain changes
that are composed of potentially many atomic changes. A
delete operation for example removes an element and its chil-
dren along with all cross references targetint them from the
was found. Especially, the distribution of the binary vari- model. Using commits from these categories only already
able internal, which indicates whether the respective change showed promising results, however, below a reasonable sig-
was assessed by a member of the development team or by an nificance level. Only when combined with a restriction of
external person (see 4.4), did not significantly differ within the commit size (≥ 10), significant differences showed up. In
the two groups of state-based and operation-based changes. our opinion, the reason for this observation is that a commit
Even if one argues that the number of changes in each with complex operations, but of a size of less than 10 prim-
group is not normally distributed, the non-parametric Mann- itive changes, is not very difficult to understand in neither
Whitney-U test can be used to recheck the result. Due representation. We conclude that commit size and category
to its non-parametric character, Mann-Whitney-U test is are factors influencing the understanding of the change in
weaker than the T-test but still succeeds in rejecting the the two given representations.
null-hypothesis of equal means at the 10% level of signifi-
cance. The test returns a Z-value of Z = −1.739 (standard-
normal approximation), which makes the probability to er- 5.3 Threats to Validity
roneously reject the null-hypothesis relatively small: We are aware of the following internal and external threats
symm. of Φ to validity, which might have affected our results without
Φ(Z)+(1−Φ(−Z)) = 2·Φ(Z) ≈ 0.082 (asymptotical
being detected.
significance).
Internal Validity. The results might be influenced by
the design we chose for the empirical study. Thereby, the
5.2 Interpretation results might be affected by the way we prepared the data
In this section, we interpret the results in terms of the and performed the interviews.
research questions posed in section 4.1: Commit selection. When preparing the data for the evalu-
Do users better understand the changes in a state- ation, we might have chosen commits which favor one repre-
based or an operation-based representation? The sta- sentation over the other. Consequently, one approach might
tistical result shows no significant overall difference between perform better than the other, which poses a threat to the
the respective representations in the variables compare score, result’s validity. To mitigate this threat, we randomly chose
log message score, taken time and self-assessment. In gen- the commits and we sampled them according to different
eral, we observed that the log message score, taken time and types of changes.
self-assessment mostly showed a similar distribution in both Understanding vs. memorization. One could argue that
representations. While compare score often deviated but the users only need to fully memorize the commit for an-
below a reasonable significance level. We already supposed swering the questions. As a consequence, we would be ex-
that the operation-based approach might provide better re- amining the user’s ability to memorize a number of changes,
sults with increasing complexity of the changes. Therefore, instead of their ability to understand the changes. To force
we have also recorded different measures for complexity for the users to really understand the commit, we limited the
each commit. The results showed that by partitioning the time for looking at each commit.
data sets according to category and size, we obtain a set of External Validity. The results might be influenced by
more complex commits, where the operation-based represen- the fact that we questioned only a restricted number of users
tation is significantly better than the state-based representa- about a restricted number of commits. Thereby, the results
tion. By design, the compare score required a more detailed might not be representative for all developers or for all pos-
understanding of the changes, while the log message score sible model changes on average.
required a birds-eye-view understanding. We believe this Prior knowledge. The users might have prior knowledge
is why only compare score shows significant differences in about a certain representation which favors the respective
both representations. The log message can often be guessed representation. This poses a threat to the transferability of
by memorizing frequent words for example, which is equally the results to all users in general. To mitigate this threat, we
well supported in both representations. With respect to the trained all participants of the empirical study on a number
research question, we conclude that users better understand of samples for both representations in advance.
Specific modeling language. The results might be affected [7] Eclipse. Eclipse Modeling Framework.
by the modeling language of UNICASE in which the models http://www.eclipse.org/emf.
are developed. As a consequence, the results might not be [8] Eclipse. EMF Compare.
transferable to the evolution of models created with other http://wiki.eclipse.org/EMF_Compare.
modeling languages. However, the UNICASE modeling lan- [9] M. Herrmannsdoerfer, S. Benz, and E. Juergens.
guage is similar to UML, whose different dialects are widely COPE - automating coupled evolution of metamodels
used in software engineering practice. and models. In ECOOP 2009 - Object-Oriented
Programming, volume 5653 of Lecture Notes in
6. CONCLUSION AND FUTURE WORK Computer Science, pages 52–76. Springer Berlin /
Heidelberg, 2009.
We reviewed both state-based and operation-based ap-
proaches for change tracking. We compared both approaches [10] J. Helming, M. Koegel. UNICASE.
by means of typical use cases of VC systems in a qualitative http://unicase.org.
way. Also we conducted a case study to compare the two [11] M. Koegel, J. Helming, and S. Seyboth.
approaches in the use case of reviewing and understand- Operation-based conflict detection and resolution. In
ing changes in a quantitative way. The results from our CVSM ’09: Proceedings of the 2009 ICSE Workshop
case study show that operation-based change tracking ex- on Comparison and Versioning of Software Models,
hibits advantages in understanding more complex changes. pages 43–48, Washington, DC, USA, 2009. IEEE
In addition, we found the size and type of changes of a com- Computer Society.
mit to be relevant factors for understanding changes. Based [12] M. Kögel. Towards software configuration
on these results, we believe that the additional information management for unified models. In CVSM ’08:
recorded by the operation-based approach can be valuable Proceedings of the 2008 international workshop on
in certain use cases. Further research is required to find Comparison and versioning of software models, pages
further determining factors and to decide in which use case 19–24, New York, NY, USA, 2008. ACM.
and under which conditions which representation should be [13] D. S. Kolovos, D. Di Ruscio, A. Pierantonio, and R. F.
used. Paige. Different models for model matching: An
analysis of approaches to support model differencing.
In CVSM ’09: Proceedings of the 2009 ICSE
7. ACKNOWLEDGMENTS Workshop on Comparison and Versioning of Software
We like to thank all the users which agreed to participate Models, pages 1–6, Washington, DC, USA, 2009. IEEE
in the empirical study. This work is partially supported by Computer Society.
grants from the BMBF (Federal Ministry of Education and [14] A. Lie, R. Conradi, T. M. Didriksen, and E.-A.
Research, Innovationsallianz SPES 2020). Karlsson. Change oriented versioning in a software
engineering database. In Proceedings of the 2nd
8. REFERENCES International Workshop on Software configuration
[1] C. Bartelt. Consistence preserving model merge in management, pages 56–65, New York, NY, USA, 1989.
collaborative development processes. In CVSM ’08: ACM.
Proceedings of the 2008 international workshop on [15] T. Lindholm, J. Kangasharju, and S. Tarkoma. Fast
Comparison and versioning of software models, pages and simple xml tree differencing by sequence
13–18, New York, NY, USA, 2008. ACM. alignment. In DocEng ’06: Proceedings of the 2006
[2] X. Blanc, I. Mounier, A. Mougenot, and T. Mens. ACM symposium on Document engineering, pages
Detecting model inconsistency through 75–84, New York, NY, USA, 2006. ACM.
operation-based model construction. In ICSE ’08: [16] E. Lippe and N. van Oosterom. Operation-based
Proceedings of the 30th international conference on merging. In SDE 5: Proceedings of the fifth ACM
Software engineering, pages 511–520, New York, NY, SIGSOFT symposium on Software development
USA, 2008. ACM. environments, pages 78–87, New York, NY, USA,
[3] S. S. Chawathe and H. Garcia-Molina. Meaningful 1992. ACM.
change detection in structured data. In SIGMOD ’97: [17] T. Mens. A state-of-the-art survey on software
Proceedings of the 1997 ACM SIGMOD international merging. IEEE Trans. Softw. Eng., 28(5):449–462,
conference on Management of data, pages 26–37, New 2002.
York, NY, USA, 1997. ACM. [18] T. N. Nguyen, E. V. Munson, J. T. Boyland, and
[4] R. Conradi and B. Westfechtel. Version models for C. Thao. An infrastructure for development of
software configuration management. ACM Comput. object-oriented, multi-level configuration management
Surv., 30(2):232–282, 1998. services. In ICSE ’05: Proceedings of the 27th
[5] S. Dart. Spectrum of functionality in configuration international conference on Software engineering,
management systems. Technical report, CMU/SEI, pages 215–224, New York, NY, USA, 2005. ACM.
1990. [19] D. Ohst. A fine-grained version and configuration
[6] D. Dig, K. Manzoor, R. Johnson, and T. N. Nguyen. model in analysis and design. In ICSM ’02:
Refactoring-aware configuration management for Proceedings of the International Conference on
object-oriented programs. In ICSE ’07: Proceedings of Software Maintenance (ICSM’02), page 521,
the 29th international conference on Software Washington, DC, USA, 2002. IEEE Computer Society.
Engineering, pages 427–436, Washington, DC, USA, [20] D. Prince. Concurrent Versioning System.
2007. IEEE Computer Society. http://www.nongnu.org/cvs.
[21] J. Rho and C. Wu. An efficient version model of
software diagrams. In APSEC ’98: Proceedings of the
Fifth Asia Pacific Software Engineering Conference,
page 236, Washington, DC, USA, 1998. IEEE
Computer Society.
[22] R. Robbes and M. Lanza. A change-based approach to
software evolution. Electron. Notes Theor. Comput.
Sci., 166:93–109, 2007.
[23] R. Robbes, M. Lanza, and M. Lungu. An approach to
software evolution based on semantic change. In
Fundamental Approaches to Software Engineering,
volume 4422 of Lecture Notes in Computer Science,
pages 27–41. Springer Berlin / Heidelberg, 2007.
[24] U. Siegen. SiDiff. http://sidiff.org.
[25] W. F. Tichy. Design, implementation, and evaluation
of a revision control system. In ICSE ’82: Proceedings
of the 6th international conference on Software
engineering, pages 58–67, Los Alamitos, CA, USA,
1982. IEEE Computer Society Press.
[26] Tigris. Subversion VC System.
http://subversion.tigris.org.
[27] C. Treude, S. Berlik, S. Wenzel, and U. Kelter.
Difference computation of large models. In ESEC-FSE
’07: Proceedings of the the 6th joint meeting of the
European software engineering conference and the
ACM SIGSOFT symposium on The foundations of
software engineering, pages 295–304, New York, NY,
USA, 2007. ACM.
[28] G. Wachsmuth. Metamodel adaptation and model
co-adaptation. In ECOOP 2007 - Object-Oriented
Programming, volume 4609 of Lecture Notes in
Computer Science, pages 600–624. Springer Berlin /
Heidelberg, 2007.
[29] Z. Xing and E. Stroulia. Refactoring detection based
on umldiff change-facts queries. In WCRE ’06:
Proceedings of the 13th Working Conference on
Reverse Engineering, pages 263–274, Washington, DC,
USA, 2006. IEEE Computer Society.

You might also like