You are on page 1of 5

Operation-based revision control systems

Haifeng Shen and Chengzheng Sun


School of Computing and Information Technology
Griffith University
Brisbane, QLD 4111, Australia
{Hf.Shen, C.Sun}@cit.gu.edu.au

ABSTRACT operation O2 = Ins[5, ‘B’] is performed to insert the


In this paper, we will present some major drawbacks in character ‘B’ at the beginning of the second line. After the
state-based revision control systems and explain why execution of O2 , the first line remains A111 and the
operation-based revision control systems can overcome second line becomes B222. At this time, the user decides
these drawbacks. We further point out that operation- to check his working copy into the repository. That is to
based revision control systems can ease the smooth switch say, changes made in his working copy will be merged
between asynchronous and synchronous collaboration into the copy in the repository.
modes. Finally issues in operation-based revision control
systems will be raised and we are working on these issues
to make operation-based revision control systems feasible A111
and usable. B222
KEYWORDS
Revision control systems, state-based merging, and O ' 1 = D el[0, 9] O ' 2 = Ins [0, "A111\nB222"]
operation-based merging
INTRODUCTION 111 c hec kout
It is very common that more than one software developer 222
is involved in the development of a large software project. Repos itory
These developers could simultaneously modify the same W orking c opy
or different modules of the code. To support this kind of 111
software development, some sort of configuration 222
management tool or revision control system is needed to
support collaboration among developers. Revision control O 1 = Ins [0, 'A']
systems usually work in the manner of Copy-Modify-
Merge (CMM) [2]. Concretely speaking, each collaborator A111
checks out a separate working copy of the code from a 222
central repository, then modifies the copy, and finally O 2 = Ins [5, 'B']
checks the copy into the repository to form a new version.
Changes made in the working copy will be merged into A111
another working copy when another collaborator updates B222
his working copy with the new version in the repository.
There are lots of revision control systems available, such c hec kin
as RCS [15], CVS [3], SourceForge [11], and Visual
SourceSafe [16]. Common features in these systems are: F ig 1 S tate-bas ed m erging
1) all of them are based on CMM paradigm; 2) all of them
only support asynchronous collaboration; 3) all of them The state-based merging process does not use the
employ state-based merging [6]. information that evolution from the initial document state
(111 and 222) to the final document state (A111 and
Let’s use a concrete example to explain what is state- B222) is caused by the execution of operations O1 and
based merging. Look at the example in Fig 1. A document
O2 . Instead, the merging process compares the initial and
in the repository contains two lines: 111 and 222. A user
the final document states to compute the differences
checks out a working copy to his working directory and
between them and to generate an edit script by the
starts modifying. Firstly, operation O1 = Ins[0, ‘A’] is
execution of which a new version will be generated in the
performed to insert the character ‘A’ at the beginning of repository to accommodate all changes made in the
the first line. After the execution of O1 , the first line working copy. The edit script includes O'1 = Del[0, 9] to
becomes A111 and the second line remains 222. Then

1
delete both lines and O' 2 = Ins[0, “A111\nB222”] to responsible for adding references in the paper. Suppose
insert two new lines. We can see that the edit script, which Author 1 has performed at least one operation in each of
can be regarded as artificial operations to produce the 20 consecutive lines and Author 2 has added several
final state from the initial state, is totally different from references in some 10 of these 20 lines. When they come
the actually performed operations. Furthermore, the edit to reconcile their work, changes made by Author 1 and
script is far more coarse-grained than actually performed Author 2 in these 20 lines cannot be merged although we
operations. In other words, the edit script is line based know their changes do not conflict at all. Manual merging
while actually performed operations are position based would be very painful because changes made by one of
(i.e., actually performed operations could target at any them have to be redone. The root of the problem is coarse-
position within a line). grained edit scripts dramatically increase the chance of
Operation-based merging [6], however, directly applies overlapping conflicts although actually performed
actually performed operations (i.e., O1 and O2 in this operations do not overlap at all.
example) to the copy in the repository to generate a new In contrast, an actually performed operation is fine-
version that accommodates all changes made in the grained in the sense that it can target at any position in the
working copy. document. Therefore overlapping conflict between a pair
of actually performed operations is very rare. Moreover,
In terms of different merging approaches, we classify
theoretically, any overlapping conflict can be syntactically
revision control systems into state-based and operation-
resolved by some sort of mechanism like operational
based revision control systems. The following two
transformation [13] or multiple versioning [14]. Therefore
sections present drawbacks in the state-based revision
operation-based revision control systems can syntactically
control systems from the perspectives of external
merge any changes made at any position in the document.
functionality and internal implementation respectively.
On the other hand, some sort of conflict resolution
Then we further argue that operation-based revision
mechanism is certainly indispensable to solve potential
control systems can ease the smooth switch between
semantic conflicts. In the previous example, changes made
asynchronous and synchronous collaboration modes.
by Author 1 and Author 2 should be able to be
Before giving concrete future work and summarizing the
automatically merged in operation-based revision control
paper, we raise some major issues in operation-based
systems because their operations do not overlap at all and
revision control systems.
there is no potential semantic conflict either.
EXTERNAL FUNCTIONALITY
3. Conflict resolution
1. Extensibility Human intervention for conflict resolution in both revision
A state-based revision control system relies on some sort control systems is inevitable. State-based revision control
of text differentiation tool, such as diff [8], to generate an systems can solve neither syntax nor semantics conflicts.
edit script between two document states. This limitation Although syntax conflicts can be automatically solved in
constrains a state-based revision control system only operation-based revision control systems as mentioned
applicable for text documents. In contrast, an operation- before, potential semantic conflicts still need human
based revision control system only requires actually intervention. State-based merging can provide no help for
performed operations and it doesn’t matter whether these human intervention because it doesn’t know where and
operations are generated from text objects. Therefore, how a conflict occurs. Conversely, during the process of
operation-based revision control systems can be extended operation-based merging, when two conflicting operations
to manage any type of document. meet, the merging process is able to know a conflict
2. Merging limitation occurs and how it occurs in terms of the information in
An edit script is line-based. Therefore changes made both operations. As a result, the merging process can
within the same line by different users cannot be merged. provide plenty of help for human intervention.
In practice, in CVS [3], even changes made in adjacent For example, in Fig 2, User 1 and User 2 are trying to list
lines cannot be merged. It was argued that concurrent all conferences they are interested. User 1 thinks ACM
changes within the same line are very rare in real life and CSCW and IEEE ICDCS are a bit tougher for them at the
it would also be easy to manually merge those changes if current stage, so he deletes both items in his working
concurrent changes have been made within the same or copy. But User 2 adds a new item ACM GROUP between
adjacent lines. It could be rare that concurrent changes are ACM CSCW and IEEE ICDCS in his working copy. When
made within the same line. However, concurrent changes User 1’s change is merged into User 2’s working copy at
within the same block are not rare at all. For example, User 2’s site, as shown in Fig 3a, if using state-based
two authors are jointly writing a paper. When the deadline merging, the system can provide no help for User 2 about
approaches, they decide to concurrently finalize the paper the conflict but simply put two conflicting versions
by taking different roles. Author 1 is responsible for together. It’s up to User 2 to decide how to manually
correcting spell and grammar errors while Author 2 is merge these two versions (he may consult with User 1).

2
But if using operation-based merging, as shown in Fig 3b, INTERNAL IMPLEMENTATION
the system knows exactly where and how the conflict 1. Communication between the repository and a
occurs, so it can provide adequate information for User 2 working site
to solve the conflict. In a state-based revision control system, the copy in the
repository has to be transferred to a working site in order
to compute the difference between the repository copy and
the working copy and generate an edit script. However,
...... ......
D e le te bo th lin e s ...... transferring a large document through a slow network,
• ACM CS CW such as the Internet, or through a network whose
• IEEE ICD CS bandwidth is very rare, such as mobile is really not
U ser 1
...... desirable. Moreover, transferring a sensitive document
over the network is not secure either. In contrast, an
U ser 2 I n s e rt a n e w lin e be twe e n th e lin e s operation-based revision control system does not require
transferring a repository copy to a working site because
actually performed operations instead of artificial
...... operations (i.e., an edit script) are required by the merging
• ACM CS CW process of an operation-based revision control system and
• ACM G RO UP those actually performed operations are already at that
• IEEE ICD CS working site.
...... 2. Differentiation mechanism
A state-based revision control system requires some sort
F ig 2 A delete-ins ert c onflic t of text differentiation tool, such as diff [8] to compute the
difference between two files in order to generate an edit
script. If the two files are very large, or very different (i.e.,
the edit script is very large), it could take a long time to
...... generate the edit script [10]. An operation-based revision
> > > > > > > U ser 2 's v e rs io n control system, however, does not have this burden
• ACM CS CW because it does not require the edit script at all.
• ACM G RO UP SMOOTH SWITCH BETWEEN ASYNCHRONOUS AND
• IEEE ICD CS SYNCHRONOUS COLLABORATION MODES
< < < < < < < U ser 1 's v e rs io n Pure asynchronous or synchronous collaboration mode is
...... not flexible enough for the real world. During the
development of a software project, both asynchronous and
F ig 3a Conflic t res olution in s tate-bas ed m erging synchronous collaboration modes could be required to
meet the real needs.
You have ins erted Asynchronous collaboration mode is particularly suitable
"• ACM G RO UP " ...... for the following situations:
No ......
into a deletion area • Developers are distributed geographically over the
"• ACM ... ICD CS ", world and the network connection is slow and
keep your ins ertion unreliable;
or not?
• Developers are working on different files or different
modules in a file.
Yes
Synchronous collaboration mode is particularly suitable
for the following situations:
...... • A convergent version of the project is urgently
• ACM G RO UP needed. Synchronous collaboration mode can
...... speedup the converging process;
• Developers are working on the same module in the
same file where conflicts could frequently occur;
F ig 3b Conflic t res olution in operation-bas ed m erging • Real-time communication is required among
developers. For instance, developers are
collaboratively revising some parts of the same

3
document by means of discussion, or developers are invent new editors but make existing editors operation-
collaboratively debugging a program. logged or even collaborative, just as DistEdit [5] did to
Above situations could happen at any stage of the make vi and emacs collaborative.
development of a software project. Therefore it would be 2. Number of operations
very helpful if collaboration can be smoothly switched Unlike state-based revision control systems that generate
between asynchronous and synchronous modes at any an edit script in terms of the initial and final document
time. A state-based revision control system can support states, operation-based revision control systems need to
any kind of single-user text editors, such as vi and emacs keep all actually performed edit operations. With
because it is only interested with the final state of a operations continuously accumulated at a working site, the
document instead how that document is developed from number of operations could increase dramatically. Large
its initial state to the final state. This is a major common number of operations requires more disk space to store at
merit of all state-based revision control systems. However, the working site and the repository, more time to
single-user editors make synchronous collaborative propagate between working sites and the repository, and
editing impossible. Therefore a state-based revision more time for these operations to be redone at the
control system cannot be switched to synchronous repository and other working sites. When the number of
collaboration mode. Life would be much easier if we can operations becomes too large to be manageable,
extend the technique obtained in the domain of operation-based revision control systems could become
synchronous collaboration to the domain of asynchronous unacceptable. Therefore the number of operations and
collaboration to achieve an operation-based revision even the size of each operation should be reduced to a
control system that can support smooth switch between reasonable extent.
asynchronous and synchronous collaboration modes.
3. Merging process
An operation-based revision control system would work As shown in Fig 4, suppose there is a document doc in the
as follows. In asynchronous collaboration mode, repository. Site 1 and Site 2 both check out their working
operations are accumulated at working sites when editing. copies from the repository and start modifying their own
These operations will be propagated to the repository copies independently. Site 1 finally ends up with doc1
when the working copy needs to be merged into the after the execution of all operations in OB1 while Site 2
repository copy and propagated to another working site
finally ends up with doc 2 after the execution of all
(via the repository) when the working copy needs to be
operations in OB2 .
merged into another working copy. If switched to
synchronous collaboration mode, instead of being
accumulated, operations will be propagated to all remote doc Repos itory
sites right after the execution at the local site.
c hec kout c hec kout
ISSUES IN OPERATION-BASED REVISION CONTROL W orking s ites
SYSTEMS
Operation-based revision control systems are not perfect. doc doc
There are some issues that do not exist in state-based
revision control systems must be addressed and solved in O B1 O B2
operation-based revision control systems.
1. Operation-logged editors
Unlike state-based revision control systems in which any doc 1
kind of editors can be used, operation-based revision doc 2
control systems require special designed editors that can
catch and save edit operations. We refer to them as S ite 1 S ite 2
operation-logged editors. To support smooth switch to
synchronous collaboration mode, these operation-logged F ig 4 M erging in a revis ion c ontrol s ys tem
editors should further be collaborative. There is no free To merge changes from doc to doc1 into doc 2 at site 2, a
lunch at all. It would be meaningful if what has been state-based revision control system normally compares the
gained is far beyond what has been lost. For example, if difference among doc, doc1 , and doc 2 to generate an edit
you want to use JFC package, you have to use Java 2
script by the execution of which changes from doc to doc1
instead of Java 1. Similarly, if you want to use operation-
based revision control systems, you have to use operation- will be merged into doc 2 . This kind of merging is
logged editors. On the other hand, to eliminate the burden supported by diff3 [4] in CVS. However, in an operation-
that users have to switch from their preferred editors to based revision control system, simply executing OB1 at
new designed operation-logged editors in order to use site 2 would not bring changes from doc to doc1 into
operation-based revision control systems, it is better not to doc 2 . In [1], merging is performed by using selective redo

4
to repeat all operations in OB1 at site 2. However redo 7. Boris Magnusson et al: “Fine-grained revision control
could be conflict-prone because creating operations for collaborative software development,” In
equivalent to those in OB1 in the current state cannot Proceedings of the first ACM symposium on
always be successful. Therefore, the merging process Foundations of software engineering, 1993.
should be well studied to maintain correctness and 8. W. Miller and E. W. Myers: “A file comparison
effectiveness in operation-based revision control systems. program,” Software - Practice and Experience, 15(1):
FUTURE WORK AND SUMMARY
1025-1040, 1985.
Major issues in operation-based revision control systems 9. J. P. Munson and P. Dewan: “A Flexible Object
must be solved to make these systems feasible and usable. Merging Framework,” In Proceedings of the
We are trying to work out some strategies to reduce the conference on Computer-supported cooperative
number of operations and the size of each operation and work, 1994.
manage the operation-based merging process correctly 10. E. Myers: “An O(ND) difference algorithm and its
and effectively. The representation of delta and variations,” Algorithmica, 1(2): 251-266, 1986.
management of multiple versions in the repository should
also be well studied. We are concurrently use REDUCE 11. SourceForge - Breaking Down the Barriers to Open
(REal-time Distributed Unconstrained Collaborative Source Development, http://sourceforge.net.
Editing) system as the operation-logged editor for testing 12. C. Sun et al: “Achieving convergence, causality-
our results. In the future, we plan to make some popular preservation, and intention-preservation in real-time
editors, such as vi and emacs, operation-logged and even cooperative editing systems, ACM Transaction on
collaborative. Computer Human Interaction, 5(1): 63-108, 1998.
In this paper, we have presented some major drawbacks in 13. C. Sun and D. Chen: “A Multi-version Approach to
state-based revision control systems and explained why Conflict Resolution in Distributed Groupware
operation-based revision control systems can overcome Systems,” In Proceedings of the 20th IEEE
these drawbacks. We further argue that operation-based International Conference on Distributed Computing
revision control systems can ease the smooth switch Systems, 2000.
between asynchronous and synchronous collaboration 14. C. Sun and C. A. Ellis: “Operational transformation
modes. Finally issues in operation-based revision control in real-time group editors: issues, algorithms, and
systems have been raised and we are working on these achievements,” In Proceedings of ACM Conference
issues to make operation-based revision control systems on Computer Supported Cooperative Work, pp.59-68,
feasible and usable. 1998.
REFERENCES 15. Walter F. Ticky: “RCS – a system for version
1. Thomas Berlage and Andreas Genau, “A Framework control,” Software Practice and Experience, 15(7):
for shared applications with a replicated 637-654, July 1985.
architecuture,” In Proceedings of ACM Symposium
16. Visual SourceSafe, http:// msdn.microsoft.com/ssafe/.
on User Interface Software and Technology, 1993.
2. Abdelmajid Bouazza and Pascal Molli: “Unifying
coupled and uncoupled collaborative work in virtual
teams,” Workshop on Collaborative Editing, ACM
Conference on Computer Supported Cooperative
Work, 2000.
3. Per Cederqvist et al: “Version Management with
CVS,” Manual for CVS.
4. Diff3 source code for CVS project,
http://oasis2.openave.net/pub/88/6/diff.diff3.c.html
5. Michael J. Knister and Atul Prakash: “DistEdit: a
distributed toolkit for supporting multiple group
editors,” In Proceedings of the conference on
Computer-supported cooperative work, 1990.
6. Ernst Lippe and Norbert van Oosterom: “Operation-
based merging,” In Proceedings of the Fifth ACM
SIGSOFT Symposium on Software development
environments, 1992.

You might also like