• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
Download
 
replacement for CVS in the open sourcecommunity”. In other words, it isdesigned to implement all of thefunctionality of CVS, with a familiarinterface, while fixing its designflaws, and offering much improvedfunctionality.Unusually for an open source project,Subversion has had a number of full-time developers employed to work on itsince the project’s inception. CollabNetis paying the salaries of severaldevelopers, and holds the copyright oncode in the project. The code is releaseunder a BSD/Apache style licence.Subversion is alpha software – thismeans that it is considered acceptablefor public release, but is still in activedevelopment. Features are still beingfinished or refined, and bugs are foundand fixed all the time. For those who are
M
ost open source developershave, at some stage, comeacross CVS. It is the de factostandard SCM (Source Code Manager)on free software projects. As such, it hasa huge user base, and has earned a repu-tation as a good piece of software.The primary goal of the Subversionproject is “to build a compelling
Subversion [1] is a free source code manager and version control system intended to replace CVS.We explain why youshould consider changing to this new system and the pitfalls you may find.
BY DAVID NEARY
Subversion
Building a better CVS
59
www.linux-magazine.com
May
2003
SYSADMIN
Subversion
worried about using alpha grade soft-ware to house their projects, however,it’s worth noting that Subversion’s devel-opers have confidence in it – the projecthas been self-hosting for a year and ahalf, with no data loss.
A better CVS
For those who have used CVS for yearswith no problems, you might be askingyourselves what I meant by design flawsin the last section. CVS has a number of problems, primarily caused by itsdependency on the RCS file format forversioning files. The issues addressed bySubversion include the following.
Atomic commits
If you are making a change to a sourcecode repository, and you commit thatchange, one of the fundamental princi-
Dave Neary dis-covered Linux in 1997,and,apartfroma flirtation withFreeBSD,has never looked back.He isan occasional con-tributor to the GIMP,and is listed as co-author of Gnect,one of the gnome games.He lives and works in Lyon.
    T    H    E    A    U    T    H    O    R
 
60
May
2003
www.linux-magazine.com
Subversion
SYSADMIN
ples of both database and version controlsystems is that either your entire changeis accepted or your entire change isrejected. This behaviour is called atomic-ity (this is the A in ACID). In CVS, this isnot guaranteed.Atomicity is only guaranteed on a file-by-file basis. This means that if you arecommitting changes to 10 files, andsomeone else starts a commit at roughlythe same time as you which changes the8th file in your list, the changes for thefirst 7 files in your change get accepted,and the rest gets rejected. After thishappens, it is very likely that therepository will be in an inconsistent stateuntil you resolve the conflict you’ve justfound and commit the rest of yourchange.In addition, because CVS has no wayof grouping changes to a number of filestogether, it isn’t even possible to revertyour partial commit while ensuring thatthe successful commit which happenedat the same time doesn’t get reverted aswell.An illustration might help explainthe problem. Tom and Dick are workingon the same source code tree whichhas 3 files in it – a.c, b.c and c.c. By coin-cidence, Tom and Dick try to commitchanges at the same time. They bothcheck that they are up to date with therepository, and then at the same timethey try to commit their changes.While Tom is writing his changes toa.c, Dick starts writing to b.c. Before Dickhas finished, Tom starts writing to c.c.Dick’s commit finds that Tom has lockedc.c, and informs him that his c.c is nolonger up to date. Dick does an update toget Tom’s changes, and finds that there’sa conflict in c.c which he has to resolve(perhaps in conference with Tom).Meanwhile, Harry checks out thesources and has Tom’s changes to a.cand c.c, and Dick’s change to b.c, but notDick’s change to c.c. He tries to build theproject, and finds that he can’t. In brief,until Tom and Dick resolve Dick’s con-flict, neither Tom, Dick or Harry has aworking copy of the source code.Subversion implements atomic com-mits. When you commit to a Subversionrepository, you start a transaction withthe repository, and if any part of thecommit fails, the transaction is rolledback and the entire commit is rejected.
Files & versioning history
CVS has no way to rename files and keepversioning history. Renaming “file1” to“file2” in CVS means doing thefollowing:
$ mv file1 file2$ cvs remove file1$ cvs add file2$ cvs commit
This creates a new “file2” with no recordof a common history with the old “file1”(which is now stored in the Attic).In Subversion, the above operation isperformed by
$ svn move file1 file2$ svn commit
and the common history of “file1” and“file2” is conserved.In addition, Subversion has dramati-cally increased the things that can beversioned. Directories and file metadata,as well as renamed or copied files, allhave their own versioning. This meansthat not only can you move and copyfiles, you can move and copy directoriestoo.These copies are very cheap, becausethey’re lazy copies – the first copy is sim-ilar to a hard-link to a particular versionof the directory. As you change files onthe branch, only those files you changeget copied onto the branch. This meansthat making and maintaining branches isa cheap operation, both in terms of spacein the repository and in terms of time.
Branching and tagging
When we tag a repository in CVS, everyfile in the directory tree we are tagging is“stamped” with the tag. Likewise, whenwe are branching, the branch tag iscreated in each file affected. This meansthat branching and tagging are expensiveoperations for big repositories and direc-tory trees, which has a cost proportionalto the number of files being branched ortagged.Subversion has made both branchingand tagging constant time operations. Infact, Subversion makes no distinctionbetween a tag and a branch. Both of these are implemented simply by copy-ing the directory you are tagging, andhave a cost the same as any other copy.Logically there is no differencebetween branching and tagging – a tag isa copy of a group of files at a certainpoint of time, and a branch is a copy of agroup of files at a certain point in timewhich can be changes independentlyof the rest of the tree. In brief, a tagis a branch that we don’t change. InSubversion, this is the case. If youchange a file in a tagged tree and com-mit, the tag suddenly becomes a branch.In addition, requesting a file off abranch in CVS requires time proportionalto the number of revisions since thebranch point with the HEAD branch,plus the number of revisions made inHEAD since the branch point. RCS filesstore the latest revision in HEAD as fulltext, and any time you request the text of another version, it has to be constructed.This means, for a file on a branch,
Figure 1:Screenshotof Mozilla pointing atthe Subversion repository
 
Client-server communication
When we change a file locally whenusing CVS, and we want to know the dif-ference between our changed copy andthe repository version, the entire file issent to the server, the diff is done there,and the result is sent back to the client.Similar operations are performed for alloperations involving locally modifiedfiles, such as updates, commits andmerges. This means that the cost of theseoperations is proportional to the size of the locally modified files, rather than thesize of the change.The philosophy of Subversion is differ-ent. The reasoning is that disk space hasbecome a more plentiful resource thanbandwidth in recent years, and thereforewe should minimise use of the latter,even if there is a cost in the former.When we update a Subversion reposi-tory, a copy of the latest repositoryrevision is made locally, as well as beingpatched into our local copy.Because of this, diffs are sent in bothdirections by Subversion. That is, if wemodify a local file and commit, only thedifferences between our local file and themost recent revision we have locally aresent to the server, meaning a lot less useof bandwidth.In addition, because we have a pristinecopy of the repository locally, there is aclear distinction between server opera-tions and local operations – for example,finding out which local files have beenmodified, and the changes we havemade to those files, are operations whichcan be performed without any access tothe server. In fact, the cost of subversionoperations in general is proportional tothe size of the change, rather than thesize of the repository or the size of thefiles being changed.
Subversion’s design –Repository versions
Subversion does not version files likeCVS. Instead, it versions the repositoryas a whole. When you commit to therepository, a transaction is started, thechanges you make are added to therepository, and if no problem occurs, thenew repository with your changes iscommitted with its version number.There are a number of advantages of this scheme over the CVS scheme. Themost important is that this is the mecha-nism which is used to give atomiccommits. It also gives a way to get agroup of changes which were made atthe same time (a changeset) very easily –you just get the difference between twosuccessive repository versions.
Apache as a server
Subversion uses the WebDAV and DeltaVextensions to the HTTP protocol forclient/server communications. In prac-tice, this means that it uses Apache 2with a specialised module to do serveroperations, and the client talks standardHTTP/WebDAV. This means that on theserver side, Subversion profits from astable and well-tested network server.Using Apache also gives several otheruseful features for free – client/serverauthentication is done using Apache’shtpasswd mechanism, secure client/server communications are provided bymodssl, and wire compression is sup-ported with mod_deflate. In addition,Subversion repositories get a web inter-face for free – just point your browser atthe root directory of the repository.For those of you who are worriedabout having to install and administerreverting all changes from HEAD to thebranch point, and then applying all thechanges made on the branch to arrive atthe complete file as we see it.Because of this, diffs, branch switchesand check-outs are all roughly propor-tional to the number of revisions on a filein CVS when those operations are on abranch. All branch and tag operations inSubversion are constant time.
Binary diffs
Storing binary files in CVS is somethingof a nightmare. Because the RCS fileformat is essentially text based, anychanges to a binary file resulted in thereplacement of the old file. If the file is a100K image that gets changed for everyrelease, like the GIMP splash screen, thenthat one file ends up taking up manymegabytes of space in the repository.Subversion uses a diffing algorithmcalled Vdelta to provide efficient binarydiffing, meaning that storing postscriptor pdf documents which changefrequently doesn’t pose the same prob-lems as it does for CVS. The diffingalgorithm is also extremely efficient ontext only files.
61
www.linux-magazine.com
May
2003
SYSADMIN
Subversion
Figure 2:Subversion design,copyrightBrian Fitzpatrick,published under the Apache licence
commandlineclient appGUI client appClient Library
Working CopyManagementLibrary
Repository AccessDAVLocalYe OldeInternetApachemod_DAVmod_DAV_SVNSubversion FilesystemClientInterfaceFilesystemInterface
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...