New ideas in version control: Distributed SCM
The next two version control systems we'll look at are very different from the first two in their design and philosophy. Whereas both CVS and Subversion are based on the idea of a central repository (though Subversion supports read-only mirrors), Bazaar and Mercurial are based on a distributed philosophy, with no central server. The lack of central server is both the main selling point and the primary weakness of distributed version control systems.
Instead of having a central repository, each developer using Mercurial or Bazaar has a complete copy of the repository on his local machine. If a developer is working on several branches, he will have several complete copies of the repository. This removes the reliance on a central server, and potentially makes the system more robust. However, developers need to have a good idea of where to get (and/or send) code changes. For larger projects, and over time, the distributed model can have obvious implications in terms of disk space and performance.
Meet the new kid, Mercurial
Mercurial is newer open source version control system based on the distributed model. In Mercurial, as in Subversion or CVS, developers work on a local working directory. However, unlike centralized solutions, Mercurial also stores a copy of the entire project history on each developer's machine. In this way, developers can work in parallel, even without a network connection.
Like Subversion, and unlike CVS, Mercurial uses the notion of change sets. Each revision is a snapshot of the project at a given point in time. When you commit your changes, you create a new revision. Like Subversion, Mercurial naturally benefits from fast tags, good support for binary files, and the other advantages related to the use of change sets.
Unlike Subversion, however, when you commit changes in Mercurial, you only create a new revision in your local repository (which, given Mercurial is based a distributed model, is considered to be just as good a repository as anyone else's). Let's take a minute to see how this works.
Setting up your own Mercurial repository
The first thing you do when starting a project in Mercurial is to "clone" your own local copy of the project. Not surprisingly for a distributed version control system, you can access a Mercurial repository via HTTP. In Listing 13, I have created a copy of the Mercurial project itself.
Listing 13. Cloning a local copy of the Mercurial repository
$ hg clone http://www.selenic.com/repo/hg
destination directory: hg
destination directory: hg
requesting all changes
adding changesets
adding manifests
adding file changes
added 5027 changesets with 9501 changes to 665 files
583 files updated, 0 files merged, 0 files removed, 0 files unresolved
Once you've obtained a local copy, modifying existing files and adding new ones is intuitive, using commands such as hg add
, hg remove
and hg rename
. The hg status
command, like the equivalent Subversion and CVS commands, lets you see at a glance what has been modified in your project files, as compared to your local repository copy.
Listing 14. Familiar operations done in Mercurial
$ hg add LocallyAddedFile.java
$ hg status
M LocallyModifiedFile.java
A LocallyAddedFile.java
Likewise, you submit changes to your local copy of the repository using the hg commit
command, as shown here:
$ hg commit -m "Some minor changes"
No username found, using 'wakaleo@taronga' instead
Push, pull, propagate
Note that the hg commit
command updates your local repository copy. Because there is no central server to update, you alone will now have an up-to-date repository. This is the major difference between a distributed version control system and a centralized system like CVS or Subversion.
To update another repository, you need to propagate your changes onto this repository using the hg push
command. Alternatively, another developer could fetch your changes into his or her own local repository copy using the hg pull
command. For example, suppose Jill has made some changes that you need to integrate into your source code. To do this, you would "pull" her changes from her machine, like so:
$ hg pull http://jill@jillsmachine
Merging and branching
Once you have fetched the changes from another repository, you can merge them into your own repository using the hg merge
command. After the merge, you need to commit the merged code to your local repository, as follows:
$ hg merge
$ hg commit
As with any merging, conflicts can arise. When conflicts do happen, Mercurial makes no attempt to merge the two files (unlike Subversion or CVS). Instead, it indicates the conflicting files and leaves it up to you to choose your favorite graphical merging tool to do the job.
If you don't specify where you are pulling your changes from, Mercurial will assume you want to get them from the original repository that you used to clone your local copy. In this case, the original repository acts a bit like a central server. This also applies for the hg push
command, as shown here:
Listing 15. Pushing changes into the original repository
$ hg push
*pushing to http://buildserver.mycompany.com/mercurial/repo/myproject
searching for changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 3 changes to 2 files
Branching is handled simply by cloning a new copy of your local repository. Tags are well-implemented: in fact, they are simply references to a particular change set, which you can create using the hg tag
command.
Listing 16. Tagging and branching
$ tag "release candidate 1"
$ hg tags
tip 2:87726d51f171
release candidate 1 1:1d05b948ba76
Mercurial is a young tool with some refreshing new features, such as simple change-set tagging. Like other distributed version control tools, its user base is smaller than that of a conventional tool like CVS or Subversion. IDE support for Mercurial is also more limited than for either CVS or Subversion, though the Mercurial Eclipse plugin provides some basic IDE integration.
Who needs the cathedral when you've got Bazaar?
Bazaar is recently somewhat famous for being the version control system used by the Ubuntu Linux distribution. Like Mercurial, Bazaar uses a distributed model based on change sets and placing a local copy of the repository on each development machine. Now let's consider some of the ways Bazaar differs from Mercurial.
The first thing you would typically do is identify yourself, so that Bazaar records your name correctly in the log files. I've identified myself below.
$ bzr whoami "John <john@mycompany.com>"
wakaleo@taronga:~/bazaar$ bzr whoami
John <john@mycompany.com>
If you want to start working on a new project, the next step is to create a new branch of the project on your local machine, using the bzr branch
command. This is similar to Mercurial's "clone" operation. Like Mercurial and SVN, Bazaar is accessible via HTTP.
$ bzr branch http://buildserver.mycompany.com/bazaar/myproject--head/
Branched 24 revision(s).
This will create a new copy of a particular branch (in this case, the head, or main development branch). You can then work on this copy to your heart's content.
There's nothing 'bzr' about these commands
When it comes time to commit your changes, the commands are relatively simple. You can add new files to the local repository, or rename existing ones, using commands like bzr add
and bzr mv
. The bzr add
command deserves an extra mention, because of the extreme simplicity of its use. If you run bzr add
with no parameters, it will automatically add all files and directories that are not already in the repository, with the exception of any file patterns you have told Bazaar to ignore (using the bzr ignore
command). If you've ever struggled with adding files in CVS and Subversion, this command is a breath of fresh air:
Listing 17. Adding multiple files with Bazaar
$ bzr add
added LICENCE.txt
added src/NewClass.java
added README.txt
added test
added test/TestClass.java
Commits are done using the bzr commit
command.
Listing 18. Commits are also pretty easy
bzr commit -m "Setup new project"
bzr commit -m "Setup new project"
added LICENCE.txt
added README.txt
added src
added src/NewClass.java
added test
added test/TestClass.java
Committed revision 1.
Another nice feature is the way Bazaar handles file deletion. Bazaar automatically detects deleted files, so you have no need of an explicit "remove" command.
Listing 19. Bazaar records a deleted file
$ rm RedundantClass.java
$ bzr commit -m "Removed unnecessary file"
missing RedundantClass.java
deleted RedundantClass.java
Committed revision 2.
What's more, if you make a mistake Bazaar lets you easily roll back your changes using the very convenient bzr uncommit
command.
Listing 20. Sometimes you just need to uncommit
$ bzr uncommit
3 John 2007-08-01
Updated LICENCE text.
The above revision(s) will be removed.
Are you sure [y/N]? y
Pushing and pulling in a distributed network
So far, your commits have only gone to your local copy of the repository. At some point, you may want to share all your changes with other members of your team. To do this, you use the bzr push
command. The hitch is, you need to know with whom you want to share your changes, and where to put them to ensure that everyone who needs to can obtain your latest and greatest code. In theory, you could update each developer's machine individually. In practice, however, you often do have a central server where developers go to fetch the latest updates.
Here's a sample command to update the code on a central server:
$ bzr push http://buildserver.mycompany.com/bazaar/myproject--head
All changes applied successfully.
Pushed up to revision 2.
Of course, when you update the repository, you may run into conflicts with other files. Your first indication of a potential conflict should occur when you push your files to a remote repository: Bazaar will immediately indicate any potential conflicts.
bzr push http://buildserver.mycompany.com/bazaar/myproject--head
bzr: ERROR: These branches have diverged. Try using "merge" and then "push".
Automatic merging
Like CVS and Subversion, and unlike Mercurial, Bazaar supports automatic file merging. The merge algorithm used by Bazaar is fairly robust, though conflicts are still possible. When conflicts do occur, Bazaar uses a similar approach to Subversion to resolve them. Here's how Bazaar indicates a conflict when I attempt to merge a local repository with a remote one:
Listing 21. Bazaar reports a conflict
bzr merge http://buildserver.mycompany.com/bazaar/myproject--head
M* LICENCE.txt
* README.txt
* src/NewClass.java
* test/TestClass.java
Text conflict in LICENCE.txt
1 conflicts encountered.
Conflicts are indicated within the file, too, in a format similar to the one used by Subversion:
Listing 22. Merge conflicts in Bazaar
$ more LICENCE.txt
<<<<<<< TREE
This is a propriatary license. My lawyer says it's best.
=======
Open source rules! This license is GPL.
>>>>>>> MERGE-SOURCE
Once I've fixed the conflict, I inform Bazaar of the fix using the bzr resolve
command:
$ bzr resolve LICENCE.txt
I then need to commit the changes locally, and push them to the remote server.
Listing 23. Commit and push
$ bzr commit -m "Merged updates"
modified LICENCE.txt
modified README.txt
modified src/NewClass.java
modified test/TestClass.java
Committed revision 4.
$ bzr push http://buildserver.mycompany.com/bazaar/myproject--head
Pushed up to revision 4
Tags and branches
Bazaar handles branches and tags much the same way Mercurial does, although tags are a relatively recent feature in Bazaar. You create a new branch simply by using the bzr branch
command shown above. Tags are managed using the bzr tag
and bzr tags
commands, as shown here:
Listing 24. Managing tags in Bazaar
$ bzr tag release-1.0
Created tag release-1.0.
$ bzr tags
release-1.0
john@mycompany.com-20070801104450-h5xcg35tyy3xxo19
Bazaar is a rich, intuitive version control tool. It is easy to learn and well documented, and has been adopted for some high-profile open source projects such as the Ubuntu linux distribution. On the downside, like Mercurial, IDE support for Bazaar is limited: to date, only an alpha-quality plugin exists for Eclipse.
In conclusion
In this article I've presented a picture of the feature sets of four prominent open source version control systems. The first system I discussed was CVS. While a good tool in its time, and still suitable for many projects, CVS lacks support for binary file formats and atomic commits, and its slow tagging and branching functions are the bane of developers on larger projects. Subversion, the popular heir to CVS, is better adapted for the needs of most modern enterprise Java development projects. Both CVS and SVN feature excellent IDE support.
Bazaar and Mercurial are newer systems that are representative of the distributed approach to version control. Distributed SCM is interesting and offers some practical advantages over centralized management. It could also be argued that the distributed tools offer a more advanced command set. On the downside, the distributed tools lack the user base of CVS and Subversion, and both also lack the quality IDE support that is commonplace for CVS and Subversion.
Both Bazaar and Mercurial are excellent, innovative tools, with flexible, well-thought-out features. Of the two, Bazaar is arguably easier to use. Bazaar also can be used with a central repository, which allows users to combine the best of both worlds.
See the Resources section to learn more about version control systems and methods.
This story, "Subversion or CVS, Bazaar or Mercurial?" was originally published by JavaWorld.