« Foxes (no fire). | Main | More on distributed VCS performance. »

November 30, 2006

bzr/hg/git performance

This posting is related to Paul Reed's recent posting about our investigation into switching away from CVS for our Mozilla 2.0 work. As you can read in Paul's post, the two alternatives on the plate at the moment are Mercurial and Bazaar, both of which are primarily written in Python (Bazaar is completely written in Python, Mercurial has some bottlenecks written in C from what I understand).

When we started talking about this I decided to do some performance tests to see how well these systems keep up with GIT, which I've been using for quite some time now and really like. Unfortunately GIT doesn't work anywhere near well enough on Windows, so using GIT is out of the question.

As Brett found out, Bazaar seems to be on the order of 2 to 3 times slower than Mercurial, which sounds bad, but depending on the actual performance might not really matter. So I decided to do some more unscientific tests to see how the performance of these two systems would compare to GIT, which seems really snappy with a repository containing all of Firefox and Thunderbird's source, including tests etc, and CVS/ directories.

Brett already tested commit speed and compared it. My first test (and only so far) was diff performance, as I tend to look at intermediate changes as I work fairly frequently. Here's what I did with Mercurial (hg), Bazaar (bzr). I already had a GIT (git) repository set up, so I used that.

  1. Initialize a fresh repository
  2. Add the whole Mozilla tree
  3. Commit

Once that was done, I made a one-line change to the file dom/src/base/nsDOMClassInfo.cpp, and did a set of diff tests and got the following results (all numbers are best of 3 runs, back to back on the same mostly idle computer):

Operationbzr (0.12.0c1)hg (0.9)git (1.4.2.4)
diff (top level) 16.9575.6001.572
diff dom/ 10.5962.2400.140
diff dom/src/ 10.5042.2120.124
diff dom/src/base/ 10.4682.2120.124
diff dom/src/base/nsDOMClassInfo.cpp 10.4722.0840.116
diff dom/src/base/nsGlobalWindow.cpp 10.0122.0240.088
diff in dom/ 16.8335.5480.136
diff in dom/src/base/ 16.8815.5040.112

What's interesting in this data is that bzr takes a huge amount of time to do a diff operation even if you explicitly tell it to only check part of the source tree (either by changing into a subdirectory or giving it a directory or file name on the command line) as it does if you do a top-level diff (~2/3 of the time). hg appears to have partially solved this, but not in all cases. Another interesting note is that explicitly diffing a file that has no changes takes essentially as long as diffing a file with changes.

So what does this all mean? Well, to me personally it means Bazaar is not yet ready for a repository the size of Mozilla. Mercurial I can live with, even if it's not snappy. Git is fast (and yeah, I kind of wish we could use it).

Posted by jst at November 30, 2006 1:57 PM

Comments

Is it an option to help GIT people to make GIT better for Windows?

Posted by: Asko at December 1, 2006 2:17 AM

No, not really. There is a win32 port for cygwin, but from what I hear and have seen it's not anywhere near good enough for Mozilla. It's been around for some time, but there doesn't appear to be strong enough support for it in the community to keep improving and maintaining it enough for us to want to bet our VCS story on it. Plus last time I tried to use it it got confused with the whole NL/LFNL issue and I got an unusable repository. Mozilla is not in the business of writing version control systems. They're non-trivial, and we've got plenty of other items on our list. Plus GIT doesn't appear to offer much on top of Mercurial, beyond speed, and it seems Mercurial performs well enough already...

Posted by: jst at December 1, 2006 1:09 PM

Wait, wait, I think I came to this discussion late... remind me again why Subversion isn't a suitable replacement for CVS? Inability to deal with a codebase the size of Mozilla?

Posted by: Ben Karel at December 11, 2006 6:31 AM

Bazaar diff speed is improving substantially - 0.14 (in early January) will be about 33% faster on this type of operation. That still has a way to go but there are other changes coming too.

Posted by: Martin at December 22, 2006 2:21 PM

What about monotone?

Posted by: Blharg at December 24, 2006 6:10 AM

Your "diff in [subdir]" tests with hg are doing tree-wide diffs. status and commit operate similarly.

Posted by: Matt Mackall at January 15, 2007 2:42 PM

What are the _units_ of those results? Seconds? Is it wall time, user+sys time, or what?


I have added this benchmark (with link to this article) at http://git.or.cz/gitwiki/GitBenchmarks

Posted by: Jakub Narebski at January 19, 2007 2:07 PM

The units for those tests are seconds, and they're all wall clock times.

Posted by: jst at January 19, 2007 6:12 PM

Hrmm, too bad that git is out of the question for us, I also would have liked it, and projects like linux-kernel and wine being users of git means it works well with big repositories - unfortunately those are not cross-platform projects that need to work well on win32...

Posted by: Robert Kaiser at January 26, 2007 11:22 AM

Not sure if the MinGW git port in progress the picture any: http://www.gelato.unsw.edu.au/archives/git/0701/37483.html.

Posted by: tor at January 26, 2007 3:26 PM

If you were wondering, hg time is almost constant because I think the dirstate parsing dominate the directory walking. (the dirstate is the index of the files in the repository)

Posted by: tonfa at January 31, 2007 1:49 AM

by the way, you might be interested in the mercurial inotify extension for large trees. Its effect on status, diff, and most other operations that look at the working directory is pretty dramatic. I've gotten very attached to it for my linux and xen repositories.

Posted by: brendan at July 12, 2007 8:30 PM

From Blharg: Your "diff in [subdir]" tests with hg are doing tree-wide diffs. status and commit operate similarly.


That's the same situation with Bzr. Just because you're in a subdirectory doesn't mean that you are diffing just that subdir. It defaults to the hole repos.

Posted by: William Lynch at July 25, 2007 11:00 AM