Google Mondrian: web-based code review and storage

Nov30

Google Mondrian: web-based code review and storage

Guido van Rossum unveiled his first Google project, Mondrian, tonight during a Python tech talk at the Google campus in Mountain View. Mondrian is a web-based code review system built on top of a Perforce and BigTable backend with a Python-powered front-end. Mondrian is a pretty impressive system and is currently in use across Google.

Shared Development Environment

Google uses a company-wide Perforce depot with almost no developer branches. Each developer has their own NFS workspace readable by anyone in the company, including automated processes. An administrative process takes snapshots of each developer workspace including local development environments accessed over SSH. Files within these snapshots can be compared to checked-in data, encrypted, and archived.

Previous methods of review

Previous to Mondrian code review was conducted largely over e-mail using Google command-line wrappers built on top of Perforce. A developer could initiate a code review from within the g4 mail tool, which would fire off an e-mail and begin a review thread. When the developer receives a response of "looks good to me," or lgtm for short, they could proceed to checkin. Changes could be compared using tkdiff.

Design-level reviews are often conducted by e-mailing around Word documents or editing a team wiki. Recently some design reviews have moved onto an internal version of Google Docs.

Web-based collaboration meets code review

The Mondrian tool creates a much better workflow by creating task-specific dashboards, in-line commenting, well-tracked statistics, and more. The application is built on top of Python open source libraries such as the Django framework, smtpd.py mail service, and the wsgiref web server software.

Code reviews can be initiated and completed from within the Mondrian interface. A developer requests a review from another user or a group of users to kick off the process. Each invited reviewer can add comments directly underneath a line of code or reference the entire file. You can request and diff the file against previous versions as well. It's a pretty slick interface, lightly highlighting each line of code as you hover, and popping open a comment box in response to a double-click. Comments can be saved as a draft and shared at a later time.

Putting the entire code review process online means you never have to worry about referencing the most recent version of a file or losing e-mails. Mondrian captures every outgoing e-mail related to the workflow, looks for key data such as revision numbers, and updates a to-do list accordingly.

Summary

The Mondrian web code review system is pretty impressive. Guido estimates he has spent about 25% of his work time on the project since joining Google in December 2005. Mondrian served as Guido's introduction to Google technologies and processes with the help of a few other Googlers treating it as a side-project. The application is so deeply intertwined with Google technologies it's not likely to be available as open source until Subversion and a backend such as SQLite can be supported.

Guido's full talk, including a demo of Mondrian, is available on Google Video.

Posted at 11:42 PM on November 30, 2006

Comments

scruzia on December 1, 2006 at 11:56 AM wrote: #

Excellent summary of Guido's talk.

As was mentioned in the meeting, there are/were other Mondrian projects out there (an OLAP server, a Haskell-like programming language, and a Ruby IDE -- see http://en.wikipedia.org/wiki/Mondrian for links). Since this one is an internal Google project, there wasn't a big need to come up with a world-unique new name for it.

With respect to availability: Because the application is so deeply intertwined with Google technologies, my first thought was not about open-sourcing it, but rather about Google offering it as a "Software as a service". However, the dependence on a globally readable NFS workspace makes that route also somewhat problematic.

Niall Kennedy on December 1, 2006 at 12:39 PM wrote: #

The name "Mondrian" was chosen by Guido out of Dutch pride. He even has a Mondrian cheese label on his PowerBook.

Joshua Allen on December 1, 2006 at 7:00 PM wrote: #

I always wondered why Dutch are so proud of Mondrian, though? Dutch have (and have had) some of the greatest thinkers in the world, and Mondrian was just a painter who removed an 'a' from his name to avoid being stigmatized as Dutch. Not a bad artist, but still...

Tom Harris on December 2, 2006 at 9:36 AM wrote: #

Interesting, though not surprising, that Google should have an in-house entry into the collaborative code review tool market.

What is more interesting is the different fashions of various (giant) companies. Google -- code review. Microsoft's in-house entry to code quality is PreFix/Prefast, some static analysis tools that they decided to start bundling with their IDE. I wonder what Apple does to support code quality amongst their developers.

Coming back to Google, I'm unimpressed. Why is it that every time I see a system like this, the example of a code review is, "developer requests code review, another developer looks at the code, makes a few comments, and after a few fixes, says, 'looks good to me.'"

I know why. Many work environments are not particularly supportive of making time to review someone else's work. The proof is that people think of review as something you have to pass with no errors, rather than as a process for continuous learning. For that, review has to be a profession (as it is in regular writing -- it's called "editor"), rather than just something I do when I get an alert from a collaborative code review tool.

Eric on December 2, 2006 at 10:58 AM wrote: #

PPL, Mondrian is already a project which is an IDE for Ruby the programming language, and is currently in Beta 8 In development, this is quite a possible situation Google failed to notice!!

Lach on December 3, 2006 at 2:00 PM wrote: #

Unless the Ruby IDE is called 'Google Mondrian' I don't see a huge problem.

beza1e1 on December 3, 2006 at 2:36 PM wrote: #

Why don't the Google Code guys integrate this?

schluehk on December 4, 2006 at 4:53 AM wrote: #

"Mondrian" is also the name of an experimental Haskell like programming language invented by Erik Meijer at Microsoft.

http://citeseer.ist.psu.edu/meijer01scripting.html

There are definitely more Mondrians in the programming world than Kandinskys. Programmers seem to have a faible for squares and New York Boogie Woogie.

gleb on December 8, 2006 at 5:59 PM wrote: #

It will be interesting to watch the video once it's available. I do wonder about the apparent emphasis on doing review before code check-in. I think the code should be good enough to check-in before another developer spends his time looking over it. Also, as Tom Harris points out above, the process of "developer requesting a review" may not be right for other companies -- in a small team you could expect each developer to review all the code going in.

Anyway, these are minor nits, as any way of doing code review is better than not doing them at all. Mike Fagan in IBM and others did a lot of research on code inspections a while back, and they are a great and relatively cheap way to maintain quality codebase. About 80% of the benefit comes from authors knowing that their code will be reviewed, so whatever code review process you chose, it makes sense to optimize for coverage. Reviewing 5% of code with 100% benefit is worse than reviewing 90% at 80% benefit, and you can't do worse than 80% benefit.

In Pluron, we started out email-based reviews similar way as Google. Every Subversion change results in an email with a diff, other developers review these at their convenience, and if they have any comments send them by email/enter them into a wiki. The intention was good, but the workflow was just too clunky. The reviewer had to write email while looking at the diff, enter enough context for the author to locate the issue. There was nothing tracking these things, so more often then not, the comments would go unresolved. In part, that was because author's job was even more involved -- get the email, open the right files and find the right context, etc.

Looks like we automated the process similarly to how Guido did it. In the timeline you are able to view a change set, and annotate it with in-line comments and assign it to the right developer, normally the author. The UI is meant to look like a debugger, with each "breakpoint" representing a comment. Now the trick is that each of these issues is tracked together with all the other other development tasks, so they do not "get lost." For the author the job is also much easier. He can see new tasks show up, click on them and get to the same view reviewer uses. He can now fix the issues, and close the tasks all from the same screen. Since we found it so useful ourselves, we exposed it as an end-user feature. We put together a demo if you are interested.

Still, as good an as efficient as per-change set review process is, it's no substitute for an occasional enscript -2Grh of a major chunk of your codebase and doing a full-blown review of it. Even if each individual changeset was perfect, the resulting whole is a breeding ground for entropy.

psluaces on December 10, 2006 at 7:21 AM wrote: #

One question: why do you use NFS shares instead of storing everything on the SCM?? I mean, you could have a branch per developer or even per task to achieve the same results, couldn't you?

Matthias Urlichs on December 20, 2006 at 12:15 PM wrote: #

I can understand why they don't put it into the SCM. If you want to do automated analysis (like answering the "which change introduced this damn bug" question) you don't want inferior-quality code in there.

... or perhaps the deciding factor was Perforce's performance; a tool like that would increase accesses to it by an order of magnitude. In contrast, they already have their shared NFS space, and the additional accesses by Mondrian are probably not that noticeable.

Too bad it's not open source. I'd definitely build something like that on top of a distributed SCM, though -- subversion isn't -- and I have no off-the-cuff idea what to do about the database.

Wilmer on March 24, 2007 at 6:35 PM wrote: #

It uses BigTable, that probably makes releasing it a bit hard. :-)

psluaces (if you ever read this, about three months later): The Perforce server is busy enough already, so as Matthias says, it's better to put that load on the NFS servers.

Also, after someone reviews your code and says some things are wrong, this easily allows you to fix things and ask the reviewer to check again. Of course this could also be done by resubmitting the code to the SCM but ... well, as said, that server has enough work to do. :-)

Also, Perforce is (AFAIK) a closed-source product, so there's not much one can hack into it.