News Graphic Art

It's pictures like this which make me wish sometimes I could be a graphic artist for a news organization.

penguin drinking wine

(The article this image attempts to graphically describe is here, for the curious.)

[09:13] | [random] | # | G
Martin and Distributed Version Control

I'm very pleased to see that Martin Pool has been hired by Canonical to work on distributed version control. He's a quite talented hacker, and while I don't think I entirely agree with some of his conclusions on version control (specifically that the Arch model is not a good underlying foundation), I'm sure that the product of his work will be impressive.

Clearly, his weblog will be a space to watch.

[21:56] | [freesoftware] | # | G
Worth reading: Introduction to Computer Security

I got Introduction to Computer Security as a gift. As the LWN review says, the book covers a lot of stuff, and is quite intense. I read all of it over the holiday break while at home. I started out trying to do all the exercises, but I got stuck on Chapter 3 question 3 where you have to prove:

Theorem 3-3: The set of unsafe systems is recursively enumerable.

I couldn't do that without a refresher look at my grad school book on computational theory, but now that I'm at home I'm going to give it a try. That reminds me, if anyone else is reading this book, it'd be cool to compare answers to the exercises.

The book covers everything from theory like the above to the details of how IPSec works, to what the Common Criteria are, to the waterfall life cycle of secure software design. Suffice it to say I learned a lot from it, and if you have more than a passing interest in computer security it's well worth reading.

[00:27] | [technology] | # | G
Graydon on divisions (or the lack thereof) between C# and Java

Very very good article by Graydon. I found it pretty convincing.

[22:21] | [freesoftware] | # | G
Responses

I wish I could allow comments on my blog, but I don't particularly feel like dealing with the spam problem, and I'd probably have to slightly hack the Pyblosxom comment plugin to make it play nicely with the SELinux setup I have on verbum.org.

Anyways, there was a lot of response to the previous post. It created an interesting thread on the Darcs mailing list (a bit here too). Also, Havoc wrote a very good blog post.

I have learned a lot about Darcs, and I can appreciate the elegance of its merge operator. But fundamentally I just don't see how it can scale.

I think it would be possible to add Darcs' merge to Arch's merge operators; conceptually, anything you can do with Darcs patches you can do with Arch changesets as well (although Arch would have to be extended to support things like Darcs' token replace patches). However, I don't really see how you'd do it the other way around; Darcs is entirely predicated around its one merge operator and the presence of all history.

I also got an email from a Monotone hacker. He said that Monotone changesets are not what I had hoped. They are just a better internal representation for Monotone's current architecture. So indeed, Monotone still does not support cherrypicking. The claim here is that a 3-way merge using complete history is better. I agree that when possible a 3-way merge is usually better; the flaw though is that it assumes that the complete history is available everywhere, and that one always wants to do a complete merge. So I find this argument very unsatisfying; after all, right now we essentially do cherrypicking all of the time when outside contributors send a diff -u to a mailing list. You can of course tell contributors what to fix, and they can create a new branch, and then you can merge that whole thing, but simply cherrypicking some of their changes and fixing up the changes yourself (while still preserving merge history) seems a lot more efficient and elegant.

This is mostly a fixable technical detail, but I'm also a bit unhappy with Monotone's usage of RSA certificates rather than OpenPGP keys. Some of us have spent quite a bit of time building up links in the web of trust, and don't particularly feel like generating trust around a new key, protecting it, etc.

So despite my reservations about some of the ancillary issues around the GNU Arch project, at this point I will continue to use it (or maybe Bazaar) for my personal projects. When a future winner emerges, it should be relatively straightforward to convert my existing changesets into the changeset format of the winner.

The common changeset mailing list archives kind of summarize how I feel about the current state of free software revision control.

[15:52] | [freesoftware] | # | G
Orthogonal Changesets

I mentioned in my previous blog entry about revision control that I thought that the Arch model of changesets which are independent of project history is crucial. But why is that?

Regular GNU patches have this property, and we rely on it all of the time. For example, you can download the latest .tar.gz of say Conglomerate, notice a bug in it, and use diff -u to create a patch. Then you can email this to the maintainer, and he can apply it. You don't have to check Conglomerate out from CVS or Arch or Darcs or whatever it's maintained with.

And as I mentioned before, an Arch changeset is basically just a super-patch that handles binary files and renames. If projects include just a bit of constant-sized metadata in their tarballs, the logcial file identity, you can run tla mkpatch old-tree new-tree to generate a changeset between those two trees. You do not need access to the Arch repository. Really, all Arch is is a layer on top of this for publishing changesets on the web, digitally signing them, etc. The functionality of tla mkpatch and tla dopatch could be fairly easily separated from the rest of tla, just like diff and patch.

In contrast, in Darcs, there is no concept of logical file identity, and thus no concept of independent changeset. Darcs can handle renames, but only in the presence of project history. This means that if you want to create a Darcs patch, you must turn your downloaded .tar.gz into a Darcs repository. I'm not sure if even that would be sufficient; you might have to actually start by downloading the upstream source out of their Darcs repository.

So, you might say, what's the big deal? Darcs makes it pretty damn easy to create a Darcs repository and start hacking.

Consider, for example, the Emacs/XEmacs fork. According to the XEmacs history, they forked from an early version of Emacs 19. A bit of research on the web places that around 1992. Let's say that RMS and the XEmacs leaders get together and decide that forking sucks, and decide to merge. Suppose further that Emacs had been using Arch at the time of the fork, and the XEmacs people created their fork by simply creating a tag from the Emacs archive into their own archive (as would be the sensible way to do it).

Obviously, no matter what RCS you're using, this is an enormous task. One reasonable way they might start to go about it is to merge changes to all of the miscellaneous Lisp files each project ships. So for example, the XEmacs people might say, "Hey, Emacs has a lot of fixes for ibuffer.el, let's merge those in!" (shameless plug ;)). So the XEmacs people look over the history in the Emacs archive, and find that the following changesets apply to ibuffer.el:

emacs@savannah.gnu.org--2000/emacs--main--21--patch-347
emacs@savannah.gnu.org--2000/emacs--main--21--patch-2049
emacs@savannah.gnu.org--2000/emacs--main--21--patch-1027
emacs@savannah.gnu.org--2001/emacs--main--21--patch-22
emacs@savannah.gnu.org--2003/emacs--main--21--patch-782

In order for the XEmacs people to merge these changesets, this is as simple as doing: tla replay $changeset. That goes out to the Emacs Arch archive, and retrieves a single changeset tarball, and adds it to the XEmacs branch. This merging style is history-sensitive, because now the XEmacs branch records that e.g. emacs@savannah.gnu.org--2000/emacs--main--21--patch-347 has been applied. Note that crucially, one did not need to download all of the thousands of changsets gathered in the just 4 years of history since ibuffer.el was created, not to mention the 14 years of Emacs history since the fork.

Even if say the XEmacs people have moved ibuffer.elto contrib/lisp, the changesets will still apply, because they are against the logical file identity.

In Darcs, this would, as far as I can tell, not work. The reason is that in order to correctly merge these individual changesets, you would require access to the entire history at once (in memory, no less!). That's because Darcs needs that history in order to correctly reorder patches and infer renames.

I was about to write that the same was true of Monotone, but it looks like Monotone may have fixed this. I'm going to have a closer look at Monotone. If they've got this right, it could be very promising. I'm a bit wary of stuffing history in SQLite, but at least the project doesn't also carry the baggage of foolish goals such as replacing libc.

[21:48] | [freesoftware] | # | G
On Thread Safety

Dom: To be more precise, the librsvg2 code itself may be thread safe, but the library as a whole is threadunsafe because it calls into Pango, which has not yet had any work on threadsafety. Have you actually tried loading multiple SVG images at once in multiple threads? The imsep-torture program brought down io-svg.c pretty quickly.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1094711648 (LWP 30154)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000035b3e15935 in pango_glyph_string_x_to_index () from /usr/lib64/libpango-1.0.so.0
#2  0x00000035b3e16118 in pango_map_get_engine () from /usr/lib64/libpango-1.0.so.0
#3  0x00000035b3e161e8 in pango_map_get_engines () from /usr/lib64/libpango-1.0.so.0
#4  0x00000035b3e18f4c in pango_context_get_base_dir () from /usr/lib64/libpango-1.0.so.0
#5  0x00000035b3e19399 in pango_context_get_base_dir () from /usr/lib64/libpango-1.0.so.0
#6  0x00000035b3e196d8 in pango_itemize_with_base_dir () from /usr/lib64/libpango-1.0.so.0
#7  0x00000035b3e1fe86 in pango_layout_line_get_pixel_extents () from /usr/lib64/libpango-1.0.so.0
#8  0x00000035b3e20a93 in pango_layout_line_get_pixel_extents () from /usr/lib64/libpango-1.0.so.0
#9  0x00000035b3e21299 in pango_layout_get_pixel_extents () from /usr/lib64/libpango-1.0.so.0
#10 0x0000003667423be8 in rsvg_text_render_text () from /usr/lib64/librsvg-2.so.2
#11 0x0000003667424138 in rsvg_text_render_text () from /usr/lib64/librsvg-2.so.2
#12 0x00000036663b7099 in xmlParseCharData () from /usr/lib64/libxml2.so.2
#13 0x00000036663bb777 in xmlParseChunk () from /usr/lib64/libxml2.so.2
#14 0x0000003667427375 in rsvg_handle_write_impl () from /usr/lib64/librsvg-2.so.2
#15 0x0000002a958ddf3b in ?? () from /usr/lib64/gtk-2.0/2.4.0/loaders/svg_loader.so
#16 0x000000371c4098a9 in gdk_pixbuf_loader_write (loader=0x513fd0,
    buf=0x52cc20 "or=\"#20305a\"/>\n    <pos y=\"0\" x=\"0\" width=\"100%\" height=\"100%\"/>\n  </item>\n\n  \n  <item type=\"rect\">\n    <pos y=\"-50\" x=\"0\" width=\"100%\" height=\"50\"/>\n    <box xpadding=\"10\" spacing=\"10\" orientation=\"h"..., count=16493, error=0x413ff140) at gdk-pixbuf-loader.c:494
#17 0x00000000004027c5 in request_thread_main (request=0x52c3c0, unused=0xe0) at imsep-loader.c:214
#18 0x00000035b284218c in g_thread_pool_thread_proxy (data=0x588b00) at gthreadpool.c:113
#19 0x00000035b283fece in g_thread_create_proxy (data=0x588b00) at gthread.c:556
#20 0x00000032b0605f81 in start_thread () from /lib64/tls/libpthread.so.0
#21 0x00000035b1bc3aa3 in thread_start () from /lib64/tls/libc.so.6
#22 0x0000000000000000 in ?? ()
[11:02] | [freesoftware] | # | G
On Dogfooding

Seth: I think that rawhide should basically always be dogfoodable. If we ever get into your "Red" state, that slows development. Anyone running any kind of system should be backing up their data anyways. Besides bad package postinsts that have rm -rf /, there's plain old hard drive failures, accidental typos with rm, giant maurading robots, etc., that happen whether you're running rawhide or not.

[02:40] | [freesoftware] | # | G