Older blog entries for DV (starting at number 215)

16 Sep 2005 (updated 16 Sep 2005 at 13:14 UTC) »

Time for a change

I realize that I take less and less satisfaction in my involvement with GNOME. I'm relatively sad to see that people actively switch from libxml2 to GMarkup for the sake of it, that all I have to offer to this community at this point are boring minutes from a Board where I nearly feel a foreigner. Gamin is a workaround against legacy APIs at the user and kernel level that people dislike anyway, so even here it is hard to get anything but negative feedback. Even in the libxml2 work I get annoyed by the dumb question answered over and over, reacting negatively, this is sad, I really take less and less joy in being around, maybe I'm just kind of depressed, but I think it is time I refocuse on something else I will have more fun with. Sure I'm gonna continue libxml2 and libxslt maintainance, but I need to try to work on code where I have more fun, where people are actually happy with getting and using the code, it is time for a change. So I don't expect to be involved very much with GNOME in the future except the local GNOME-FR group being formed, I think I am done with work at the board level, I won't candidate again this year, I came back because I was really annoyed by the state of the board the year before, my candidacy statement was mostly to get back to regular meetings and public minutes for them, I think it's is back on track but it's not a fun work honnestly. If nobody wants to do it in the future then put that as a task for the Director, that would force some public communication every 2 weeks on the state of the affairs at the foundation level.

Update: note that I don't think the GNOME project is not interesting, but that what I can contribute to it is not really, libxml2/gamin/taking minutes is in no way representative of where fun be had in the project, and that is what burns me out, for anybody really interested in UI this is certainly a nice project to work in !

So what am I gonna do ? the Xen virtualization work looks very promising, that's what I expect to focuse on, it allows me to get back to my roots in Operating System, it is a growing community and technology and has the potential to affect a lot of the way we do computing and interract with computers in general. And there is an awful lot of work to do there, we are only in the beginning steps of the integration in Fedora, Linux and even more important it is a completely new concept from an user standpoint. Finding out the right concepts for them to grasp and use the technology, building code implementing those sounds exciting, I need excitement !

Never wake up Murthy ...

... with statements like "the last release is looking good". In the meantine 2 serious bug were found plus a complete blocker, expat headers exports XML_FEATURE_UNICODE which was added to libxml2-2.6.21 API too :-( , as a result mod_php fail to build, and I need to scratch libxml2-2.6.21 API and just released libxml2-2.6.22, sigh ...

However the Gnomers may appreciate the fact that libxml2-2.6.22 adds libxml2 API in devhelp, I will add libxslt too before the next release, based on on existsing libxml2-api.xml, 729 lines of XSLT and a bit of makefile and rpm spec glue.

Standards and politics

Christian, freedesktop.org has not much in common with standard bodies like ISO, W3C or IETF, in fd.o you may find a person annoying at times but you never get the same kind of politics games I have seen in the other ones. It is really nice to stay innocent and argue purely on technical matters and to just get the Right Thing done. Sometimes that's sufficient, but it's not very common, there is always interests, prior deployements, keeping the edge in a competition oriented world. Standardization can help bring awareness, interoperability, drive price down insanely, or it can fail too for a variety of reason, either because people don't want to negociate, because the trade-off between every existant player end up defining something hideous and unstable, or simply because it arrives too late and the market has standardized de facto on something else. Always keep in mind what your goals are, going though a standard process can be a very long and tedious work, sometimes it is worth it anyway :-)

GNOME CVS is accessible only with SSH

If you hit this then go read Owen's mail on the change, the reasons and how to best handle it.

Gnome Live CDs

Following Luis' announcement, fr.rpmfind.net now hosts copies of GNOME 2.12 live CDs, if for some reason you can't use Bittorrent, get them there.

GNOME Summit registration

There is only 35 persons listed at the moment, this looks a low number to me, please take the time to register by just adding your name to the list (and possibly to create an account on live.gnome.org if you didn't yet) if you think you are coming, this will help the logistic a lot, thanks !

xml:id is a W3C Rec

Whooohoo ! This is a relatively short and simple W3C specification. For people not familiar with it, IDs in XML are a special type of attributes whose value is used to point inside a document. So if you use an URI "http://example.com/bar.xml#foo" and there is an element in bar.xml with an attribute which is of type ID and of value "foo", then you can reference that element in a completely standard way (via the usual Mime-Type fragment identifier and XPointer). The problem is that to have ID attributes one needed to reference from the XML document and load the DTD where the type of the attributes is defined when parsing which a lot of XML processing avoids anyway. But if you have an xml:id compliant XML toolchain, any attribute named xml:id is of type ID and can be used for pointing inside the document, even in an absence of a DTD. This can be really useful if you design new XML based data type and need a way to key your records and point to them. It will work with XPath too "id('foo')" will directly bring you back the element if found in the document (and since the table is built at parse time it's an instant lookup usually).


After some troubles with rawhide I finally managed to run Fedora with an inotify kernel, then quickly spotted the memory leaks that plagued some of the users on inotified distros, and I released 0.1.6 with the patch and the latest updates to the inotify back-end. Note that the patch works for 0.1.5 you may have to ask Fred Crozat if you need to backport it, he did :-). He also pointed out that valgrind 3.0.1 also added support for the dnotify, so now gamin can also be valgrinded on older kernels which is excellent news (but didn't spotted other leaks). Valgrind rules !

libxml2 and libxslt

Apparently the latest releases were good ones, only one corner case parser bug was found (but it was also in 2.6.20) and a special compilation problem, this is less than the average amount of report after a week in a new release, especially considering the amount of changes which went in.

Barbecue ?

Isn't miguel inviting people to a barbecue where primates are roasted ? Someone call the Bonobo protection league quick !

They killed the bonobo ... you bastards !

5 Sep 2005 (updated 5 Sep 2005 at 22:45 UTC) »

Tintin en Irak

Damn someone should translate this satire Tintin en Irak to english (okay there are some French politic bits too), it's hilarious, and really well done, thanks to Uche Ogbuji for finding this excellent piece ! People familiar with TomBoy adventure will appreciate the delicate choice of mapping between the comics characters and the real ones :-)

Update: Wouter Bolsterlee provided me with a ready to print PDF :-), enjoy !

War on Weather

Following the recent events and following the satiric tone of Tintin en Irak here is some of the steps I expect the Bush administration to take following the disaster in the US South:

  • raise a big War on Weather media drama with looping images of Katrina on all TV channels
  • fund a multi billion dollar research program with friends from the military lobby to find if clouds may turn into potentially dangerous weather and to finger print them
  • also fund the deployment of large weapons-like lasers along the US coast to vaporize potentially harmful clouds
  • assign the department of Homeland Security the extra task of fingerprint detection of clouds entering the US territory
  • try to invade Cuba based on the obvious fact that they have developped Clouds of Massive Destruction targetting the US
  • provide the oil industry pumping in the carribean with army support and cloud destruction weapons to operate quietly in the future

The good old techniques always work ...

5 Sep 2005 (updated 5 Sep 2005 at 12:17 UTC) »

libxml2-2.6.21 and libxslt-1.1.15

Finally made a new set of releases, I have been chasing bugs for the last weeks, so people should really update in general. As a result I ended up closing 182 bugs in GNOME bugzilla (manually because the "handle multiple bugs at once" form does not allow to change to CLOSED state so I lost one hour clicking manually on the bug forms :-( ), anyway it's good to have a libxml2 and libxslt bug lists trimmed down to something reasonnable. It's good too to have them in time for the new GNOME release too.

Upcoming events

Don't forget to register for the GNOME Summit if you are coming, also tell us what you want to work on. This clearly won't be a talk driven conference but setting topics of work and discussion in advance will make us more productive.

My talk on Xen at FUDcon is scheduled for Thursday 6th October, I will try to be a bit on the GNOME booth too before and after on that day.

Disaster recovery

Without getting down into the political side, I'm still surprized it takes such a long time for countries affected by a disaster to request or even accept international aid. Seeing the US take a full week before accepting the various kind of help offered worldwide is a bit shocking, it's not like they didn't know they needed it, is that a logistic problem ? But when the Red Cross, an international foreign non-govenemental agency appears to be the most efficient workforce on the scene for 5 full days, and that even the state governemt officiel officially recognize it, this really means the money going to governemental disaster planning or handling should be directly donated to those who know and care about handling those. This also mean the governement is incompetent to handle those. Point is that this fact would not surprize us coming from a less developped country (thailand post tsunami recovery effort looks far more organized in retrospect than US response to Katerina) but again shocking for a nation very eager to point to others how they think a country should be run...

26 Aug 2005 (updated 26 Aug 2005 at 12:56 UTC) »

the future of gamin

John has submitted a patch to have gnome-vfs bypass FAM/gamin and access inotify directly, and people have been asking what my opinion was. Basically it's just fine, there is only 2 concerns: first the OS portability, gamin/FAM back-end for notification should be maintained as a way to keep working on older systems, BSD, MacOS-X, Solaris, etc., second the switch between inotify and gamin should be done at run-time, the reasons are that gamin/fam is legacy and will continue to be around, and by doing so we don't introduce one more binary incomptatibility in the platform, it's hard enough already for ISV shipping on Linux.

But It should be clear too that I find FAM a very ugly, limited and completely underspecified (if specified at all!) API, that it should die as promptly as possible and its only vertue was to be slighly less broken and specific than the various default kernel APIs found on various OSes especailly dnotify ! Die FAM, die ! The ideal situation would be to have a sane POSIX standardized notification API, and just rely on a kernel impleemtation based on syscalls, but I would not hold my breath ! The best future would be for gamin to become a legacy and useless piece of code.


I hacked furiously on libxml2 again this week, first trying to address as many bug reported as possible before the next release (probably at the end of next week) and also found a way to reduce the memory allocator usage of the library which can lead to very significant speedups in some cases as the parsing speed for the database file I use to profile raised from 21MBytes/s to 25MBytes/s on 32 bits and I would expect greater improvements when running on 64bits systems. Kasimier seems to have much work so the progresses on XML Schemas are slowing down, I should use that to also finish the schematron implementation and adding an interface to validate DTD at the SAX level like the interface for SAX and XSD added in July, this should be close to trivial with the existing code.


I will be out of reach until tuesday evening, as I was called to cleanup the land of my mother, there is an alarming number of wildfire around in the South of france and preventive actions is urgently needed. I also discovered that there is frequent Ryanair flights from there to London, which I will use when going to FUDCon3 where I will be speaking about Xen again. I will also go to the GNOME Summit at MIT in Cambridge MA the following week-end 8-11 October.

libxml2 breakage

Sorry I broke CVs head for a day or so, the error didn't show up in my checkout because I compile statically :-\ . Currently going though the libxml2 bugzilla list trying to kill as many bug report as possible, one of them was a real parser bug !</b>


While I sit on IRC all day long, I don't use IM until now. I now have Gaim, I did a review and a small implementation of Jabber a few years ago, I'm very happy of the boost it will receive.

Journey in regexp land

I have been rather quiet in the last 10 days, first due to an extended week-end and then because I hit a relatively hard problem at the regexp level used in libxml2. Basically it's all XML Schemas fault's Kasimier nearly completed the support except the redefine feature allowing a schemas to subset the content model of a type exported by an imported schemas. Kasimier will as usual handle the nasty part of making sense of the spec and I will give him the basic tools to have this work. Which means I need to provide ways to check that a content model is a valid derivation of another, which in regexp terms can be sumarized by: does regexp R accepts all strings generated by regexp r.

And that is a rock, a hard one.

After reading quite a bit, first my existing automata + counters modelling of regexps that we use for content model validation is really not a good model to try to solve this (though it's good for validating instances). So back to the litterature, various papers, most of them relatively recents, especially the paper from Michael Sperberg-McQueen at Extreme Markup earlier this month on on using Brzozowski derivatives for the task. Looks fine except it will explode when using large counter ranges. My selected approach is to do the derivation at the algebraic level instead of doing step by step on all possible input strings, and to fallback to injecting token by token only when no progress can be made purely on tree constructs. A small week of frenetic testing and refinement I now have something which seems to work relatively nicely. I just pushed it into libxml2 , adding less than 8kB of code to xmlregexp compiled size, added a first set to regression testing and support at the testRegexp command line test:

paphio:~/XML -> ./testRegexp --expr '(a*, ((b, c, d){0,5}, e{0,1}){0,4}, f)' '(a{1,100}, b, (c, d, b){2,3}, c, d, e)'
Testing expr (a*, ((b, c, d){0,5}, e{0,1}){0,4}, f):
Subset parsed as: ((((a , b) , ((c , d) , b){2,3}) , c) , d) , e
Resulting derivation: (((b , c) , d){0,5} , e?){0,3} , f
Ops: 0 nodes, 55 cons
paphio:~/XML -> ./testRegexp --expr '(a|b),(a|c){0,100}' 'a{0,100},(a|c)'
Testing expr (a|b),(a|c){0,100}:
Subset parsed as: a{0,100} , (a | c)
Resulting nillable derivation: empty
Ops: 0 nodes, 11 cons
paphio:~/XML -> ./testRegexp --expr '(a|b){3,*}' '(a,b)+'
Testing expr (a|b){3,*}:
Subset parsed as: (a , b)+
Resulting derivation: (a | b)+
Ops: 0 nodes, 8 cons
paphio:~/XML -> ./testRegexp --expr '(a|b),(a|c){0,99}' 'a{0,100},(a|c)'
Testing expr (a|b),(a|c){0,99}:
Subset parsed as: a{0,100} , (a | c)
Resulting derivation: forbidden
Ops: 0 nodes, 9 cons

The key is to try to keep sub-linear performances, I really expect redefines to be used to restrict content models from unbounded sets to bounded and reordered ones for example (a|b)* into (a,b){1,100000} to avoid consumer of services to be DoS'ed, if you explode when validating this is just worse, this is a big problem as pointed recently in a large threads on xml-dev. Hence testRegexp logs the number of Cons i.e. how many time an intermediate expression node was generated (one of Brzozowski results is that this set is finite, but the goal is to keep it small :-).

Future work on this is to fix one potential problem left, apply it to Kasimier code when it's there, extend it to allow the full set of operators needed by Relax-NG and maybe rewrite the RNG validator on top of it. Not sure I will use it for validation in Schemas itself (apart for the Schemas compilation of course), as I prefer good old automatas rather than mutating trees during the validation phase.

Test suite

Very impressed by yesterday SVG test suite results from Uraeus. Looks excellent, congrats ! Now can you automate the process of finding defects in output, I started having an headache approximately 2/3rd in the scanning process :-)

206 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!