OpenXML vs ODF in SC34: The Phoney War 

One of the more tedious aspects of this multi-day SC34 meeting has been the politics surrounding the supposed office format wars, arising from the increasing likelihood that SC34 will participate in the creation of an ISO Standard for Microsoft Office documents. Signs that something was up came in the form of several conspiratorial phone calls and email exchanges beforehand, and the presence of unfamiliar OASIS, IBM, Microsoft, and Ecma people at the opening plenary.

ODF has already passed most of the ISO/IEC standardisation process, though OpenXML (or “Office Open XML” – OOXML – as Ecma are now calling it) has not yet begun its trek. I guess Microsoft/Ecma were checking out SC34 to determine what sort of reception their document would get, since clearly it would be stupid of them to put OOXML through a standards process that was predisposed to reject it.

So what will they have found?

If Microsoft HQ staff had asked their national standards body (ANSI) about SC34, they might have gathered we are not the sort of people one should associate with. As an illustration: while during these current meetings the hospitality and support of our Korean hosts has been generous and friendly, during last year's meeting in Atlanta the only message sent from ANSI was a snub: they wished to emphasise they were not hosting the meeting and did not want to be associated with it. (Hence our many American members work for SC34 in a personal capacity rather than with national backing.)

And from reading some of the web press one might have thought SC34/ISO would be a forum whose rules might be exploited to desired ends by one or other corporation.

The reality is of course different.

The members of SC34 can be generally characterised as level-headed, thoughtful technical experts, specialising in the technology of “document description and processing languages”. Okay, there are arguments – but they generally concern the intellectual and technical merit of ideas and positions. The mundane politics of commerce, and the sloppiness of thinking inherent in polarised Pro- or anti-Microsoft debate, is not the stuff of SC34 meetings. Indeed, I have heard the adjective “slashdottish” being used in SC34 as a rebuke.

No, SC34 doesn't do hype or crave the spotlight. Which is probably why nobody has heard of us — even though (to risk being described as slashdottish) many current and upcoming Web technologies are rooted in standards SC34 originated (SGML and HyTime in particular).

SC34 members are also volunteers, drawn mostly from small and medium-sized companies. Many of us know the reality is, for our customers, that Microsoft tools are often the tools of choice; and many of us judge that having a locked-down standard version of the Microsoft Office formats could being huge benefits when it comes to working with Office documents.

However, in the grand scheme of the proposed standardisation of OOXML, SC34, and its members' judgements, are only part of the picture – there are other considerations in this complex matter ...

First, SC34 members themselves will have no vote if and when OOXML is assigned to us for standardisation. It is standards bureaucrats in our national bodies (ISO member countries) who will vote. Sure, these people will often take advice from their country's SC34 members, but ultimately it is their votes, not ours, which count.

Secondly, the criteria for judging a standard are not whether it is good for competition or the software industry at large; so long as the technology is of use to the world, the criteria are more concerned with whether the standard is technically correct. This is often as much a judgement on the quality of the standards document, than on the attributes of the technology it describes.

Thirdly, there are ultimately inviolable rules which govern the justification for creating of ISO document, laid down in the rules for ISO/IEC.

It is this last point which has provoked most discussion in the bars and along the committee corridor this week. Crucially, the rules forbid the creation of “contradictory” standards – and the word “contradictory” is sufficiently ill-defined at present, for some to feel that OOXML “contradicts” ODF, in spite of the evidence that ISO itself is full of Standards which are ... well, overlapping and complementary at least.

There is also the question of time and resources. The kind of fast-tracking procedures used for ODF and mooted for OOXML give very little time for a small part-time volunteer group to give such bulky documents adequate scrutiny. There is no doubt ODF is, right now, less good than it would have been if subjected to the full rigours of ISO standardisation. With OOXML predicted to weigh in as a behemoth 7,000 page standard the danger that OOXML will be inadequately scrutinised is greater still.

It is this unseemly haste that most concerned me and my fellow UK delegates during this meeting. Although arguably the trend has already been set by some other standards groups, it is ultimately in nobody's interest if the standardisation process becomes devalued to the point where it is delivering technology standards which are done quicky, but which don't provide a solid and useful basis for implementers and users.

But, despite all the above, I expect to continue to hear the standardisation attempts of Microsoft being characterised into a dumbed-down narrative of “ODF vs OpenXML”.
[ add comment ] [ 0 trackbacks ] permalink
SC34 meetings in Seoul: DTLL 

I am in Seoul for SC34 meetings, and in particular to work on DTLL, the Datatype Library Language — or ISO/IEC 19757 Part 5, as it hopes to be known.

DTLL is the brainchild of Jeni Tennison and was first described by her at the XMLOpen Conference in 2004, to general applause. As editor of Part 5 my job is to take her vision and turn it into an International Standard – this requires a particular kind of preparation of the text and committee work, alongside the various rounds of international voting which constitute the ISO JTC1 process.

I am also, in parallel, preparing an implementation of DTLL in Java, not least because I believe that there's nothing like having to code a spec to crystallise any gotchas. This will be released on sourceForge under an open source licence when I can be confident that the language is close to its final form, and that my code works :-)

The core idea of DTLL is the use of an expression language (regular expressions, say) to split text content up in such a way that it is represented as a tiny XML document. Various tests can then be performed on this XML to determine whether a value conforms to the rules for a particular datatype.

So to take a timely example, today's date in ISO 8601 format is


and this might be decomposed into year, month and day parts with the following regex:


Note the used of named sub-expressions here (i.e. the '?[YYYY]', '?[MM]' and '?[DD]' at the beginning of group-matching patterns) – these 'name' the matched groups so that, if this expression was applied to our example date, we'd get:

YYYY - 2006
MM - 05
DD - 28

A DTLL processor will represent these matched groups as an XML document with a document element named to match the name we've given to our datatype. So, if we'd called this datatype 'date' we'd get (with Namespaces omitted for brevity):

If the content we're targeting doesn't match the regex, then we know it's not valid to our rule. But if it does, then we can get to work with XPath to validate the data further. So, to check that our date value is between 1 and 31 we can say

<condition test='/date/DD >= 1 and /date/DD <= 31'/>

And to specify tests for correct values depending on the month, and take account of leap years we can say (here's a complete datatype definition):

<datatype name='date'>
<condition test='/date/DD >= 1 and /date/DD <= 31'/>
<condition test='/date/MM >= 1 and /date/MM <= 12'/>
<condition test='(/date/MM = 1 or /date/MM = 3 or /date/MM = 5 or
 /date/MM = 7 or /date/MM = 8 or /date/MM = 10 or
 /date/MM = 12) or /date/DD <= 30'/>
<condition test="/date/MM != 2 or
 /date/DD <= 28 or
 (/date/DD = 29 and
 (/date/YYYY mod 400 = 0 or
 (/date/YYYY mod 4 = 0 and
 not(/date/YYYY mod 100 = 0))))" /> 

Et voila, a test for ISO 8601 dates (which is incidentally, more conformant than the test specified by W3C XML Schema since, unlike there, the '-' separator between the parts of the date is here optional).

The language has features for typed variables which makes definitions in practice more concise and modular, but this gives a flavour I hope.

Jeni is speaking on DTLL at the Extreme Markup 2006 Conference, by which time I'm hoping the language itself will have stablised.

The latest official version of DTLL document is always available from the DSDL homepage, but if you're interested in a more up-to-date status report, please feel free to contact me.
[ add comment ] [ 0 trackbacks ] permalink
Debating Gender Difference 

Long discussion this morning with Sarah about gender difference, following the appointment by Cambridge University's Faculty of English of four young men to four vacant lecturing positions.

Of course these four young men might well have been the four best candidates for ths posts, but with a field evenly split between the sexes there's a thought that same kind of unconscious bias plays a part.

Relatedly, a very interesting debate between Marc D. Hauser and Elizabeth Spelke contains the startling claim that academics, when presented with a vita, will rate it higher if they are told that candidate is male.

The wider question of the debate, on the relative abilities of men and women, was however left in the balance ...
[ add comment ] [ 0 trackbacks ] permalink
XML:UK Are Calling for Participation 

XML:UK are running a “Member Presentation Day” on 27 June in Reading.

They're looking for short (15-20 minute) presentations on any XML-related topic and give as examples:

- a survey of the uses of XML within your organisation
- a current or recent project
- a burning issue that you feel needs more attention from the XML community.

This is a great change for members to learn about what's going on. If it's anything like the last event we can expect tales of triumph and woe in equal measure ...
[ add comment ] [ 0 trackbacks ] permalink