2004-12-25

Attention, Attention.xml

Filed under:
@ 22:33

The delights of corporate culture are on show in a post from Scoble :

Steve Gillmor’s report on Attention.xml is included in Esther Dyson’s Release 1.0. Thanks to Mike Manuel for letting us know the report is now available for $80. I’ll have to check our corporate library and see if it’s available there (I believe it is).

Then of course there was an (invitation-only) FOO Camp presentation…
Sod all that. Here’s my €0.02 report.

Right, hitting the spec: I think the problem statement is a bit light, but there certainly is an interesting problem in this area around “with so many unread items, how do you know which to read first”. This is one aspect of the challenge facing anyone working with syndicated material - how to filter and focus the info overload down to what is really of interest to the individual end user. The notion of attention has a lot of potential here.

The sample application profiles provided help suggest where the information might come from (it would be helpful to push some of this info up into the spec intro, in an overview fashion). But Attention.xml basically provides a way of describing aspects of a user’s visits to a blog/feed/page/post/item/entry in a machine-readable fashion. This is information that could be extremely useful if captured, to both clients and servers of feeds. Doesn’t really matter where it comes from.

Next, I’d suggest that they have done a pretty good job of identifying the values that would be of use in the context of “attention". However, I do believe there are systematic problems that leak between format and model. Basically the model is the format, and the both tied to a hierarchy. While trees may be useful for many things, they are a handicap on the Web where the default structure is a directed graph.

Rather amusingly the spec answers a FAQ question about the MeNow work as “premature, and hugely overdesigned (like most RDF efforts)”. It’s hard to judge the “premature” of this - it’s relatively new, who knows whether the world is ready for it… I believe the folks behind it have been simultaneously playing with implementation code, but I think it’s still pretty much untested. Overdesigned? Well both MeNow and Attention.xml appear to have come from a brainstorming session where had I been present, they would both now have an element “catPhoto".

[PS. “Like most RDF efforts” - rather an unfair generalisation, although it certainly can go this way. Because you can add as little or as much information to a schema/vocabulary/ontology as you like, it can be difficult to know it’s time to stop designing and get on and do some application implementation stuff with your creation. I’ve got the balance wrong more than once myself.]

The use of the XOXO format to me seems bizarre. Now I have great admiration for Tantek and co’s approach in trying to remove the ambiguity of OPML, and tighten up the semantics in the process. But just stand back for a moment. Assuming you’ve decided you’re going to use a hierarchical model, and design your own XML format for it, why not build one that fits the task more closely?

Look what happened elsewhere in the context of “Simple” RSS. The use of OPML for things like blogrolls appears to have been in part to Dave Winer’s almost religious obsession with Outliner apps, part due to what was easy for Userland to code in their products, part due to Dave’s promotion and the lack of any obvious alternative. As a channel list format I reckon OPML leaves a lot to be desired, but it mostly works. It works not because it’s a good format for the job, but because the environment in which it is deployed is extremely flexible. On a technical level protocols and formats like HTTP and HTML/XML allow a huge berth within in which functional components can operate, and to an often surprising extent interoperate, despite loose languages.

No doubt, XOXO is a nice way of expressing hierarchical documents. Not long before XOXO appeared, I’d personally reached a very similar conclusion on how best to do outliner markup. XHTML brings all the benefits of XML, convenient renderers (browsers), the marvels of CSS. The thing is though, what’s being expressed by Attention.xml isn’t a document aimed at direct human consumption, but a bunch of structured data destined for machine processing. Why complicate matters by wrapping it in a document format? I’d say it falls between two stools - it won’t be all that interesting/useful as a document in a browser no matter how prettily styled; it won’t be all that good as input for program control or reasoning as the language semantics aren’t well-defined. But like many other formats, it almost certainly is good enough on both counts.

The blogroll/channel list format isn’t the only place OPML has been applied, it’s had a lot of things thrown at it. The spec is a very thin layer on top of XML, and because of this can support a very wide range of applications. That doesn’t mean it’s necessary, or even a good idea for these purposes. But I’m afraid XOXO may be suffering from OPML envy.

It wouldn’t really be fair to criticise Attention.xml without offering a suggestion of how it might be better.

Ok, from the Attention.xml spec, here’s an example (tweaked a little to make the URIs clearer) of the per-feed part, first in OPML:

<outline text="atitle"
         description="descstr"
         url="http://aurl"
         atomurl="http://anatomurl"
         xmlurl="http://anrssurl"
         type="typestr" ...>

and in Attention.XML XOXO:
<li><a href="http://url" type="typestr">atitle</a>
    <dl><dt>description</dt><dd>descstr</dd>
        <dt>alturls</dt>
        <dd><ul><li><a href="http://anatomurl">atomurl</a></li>
                <li><a href="http://anrssurl">xmlurl</a></li></ul></dd>
    </dl>
</li>

Now looking at the XOXO version, what are all those HTML elements for? The same data is expressed more compactly and more clearly in the OPML version. The XOXO version does have the advantage of relatively clear semantics from HTML - <a href="…” is pretty well known, but there’s an awful lot of baggage. OPML is designed for outline documents, and that brings with it all kinds of bizarreness from the OPML spec - a required <head> and <body>, optional stuff like <windowBottom>, <vertScrollState>. Is this going to be useful for an Attention data format? I think not. Also, why is the element called <outline>?

So first pass, keeping the OPML structure but renaming the element to what it represents:

<feed text="atitle"
         description="descstr"
         url="http://aurl"
         atomurl="http://anatomurl"
         xmlurl="http://anrssurl" ...>

Now one thing the XOXO version does bring with it is a kind of separation of concerns, it restructures the syntax to reflect the intended domain semantics. But why all the HTML? Why not something like:
<feed url="http://aurl">
     <title>atitle</title>
     <alturls>
         <atomurl>http://anatomurl</atomurl>
          <xmlurl>http://anrssurl</xmlurl>
     </alturls>
</feed>

Now that’s the same data, with the structure reflecting semantics but without the cruft. The use of <atomurl> is a bit iffy (though the original OPML did use an attribute name), loosening the coupling there, as in the XOXO version probably is a good idea. So how about:
<feed url="http://aurl">
     <title>atitle</title>
     <alturls>
         <url href="http://anatomurl">atomurl</url>
         <url href="http://anrssurl">xmlurl</url>
     </alturls>
</feed>

Ok, that looks pretty clear. A nice simple format. But this data is designed to describe resources on the Web so it might make sense to use the existing resource description framework (you really weren’t expecting that, were you?). All little tidying first: the grouping of the alternate formats is also redundant once you introduce meaningful element names. It’s always worth considering reusing existing vocabularies, and RSS is nearby, so here you go:
<channel rdf:about="http://aurl"
      xmlns="http://purl.org/rss/1.0/"
      xmlns:atn="http://purl.org/stuff/attentionRDF"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
     <title>atitle</title>
     <link rdf:resource="http://anatomurl" atn:format="atomurl" />
     <link rdf:resource="http://anrssurl" atn:format="xmlurl" />
</channel>

Now if you compare that with the OPML version above, there is some extra complexity here - to the eye at least, comparable to the XOXO version. XOXO exploits XHTML to highlight the document structure and hence, indirectly, the semantics. The RDF/XML version expresses the information in an existing well-defined data model, reusing existing vocabularies with global disambiguation. Here’s a visual representation direct from the W3C’s RDF Validator (you need to check “RDF is NOT enclosed in <RDF>…</RDF> tags"). This is still a hierarchical structure, but now if you need to drop out of the tree, put in bidirectional link or whatever, you can without any redefinition. Should you wish to add material from other vocabularies you can do this in a semantically richer fashion that the HTML rel attribute allows. Why settle for lower-case semantics when for no extra cost you can put your data on the (upper-case) Semantic Web?

In conclusion then I’d say that the Technoratists are on to something with Attention.xml (and it’s really nice to see the bits they’ve implemented/made public so far). I think they’ve probably identified most if not all of the pieces of interest. I believe they’ve been wise avoiding the anti-interop that is OPML, and their use of XHTML outlines in XOXO is ingenious. But in the process of putting the data in XOXO they’ve lost the benefits of a custom XML language without gaining many of the advantages of a general-purpose data language (this deficit can be overcome, either at producer or consumer simply by using a mapping as in GRDDL). I doubt whether this will be an issue in practice, as the format is capable of expressing what is required. The adoption of other formats in the syndication space has had a lot more to do with marketing, blog-hype and perceived lack of alternatives than any technical merits.

Having said all that, it would be nice to have a general mapping from XOXO to RDF. So if anyone now feels that they no longer need Steve Gillmor’s $80 report, sling the dollars my way and I’ll sort out some XSLT ;-)

PS. Dare Obasanjo’s report highlights the question of the problem statement, and how much Attention.xml can help (along with some other good points).

10 Comments

  1. Great to get your perspective on this.

    In general, I think Attention.xml reflects a side effect of XML hype, which is the assumption that this application should be developed out of XML (or even: it makes sense because it’s developed out of XML), which leads to, as you say, a “leak between format and model". The development effort gets redirected into making the thing interesting XML, rather than exploring the thing itself.

    To me, the antidote to this is to simulataneously develop/test the model in multiple formats. “Attention” would benefit from being conceived of as RDBMS, RDF, OPML, and XOXO. (But, this approach makes development go a lot slower, for me at least.)

    Comment by Jay Fienberg — 2004-12-25 @ 23:15

  2. Yep, good point. I’m a bit conflicted because I really like XML ;-)

    I think it was when RSS 2.0 was being rushed out against a lot of resistance from folks wanting to unite 0.9x and 1.0, Aaron Swartz came up with the plain text RSS 3.0 - his intro really cut through a lot of hype & doubletalk.

    Comment by Danny — 2004-12-26 @ 00:13

  3. The XOXO version does have the advantage of relatively clear semantics from HTML…

    …but abuse of <dt>/<dd> for things which are not definitions of terms breaks all but the most superficial of HTML semantics.

    E.g. somebody does a view source on one of these files, sees “alturls", doesn’t have the faintest clue what that’s about and does a Google “define: alturls” search getting back reams of references to Attention.xml documents containing text which is not a definition of alturls.

    Comment by Ed Davies — 2004-12-26 @ 11:17

  4. I like Aaron’s intro to RSS 3.0, but my issues with XML are really different than his. I’m a long-time fan of SGML, and in the ways that XML is a simpler/cleaner SGML, of XML.

    For mixed-markup, I love SGML/XML. And, otherwise, for “databases” (i.e., data storage / field deliniation formats), I think there are a few things one can do with SGML/XML that are important and missing from RDBMS.

    But, I think taking a concept of information structure and translating it directly into an XML data schema tends to result in a design more expressive of XML’s bells and whistles than of the actual concept of information structure. And, there is also the “thinking in XML” syndrome, where the concept is only imagined in terms of XML.

    (All this, of course, also reflect my bias towards relational data structures, rather than hierarchical ones.)

    In cases like Attention.xml or the RSS Media namespace, I think folks are first imagining very specifical applications where a tree structure seems to make a lot of sense (in terms of application processing, especially using OO techniques), and then, as a second thought, trying to make that application match a broader conceptual model (e.g., that people can buy into even outside of the specific application). And, that’s certainly one way to get things done!

    Comment by Jay Fienberg — 2004-12-26 @ 20:47

  5. […] 004
    Some Opinions on the Attention.xml Specification
    Thanks to Danny Ayers post entitled Attention, Attention.xml I finally found a link to the

    Pingback by Dare Obasanjo aka Carnage4Life - Some Opinions on the Attention.xml Specification — 2004-12-27 @ 18:38

  6. Wow, thanks for a pointer to the attention.xml spec. This looks like SIAM++ (http://www.25hoursaday.com/draft-obasanjo-siam-01.html). I fail to see how an XML serialization of the internal state of an aggregator solves the issues raised in the problem statement.

    You do a decent job of criticizing the syntax, I actually would go one further and ask exactly how the information in an attention.xml document is supposed to solve the problems in the problem statement.

    Comment by Dare Obasanjo — 2004-12-27 @ 18:39

  7. Dare - yep, it’s very like your SIAM spec, in fact I think they cite it somewhere. I agree entirely about the problem statement & solution description leaving a lot to be desired, too much is left to the imagination. I guess if they got that sorted out, it would be easier to see how well the format fulfils what’s required of it.

    If you get to see Gillmor’s report in the corporation library, please blog it ;-)

    Comment by Danny — 2004-12-27 @ 19:44

  8. Danny, good comments on the syntax. I also think the idea behind
    spec is very good. But syntax aside, I think they provide
    too little means of rating and classifying content. For example
    at channel level I just see ‘xfn’ and ‘votelink’ attributes.
    Ans there is no way at all to classify/rate groups and feeds locally. For instance I would like to see items from feed A before items from feed B (when using view mixing both).

    Comment by Vadim Zaliva — 2004-12-27 @ 19:48

  9. PS. Dare - I just read your post trackbacked above, yep, spot on.

    Comment by Danny — 2004-12-27 @ 19:49

  10. Vadim - right. Not an easy problem, trying to come up with a good way of expressing such information in a generally-usable fashion. As you suggest, I don’t think ‘xfn’ and ‘votelink’ do much to help ;-)

    Comment by Danny — 2004-12-27 @ 20:07

RSS feed for comments on this post.

Leave a comment

Sorry, the comment form is closed at this time.

horizontal line

Powered by WordPress