Content Syndication with XML and RSS

Ben's Main Blog | Hypermedia | Gbloogle | Email Me |

RSS 1.0 Feed | RSS 0.92 & 2.0 Feed | Comments Feed | Site Archive | RSS Resources

Yahoo Financial News feeds

Jeremy Zawodny announces:

"Got a stock ticker for which you'd like to have an RSS news feed? Help test the beta RSS feeds we've put up o Yahoo Finance. Take your favorite ticker, say YHOO, and put this URL in your news aggregator:

http://rss.finance.yahoo.com/rss/get?ticker=YHOO

The pattern is pretty simple.

This hasn't been tested much yet, so I'd appreciate some feedback. There will likely be some outages as I tweak and adjust things. Also, if it works (or not) for you, it'd be good to know which aggregator you use. So far I know that NetNewsWire and AmphetaDesk like it."


FOAF to vCARD returns!

The FOAF to vCard widget is vaguely working, but with a few problems. Firstly, I made the mistake of using XML::Simple - which means, afaik, you have to define each field by a definite path from the root node. In other words, you have lines like:
if ($parsed_foaf->{'foaf:Person'}->{'foaf:mbox'}) {print VCARD "EMAIL;INTERNET:$parsed_foaf->{'foaf:Person'}->{'foaf:mbox'}\n";};

Which will not work if you have any other attributes in there (say <foaf:Person id="">, or you have more than one foaf:mbox entry.) So, today, I'm working on using a pure RDF parser to get the good stuff out. Weapons of choice - RDFStore.

L.M.Orchard's Aggregator Wishlist

Main: AmphetaOutlinesWishList is interesting reading, and being a Wiki, good to add to too.

Bring on the Porn

As I mention over on the other blog, I'm interviewing some pornographers tomorrow. It occurs to me at this late hour that if ever there were an industry with the need for RSS and RDF, it's the online-porn one. Does anyone know if this is already going on?

FOAF to vCard

It's so alpha as to be almost pitiful, but this widget may well give you a vCard file from a FOAF file. You can then import the vCard into Outlook, OSX AddressBook and any other vCard compatible app. I'm putting this up now so I can go and get my tea. Tell me what breaks, ok?






Don't have a FOAF file? Leigh Dodd's Foaf-a-matic is your baby.

UPDATE Well, first feedback is that it's not namespace aware, so if you're using anything but foaf: as your prefixes, I'm buggered. Working on it.

More Site Upgrades

I'm adding a lot of new metadata to this site's RSS1.0 feed. There's <content:encoded>, as per Ziv Caspi's request, and a load of new FOAF and DC stuff. I'm adding as I go this afternoon, so bear with me.

Semantic Linking

In the comments section of this posting yesterday, there is much talk of linking to other RDF documents.

"I would love to know..

b) how to link to my FOAF file from my RSS feed. mod_link seems like the right direction, but I can't make heads or tails of it.

(says Mark Pilgrim)

For (b), the thing that occurs to me immediately is (using my feed as an example):


...

(says DJ Adams)

This strikes me as fundamentally very important to RSS. If we can start talking about linking to, and using, other RDF vocabularies, we suddenly start to be able to make all sorts of interesting applications. For example, if my RSS 1.0 feed contains <dc:creator rdf:resource="http://www.benhammersley.com/benfoaf.rdf" />, a semantically aware application could follow that link to my FOAF file, and find this section:

<foaf:knows>
<foaf:Person>
<foaf:name>Mark Pilgrim</foaf:name>
<foaf:title>Mr</foaf:title>
<foaf:firstName>Mark</foaf:firstName> <foaf:surname>Pilgrim</foaf:surname>
<rdfs:seeAlso rdf:resource="http://diveintomark.org/public/foaf.rdf"/>
</foaf:Person>
</foaf:knows>

and then poddle off to read Mark's foaf file. Within which there is a line that says:

<foaf:homepage rdf:resource="http://diveintomark.org/"/>

Whereupon, retrieving that resource, we find:

<link rel="alternate" type="application/rss+xml" title="RSS" href="http://diveintomark.org/xml/rss.xml" />

Which tells us where the RSS representation of Mark's homepage is. So, just by following one strand of links we can start to make all sorts of interesting statements.We know all about the content on my site, we know all about me, we know I know Mark, we know all about Mark, and we know what Mark is talking about. There are many interesting applications to be made from this, not least from just tracking digital social networks and the topics they are thinking about.

There's more thinking to be done here, but it does all start to make some sense, no?

The Vision Thing

Michael Sippey is asking an open question:

"While Dave Winer and his merry band of monsters battle over the future of "Really Simple Syndication," I'm left wondering what the hubbub is all about. So, RSS can be used to syndicate and aggregate news items. Great. It's done that fairly reliably since version 0.91. But what's next? If developers are expected to care enough about RSS to build compelling applications that either consume or produce it, then I would argue there need to be at least a few use cases that go beyond "read news story....In short, what's the vision?"

It's a good conversation already. Take a look.

DJ's thoughts

DJ Adams is up late too...he's quoting a very good point from Ken Macleod about working with namespaces, which bears repeating here:

"A very key point (I think) drawn out in this article is that namespaces are used only to derive a (URI+localname) pair -- namespaces should never be considered seperate from the element name they specify. ... A namespace and localname make a single item of data, distinct from any other combination of namespace and localname.

Libraries and applications (tools) should not try to store a namespace as one "object" and try to link all of the names as "children" of those objects. So, if you're working in a language that's string-happy, like Tcl or Perl, the first thing you should do is take the namespace and element name and put them together and use them like that from then on, "{URI}LocalName" works well in Perl, for example."

This approach, of course, would prevent all the problem found by other methods if someone (entirely legally) chooses a different namespace prefix for an RSS 1.0 module. If your parser goes by dc: and dc: only for Dublin Core, it will break if I use xmlns:dublin="URI OF DUBLIN CORE" in my feed. It shouldn't.

Friend of a Friend

Mark Pilgrim is discovering FOAF - the "Friend of a Friend" RDF vocabulary, and with it pictures of me at my wedding.

For those of you who have never seen it, FOAF is especially good at describing digital social networks. How people know each other can be very interesting data. Also interesting is FOAF's use of the <rdfs:seeAlso> element, which allows FOAF files to lead onto others. Here's my FOAF file for an example.

Rhetorical question: I wonder if this element could be used within RSS?

Trillian Pro gets RSS

Being a cool kid with a OS X fetish, I'd lost track of the marvellous Windows-only IM Client Trillian. Well, it turns out they've launched a Pro version, with pluginable RSS reader. Super.

Mark Pilgrim surveys aggregators

Numbers just in! According to Mark Pilgrim's site statistics, NetNewsWire is impressively in front in the aggregator popularity stakes:

"NetNewsWire is the most popular aggregator among my readers, with 45%. Radio UserLand is a close second, with 38%. Amphetadesk, Aggie, and Straw are a distant third, fourth, and fifth. These results have been consistent over the past week, when I first noticed that NetNewsWire had taken over the #1 spot. This is particularly amazing, given that NetNewsWire only runs on Mac OS X. Way to go, Brent."

Excuse our dust

Excuse me today, people. I'm fiddling with things, and it all might break...
Update: All done. Much prettier. More accessible. More compliant. Mmm

NetNewsWire Lite 1.0

My weapon of choice this week, NetNewsWire Lite has gone version 1.0. Brent Simmons, the author, is committed to keeping the Lite version free, and has badass plans for a pay Pro version.

RSS 2.0, and a possible namechange

Well, this has been an exciting week.

Dave Winer has declared his RSS 2.0 to be complete. It has various changes to the RSS 0.92 specificiation, but I'm still going through them for inclusion in my book. One interesting thing to note, is the copyright message at the bottom of the spec:

Copyright 1997-2002 UserLand Software. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and these paragraphs are included on all such copies and derivative works.

This document may not be modified in any way, such as by removing the copyright notice or references to UserLand or other organizations. Further, while these copyright restrictions apply to the written RSS specification, no claim of ownership is made by UserLand to the format it describes. Any party may, for commercial or non-commercial purposes, implement this format without royalty or license fee to UserLand. The limited permissions granted herein are perpetual and will not be revoked by UserLand or its successors or assigns.

Which means, as far as I can read it, that tutorial documents based on the specification must credit Userland with the copyright ownership of the original RSS 2.0 specification document, but Userland do not claim ownership of the format itself. This strikes me as a bit odd: I cannot write about the RSS 2.0 specification without deriving something from the specification document (I fail to see how one could), and hence having to credit Userland, yet Userland do not claim ownership of the format. Curious.
UPDATE: This might not be an issue at all. Read the comments for more.

Meanwhile, the RSS-Dev group is debating changing the name of RSS 1.0, and building the specification into a robust, and completely open specification. As Joseph Reagle said

"I *think* I would like to see a W3C[*] Recommendation (and interop service, guidelines/tools, etc.), of unspecified name, that satisfies the requirement of a very tightly specified XML language and RDF schema for syndication that
(1) has a RDF data model/schema,
(2) easily comprehended by those not familiar with RDF,
(3) and has some bridge between it's syntax and the data model (e.g.,
a RDF parsable but simpler syntax as proposed on this list (a really short bridge!), XML syntax with RDF schema annotations, or some translation...)

In my own mind, I call this "SYN" to represent the state from which this version would arise, but then prosper . That's just my current thought though, as I continue to follow the discussions with some interest and some chagrin."

May we all live in interesting times.

blink blink

Mark Pilgrim to make us all <blink>?
nice idea, actually.

Commenting Issues *FIXED*

I'm having issues with the commenting feature, and without my coffee this morning it is frankly not going to get done. Bear with me.

In the mean time, I've had various comments emailed to me. Click on the MORE link below to read the wisdom...

UPDATE: I've hacked a fix to the commenting issues. It's only a duct-tape-fix, as I'm going to be doing something a bit nicer with this blog's layout as soon as the spare time presents itself.

CLICK HERE TO READ MORE...

Precedence Order

Another question for everyone I guess. In both RSS 1.0 and the Userland proposal for RSS 2.0, you can have duplicate semantics. For example:


<title>TITLE<title>
<dc:title>ANOTHER TITLE<dc:title>

Which of these takes precedence? This is mostly an issue with RSS 2.0, as there is a lot of overlap between the core elements and available modules. (and RSS 1.0 has only one possible overlap, the title/dc:title one.) This is one reason why I'd like to see the core elements become fewer in number.

Shelley's 2.0 proposal, redux

In an utterly blatant attempt at editorialising, I'd like to point out Shelley Power's idea once again. It's valid RDF, built around a RSS 0.9x core, with namespaces. It's simple, elegant, and seems to have been passed over.


<?xml version="1.0"?>

<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns="http://purl.org/rss/1.0/">

<channel rdf:about="">

<title></title>
<link></link>
<description></description>

<item>
<rdf:Description rdf:about="">
<link></link>
<title></title>
<description></description>
</rdf:Description>
</item>

</channel>

</rdf:RDF>

One extra line of simple markup per item, and we have complete RDF. Not much a tax on the user, and a great potential gain over 0.9x.

I'd love to see it combined with Mark Pilgrim's ideas regarding simplifying the core spec, and the use of pre-existing namespaces. viz:

"Full HTML posts are included with mod_content. Dates (including item-level dates) are included with Dublin Core. Contact information is included with mod_admin. Information about how often aggregators should poll is included with mod_syndication."

That way we really can freeze the core spec and get on with things. What say you all?

Offnews

Offnews - Kevin Burton's new creation.

Bloglet RSS

Bloglet now supports RSS. The original Bloglet service allowed site owners to offer a service to their readers to give postings-by-email. The new RSS-powered system makes it automatic, with no site-owner intervention needed.

Note, the sites says: "You must support item level publication dates. This is the pubDate tag for v0.92 and dc:date for v1.0. (Someday I hope to implement something that doesn't require this, because like 90% of RSS feeds don't have an item-level date. Doh!)"

XUL bindings tutorial

It's been a busy weekend, and I've not really got time to explain it further, but if you're thinking about RDF, or thinking about developing newsreaders for RSS, then have a look at this XUL bindings tutorial. (ta to DanBri for the link)

Building an RSS newsreader in Python

A new tutorial: The EffNews Project: Building an RSS Newsreader, in Python

RDFTP

Just posted to the RDF-Interest Group:

"Rdftp is an RDF server that supports Query and Update operations on RDF content. The purpose of rdftp is to be an experimental implementation of an RDF Transmission Protocol. The rdftp server stores RDF content both in a relational database as RDF triples and in a hierarchical repository of RDF graphs/models, residing in the server’s filesystem as RDF files.

The rdftp server is implemented as a PHP script and runs on top of any HTTP server that supports PHP. Both Query and Update operations can be called using
either HTTP GET or HTTP POST. rdftp also provides a simple HTML interface for “Navigating” through the Query results that are presented as Triples in a tabular form.

rdftp stores RDF data in a relational database as triples using a single-table approach. Apart from the triple itself, the table also contains information about the source model /graph of the triple.

This is the initial version (v. 0.5) of rdftp RDF Server. rdftp is available
as Free Software. Documentation and user guide is included in the
Download (http://www.semanticweb.gr/downloads/rdftp.zip)."


Gosh. That does look sweet.

Blognet

Kevin Burton's Blog.net - " Blognet is a P2P network based on the existing weblog infrastructure, RSS 1.0, mod_subscription, mod_link, Web Services Description Language (WSDL) and Simple Object Access Protocol (SOAP)."

Straw

Straw - The GPL'd desktop news aggregator for Gnome 2. (tipster: Aaron)

GUIDs

Kevin Burton on GUIDs - required reading for RSS 0.9x, 1.0 and 2.0 developers. Very good work.

Repeating Elements

Over at the RSS-Dev mailing list, the RSS-Dev Working Group (of which, I am a member) is voting on whether to allow repeating elements within RSS 1.0 modules. This means, in line with the Dublin Core recommendation, we can say::

<dc:subject>foo</dc:subject>
<dc:subject>bar</dc:subject>

Instead of:

<dc:subject>foo, bar</dc:subject>


UPDATE: ooh no. The vote has been postponed for further discussion. Cool.

More RSS 2.0 questions

Sam Ruby is making some good points about the Userland proposal:

"We still have an opportunity to heal some of the rift between the two branches - simply by embracing instead of displacing one or more of the existing modules. Or, at a minimum, placing the new items in a namespace separate from the core so that they start out life with an equal status. Beyond that wishes include modularizing the spec. Have a core that is Really Simple. Have a module for ScriptingNews value add. Have a legacy module for things that nobody is quite sure why they are there any more, but we can't bear to part with. And perhaps the confusing docs element and the version attribute (that as near as I can tell nobody checks anyway) can be put into that legacy module, having been replaced by xmlns."

It's a good point. With namespaces now introduced, why not be really radical and strip out all the unused elements into namespaces? skipHours? GUID? (which is non-standard anyway) Version? docs? Why are these in the core spec, if namespaces are allowed, and simplicity is the thing we're going for?

Questions for Dave

In the comments to Entry below, Dave Winer says:

"To everyone, I believe the new spec is leaps and bounds better than any RSS spec we've had to date. It's clear, easy to understand, prioritized. It's certainly not the final word in syndication specs. I really hope people leave RSS alone, let it be. It's been the most screwed-with spec ever. Go invent new stuff, if you break through, we'll all applaud you. But RSS is what it is. Imperfect. Popular. Ill-conceived. "

He also asks me to specifically ask the questions I have about his proposed 2.0 specification. Here's mine, and you should all feel free to add your own to the comments section below...

1. Namespaces: RSS 1.0 utilises namespaces specifically for the importation of other RDF vocabularies and ontologies. Being RDF, these already have a consistent syntax and structure. What syntax and structure is envisaged for Userland's 2.0 spec modules?

2. There are other suggestions for the data model of RSS 2.0, which would allow for a simplified RDF syntax to be included. There has been no mention of these on Scripting News, yet they appear to be gaining a great deal of support with the RSS community. Why are you declaring 2.0 golden, seemingly without taking a major proposal into consideration, publically and with reasoning?

3. The Userland specification document is in a non-standard form. If it were to be rewritten in other style - RFC style, for example - would Userland point to it? Support it? Acknowledge its existence?

I have more, but let's start with those...

Winer to declare 2.0 spec complete tomorrow

According to this Scripting News entry, Dave Winer will tomorrow remove the caveat from his proposal for RSS 2.0 thereby declaring it the normative spec. He goes on:

"Before then, if you see any problems (not mega-problems, please), let me know. But it's not "speak now or forever hold your peace." In the Roadmap section I call for a 2.0.1 and 2.0.2 and even a 2.0.3 would be reasonable, to clarify and correct."

Problems, yes. I see some, and thus am speaking before the deadline.

We'll start with the first. We're not done yet. The userland spec is no where near anything being mooted as 2.0 in RSS-Dev , or Syndication groups. Unilaterally declaring it to be the 'official' spec, without even attempting to gain any form of approval from the community at large - which is what removing the caveat effectively does - is both exclusionary and confusing for the end-user.

Secondly, there are many technical questions yet to answered by the Userland proposal. Questions on namespace management. On description of extensions to newsreaders. On the data model itself. On its very name. All these questions need to be answered in a considered way, and blasting a spec through is not the way to do it.

DanBri

Dan Brickley has written, imho, an excellent posting on the value of RDF with RSS. All those bothered by this, should read it.

Sean Palmer's Approach

Following on from Shelley's ideas of yesterday, Sean Palmer has two examples of potential RSS 2.0 code. They are both valid RDF, note, but show a distinct 0.9x-ish simplicity.

Kevin Burton on keeping RDF

Kevin Burton is also very much in favour of RDF The End of RSS Innocence

Shelly Rocks Tough

Shelly Powers is on fire today with two very good blog postings on the current new RSS Myths about RDF/RSS and RSS - Proof is in the implementation.

AaronSW stakes out RSS 3.0

We're moving fast now,people. AaronSW's The Road to RSS 3.0 is a marvellous document.

RDF Core model

The W3C today released a new draft of the Core RDF concepts document. Essential reading for anyone who wants to claim to talk about RSS 1.0 with any authority or relevance.

My Shopping List

Ok, if we're going to go full steam ahead in RSS 2.0 (and I'm not sure that's been decided in the right way yet) - I have a personal shopping list of features I'd like to see in there.

1. An RFC-style specification document. Prefaced by a Userland-style foreword, and complete with code examples.

2. A continual thought given to those who wish to transform an RSS 2.0 document into an RDF syntax. This means the RDF people have to work together with the RSS 2.0 people to help the RSS 2.0 spec be easily converted.

3. The RSS 2.0 people have to meet the RDF people half way - and especially the RSS 1.0 module writers. This means that while the RSS 1.0 modules may need a little retooling, they are the official starter modules for RSS 2.0. If I see even a smidgen of a inkling of an argument that starts "but you can already do that in mod_xxx," I may lose my temper. (CLARIFICATION: In that RSS 2.0 people need to look at all the 1.0 modules before suggesting additional elements. we've spent two years making modules, and this work should NOT be lost. There is already MUCH good in a great deal of the RDF stuff. Respect that.)

4. The steering of RSS 2.0 is done by committee. Being entirely biased, I'd say the RSS-DEV Working Group is a good start, if it was to have a few other people elected to it - Dave Winer, for example.

5. As a gesture of good will, and an olive branch to move on, the canonical specification, and the RSS namespace URI are to be PURLs and not directly hosted at any commercial operation. The current RSS 1.0 CVS is also v.good. Why change it?

6. A DTD, an XML-Schema, and RDF-Schemas (or suchlike) decribing the core spec, and all the modules.

7. World peace, or close to it.

A thousand flowers

Rael Dornfest has posted an essay in support of an RSS 2.0 - with namespaces, but dropping RDF. (or, at least, relegating it to a module).

Dave Winer goes even further, and posts a roadmap to 2.0.

And funnily enough, I have mocked up an RSS 2.0 feed for this site. Not spec-compliant, as there ain't no spec yet.

As a footnote, and mostly to show off my knowledge of post-revolutionary Chinese internal politics, it must be noted that the phrase "Let a thousand flowers bloom" is a misquote of Chairman Mao's "Let a hundred flowers bloom, let one hundred schools of thought contend." It's perhaps not a good idea to keep using it...

Mao first used the phrase in 1958, asking for the intelligentsia of China to suggest improvements and constructively criticise the regime.

Whether he always planned to do so, or if it just broke in his mind, we may never know - but after a few of the flowers started to bloom, Mao really didn't like being told he was wrong, and soon it became apparent that the Hundred Flowers movement was really just an exercise to see who didn't support him. Many people died, or were exiled, and the resulting cracks in society led to the Cultural Revolution, and the resulting destruction of much of China's ancient cultural artefacts, and the deaths of millions.

And so, using a phrase previously employed by a totalitarian dictator to weedle out and destroy all opposition, is perhaps a little dodgy in these circumstances.

Chat in RSS 0.94

Dave Winer is talking about adding a Instant Messaging contact element for the webmaster of an RSS 0.94 feed. He's thinking about leaving it for this version, due to it looking too involved.

I think it's still a goer though - and a good idea at that - if you use URIs as the contact value. Say the element is called <authorContact> one could have:

<authorContact>mailto:ben@benhammersley.com<authorContact>

but you can also use the aim: URI scheme or the Jabber URI scheme to produce:

<authorContact>jabber:bhammersley@jabber.org?message<authorContact>

or

<authorContact>aim:goim?screenname=benxhammersley<authorContact>

or to point to a web-based form or whatever. That's my suggestion for the day...

Tokenized Content

Over at RSS-DEV, David Galbraith - who probably handles more RSS in a day than most will ever see in a lifetime, comes up with this suggestion:

"From my perspective the main reasons for having content in the RSS is to allow aggregated RSS items by topic, based upon a search of the full text of articles. RSS is about having metadata which gets you to original content
and sometimes this includes the content itself. E.G. you want all news items
about Iraq even if the headline does not mention it: 'Bush ready for war'.
This was one of the main reasons for me pushing for RSS 1.0.

You are right that publishers are reluctant to syndicate full content for
the obvious reason that it invites problems with readership and/or
copyright. There is, however, a possible solution to this using a tokenized
version of the content. In other words a version of the content with stop
words such as 'the' and 'and' removed, is fine for machines to read and
index, but not for humans to read and understand. In addition the process of
tokenization is largely a one-way function in that it is tricky to write
software to accurately replace the stop words once they have been removed,
so the original content is not at risk.

To summarize: a mod_tokenized_content module would be great, particularly if
some of the auto RSS generators tokenized on the fly. This would be very
publisher friendly in addition to being useful for RSS users."

RSS 1.0 on Radio Userland

Rejoice, you Radio Userland users, for you too can join in the Wonderful World of RSS 1.0. Bill Kearney has this to say:


All this talk about new formats for Radio made me wonder what it would take for
Radio8 to properly support RSS-1.0. So I wrote a tool for Radio that does just that.
http://www.ideaspace.net/users/wkearney/misc/radio/radio8/rdf_0.1.zip

Unzip is and put it in the Tools folder of your Radio install.

What it will do is tag along behind the existing RSS file writing routines and
create the same output in RSS-1.0 format. It simply takes the existing RSS-0.92
format filename and tacks an ".rdf" onto the end of it. All of Radio's normal
upstreaming methods can take over from there.

It makes no modifications to any of Radio's existing file writing or other
features. It's got no read UI. It can be enabled/disabled using the Radio8
tools menu. Uninstalling it is as simple as removing the rdf.root file from
your Tools directory.

So for anyone using Radio that would like to use a standards-support format,
this might be worth giving a try.

This is free of charge, I retain no copyright to it's code. I wouldn't mind a
mention if you decide to build on it.

Bill is a very nice guy, incidentally. You can even hear his voice.

RSS trademarked?

Morbus has found that Userland filed for a trademark on the word "RSS", with unfortunate timing, it would seem. The mark is currently abandoned, however.

RSS 0.94 on a mailing list

More support for a proper mailing list discussion of RSS 0.94 comes from Eric Vitiello:

"The converstion method for the development of the specification is inappropriate. Analogy: Try to create a document that requires the input of 10 or more people. Do it via telephone, without conference calling. You'll become quite frustrated. This is exactly what is happening. Too may two-person conversations, and no good way to keep track of them all.If I don't read all of the websites of the people interested in this, I will miss their comments. A mailing list would solve this problem."

Even More RSS 0.94

Erik Thauvin talks about preventing Spammers harvesting email addresses from RSS feeds:

Most of us purposely employ various methods to cloak email addresses on web pages, while they are left wide open on all syndicated feeds. How long will it be before the email extractors figure out a way to collect addresses from our RSS-based feeds? They might even do so already. The one thing we surely don't need is more spam. Moreover the current email address format [email@host.com (Full Name)] is non-standard. RSS 0.9.4 needs to utilize common Internet standards, not create unsupported ones.

More RSS 0.94 suggestions

More RSS 0.94 suggestions are coming in:

Mike Kruz says:

I'd suggest a few things:

  • add "id" element to channel, with an optional "domain" attribute like categories (so a feed can say "my id here is X, my id on Syndic8 is Y"...).
    This would help track duplicates since feeds http://foo.com/index.xml and http://www.foo.com/index.xml would have the same id (since their are indeed the same files )
  • add a optional "category" element to the channel, like the one for items. This element may appear several times, if the channel is in several categories within the same domain or within several domains.

  • add a optional "generator" element to the channel, identifying the toolkit that was used to produce the feed



Meanwhile, Ziv Caspi says:

My main problem is with the <guid> element, but there are a few others as well.



  1. The name is somewhat misleading. To many developers, GUID is more than "globally-unique ID". It is a fixed-width number that has certain rules when formatted as text. The <guid> element in RSS 0.94 should really be <uri>, or something similar.



  2. The <guid> may have an optional attribute, isPermaLink. This smells like a hack to me. Why not just add an optional <permalink> element? Is there some hidden cost with it? As currently specified, weblogs that want to provide globally-unique IDs in non-URL format (perhaps GUIDs...) cannot at the same time provide a permanent link. This breaks orthogonality, without any apparent advantage that I can see.


  3. I think the <skipDays> and <skipHours> are useless. A RSS feed publisher has an interest in providing rough update frequency for aggregators. However, controlling when aggregators retrieve data seems pointless -- what if an aggregator goes online only at specific times which fall in this "window of inaccessibility"? In order not to break compatibility, I suggest both would be declared deprecated. Most aggregators I've come across don't honor these elements anyway.



  4. The default and maximum height and width of an <image> should really not exist. The default means absolutely nothing. The maximum is even more damaging (for the same reason that limiting the <description> size is damaging).


  5. If there's some logic behind what goes into an element and what gets an attribute, I failed to see it. Why not have everything an element and be done with it?

How to discuss a specification

Julian Bond's comments below have started a nice thread in the Syndication Mail List, concentrating on the way Dave Winer is orchestrating the discussions over his proposal for RSS 0.94.

Dave himself points to His essay on 'stop energy', and Morbus Iff, replies here with

...there is no single place for locating the effort, since the discussion is spread out against multiple sites, historical researching and central repository goes out the window. Instead, we have an omnipotent librarian who controls the information archived by linking to only that which he deems important. Censorship, Revisionism, etc, what have you."

OCS 0.5

Open Content Syndication Directory Format (Version 0.5) has just been released by Ian Davis.

" The Open Content Directory Format is intended to provide a concise, machine readable-listing of a set of syndicated services. The directory format is capable of supporting multiple sites, each with multiple services. Each service can have muliple formats such as RSS (RDF Site Summary), XHTML, Plain Text, Avantgo or WML format as well as separate publishing schedules or languages."

Bond on RSS 0.94

Julian Bond posts to the Syndication mailing list about RSS 0.94


I find it extraordinarily sad that the RSS 0.94 development effort is
happening across multiple weblogs and multiple comment threads and not
here. So having got that off my chest, here's my two pennyworth.

I'd like to see:-

- The 0.94 spec re-written in the style of the early RFCs, like rfc822
for instance. These are models of clear, concise, accurate text that
would be worth emulating.

- A deliberate effort to remove all ambiguity. There are a number of
places in the proposed spec where language is used that is very open to
interpretation. For instance, my favourite; "<link> is the URL of the
story." in the <item> description.

- Notes on elements that are deprecated even though they are left in for
compatibility.

- A more complete sample and sample snippets that illustrate the use of
every element.

- Some commentary on common usage.

- The removal of the restriction on using link types other than http and
ftp. In particular, I can't see any good reason for banning https and I
can think of (fairly bizarre!) applications where news and mailto would
be handy.

- A <category> tag on channel that is similar to <item><category>

- A DTD and a set of recommendations of how to reference the DTD that
can handle foreign character sets without breaking validating parsers.

- A note deprecating the use of entity-encoded html in any element
except <item><description>

And finally, and somewhat contentiously,

- The removal of the copyright notice. I cannot think of any good reason
for a corporate copyright disclaimer on a standards specification.

UPDATE: Dave answers this last point here

BBC RSS feeds

matt jones brings us news that the BBC is now public betaing some RSS feeds. His site has the URLs...

Mozilla Breakage

It seems this site now breaks Mozilla. Which is really annoying, as I've not changed anything significant. Look at the source, and tell me if you can see why...
UPDATE: Fixed! DJ Adams is my new hero.

Architectural Principles of the World Wide Web

Also newly released on Friday, the W3C's Architectural Principles of the World Wide Web. Good Reading.

New mod_link

Meanwhile, RSS 1.0 development is, if anything, getting faster. Kevin Burton's nifty new version of mod_link has just been released.

RSS 0.94

I go away for a few days, and look what happens: releases galore.

So, to catch up:

Dave Winer has, as promised, floated his proposal for RSS Version 0.94.. The ChangeLog details the differences between this and 0.92. (there is no 0.93, as that development stalled somewhat last year). In summary, there are three new elements, and a couple of usage changes, plus a proposal to change the name from a non-acronym to an acronym standing for "Really Simple Syndication".

One thing missed from the ChangeLog is that the <cloud> element can now be utilised over HTTP-POST. I think this is new, feel free to tell me I'm wrong.
UPDATE Yep, I was wrong. It's always been there.

Dave's roadmap also says;

"We also anticipate the adoption of namespace support at some point not too far down the road, so that developers can add modules and avoid possibly interfering with each others' work. At that time it would be a good idea to contemplate freezing this spec, allowing subsequent work to happen in modules, and in completely new syndication formats, with new names."

Talking of which, RSS 1.0 users can feel free to use the new RSS 0.94 elements - with a suggested namespace prefix of rss094 and an identifiying URI of http://backend.userland.com/rss.

Rich Equivalents

Jon Hanna has released a new RDF Site Summary 1.0 Module: Rich Equivalents

Some good thinking in there. and the spec document has embedded RDFS. Hurrah!

RSS 0.94 and roadmap

Dave Winer just posted a heads up:

"Heads-up. I'm doing my first post-surgery technical project, a merging of the RSS 0.91 spec with the 0.92 addenda, and documenting the new features in 0.94 and including a roadmap for evolution, all in one Web doc with a liberal copyright. Should have a draft ready later today."

The roadmap will be especially interesting, I know.

Sam Ruby on RSS naming

We've been talking a lot about why RSS 1.0 is called RSS. Sam Ruby has a question too.

Rock DJ

DJ Adams. The Man. The Machine. The Guy who just wrote:

"What I've ended up with is a Mozilla toolbar button that you can click while viewing a weblog that points to its own RSS feed. The button's link is to Javascript, adapted from Mark's
auto-subscribe bookmarklet. On discovering an RSS feed (and the title of the blog page), it then constructs an XSLT pipeline URL that Jon demonstrated last month."

There's More Here. DJ Rocks.

What's RSS got to do with it?

Dave Winer is asking the question, Why is RSS 1.0 named RSS?

XML in silicon

XML processing in silicon?
Does anyone have any first hand knowledge of this?

"Processing XML is radically different from switching and network protocol routing. While ordinary network infrastructures simply scan packet headers, XML-aware networks are capable of understanding, parsing, filtering and processing the XML content itself," said Ron Schmelzer, senior analyst at ZapThink, LLC, a firm focused on XML and Web services research, analysis and insight. "DataPower helps businesses of all sizes realize this value by balancing the needs of speed, security and reliability without sacrificing the flexibility and benefits of XML."

RDFS for RSS 1.0


Alright - this is better: RDFS for RSS 1.0
Take a look and tell me what you think, would you?

RMDSR 0.1 UPDATED

UPDATE: I've deprecated this proposal in favour of RDFS. I'll have some documentation tomorrow...

Evening all,

Can I ask you all to have a look at:

http://hacks.benhammersley.com/rss/rmdsr/

and tell me what you think. It's an idea of mine to solve one of the big problems within RSS 1.0:

"Currently RSS 1.0 has 21 modules available for inclusion, with more being invented both publically and for private internal use. Meanwhile, aggregators, and RSS readers are being developed for many platforms. Here in lies a problem: how do Reader authors keep up with the meanings of the different elements given with all the new modules? How does a desktop reader application display an element that didn't exist when the author wrote the program?"

Authors of reader programs are especially invited to join the discussion.

Bill's Big Plans

I just had a fantastically enojoyable conversation with Bill Kearney. He has plans, and I only wish I was at liberty to share them. But *oh*so*good*

Tipping Point Ahoy, my children, Tipping Point Ahoy. Give it, ooh, a couple of weeks.

SOAP Web Service RSS1.0 parser

A web-service for parsing RSS 1.0 - it returns an array, if you fancy getting all SOAPy.

NetNewsWire Lite

Brent Simmon's NetNewsWire Lite has hit a new version: 1.0b15. Many improvements for OSX-heads.

mod_image proposals

There are also now two proposals for a mod_image module for RSS 1.0. One from Kevin Burton, and the other from Jon Hanna. Discussion is ongoing.

mod_email

In a concerted effort to make my life harder, Chris Croome has released mod_email - designed to be used to represent email headers. Interesting.

Morbus on Semantic Web

My erstwhile co-blogger, Morbus Iff, is just to damn polite to link to this, but his Semantic Web 123 is rather good and linksome.

mod_creativecommons is now mod_cc

mod_creativecommons is now called, mod_cc. Yummy.

mod_creativecommons

Just posted to the necessary lists:

I've just uploaded mod_creativecommons to the RSS-DEV list's file repository:

http://groups.yahoo.com/group/rss-dev/files/Modules/Proposed/mod_CreativeCommons.html

It's dangerously first draft, and contains at least three mistakes. Help spotting them is remarkably welcome...

Note also also that I've not yet placed the references and acknowledgements section into the document. It's all their fault, of course, and they will be named in due course. All mistakes are my fault, however. One should never listen to the cricket when one is doing RSS stuff. Most distracting.

XML::RSS::Tools

Which reminds me. Adam Trickett, author of the XML::RSS::Tools module, emailed yesterday:

"Uploaded my latest ultra-high level RSS/RDF manipulation tool upto CPAN for
anyone with Perl to use. To do anything exciting you need your own XSL
sheets, but it seems to work okay with the example ones provided. This
version fixes some bugs of the previous versions, and some problems with the
underlying XML::RSS parser.

I've expanded the documentation, it includes some extra links including your
site, and I'm in the process of expanding things further. The next version
will have the examples documented in HTML, and other niceties."

Excellent! I'm playing with it now, and lo it is good.

Mod_CC? Burton says Aye!

The RSS 1.0 feeds on this site have had Creative Commons metadata for a few days now. Happily it doesn't seem to break anything, and now Kevin Burton has implemented it into one of his systems:

"I took a few minutes and included Creative Commons [2] RDF support within Bonita (an Alpha quality RSS reader/producer/aggregator for Mozilla I am working on).

In the long term both Reptile and Bonita (when they are working together) will
allow the exclusion of RSS/RDF content that is not licensed under acceptable
terms.

I think we will probably need to go ahead and make this an RSS module with
mod_cc documentation at least so that we all know the best way to implement
this."

I'm already on it...

PHP parsing of RSS

Jason of the The Trommetter Times has some nice PHP scripts for parsing RSS.

RSS for Pocket PC

PocketFeed is an RSS/RDF news aggregator that runs on the Pocket PC 2002 PDA. Mmmm wireless syncing, mmmm.

Syndic8 is One

Syndic8.com is one year old today! As Jeff Barr, its developer, says in an email to the Syndic8 community:
I would like to thank everyone for their participation -- feed submitters,
reviewers, evangelists, those who syndicate or scrape, and anyone else
that I missed. Thanks!

I certainly never envisioned having 929 users or 14,854 feeds after just
a year. Dealing with this growth has been a challenge and I have learned
a whole lot. I would not be surprised to see the database at 20,000 feeds
by the end of the year.

14,854 feeds! Well done everyone.

RDF mod_events

Libby Miller, w3c-rdf-calender lead, and Semantic Web Doyenne, posted this to various development groups earlier today:

I've been working on a little app which runs as a little (Java) server
on the desktop and which can download and display RSS+events files,
including encrypted ones if you care to give it your passphrase (but use
a test one - this is alpha software).
There are a bunch of UI issues - I was trying to see whether it would
work or not really.

http://swordfish.rdfweb.org/rdfquery/downloads.html#rdfcal
http://swordfish.rdfweb.org/rdfquery/rdfcal.tar.gz

it looks like (and uses the same servlet code as) the RSS+events demo I
did:

http://www.w3.org/2001/sw/Europe/events/view/

but making it a server means we can do secure stuff with gpg keys...

RSS class for cocoa

From the author of the very-nice-indeed NetNewsWire, comes an Open Source RSS Class for Cocoa. OS X people, go craaazy.

Creative Commons RDF

The Creative Commons lot have publically released their RDF specification. I'll be incorporating some of it into the RSS 1.0 feeds on this site today - and then we'll get onto arguing about it. :-)

UPDATE: I've included the cc rdf into the RSS 1.0 feeds for each individual entry. Have a look at the syntax. It's correct RDF, but I'm not sure it would degrade gracefully. Thoughts?

RDF events in Outlook.

I missed this when it was launched last October, but it's pretty cool and very relevant to RSS 1.0 - The Retsina Semantic Web Calendar Agent. It can take RDF event data and add it to MS Outlook calenders. Very useful.

Also found, mainly because I just got my new TiBook, is the OS X RDF Author - very nice even if all you want to do is look at pretty pictures.

Reading Online News

Reading Online News: A Comparison of Three Presentation Formats: Users were asked to locate specific information within news articles on three different layouts: full text (Full), link titles plus abstracts (Summary), or link titles only (Links).

Overall, there were no statistical differences in search time across the three presentation types. However, the Summary condition was perceived most positively in terms of ease of finding information, being visually pleasing, promoting comprehension, participants' satisfaction with the site, and looking professional. The Summary condition was also the most preferred. The Full condition was the least preferred, and had the most negative perceptions associated with it. The Full condition was perceived as being most difficult to find information, not promoting comprehension, not being visually pleasing, and not being satisfying.