Recently, there has been some discourse regarding the possibility of a feed:// URI scheme, which would be used for ...identifying data feeds used for syndicating news or other content from an information source such as a weblog or news website. In practice, such data feeds will most likely be XML documents containing a series of news items representing updated information from a particular news source.
To be perfectly frank, I think this is a terrible idea. And yes, I do have reasons. And yes, you are about to see them. I thought about titling this post 'feed:// URI scheme considered harmful', but those posts are always so damned self-righteous. I hope my tone is more approachable, because I would really like to hear from any of the feed:// supporters for rebuttal. Of course, nobody reads my blog, so I guess I can title it whatever I bloody well please.
Re-creates the problem it attempts to solve
The reason given for creating this new standard is that modern applications (eg, browsers) lack sufficient support for feed MIME types. This is entirely too true, but adding a feed scheme will only further confuse the issue. Arguing that we need a feed scheme because browsers don't support the MIME types is like arguing that we need a new way mark up documents on the web because XHTML-2.0 support is virtually non-existant. I know it sucks--I can't wait for XHTML-2.0 and CSS3--but you're just going to have to wait. Creating a new mark-up language would only split resources and extend the time it will take to adopt both.
Creates a protocol based on doctypes
The entire purpose is not to signal that a different transport protocol be used--in fact, feed:// really means use HTTP. Instead, feed:// is used to signal that a certain class of documents (feeds) are being requested. This is what MIME types are for. This is not what the URL scheme is for. This, in itself, is enough (IMHO) to kill the idea.
Limits transport layer
HTTP isn't the only transport protocol that could be used to carry a feed. Most noteable of these alternatives is HTTPS, but FTP, SFTP, Gopher, etc. could all be used. Standards should server to extend capabilities, not limit them. Howerver, the proposed feed scheme is only able to handle HTTP and HTTPS, however. Additionally, the mechanism for HTTPS is flawed... keep reading.
In order to allow for both HTTP and HTTPS, the proposal really creates three new schemes. Here is Section 3 of the proposal:
The following are examples of the "feed" URI scheme
feed:http://example.com/rss.xml - Identifies the RSS feed at 'http://example.com/rss.xml'
feed:https://example.com/rss.xml - Identifies the RSS feed at 'https://example.com/rss.xml'
feed://example.com/rss.xml - Identifies the RSS feed at 'http://example.com/rss.xml'
Notice that a colon is used to seperate 'feed' from the transport protocol. Now lets have a look at RFC 2396:
2.2. Reserved Characters
Many URI include components consisting of or delimited by, certain special characters. These characters are called "reserved", since their usage within the URI component is limited to their reserved purpose. If the data for a URI component would conflict with the reserved purpose, then the conflicting data must be escaped before forming the URI.
The "reserved" syntax class above refers to those characters that are allowed within a URI, but which may not be allowed within a particular component of the generic URI syntax; they are used as delimiters of the components described in Section 3.
3. URI Syntactic Components
The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows:
An absolute URI contains the name of the scheme being used (<scheme>) followed by a colon (":") and then a string (the <scheme-specific-part>) whose interpretation depends on the scheme.
3.1. Scheme Component
Just as there are many different methods of access to resources, there are a variety of schemes for identifying such resources. The URI syntax consists of a sequence of components separated by reserved characters, with the first component defining the semantics for the remainder of the URI string.
Scheme names consist of a sequence of characters beginning with a lower case letter and followed by any combination of lower case letters, digits, plus ("+"), period ("."), or hyphen ("-"). For resiliency, programs interpreting URI should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http").
Relative URI references are distinguished from absolute URI in that they do not begin with a scheme name. Instead, the scheme is inherited from the base URI, as described in Section 5.2.
I think it's fairly clear that using the colon as a separator here is completely illegal, so really we're just left with the feed:// (aka HTTP) URLs, which is not at all sufficient. The proposal is a pre-draft, so this problem can be easily corrected by using a hyphen, plus, or period instead. However, some people have already implemented the idea...
"I think it's fairly clear that using the colon as a separator here is completely illegal"
"The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows: :"
So, feed:http://example.com/rss.xml has a scheme of "feed", with a "scheme-specific-part" of http://example.com/rss.xml, which is perfectly legal. It's also only one new scheme.
I don't like the idea personally, however.
Ehhh okay I'm an idiot. I mean, I even quoted that in my entry... wow.
Yeah so that argument is gone, though I do feel that was the weakest of all as it wasn't a practical reason. Man I can't believe how stupid that was. So embarrasing.
Wait, I was right! The scheme-specific part is the rest of the URI, which /cannot/ include a colon because that is a reserved character. There is only one colon allowed and that is to seperate the scheme (eg, http) and the scheme specific part (eg, /en/archives/187-On-the-feed-URI-scheme.html).
Phew, I'm not a complete moron!
A bit of section 2.2: "Characters in the 'reserved' set are not reserved in all contexts. The set of characters actually reserved within any given URI component is defined by that component. In general, a character is reserved if the semantics of the URI changes if the character is replaced with its escaped US-ASCII encoding."
And from section 3: "URI that are hierarchical in nature use the slash "/" character for separating hierarchical components. For some file systems, a "/" character (used to denote the hierarchical structure of a URI) is the delimiter used to construct a file name hierarchy, and thus the URI path will look similar to a file pathname. This does NOT imply that the resource is a file or that the URI maps to an actual filesystem pathname."
Note that the scheme is still defined as
scheme = alpha *( alpha | digit | "+" | "-" | "." )
The important part is that it (":") remains reserved in this context, unlike the slash which is specifically allowed for a certain purpose in that context.