Table of contents:
tag: URIsEvery Atom entry must have a globally unique ID, in the <id> element. This helps aggregators and directories keep track of an entry, even if it gets updated. Some aggregators redisplay changed entries; some don’t; some track changes over time. But before you can do any of these things, you need to uniquely identify the entry, and that’s what <id> is for.
There are three requirements for an Atom ID:
There are several ways to construct an unchanging, globally unique URI, but some are better than others.
It’s valid to use your permalink URL as your <id>, but I discourage it because it can create confusion about which element should be treated as the permalink. Developers who don’t read specs will look at your Atom feed, and they see two identical pieces of information, and they pick one and use it as the permalink, and some of them will pick incorrectly. Then they go to another feed where the two elements are not identical, and they get confused.
In Atom, <link rel="alternate"> is always the permalink of the entry. <id> is always a unique identifier for the entry. Both are required, but they serve different purposes. An entry ID should never change, even if the permalink changes.
“Permalink changes”? Yes, permalinks are not as permanent as you might think. Here’s an example that happened to me. My permalink URLs were automatically generated from the title of my entry, but then I updated an entry and changed the title. Guess what, the “permanent” link just changed! If you’re clever, you can use an HTTP redirect to redirect visitors from the old permalink to the new one (and I did). But you can’t redirect an ID.
The ID of an Atom entry must never change! Ideally, you should generate the ID of an entry once, and store it somewhere. If you’re auto-generating it time after time from data that changes over time, then the entry’s ID will change, which defeats the purpose.
RFC 2141 defines a syntax for URNs. URNs specifically designed to be used as globally unique identifiers. They are valid URIs. They look sort of like URLs you might type in a browser, but URNs are not designed to be clickable. They’re just structured identifiers.
So why not use them? Well, the main reason is that they require registration (described in RFC 3406). You can’t just use the domain name you’ve already registered; URN namespace registration is a separate process.
If you have a registered URN namespace, you can use it to generate Atom IDs. But if you haven’t registered one, you can’t just make up a URN and publish it. URNs don’t work that way.
tag: URIsHowever, there is an emerging standard that allows anyone to construct globally unique identifiers without additional registration: tag URIs. To construct a tag URI, you only need a domain name or an email address. (A subdomain works too.) For the purposes of this tutorial, I’m going to assume that you have your own domain name or subdomain, and that you don’t wish to publish your email address for spammers to scrape.
Start with your permalink URL. I’ll use http://diveintomark.org/archives/2004/05/27/howto-atom-linkblog, a real example of a recent post. Your permalink may look different; it may not contain a date; it may just use a numeric ID; it may contain a fragment identifier (with a # mark). That’s OK, you can make a tag: URI out of any URL.
Discard everything before the domain name.
Progress so far: diveintomark.org/archives/2004/05/27/howto-atom-linkblog
Change all # characters to /
Progress so far: unchanged
Immediately after the domain name, insert a comma, then the year-month-day that the article was published, then a colon. Be sure to use a four-digit year, two-digit month, and two-digit day. Don’t forget the colon.
Progress so far: diveintomark.org,2004-05-27:/archives/2004/05/27/howto-atom-linkblog
Add tag: at the beginning. (Don’t add slashes; it’s just “tag:“. That’s a common mistake.)
Progress so far: tag:diveintomark.org,2004-05-27:/archives/2004/05/27/howto-atom-linkblog
That’s it! There are other ways to create valid tag: URIs, but this procedure works for any URL.
The only potential problem here is that if your permalinks may change over time (for example, if they are based on title and you modify titles after posting, or if you change your permalink URL scheme entirely), you must not recompute the tag: URI when the permalink changes. Ideally, you should build the Atom ID once and then store it with the rest of the entry data. If this is not feasible, and if you can not guarantee that your permalinks will never change, there are some other ways to build valid tag: URIs that you might want to consider instead.
By some internal key, such as the primary key on your database’s entry table. For example, in Movable Type, the <$MTEntryID$> template tag will give you the primary key from the mt_entry table. This key will never change for a given entry. It’s not a valid tag: URI all by itself (it’s just a number), but it can be used to construct a valid tag: URI.
Example: tag:diveintomark.org,2004-05-27:1192 (the <$MTEntryID$> was 1192)
The only potential problem here is a lack of portability. I recently switched publishing tools, and the import/export routine did not capture these database keys. All of my entry IDs changed. I am a bad person. This experience was partly what prompted me to write this tutorial.
And obviously, if your publishing system does not use a database, then this technique is not for you. Sorry, Rael.
By entry creation date. If your publishing tool tracks an unchangeable entry date (that is, a date that doesn’t change when the entry is updated, and is not modifiable by the end user), you can use it to create a tag: URI.
Example: tag:diveintomark.org,2004-05-27:/archives/20040527110347
In this case, the entry creation date was 2004-05-27 at 11:03:47 AM. Spaces are not allowed in a tag: URI, so I mushed together the year, month, day, hour, minute, and second.
The only potential problem here is that you must guarantee in advance that no two entries will be published at exactly the same time, since then they would share the same entry ID, which defeats the purpose. For single-author feeds this is generally not a problem, and this is how I generate my Atom IDs now.
If the same entry appears in two different feeds, it must have the same ID in both places. This is not an exception to the “globally unique” rule; it’s an integral part of it. An entry’s ID is the key for that entry across all time and space. If the same entry appears in two places, it must have the same ID in both places — otherwise it’s not really the same entry.
How could this happen?
In the case of multiple feeds produced by the same site, you just need to make sure that the way you are constructing your Atom IDs will generate the same ID in both places. Make sure the ID is not based on the URL of the feed in which it appears, or the category name, or some other data that is different between the feeds in which the entry appears.
In the case of sites that aggregate content from multiple sites, the aggregator script should preserve the original <id> element from the entries of each feed.
How aggregators deal with duplicate entries is entirely up to them. If an entry appears in two feeds, and you’re subscribed to both feeds, some aggregators may display it in both places, but mark it as “read” in both feeds once you read it once. The behavior of client-side software is entirely up to the client-side developer. The only thing <id> does is try to give developers the ability to make those decisions without complicated and error-prone heuristics.
An Atom ID is an unchanging, globally unique URI. All parts of that are important. If an entry’s ID changes over time, that defeats the purpose. If you’re reusing IDs for different entries, that really defeats the purpose. There are several techniques for constructing unchanging, globally unique URIs, and you should use whichever one is easiest for you.
§
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)
§
© 2001–present Mark Pilgrim