You can make your web pages robust now!
Download our free, Open Source software
(on Netscape, click on the link
with the alternative mouse button and select Save Link As).
It's implemented in Java 2 v1.3, so it runs most everywhere,
even on Linux now.
A more up to date version
but without the initial word cache has been bundled with the
Multivalent Browser.
To use this version, replace "java -jar Robust.jar ..."
with "java -classpath .../Multivalent.jar util.RobustHyperlink ...".
Use it to automatically make all your web pages robust. It can rewrite all the HTML pages on your web site, including your bookmarks, to make HTML A HREF URLs robust without touching the formatting of the rest of the text.
Until web browers fully support Robust Hyperlinks, you can recover from 404 page not found errors by taking the signature and feeding it into a search engine of your choice manually. For instance, consider the following Robust Hyperlink:
http://http.cs.berkeley.edu/~phelps/Robust/?lexical-signature=Let's assume that the page moves to another site on the World Wide Web and so the address-based portion of the URL fails. Soon, web browsers will automatically take you to the new location, but for the near term, take the siganture words "transclusions html http chaotically reattachment" and given them to a web search engine. Assuming page has been indexed at its new location, the search engine report that new location.
transclusions+html+http+chaotically+reattachment
Add a bookmark to Netscape, Microsoft Internet Explorer, or other JavaScript-compatible browser so that when a Robust Hyperlink breaks, you can click on the bookmark to perform robust fixing. (Presumably, in the future browsers, will build this in. You need to have JavaScript active, and due to technical details of the browsers, broken links due to a disappearing host, which is rare, as opposed to disappearing pages in a still-valid host, cannot be handled this way.)
One-click installation: To install in your browser, simply right-click on the following link: Fix Robust 404, and select "Add Bookmark" in Netscape, "Add to Favorites..." in MSIE, or "Add Link Document to Hot List" in Opera. (On Netscape, we recommend filing the link the the "Personal Toolbar Folder", as apparently Netscape has a bug following this link from the pulldown Bookmarks menu.)
One-click invocation: Now try it out. After adding the bookmark, try clicking on the following broken robust hyperlink to the UCB CS Home Page. When you get the Location Not Found page, click on your new bookmark, then choose a search engine with which to perform the signature resolution.
Download the Java source code. I'm eager to hear bugs and feature suggestions (write to phelps (at) cs.berkeley.edu), and I hope it provides a useful base for implementations in other languages and integration into other software.
Project ideas: GUI to our software, integration into Mozilla.
Usage:
java -jar Robust.jar [<options>] [<URL>] [<filename>]
with <options> described below.
Given a lone URL (no <filename>), report signature of <URL> and, optionally (with the -check option), check efficacy by looking up signature in various web search engines.
Otherwise, rewrite the single file <filename> or all files ending in ".html" or ".htm" in the directory <filename> and its subdirectories. <URL> gives the HTTP URL corresponding to <filename>, and it used to resolve links relative to the local site root (as in "http:/images/img.png"). If the HTML contains no such links, as in a bookmarks file, <URL> may be omitted, but do so at your own risk. If you'd like, you can make a copy or revision control checkpoint of the file/directory tree first. Rewriting is done to a temporary file, and at completion of the process renamed to the original filename, so you can interrupt the process safely.
Command-line options