Software

You can make your web pages robust now! Download our free, Open Source software (on Netscape, click on the link with the alternative mouse button and select Save Link As). It's implemented in Java 2 v1.3, so it runs most everywhere, even on Linux now. A more up to date version but without the initial word cache has been bundled with the Multivalent Browser. To use this version, replace "java -jar Robust.jar ..." with "java -classpath .../Multivalent.jar util.RobustHyperlink ...".

Use it to automatically make all your web pages robust. It can rewrite all the HTML pages on your web site, including your bookmarks, to make HTML A HREF URLs robust without touching the formatting of the rest of the text.

Until web browers fully support Robust Hyperlinks, you can recover from 404 page not found errors by taking the signature and feeding it into a search engine of your choice manually. For instance, consider the following Robust Hyperlink:

http://http.cs.berkeley.edu/~phelps/Robust/?lexical-signature=
transclusions+html+http+chaotically+reattachment
Let's assume that the page moves to another site on the World Wide Web and so the address-based portion of the URL fails. Soon, web browsers will automatically take you to the new location, but for the near term, take the siganture words "transclusions html http chaotically reattachment" and given them to a web search engine. Assuming page has been indexed at its new location, the search engine report that new location.

One-Click Browser Installation

Add a bookmark to Netscape, Microsoft Internet Explorer, or other JavaScript-compatible browser so that when a Robust Hyperlink breaks, you can click on the bookmark to perform robust fixing. (Presumably, in the future browsers, will build this in. You need to have JavaScript active, and due to technical details of the browsers, broken links due to a disappearing host, which is rare, as opposed to disappearing pages in a still-valid host, cannot be handled this way.)

One-click installation: To install in your browser, simply right-click on the following link: Fix Robust 404, and select "Add Bookmark" in Netscape, "Add to Favorites..." in MSIE, or "Add Link Document to Hot List" in Opera. (On Netscape, we recommend filing the link the the "Personal Toolbar Folder", as apparently Netscape has a bug following this link from the pulldown Bookmarks menu.)

One-click invocation: Now try it out. After adding the bookmark, try clicking on the following broken robust hyperlink to the UCB CS Home Page. When you get the Location Not Found page, click on your new bookmark, then choose a search engine with which to perform the signature resolution.

Source Code

Download the Java source code. I'm eager to hear bugs and feature suggestions (write to phelps (at) cs.berkeley.edu), and I hope it provides a useful base for implementations in other languages and integration into other software.

Wanted!

We're looking for a web search engine that better supports Robust Hyperlinks with easy to parse word frequency information, easy to parse search results, search results that reports pre-computed signatures as well, and queries that do not require all word in the signature to be found in returned pages but that highly biases ranking toward such pages.

Project ideas: GUI to our software, integration into Mozilla.

Documentation

Make your web pages robust automatically! Invoke this class as an Java application (see below) under Java 2 v1.3 to rewrite your web pages, making all HREF URLs robust and leaving inaccessible URLs and all other text untouched. Or compute the signature of a non-local web page. Make your bookmarks robust, and you will be able to find those interesting pages after they move to a different site. If you're a webmaster, make your site robust now, then add this to your production process so that new HTML pages are made robust before being put on the web server.

Usage:

java -jar Robust.jar [<options>] [<URL>] [<filename>]
with <options> described below.

Given a lone URL (no <filename>), report signature of <URL> and, optionally (with the -check option), check efficacy by looking up signature in various web search engines.

Otherwise, rewrite the single file <filename> or all files ending in ".html" or ".htm" in the directory <filename> and its subdirectories. <URL> gives the HTTP URL corresponding to <filename>, and it used to resolve links relative to the local site root (as in "http:/images/img.png"). If the HTML contains no such links, as in a bookmarks file, <URL> may be omitted, but do so at your own risk. If you'd like, you can make a copy or revision control checkpoint of the file/directory tree first. Rewriting is done to a temporary file, and at completion of the process renamed to the original filename, so you can interrupt the process safely.

Command-line options