BackRub Frequently Asked Questions

If your question is not answered here, please email backrub@pcd.stanford.edu or if you prefer call (415) 723-3154 and ask for Larry.

1) Why is BackRub asking for a file called robots.txt which isn't on my server?

This is a document which can tell BackRub not to download some or all information from your web server. For information on how to create a robots.txt file, see http://info.webcrawler.com/mak/projects/robots/norobots.html.

2) I don't want BackRub visiting my site or part of my site.

There is a standard for robot exclusion at http://info.webcrawler.com/mak/projects/robots/norobots.html. You can put a file on your server called robots.txt which can exclude BackRub or other "web crawlers". BackRub has a user-agent of "BackRub".

3) Why is BackRub trying to download incorrect links from my server? Or from a server that doesn't exist.

It is a property of the web that many links will be broken or outdated at any given time. Whenever anyone in the world types a link incorrectly that points to your site, or fails to update their pages to reflect changes in your server, BackRub will try to download an incorrect link from your site. Also, this is why you may get hits on a machine that is not even a web server.

4) Why is BackRub downloading information from our "Secret" web server?

It is almost impossible to keep a web server secret by not publishing any links to it. As soon as someone follows a link from your "secret" server to another web server, it is likely that your "secret" URL is in the referer tag, and can get stored and possibly published by the other web server in its referer log. So, if there is a link to your "Secret" web server or page on the web anywhere, it is likely that BackRub and other "web crawlers" will find it.

5) I have a robots.txt file. Why isn't BackRub obeying it?

In order to save bandwidth BackRub only downloads the robots.txt file every week or so. So, it may take a while for BackRub to learn of any changes that might have made to your robots.txt file. Also, BackRub is distributed on several machines. Each of these keeps its own record of your robots.txt file. Also, check that your syntax is correct against the standard at: http://info.webcrawler.com/mak/projects/robots/norobots.html.

6) How do I register my site with BackRub so it will be indexed?

There is no way to register, BackRub will find it eventually.

7) Why are there hits from multiple machines at .stanford.edu all with user-agent BackRub?

BackRub was designed to be distributed on several machines, in order to improve performance and scale as the web grows. Also, in order to cut down on bandwidth usage we would like to run many crawlers which run on machines close to the sites they are indexing in the network.

8) Your logo is upside down: Why is the light source obviously below the image? It looks quite unnatural...

The logo is simply a scan of my hand, from a flatbed scanner converted to black and white. The "back" in the picture is the scanner cover, and the shadows are from the scanner light.

For more answers, see the Robots FAQ.