Web | Moving Images | Texts | Software | Audio | Patron Info
Internet Archive Home | Forums | FAQs | About IA | Terms, Privacy, & Copyright | Contact
 
 Search:
Anonymous User (login or join us)

Read More

Why the Archive is Building an 'Internet Library'

Future Libraries

Related Projects and Research

Storage and Preservation

Server Statistics


Media Coverage [more]
Library of Congress Acquires Prelinger Collection
PC World: Best of Today's Web: Greatest Hits and Hidden Gems
Village Voice: "Other People's Property – Academics Square Off Against Hollywood on Internet Content"
Gannet News Service: "Archive site preserves earliest Web pages"
CNET: "Web know-it-all goes where you won't"
International Herald Tribune: "Go Wayback"
Online: "The Wayback Machine: The Web's Archive"
Spiegel Online: "Website of the Year: Wayback Machine"
Yahoo Internet Life: "Top of the Net 2001'
Technology Review: "Things That Matter: Living Memories"
Removing Documents From the Wayback Machine
The Internet Archive is not interested in offering access to Web sites or other Internet documents whose authors do not want their materials in the collection. To remove your site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.

The robots.txt file will do two things:

  1. It will remove all documents from your domain from the Wayback Machine.
  2. It will tell us not to crawl your site in the future.

To exclude the Internet Archive's crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver
Disallow: /

Robots.txt is the most widely used method for controlling the behavior of automated robots on your site (all major robots, including those of Google, Alta Vista, etc. respect these exclusions). It can be used to block access to the whole domain, or any file or directory within. There are a large number of resources for webmasters and site owners describing this method and how to use it. Here are a few:

Once you have put a robots.txt file up, submit your site (www.yourdomain.com) on the form on http://pages.alexa.com/help/webmasters/index.html#crawl_site.

The robots.txt file must be placed at the root of your domain (www.yourdomain.com/robots.txt). If you cannot put a robots.txt file up, submit a request to wayback2@archive.org.