The Oakland Archive Policy

Recommendations for Managing Removal Requests And Preserving Archival Integrity

School of Information Management and Systems, U.C. Berkeley

December 13 - 14, 2002

Introduction

Online archives and digital libraries collect and preserve publicly available Internet documents for the future use of historians, researchers, scholars, and the general public. These archives and digital libraries strive to operate as trusted repositories for these materials, and work to make their collections as comprehensive as possible.

At times, however, authors and publishers may request that their documents not be included in publicly available archives or web collections. To comply with such requests, archivists may restrict access to or remove that portion of their collections with or without notice as outlined below.

Because issues of integrity and removal are complex, and archivists generally wish to respond in a transparent manner, these policy recommendations have been developed with help and advice of representatives of the Electronic Frontier Foundation, Chilling Effects, The Council on Library and Information Resources, the Berkeley Boalt School of Law, and various other commercial and non-commercial organizations through a meeting held by the Archive Policy Special Interest Group (SIG), an ad hoc, informal group of persons interested the practice of digital archiving.

In addition, these guidelines have been informed by the American Library Association's Library Bill of Rights http://www.ala.org/work/freedom/lbr.html, the Society of American Archivists Code of Ethics http://www.archivists.org/governance/handbook/app_ethics.asp, the International Federation of Library Association's Internet Manifesto http://www.unesco.org/webworld/news/2002/ifla_manifesto.rtf, as well as applicable law.

Recommended Policy for Managing Removal Requests

Historically, removal requests fall into one of the following five categories. Archivists who wish to adopt this policy will respond according to the following guidelines:

Type of removal request	Response
Request by a webmaster of a private (non-governmental) web site, typically for reasons of privacy, defamation, or embarrassment.	1. Archivists should provide a 'self-service' approach site owners can use to remove their materials based on the use of the robots.txt standard. 2. Requesters may be asked to substantiate their claim of ownership by changing or adding a robots.txt file on their site. 3. This allows archivists to ensure that material will no longer be gathered or made available. 4. These requests will not be made public; however, archivists should retain copies of all removal requests.
Third party removal requests based on the Digital Millennium Copyright Act of 1998 (DMCA).	1. Archivists should attempt to verify the validity of the claim by checking whether the original pages have been taken down, and if appropriate, requesting the ruling(s) regarding the original site. 2. If the claim appears valid, archivists should comply. 3. Archivists will strive to make DMCA requests public via Chilling Effects, and notify searchers when requested pages have been removed. 4. Archivists will notify the webmaster of the affected site, generally via email.
Third party removal requests based on non-DMCA intellectual property claims (including trademark, trade secret).	1. Archivists will attempt to verify the validity of the claim by checking whether the original pages have been taken down, and if appropriate, requesting the ruling(s) regarding the original site. 2. If the original pages have been removed and the archivist has determined that removal from public servers is appropriate, then the archivists will remove the pages from their public servers. 3. Archivists will strive to make these requests public via Chilling Effects, and notify searchers when requested pages have been removed. 4. Archivists will notify the webmaster of the affected site, generally via email

Third party removal requests based on objection to controversial content (e.g. political, religious, and other beliefs).	As noted in the Library Bill of Rights, ' Libraries should provide materials and information presenting all points of view on current and historical issues. Materials should not be proscribed or removed because of partisan or doctrinal disapproval.' Therefore, archivists should not generally act on these requests.
Third party removal requests based on objection to disclosure of personal data provided in confidence.	Occasionally, data disclosed in confidence by one party to another may eventually be made public by a third party. For example, medical information provided in confidence is occasionally made public when insurance companies or medical practices shut down. These requests are generally treated as requests by authors or publishers of original data.
Requests by governments.	Archivists will exercise best-efforts compliance with applicable court orders Beyond that, as noted in the Library Bill of Rights, 'Libraries should challenge censorship in the fulfillment of their responsibility to provide information and enlightenment.'
Other requests and grievances, including underlying rights issues, error correction and version control, and re-insertions of web sites based on change of ownership.	These are handled on a case by case basis by the archive and its advisors.

Addendum: An Example Implementation of Robots.txt-based Removal Policy at the Internet Archive

To remove a site from the Wayback Machine, place a robots.txt file at the top level of your site (e.g. www.yourdomain.com/robots.txt) and then submit your site below.

The robots.txt file will do two things:

1. It will remove all documents from your domain from the Wayback Machine.

2. It will tell the Internet Archive's crawler not to crawl your site in the future.

To exclude the Internet Archive's crawler (and remove documents from the Wayback Machine) while allowing all other robots to crawl your site, your robots.txt file should say:

User-agent: ia_archiver

Disallow: /

Robots.txt is the most widely used method for controlling the behavior of automated robots on your site (all major robots, including those of Google, Alta Vista, etc. respect these exclusions). It can be used to block access to the whole domain, or any file or directory within. There are a large number of resources for webmasters and site owners describing this method and how to use it. Here are a few:

o http://www.global-positioning.com/robots_text_file/index.html

o http://www.webtoolcentral.com/webmaster/tools/robots_txt_file_generator

o http://pageresource.com/zine/robotstxt.htm

Once you have put a robots.txt file up, submit your site (www.yourdomain.com) on the form on http://pages.alexa.com/help/webmasters/index.html#crawl_site.

The robots.txt file must be placed at the root of your domain (www.yourdomain.com/robots.txt). If you cannot put a robots.txt file up, submit a request to wayback2@archive.org.

For further information, please contact jeff - at - archive - dot - org.