Recommendations for Managing Removal Requests And Preserving Archival Integrity
School of Information Management and Systems, U.C. Berkeley
December 13 - 14, 2002
Online archives and digital libraries collect
and preserve publicly available Internet documents for the future use of
historians, researchers, scholars, and the general public. These
archives and digital libraries strive to operate as trusted
repositories for these materials, and work to make their collections as
comprehensive as possible.
At times, however, authors and publishers may
request that their documents not be included in publicly available
archives or web collections. To comply with
such requests, archivists may restrict access to or remove that portion
of their collections with or without notice as outlined below.
Because issues of integrity and removal are
complex, and archivists generally wish to respond in a transparent
manner, these policy recommendations have been developed with help and
advice of representatives of the Electronic Frontier Foundation,
Chilling Effects, The Council on Library and Information Resources, the
Berkeley Boalt School of Law, and various other commercial and
non-commercial organizations through a meeting held by the Archive
Policy Special Interest Group (SIG), an ad hoc, informal group of
persons interested the practice of digital archiving.
In addition, these guidelines have been
informed by the American Library Association's Library Bill of Rights http://www.ala.org/work/freedom/lbr.html, the Society of American Archivists Code of
Ethics http://www.archivists.org/governance/handbook/app_ethics.asp, the International Federation of Library
Association's Internet Manifesto http://www.unesco.org/webworld/news/2002/ifla_manifesto.rtf, as well as applicable law.
Historically, removal requests
fall into one of the following five categories. Archivists
who wish to adopt this policy will respond according to the following
guidelines:
Type of removal request |
Response |
Request by a webmaster of a private
(non-governmental) web site, typically for reasons of privacy,
defamation, or embarrassment. |
1.
Archivists should provide a 'self-service' approach site owners can
use to remove their materials based on the use of the robots.txt
standard. 2. Requesters
may be asked to substantiate their claim of ownership by changing or
adding a robots.txt file on their site. 3. This allows
archivists to ensure that material will no longer be gathered or made
available. 4. These
requests will not be made public; however, archivists should retain
copies of all removal requests. |
Third party removal requests based on the
Digital Millennium Copyright Act of 1998 (DMCA). |
1. Archivists
should attempt to verify the validity of the claim by checking whether
the original pages have been taken down, and if appropriate, requesting
the ruling(s) regarding the original site. 2. If the claim
appears valid, archivists should comply. 3. Archivists
will strive to make DMCA requests public via Chilling Effects, and
notify searchers when requested pages have been removed. 4. Archivists
will notify the webmaster of the affected site, generally via email. |
Third party removal requests based on
non-DMCA intellectual property claims (including trademark, trade
secret). |
1. Archivists
will attempt to verify the validity of the claim by checking whether
the original pages have been taken down, and if appropriate, requesting
the ruling(s) regarding the original site. 2. If the
original pages have been removed and the archivist has determined that
removal from public servers is appropriate, then the archivists will
remove the pages from their public servers. 3. Archivists
will strive to make these requests public via Chilling Effects, and
notify searchers when requested pages have been removed. 4. Archivists
will notify the webmaster of the affected site, generally via email |
Third party removal requests based on
objection to controversial content (e.g. political, religious, and other
beliefs). |
As noted in the Library Bill of Rights, '
Libraries should provide materials and
information presenting all points of view on current and historical
issues. Materials should not be proscribed or removed because of
partisan or doctrinal disapproval.' Therefore, archivists should not generally
act on these requests. |
Third party removal requests based on
objection to disclosure of personal data provided in confidence. |
Occasionally, data disclosed in confidence
by one party to another may eventually be made public by a third party. For example, medical information provided in
confidence is occasionally made public when insurance companies or
medical practices shut down. These requests are generally treated as
requests by authors or publishers of original data. |
Requests by governments. |
Archivists will exercise best-efforts
compliance with applicable court orders Beyond that, as noted in the Library Bill
of Rights, 'Libraries should challenge censorship in
the fulfillment of their responsibility to provide information and
enlightenment.' |
Other requests and grievances, including
underlying rights issues, error correction and version control, and
re-insertions of web sites based on change of ownership. |
These are handled on a case by case basis
by the archive and its advisors. |
To remove a site from the
Wayback Machine, place a robots.txt file at the top level of your site
(e.g. www.yourdomain.com/robots.txt) and then submit your site below.
The robots.txt file will do two
things:
1. It will remove all documents from your
domain from the Wayback Machine.
2. It will tell the Internet Archive's
crawler not to crawl your site in the future.
To exclude the Internet
Archive's crawler (and remove documents from the Wayback Machine) while
allowing all other robots to crawl your site, your robots.txt file
should say:
User-agent:
ia_archiver
Disallow:
/
Robots.txt is the most widely
used method for controlling the behavior of automated robots on your
site (all major robots, including those of Google, Alta Vista, etc.
respect these exclusions). It can be used to block access to the whole
domain, or any file or directory within. There are a large number of
resources for webmasters and site owners describing this method and how
to use it. Here are a few:
o http://www.global-positioning.com/robots_text_file/index.html
o http://www.webtoolcentral.com/webmaster/tools/robots_txt_file_generator
o http://pageresource.com/zine/robotstxt.htm
Once you have put a robots.txt
file up, submit your site (www.yourdomain.com) on the form on http://pages.alexa.com/help/webmasters/index.html#crawl_site.
The robots.txt file must be
placed at the root of your domain (www.yourdomain.com/robots.txt). If
you cannot put a robots.txt file up, submit a request to wayback2@archive.org.
For further information, please
contact jeff - at - archive - dot - org.