The world's largest crawl and massive
archive is available
Imagine the entire contents of the world
wide web... on disk.
The Alexa crawl gives you the ability to
tap the world's largest crawl index.
Spanning seven years, filling over 500 Terabytes of online
storage and expanding at a rate of 30 Terabytes per month,
the Alexa archive represents the largest collection of Web
information in the world today.
Largest bi-Monthly Crawl
Compare Alexa with the largest search indexes
and you'll see, Alexa is the largest -- over 3.5 billion unique
URLs, 3 billion unique pages, all updated every 60 days.
All this can be yours.
To explore information that is ten times the size of the Library
of Congress, Alexa has developed a proprietary operating system
and a powerful set of data mining tools that leverage excess
process capacity on hundreds of parallel computers.
Specialized collections of web data may be developed on request
and, on a subscription basis, updated up to several times
per day. Collections can be used
as a one-off research-oriented collection or as a continuous
up-to-date collection for Archivists and Search Engines.
Access Alexa's massive crawl of the web
in one of the following ways:
Alexa, in partnership with the Internet Archive,
offers free access to an archive of Alexa's crawl, going back
to 1996 via the Internet
Archive Wayback Machine. This unique service, the first
of its kind, provides public access to over 10 Billion archived
Collection - Hosted
Specialized archive collections can be made for a reasonable
cost. Working with you, Alexa would generate and maintain
a custom index of web content available via web interface.
This service is perfect for archivists or historians who would
like to create a special collection of web documents available
via the web. Example: September 11th Archive, commissioned
by the Library of Congress.
Special Collection - Portable
When having a copy of the crawl at your
location is the only option, Portable is for you. Alexa generates
a special collection of archived documents, places it on disk
and ships it to your location. Collections may be as small
as a few hundred web pages or as large as several billion,
depending on your needs.
Web - Hosted
Alexa's entire crawl of the web can be made available
to you on a subscription basis with access to Alexa's specialized
set of datamining tools. This product provides the maximum
performance, access and update frequency.
Web - Portable
For organizations capable of hosting or
mining an entire crawl index that exceeds 60 Terabytes in
size, Alexa can ship the contents of the crawl to your location.
Current customers include the Internet
Archive and the Library
of Alexandria in Egypt.
Frequently Asked Questions
Q: How large is the crawl?
A: Very, very large. The crawl
is over 60 Terabytes in size, spanning over 3.5 billion
unique URLs. This is larger than Google, and approximately
4 times larger than Altavista's published size.
Q: How often is the crawl updated?
A: The web-wide crawl takes
approximately 2 months to complete. Special collections
may be created on request and updated as often as needed.
Alexa Business Development
fax: (415) 561-6795
Presidio of San Francisco
PO Box 29141
San Francisco, CA