The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.
Once TWA took possession of the DC 1, it did not take long for them to realize they had a unique airplane. TWA received the first of their DC-2s on May 14, 1934, with the delivery of ship #301. American Airlines and other airlines all wanted the new DC-2.
The DC-3 overwhelmed the industry. The DC-3 was the first plane that could fly from New York to Chicago non-stop. American Airlines, United Airlines and TWA all used the Douglas Sleeper transport DST, the 14-passenger version of the DC-3.
In July 1939, the C47 was on the drawing board. With war about to break out in Europe, Douglas was swamped with orders for the drawing boardC47. As a stop gap measure, until the C-47 was flying, Douglas engineers modified the DC2. They assembled a DC2 fuselage to a DC3 tail, added more powerful engines and called it the C39.
The Navy had 100 of their R4Ds converted to R4D-8 (later the C-117D) at $300,000 each.
By the late 1940s, the airlines were now losing money on the DC-3. The question was how long the airlines could wait before replacing it. Many reasoned the DC-3 had to wear out soon; after all, it was more than 20 years old. In addition, another pressing problem forced Douglas to look for a DC-3 replacement.
Donald Douglas reformed his Davis-Douglas Company in 1921, calling it the Douglas Company. In 1928 the Douglas Company became the Douglas Aircraft Company, and Donald W. Douglas served as president until 1957, when he became Chairman and Chief Executive Officer.
United Airlines had approached Douglas in 1935 to start development of the DC4. In 1936, it became a cooperative project among five airlines (UAL, EAL, TWA, PAA, and AA).
From the beginning, the intention was to make the DC4, �Skymaster,� a different plane. It took advantage of the requirements generated by the success of the DC3. The public wanted larger and faster equipment, so Douglas invested three million dollars in the DC4, their first four engine, 42 passenger (30 berth), commercial airliner.
Douglas realized that for airlines to be profitable they would need a variety of aircraft sizes and capabilities to service routes of various lengths and passenger densities. The DC-3 would serve the medium range routes and the DC-4, under development, would relieve the DC-3 on the transcontinental routes. To fill the gap in the short haul routes serving the small, out-of-the-way communities, Douglas developed the DC-5. ."
Established 1994-The Oldest Douglas DC-3 web site in the world License Agreement No. BMC 01-TM-047 with The Boeing Company The DC-3/Dakota Historical Society is a Service Disabled Veteran Owned Organization �1996-2010 DC-3/Dakota Historical Society, or respective copyright holders All Rights Reserved. No part of this website may be reproduced without permission.