The Internet Archive discovers and captures web pages through many different web crawls.
At any given time several distinct crawls are running, some for months, and some every day or longer.
View the web archive through the Wayback Machine.
Web wide crawl with initial seedlist and crawler configuration from March 2011. This uses the new HQ software for distributed crawling by Kenji Nagahashi.
What’s in the data set:
Crawl start date: 09 March, 2011
Crawl end date: 23 December, 2011
Number of captures: 2,713,676,341
Number of unique URLs: 2,273,840,159
Number of hosts: 29,032,069
The seed list for this crawl was a list of Alexa’s top 1 million web sites, retrieved close to the crawl start date. We used Heritrix (3.1.1-SNAPSHOT) crawler software and respected robots.txt directives. The scope of the crawl was not limited except for a few manually excluded sites.
However this was a somewhat experimental crawl for us, as we were using newly minted software to feed URLs to the crawlers, and we know there were some operational issues with it. For example, in many cases we may not have crawled all of the embedded and linked objects in a page since the URLs for these resources were added into queues that quickly grew bigger than the intended size of the crawl (and therefore we never got to them). We also included repeated crawls of some Argentinian government sites, so looking at results by country will be somewhat skewed.
We have made many changes to how we do these wide crawls since this particular example, but we wanted to make the data available “warts and all” for people to experiment with. We have also done some further analysis of the content.
If you would like access to this set of crawl data, please contact us at info at archive dot org and let us know who you are and what you’re hoping to do with it. We may not be able to say “yes” to all requests, since we’re just figuring out whether this is a good idea, but everyone will be considered.
A census of the world's go-playing population claims that 1 in every 222 people on the planet can play the game. In what appears to be a careful collation of published sources over the past few years, the site http://www.fin.ne.jp/~igo/census.htm (in Japanese only) says there are 26,902,220 players out of a world population of 6,000 million.
The site does not say, but I am certain the work done and the interest on this topic in Japan are related to Japan's efforts to accord Olympic status to go. For that reason it is important that they do not overstate their case. Indeed, the figures seem credible partly because they are so modest about Japan itself. A total of 12 million players has been bandied about often in the past; wiser heads have trimmed this to 6 million. Yet the current census puts the Japanese total at just 3 million.
China leads the way with 10 million players, but tiny South Korea has far and away the densest go population with 9 million. Since many blocks of flats have their own go teachers for children after school, and there is massive pride in the achievements of world number one Yi Ch'ang-ho, this does not seem too surprising. Even despite the recent financial turbulence in Korea, it is from the ranks of Korean companies that the newest go sponsors are emerging.
In contrast, North Korea is said to have just 2,000 players. Taiwan, another home of professional go, has 600,000.
The total go population in Asia is put at 22,062,000.
Western Europe has the next highest total - 150,000. Germany leads with 46,000, Britain has 35,000, the Netherlands 30,000 (the Korea of Europe!) and France 20,000. Since the membership of the British Go Assocation is well below the 1,000 mark, and regular tournament players are only a fraction of the membership, the figure of 35,000 needs to be justified.
However, if it is recalled that the best-selling go book, Penguin's "Go for Beginners", sold 30,000 copies, and there have been many other beginners' books, some at least selling over 10,000, the estimate of 35,000 can hold water if it refers to people who can play go.
North America has 127,000 players. Eastern Europe has 119,000 (Russia 80,000, Ukraine 20,000), and South America 30,080 (almost all - 30,000 - in Brazil, which has a large ex-patriate Japanese community).
Australia accounts for three-quarters of Oceania's go players. The Middle East has 100 players, while Africa is said to have just 40 players, which sounds too low even for Republic of South Africa alone - not to mention a small colony of internet players discovered in Kenya recently. But it is certainly of the right order of magnitude.
Another figure important to the Japanese is the number of countries participating in the World Amateur Championships. The figure has been growing rapidly in recent years and is now over 50. Several new countries will be appearing in the next event in Sendai in summer 2000.
There is no breakdown of go population by the sexes, but I would expect the highest density of women by far to be in Japan. As to age groups, Japan is almost certainly ageing, whereas Korea is almost all teeny-boppers or under - ensuring they will remain a force for years to come.