|Measuring the Webs diameter|
|Researchers say Internet follows a type of ecology: |
Average distance between two points is 19 clicks
|By Alan Boyle|
|Sept. 8, 1999 Just how wide is the World Wide Web? A statistical survey has measured the Webs diameter, finding that theres an average of 19 clicks separating random Internet sites. The findings have implications for the future of Web searching as the global network grows.|
ALTHOUGH THE INTERNET is 30 years old, the protocol that made the World Wide Web was created only in 1990. But even in that short time, the Web has taken on an organic life of its own, as evidenced by two studies appearing in the Sept. 9 issue of the journal Nature. Both studies found that the Webs growth dynamics and its topology that is, the way its put together follow whats known in physics as a power law.
The Web doesnt look anything like we expected it to be, said Notre Dame physicist Albert-Laszlo Barabasi, who along with two colleagues studied the Webs topology. A power-law distribution means that the Web doesnt follow the usual mathematical models of random networks, but instead exhibits the type of physical order found in, say, magnetic fields, galaxies and plant growth.
Barabasi said that although the average Web page has seven links to other pages, there is a very, very high number of Web pages that have a huge number of connections far higher than they anticipated based on traditional mathematical models.
SHAPE OF THE WEB
The power-law connection means its possible to figure out the shape of the World Wide Web, even if you cant precisely map out every site and page on the network. Barabasi and his colleagues studied the distribution of links on a variety of sites at Notre Dame, at South Koreas Seoul National University, at the White House, at Yahoo and found that there was a consistent relationship between size and connectedness.
That relationship can be used to determine the average shortest path between two points in a network; that is, the diameter. Thus, if you accept the estimate that there are 800 million documents on the Web, you come up with an average distance of 19 links between two randomly selected points.
Its something like the movie Six Degrees of Separation (or the Hollywood name game known as "Six Degrees of Kevin Bacon) the idea that everybody on Earth is connected to each other through six intermediate steps.
If you picked out any two random Web pages, they might be linked directly to each other, or it might take hundreds of intermediate clicks to get from one page to the other. But if you went through that exercise thousands of times and tallied all those clicks, the findings indicate that the average would be roughly 19. With a chuckle, Barabasi says it would be just fine to think of it as 19 Clicks of Web Separation.
MORE THAN A GAME
But the power-law findings arent just a game: The researchers say that studying online topography is crucial in developing search algorithms or designing strategies for making information widely accessible on the Web. Even if the size of the Web mushrooms to 10 times its current size, the degrees of separation would only rise slightly, from 19 to 21. That is likely to increase the reliance on intelligence search techniques that can adroitly skip from site to site, seeking out the most relevant or most popular sites within the Web behemoth.
Search-engine companies already are relying increasingly on such techniques, said Danny Sullivan, editor of Search Engine Watch. He said thats the rationale behind search sites such as Google, which ranks its results by link importance ... DirectHit, which bases its analysis on what people are clicking on ... and Inktomi, which is looking at what people are actually viewing.
Yahoos success with a hand-picked link database is actually another example of why we cant use (just) brute force, Sullivan said. One of the reasons why Yahoo is so popular is because human beings actually do a pretty good job of picking out the best sites.|
If I give you a whole bunch of needles (in the proverbial haystack) and you want to pick out the best one, you need to go on something more than the fact that theyre all needles, he said.
NEC researcher Steve Lawrence, who has done his own statistical surveys of the Web, said the new findings could help the designers of future search tools.
Theres an opportunity for intelligent agents that take starting points from search engines and follow the links to go find what the user is after, he said. The 19-click finding could also provide a ballpark estimate for how deeply a Web crawler needed to dig, he said.
Barabasis colleagues in the topology study are Reka Albert and Hawoong Jeong of Notre Dame.
THE WEBS GROWTH DYNAMICS
Another study in Nature, looking at the global networks growth dynamics of the Web, confirms the idea that the World Wide Web follows natural laws and can be studied as an ecology of knowledge.
There is order hidden in the Web, said Bernardo Huberman and Lada Adamic of the Xerox Palo Alto Research Center.
They found that a power law was at work in the distribution of Web pages that a diminishingly small proportion of sites had an increasingly large page count. The proportions appeared to hold steady over various samples of the Web. For example, if data were collected from 250,000 Web sites, the probability of finding a site with a million pages would be 1 in 10,000, the researchers said.
The latest study by Huberman and Adamic is part of a series showing that site size, like site traffic, is distributed unequally: A small number of sites are responsible for a disproportionately large part of the Webs volume and activity.
Huberman said the growth of the Web was subject to two dynamics: the fact that the total number of sites is growing exponentially, and the fact that the fluctuations in the size of a particular site are proportional to the size of the site.
The more pages a site has, the more likely it is that more pages will be added to it, he said. Its just like the growth of a tree.
Like a tree, the total size of the Web will eventually become subject to resource limitations, Huberman acknowledged. But in his view, the current Web is still just a sapling, with plenty of potential for continued exponential growth.
I think that we might end up in an era where, just as people today have their own e-mail addresses, people will have their own Web sites, he said. But eventually it will taper off. Eventually it has to be self-limiting.
"Dateline NBC": Six Degrees of Kevin Bacon
Notre Dame: Random Networks
Self-Organizing Systems FAQ