How to count URLs

Wait a minute, how can we say we've got the biggest database? Don't all those other services say theirs are bigger?

Sometimes it can be tricky to sort out the conflicting claims. We think that the best way to figure out which service has the biggest database is to actually measure each index. But we'd also like to shed some light on the different ways that the various navigation services measure the size of their index.

Method 1: Count only URLs where the page's actual content has been indexed. (Used by Excite.)
The sites that use this method only report the web pages that have actually been retrieved and indexed. This means that every URL has actually been visited, and more importantly, it means that users can search on each and every word on every page in the index. (Excite has retrieved the full-text of 50 million URLs -- that's more pages than any other site.)

Method 2: Count URL tags listed in HTML documents, even if the page itself is never retrieved or indexed. (Used by Lycos.)
Almost every document on the web has a few hyperlinks in it -- after all, that's what the web is all about. Services such as Lycos base their index size not on the number of documents retrieved, but rather on the total number of links that they've seen, even if they never get around to actually retrieving the page that the link points to. It's a little like claiming you've read War and Peace when you've really only looked at the cover. The title certainly doesn't tell you about any of the characters, the plot, the setting, or anything else. To get that information, you have to actually read the book. It's the same way on the web; to get information about a page, you have to actually download the full-text -- but that isn't the number that Lycos reports. If they did report that number, then their database size would probably be more like 6 million, not the 50 million that they claim. Here's another way of looking this: we're both saying that our size is 50 million, but we're measuring in gallons and they're measuring in pints. Clearly 50 million gallons is bigger than 50 million pints...

Method 3: Count any URLs retrieved and indexed, even if you've seen them more than once. (Used by OpenText.)
Some services count the number of URLs that they actually retrieve, but they keep counting the same page again and again each time they see a link to it. That can be misleading -- thousands of sites point to http://www.excite.com/, but it's still just a single page, and it should be indexed as such.

So why does Excite use the first method? We believe it's the most accurate representation of the size of our database. The other two methods are misleading, and only confuse users who are trying to figure out which sites give them the best coverage. We use Method 1 because it gives the most accurate picture of how much useful information our index contains.

But don't take our word for it. To really see which index has the most documents, try measuring each one. We've described the method we used to compare index sizes, and of course we've also given a detailed report of our findings.

If you have questions about this or need clarification, send us some email.

Back to the front page
Back to Excite


Copyright (c) 1996 Excite Inc.