Robots, Spiders and Wanderers:
    Finding Information on the Web

Early spiders

One of earliest progenitors of today’s spiders was first made available in March 1994. Created by Oliver McBryan the World Wide Web Worm (WWWW), as it was called, was designed to index only the HTML document titles and headers. At the height of its popularity the WWWW held over 100,000 Web documents in its database. Another spider, created by Jonathan Fletcher in these early stages of Web robot evolution, is known as Jumpstation.

The design of these primordial crawlers, while serving to advance the indexing process through automation, still lacked a high degree of effectiveness as it limited the retrieval to titles alone. Through application it was found that such retrieval programs were unable to index up to 20% of Web sites as these sites did not contain the HTML optional title element. As well, it was found that titles do not always accurately reflect the content of a document. To address this area of inadequacy the Repository Based Software Engineering (RBSE) Spider  was designed to index Web documents by content as well as by title. Its arrival on the Web scene in February 1994 was followed closely by other content/title indexers, such as the one created by Brian Pinkerton, known as WebCrawler  and Michael Mauldin’s Lycos . The design solution to integrate a content search with a title search served to further the effectiveness of the Web robot as a tool capable of responding to the needs of user. This malleability would again be tested as additional design flaws left users wondering if robots were more of a hindrance than an aid to retrieval.

Created by E. Hernandez for LIBR 500: Foundations of Information Technology.  Last Updated April 10, 2001.