The Wayback Machine - https://web.archive.org/all/20060325113044/http://www.myseoadvisor.com/search-engine-spiders.htm
Search engine spiders,how search engines works,how internet search engines works Search engine spiders,how search engines works,how internet search engines works
Home About Us SEO Services Testimonials Contact Us Recent Projects FAQs
  SEO Articles

  SEO News

  SEO Blogs

  Internet Marketing




  Search Engine Spiders

Before a search engine can tell you where a file or document is, it must be found. To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. A search engine spider is an automated software program used to locate and collect data from web pages for inclusion in a search engine's database and to follow links to find new pages on the World Wide Web.
When a spider is building its lists, the process is called Web crawling. In order to build and maintain a useful list of words, a search engine's spiders have to look at a lot of pages.
Crawler-based search engines have three major elements. First is the spider, also called the crawler. The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled." The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Everything the spider finds goes into the second part of the search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated with new information.

Sometimes it can take a while for new pages or changes that the spider finds to be added to the index. Thus, a web page may have been "spidered" but not yet "indexed." Until it is indexed -- added to the index -- it is not available to those searching with the search engine.

Search engine software is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant. You can learn more about how search
 

Search engine spider identification

The following is a basic listing of search engine spider names and their "owners". This is by no means complete, as there are many thousands of search engines on the Internet, but it covers the more common beneficial spiders.

Spider name  Spider owner
Googlebot  Google.com 
TeomaAgent  Teoma.com 
Zyborg  Wisenut.com 
Gulliver  NorthernLight.com
Architext spider  Excite.com 
FAST-WebCrawler  FAST (AllTheWeb.com) 
Slurp  Inktomi.com 
Yahoo Slurp Yahoo Web Search
Ask Jeeves  AskJeeves.com
ia_archiver  Alexa.com
Scooter  AltaVista.com 
Mercator  AltaVista.com
crawler@fast   FAST (AllTheWeb.com)
Crawler  Crawler.de 
InfoSeek sidewinder  InfoSeek.com 
Lycos_Spider_(T-Rex)  Lycos.com 
Fluffy the Spider   SearchHippo.com
Ultraseek  InfoSeek.com
MantraAgent  LookSmart.com
Moget  Goo.jp
T-H-U-N-D-E-R-S-T-O-N-E  Thunderstone.com
MuscatFerret  Euroferret.com
VoilaBot  Voila.fr
Sleek Spider  Search-info.com
KIT_Fireball  FireBall.de
WebCrawler  Webcrawler.com

 

 




Search Engines
  
copyright © 2005 myseoadvisor.com. All rights reserved.