About the Author:

Kevin Broccoli is a professional indexer who has created indexes for several Web sites.

His company, Broccoli Information Management, offers indexing and related services for the Web and other media.


Guest Article:

Indexes: An Old Tool for a New Medium

By Kevin Broccoli
November 17, 1998


Search Frustration

"I know it's here but I just can't find it!"

You've probably heard that exclamation in a variety of situations.

Today, however, it seems that people often experience this kind of frustration when trying to locate specific information within HTML documents. This is especially true concerning "content-rich" Web sites.

Search-Engine Shortcomings

Perhaps you've had the following experience: You visit a Web site hoping to find information about a particular topic. You type a keyword or two into the site's search engine. What do you find? Nothing! The search engine says, "0 results have been found for your search."

So you try once more, this time using a different search term than before. Now you do get some results – but too many. The search engine now says, "47 documents have been retrieved." That’s more than you wanted or expected.

Still, you start looking through those documents one at a time. After several hours spent scanning many pages of text, you discover that only 4 of those 47 documents   contain the information you sought.

Exasperated, you wonder why so much information was presented to you, when so very little of it met your needs. You also wonder what might have happened had you not submitted that precise search term into the search engine.

The root of the problem lies in how search engines perform searches. Put simply, they scan text looking for occurrences of whatever word you typed into the search box. Then, they list every single document that contains even the merest mention of the word.

What Makes a Good Index

The Internet is a relatively new medium, but you can learn a lot about how to make online content work well from the "parent" of online media: print media.

Most printed reference or nonfiction books offer an index of some kind. An index is not a blind, mechanical catalog of words. Rather, it is created by an indexer.

Indexers are trained to analyze concepts. An indexer will physically read every page of a book and develop a list of page references that lead to information on various topics, individuals, or places covered in that book.

The goal of an index is to direct readers to pertinent information on each topic listed, rather than passing mentions. This requires the indexer to make many judgement calls – that is, to consider context as well as content.

Indexers also categorize concepts – they break down main subjects headings into subtopics, in a hierarchical format. This structure helps readers "narrow" their search.

A well-written index assumes that the reader may not know specific terms used in the text. Therefore, an indexer will use a thesaurus to create index entries that are synonyms of the terms used within the text. This ensures that even if readers don’t know the exact words used in a text, they still will be directed to pages that discuss the topic sought.

A well-written index also lists topics that are implied, rather than stated directly in the text. Consider the example of a book about dogs that does not include a section devoted to canine food or nutrition – but that does discuss (in various places) the importance of feeding a dog properly, and also what vitamins and minerals are essential to canine health.

It is likely that readers would turn to this book seeking information on dog food or nutrition, so the book's index should include the terms "nutrition" and "food," with references to relevant pages.

Web Indexes vs. Book Indexes

Indexes obviously are useful and appropriate for books. However, they also can work well for Web sites. A Web site index offers the same benefits over a search engine that a book index offers over a concordance.

What’s a concordance? A concordance lists every single occurrence of each individual word of significance contained in a specific text. This is similar to the results produced by a search engine. If you look at a large concordance (such as Strong's Exhaustive Concordance of the Bible), you’ll see how many listings are possible for a single word. (For instance, in Strong’s Exhaustive, try looking up the word "king.") Therefore, for most purposes a concordance generally isn’t as useful as an index.

In some respects, the process of creating an index for a Web site is similar to creating an index for a book. For instance, a  Web indexer will read through every page in the site, analyze the concepts discussed, and develop an index that lists the topics covered in the text.

One key difference between a book index and a Web index is hypertext.

In a Web index, the references listed can (and should) be live links that take the user directly to the relevant text in the site. Live links make a Web index not merely informative, but functional. Some examples of Web site indexes that utilize live links are:

Ideally, a Web site indexer should know how to modify the HTML code of Web pages, in order to create hyperlinks. Specifically, indexers should know how to create an "anchor" in the Web page where the text referenced in a particular index entry begins (if no anchor already exists at that location), and then make the index entry a live link to that anchor.

Updating is an important issue for both print and online indexes. However, updating a Web index typically involves incremental maintenance. (Index updates for books are infrequent, major projects.)

Most Web sites evolve constantly – from minor modifications to small sections of text, to the addition or deletion of entire content sections. Also, existing content can be moved to a different page or directory within the site.

In order for a Web index to remain useful, it must keep pace with the site’s evolution. Few things are more frustrating to a user than broken or outdated links in a site’s own index.

Consequently, there should be regular, frequent communication between the site’s developers and the indexer. Whenever significant content is modified, moved, added, or deleted, the indexer should be informed. Then, the indexer should immediately update the index to reflect the current state of content on the site.

Is It an Index – or Not?

A quick look around the Web reveals that the term "index" is much misunderstood by Web developers and publishers. In fact, most Web reference tools labeled "site index" are not indexes at all!

Most people know what an index is, from having used them in printed books. Therefore, when a visitor sees a link on your site that says "site index," he or she may click on that link expecting to encounter a real index. However, if that link leads to a different type of guide it might cause confusion, frustration, or disappointment.

If the guide or reference tool you’ve created for your Web site is not a true index, it’s helpful to your visitors if you call it by its correct name.

The site guides and tools described below are not indexes, but they commonly are mislabeled as such. Examples of sites that have made this mistake also are listed:

  • A table of contents, even a very detailed one, is not an index. It is very common for a site’s table of contents to be mislabeled as a site index – in fact, it’s more common to see this mistake than to see true Web site indexes that are labeled correctly! A similar misunderstanding could lead to a site map being mislabeled as an index.
    See: Sears, Chase Manhattan Bank,, and The Beer Info Source
  • A collection of links to related Web sites or other resources is not an index.
    See: Family Tree Maker

Sometimes it can be hard to tell whether a particular site guide is an index or some other kind of tool. For instance, at first glance the "index" of the Association for Health Services Research Web site appears to be a true index. It is ordered alphabetically, and some entries (such as "About AHSR") include subtopics.

However, this page is a sophisticated table of contents,  not a true index. All of its entries directly reflect the site’s structure (how information is divided into sections and pages). The list is not really broken down by subject. For instance, while this list includes entries for "Job and Resume Binder Order Form" and "Career Center," there is no subject-based entry for "Jobs."

Not Every Site Needs an Index

Some types of Web sites on the Web that would not benefit significantly from an index. For instance:

  • Online stores: These sites may be large, but since they usually have very little content (in the conventional sense) the only essential information retrieval tool is a search engine. is good example of this. There, users simply type in the title of the book or CD sought, or perhaps the author's or musician’s name, and they are led to a page featuring information about the book or CD.
  • Smaller sites: When a visitor can click through a site’s complete contents in a matter of minutes, an index would not add much value.

In contrast, many types of sites would serve their visitors better by offering an index. This is especially true of online magazines or other content-rich sites.

For example, 21st Century Online publishes articles by professionals in various disciplines. Although a reader can simply "drill down" through the current selection of articles on the site, this becomes increasingly difficult as more and more articles are published.

Even Hotwired (the online counterpart of Wired magazine) does not yet have a site index. However, an index would be especially helpful for finding specific information in this venue’s four years’ worth of archives.

Working with (or as) an Indexer

If you decide that your Web site needs an index, you then must decide whether to hire someone to create it, or whether to do it yourself.

If your site is very content-rich, you’re probably better off investing in hiring a professional indexer. This also could be a good decision for sites that are smaller or less complex, as long as the budget is available.

Remember: the goal of an index is to improve the usability of a Web site. Therefore, considering an indexer as a usability professional could help justify this investment.

However, if your site is not especially large, or if there is no budget to hire an indexer, or if you simply wish to learn a new skill, it is possible to teach yourself enough about the basics of indexing to attempt this project. A few resources that can help you learn how to create an index are:

  • "Organizing Your Site from A to Z: Creating an index for users who know what they’re doing"
    This article by Lou Rosenfeld, published in Web Review in Oct. 1997, covers the basics of what makes a good index. It also outlines a four-step process for creating a Web site index.
  • The Chicago Chapter of the American Society of Indexers has published an Index Evaluation Checklist, which can help you determine whether an index is appropriate and complete. While this document does not specifically address Web indexing, many of its points apply to Web indexes.

Indexing also can be a lucrative line of work. Although most available indexing work is for print media (books, etc.), indexes are becoming increasingly common in online and digital media (Web sites, Intranets, CD-ROMs, etc.). For writers, editors, producers, or Web developers, indexing can be one more valuable service to market to your clients.

The ASI is a good resource for people who seek to become professional indexers. This group’s indexing FAQ covers several key points about the "business side" of this field.


Whether your site has an index or not, or whether you learn to create indexes or not, learning about indexing can prove valuable to anyone who develops or uses Web sites.

Understanding indexes makes Web developers and publishers consider what their users would want to find, and how those searches could be simplified or aided. Similarly, Web users who understand the value of a good index can encourage Web publishers to add this key usability tool to their sites.

It’s even possible that, one day, indexes might be considered as indispensable to informational or content-rich Web sites as they are to printed reference books today.


(top of this page)

(c)1998 by Kevin Broccoli. All rights reserved. Published in CONTENTIOUS with permission. Do not reproduce or redistribute any material from this document, in whole or in part, without written permission from the author.