An Old Tool for a New Medium
"I know it's here but I just can't find it!"
You've probably heard that exclamation in a variety of situations.
Today, however, it seems that people often experience this kind of frustration when
trying to locate specific information within HTML documents. This is especially true
concerning "content-rich" Web sites.
Perhaps you've had the following experience: You visit a Web site hoping to find
information about a particular topic. You type a keyword or two into the site's search
engine. What do you find? Nothing! The search engine says, "0 results have been found
for your search."
So you try once more, this time using a different search term than before. Now you do
get some results but too many. The search engine now says, "47 documents have
been retrieved." Thats more than you wanted or expected.
Still, you start looking through those documents one at a time. After several hours
spent scanning many pages of text, you discover that only 4 of those 47 documents
contain the information you sought.
Exasperated, you wonder why so much information was presented to you, when so very
little of it met your needs. You also wonder what might have happened had you not
submitted that precise search term into the search engine.
The root of the problem lies in how search engines perform searches. Put simply, they
scan text looking for occurrences of whatever word you typed into the search box. Then,
they list every single document that contains even the merest mention of the word.
What Makes a Good Index
The Internet is a relatively new medium, but you can learn a lot about how to make
online content work well from the "parent" of online media: print media.
Most printed reference or nonfiction books offer an index of some kind. An index is not
a blind, mechanical catalog of words. Rather, it is created by an indexer.
Indexers are trained to analyze concepts. An indexer will physically read every
page of a book and develop a list of page references that lead to information on various
topics, individuals, or places covered in that book.
The goal of an index is to direct readers to pertinent information on each topic
listed, rather than passing mentions. This requires the indexer to make many judgement
calls that is, to consider context as well as content.
Indexers also categorize concepts they break down main subjects headings
into subtopics, in a hierarchical format. This structure helps readers "narrow"
A well-written index assumes that the reader may not know specific terms used in the
text. Therefore, an indexer will use a thesaurus to create index entries that are synonyms
of the terms used within the text. This ensures that even if readers dont know the
exact words used in a text, they still will be directed to pages that discuss the topic
A well-written index also lists topics that are implied, rather than stated
directly in the text. Consider the example of a book about dogs that does not include a
section devoted to canine food or nutrition but that does discuss (in various
places) the importance of feeding a dog properly, and also what vitamins and minerals are
essential to canine health.
It is likely that readers would turn to this book seeking information on dog food or
nutrition, so the book's index should include the terms "nutrition" and
"food," with references to relevant pages.
Web Indexes vs. Book Indexes
Indexes obviously are useful and appropriate for books. However, they also can work
well for Web sites. A Web site index offers the same benefits over a search engine that a
book index offers over a concordance.
a concordance? A concordance lists every single occurrence of each individual
word of significance contained in a specific text. This is similar to the results produced
by a search engine. If you look at a large concordance (such as Strong's
Exhaustive Concordance of the Bible), youll see how many listings are
possible for a single word. (For instance, in Strongs Exhaustive, try looking up the
word "king.") Therefore, for most purposes a concordance generally isnt as
useful as an index.
In some respects, the process of creating an index for a Web site is similar to
creating an index for a book. For instance, a Web indexer will read through every
page in the site, analyze the concepts discussed, and develop an index that lists the
topics covered in the text.
One key difference between a book index and a Web index is hypertext.
In a Web index, the references listed can (and should) be live links that take
the user directly to the relevant text in the site. Live links make a Web index not merely
informative, but functional. Some examples of Web site indexes that utilize live
Ideally, a Web site indexer should know how to modify the HTML code of Web pages,
in order to create hyperlinks. Specifically, indexers should know how to create an
"anchor" in the Web page where the text referenced in a particular index entry
begins (if no anchor already exists at that location), and then make the index entry a
live link to that anchor.
Updating is an important issue for both print and online indexes. However,
updating a Web index typically involves incremental maintenance. (Index updates for books
are infrequent, major projects.)
Most Web sites evolve constantly from minor modifications to small sections of
text, to the addition or deletion of entire content sections. Also, existing content can
be moved to a different page or directory within the site.
In order for a Web index to remain useful, it must keep pace with the sites
evolution. Few things are more frustrating to a user than broken or outdated links in a
sites own index.
Consequently, there should be regular, frequent communication between the sites
developers and the indexer. Whenever significant content is modified, moved, added, or
deleted, the indexer should be informed. Then, the indexer should immediately update the
index to reflect the current state of content on the site.
Is It an Index or Not?
A quick look around the Web reveals that the term "index" is much
misunderstood by Web developers and publishers. In fact, most Web reference tools labeled
"site index" are not indexes at all!
Most people know what an index is, from having used them in printed books. Therefore,
when a visitor sees a link on your site that says "site index," he or she may
click on that link expecting to encounter a real index. However, if that link leads to a
different type of guide it might cause confusion, frustration, or disappointment.
If the guide or reference tool youve created for your Web site is not a true
index, its helpful to your visitors if you call it by its correct name.
The site guides and tools described below are not indexes, but they
commonly are mislabeled as such. Examples of sites that have made this mistake also are
- A table of contents, even a very detailed one, is not an index. It is very common
for a sites table of contents to be mislabeled as a site index in fact,
its more common to see this mistake than to see true Web site indexes that are
labeled correctly! A similar misunderstanding could lead to a site map being
mislabeled as an index.
See: Sears, Chase Manhattan Bank, WebReference.com, and The Beer Info Source
- A collection of links to related Web sites or other resources is not an index.
See: Family Tree Maker
Sometimes it can be hard to tell whether a particular site guide is an index or some
other kind of tool. For instance, at first glance the "index" of the Association for Health Services Research
Web site appears to be a true index. It is ordered alphabetically, and some entries (such
as "About AHSR") include subtopics.
However, this page is a sophisticated table of contents, not a true index. All of
its entries directly reflect the sites structure (how information is divided
into sections and pages). The list is not really broken down by subject. For
instance, while this list includes entries for "Job and Resume Binder Order
Form" and "Career Center," there is no subject-based entry for
Not Every Site Needs an Index
Some types of Web sites on the Web that would not benefit significantly from an index.
- Online stores: These sites may be large, but since they usually have very little
content (in the conventional sense) the only essential information retrieval tool is a
search engine. Amazon.com is good example of this.
There, users simply type in the title of the book or CD sought, or perhaps the author's or
musicians name, and they are led to a page featuring information about the book or
- Smaller sites: When a visitor can click through a sites complete contents in a
matter of minutes, an index would not add much value.
In contrast, many types of sites would serve their visitors better by offering an
index. This is especially true of online magazines or other content-rich sites.
For example, 21st Century Online publishes
articles by professionals in various disciplines. Although a reader can simply "drill
down" through the current selection of articles on the site, this becomes
increasingly difficult as more and more articles are published.
Even Hotwired (the online counterpart of Wired
magazine) does not yet have a site index. However, an index would be especially helpful
for finding specific information in this venues four years worth of archives.
Working with (or as) an Indexer
If you decide that your Web site needs an index, you then must decide whether to hire
someone to create it, or whether to do it yourself.
If your site is very content-rich, youre probably better off investing in hiring
a professional indexer. This also could be a good decision for sites that are smaller or
less complex, as long as the budget is available.
Remember: the goal of an index is to improve the usability of a Web site.
Therefore, considering an indexer as a usability professional could help justify this
However, if your site is not especially large, or if there is no budget to hire an
indexer, or if you simply wish to learn a new skill, it is possible to teach yourself
enough about the basics of indexing to attempt this project. A few resources that can help
you learn how to create an index are:
Site from A to Z: Creating an index for users who know what theyre doing"
The American Society of Indexers (ASI) offers a bibliography of resources for and about
indexing, as well as a list of frequently asked questions
on this topic. They also have published a document specifically about Indexing the Web.
This article by Lou Rosenfeld, published in Web Review in Oct. 1997, covers the
basics of what makes a good index. It also outlines a four-step process for creating a Web
- The Chicago Chapter of the American Society of Indexers has published an Index Evaluation Checklist,
which can help you determine whether an index is appropriate and complete. While this
document does not specifically address Web indexing, many of its points apply to Web
Indexing also can be a lucrative line of work. Although most available indexing
work is for print media (books, etc.), indexes are becoming increasingly common in online
and digital media (Web sites, Intranets, CD-ROMs, etc.). For writers, editors, producers,
or Web developers, indexing can be one more valuable service to market to your clients.
The ASI is a good resource for people who seek
to become professional indexers. This groups indexing FAQ covers several key
points about the "business side" of this field.
Whether your site has an index or not, or whether you learn to create indexes or not,
learning about indexing can prove valuable to anyone who develops or uses Web sites.
Understanding indexes makes Web developers and publishers consider what their users
would want to find, and how those searches could be simplified or aided. Similarly, Web
users who understand the value of a good index can encourage Web publishers to add this
key usability tool to their sites.
Its even possible that, one day, indexes might be considered as indispensable to
informational or content-rich Web sites as they are to printed reference books today.
(c)1998 by Kevin Broccoli. All rights reserved.
Published in CONTENTIOUS with permission. Do not reproduce or redistribute any material
from this document, in whole or in part, without written permission from the author.