Vocabulary Links:// Thesaurus Design for Information Systems — seminar by Dr. Bella Hass Weinberg

Synthesis by Fred Brown
Allegro Technical Indexing


Published in KeyWords, November/December 1998, American Society of Indexers.

Dr. Bella Hass Weinberg presented a seminar entitled "Vocabulary Links:// Thesaurus Design for Information Systems" on April 3, 1998, at the Association of the Bar of the City of New York

In a humorous introduction, Dr. Sherry Vellucci, who teaches cataloging at St. John's University, suggested that Bella Hass Weinberg became involved in thesaurus construction through her love of hats. Bella wished to answer such questions as: 
When did women start wearing hats — and why? 
What were the first hats made of? 
Did berets really come from France?
She took her request to the library of a well-known fashion institute. When she entered the word "hat" into the institute's new full-text database, she got back 32,345 documents, including such hits as: 
"Versace's friends passed the hat to help him ...", 
"the fashion world took off their hats to Vera Wong's ...",
and "a new Israeli designer has just tossed his hat into the ring!".
According to Dr. Vellucci, Bella took up the challenge and five years later produced the first draft of the NISO National Standard on Thesaurus Construction [published in 1994]. 

The thesaurus provides a powerful tool for organizing and searching large bodies of online information such as databases, help systems, and knowledge bases. You can use a thesaurus to generate a standard set of headings and cross-references for a back-of-the-book style index. A thesaurus may be available in printed form, online, or both. 

In her presentation, Bella demonstrated the diversity and complexity of current paper and online thesauri. Current displays of thesauri, print and online, have been developed by specialists in the field of information science. User-centered design techniques have yet to be employed in developing thesaurus interfaces, and little or no usability testing has been done. Bella spent much time clarifying thesaurus terminology that varied among different thesauri. Despite her honorable efforts, this inconsistency in thesaurus terminology made it difficult for seminar participants who were new to thesaurus design to grasp the essential concepts. End-users, in addition to being confronted by different nomenclatures, make little use of thesauri when searching. 

Bella discussed thesauri in relation to Natural Language Searching. Natural Language Searching has serious problems originating from the nature of language itself. Many words (synonyms) mean the same thing. A single word can have different meanings (homographs) - resulting in ambiguity. Dr. Vellucci's hat query demonstrates that words with a single meaning can also be used in quite different contexts . Thus it becomes difficult to automatically map a given concept in the searcher's mind to a given string of text. [In a private communication, Dr. Weinberg explained that a searching thesaurus cannot overcome the problem of idiomatic uses of words which result in false drops.] 

Controlling or managing the vocabulary used for the purposes of indexing and/or searching provides a solution - known as "vocabulary control." The controlled vocabulary process allows both the indexer and the searcher to access the same concepts through authorized terms known as descriptors. 

The thesaurus enables you to organize and manage vocabulary. A thesaurus requires 

  • a well designed structure suitable to a particular body of terminology (domain)
  • a usable interface for both indexers and searchers
  • good management
Thesaurus structure should be kept simple. Simpler structure assures greater consistency in indexing — important to the quality of search results. 

Bella argued eloquently that the thesaurus must be brought to the user instead of the user having to request it. A thesaurus should be designed for usability - otherwise users will not consult it. When the user inputs a term, the system should suggest other related search terms. Users need to see how their search terms are expanded and select the appropriate terms from a suggested list. The expansion should not be done automatically; otherwise, entering a term such as "counselors" could lead to obtaining information about "lawyers" by mistake, when information on camp counselors is sought. 

Good thesaurus management ensures that the thesaurus remains relevant and usable over time. 

Thesaurus structure

Thesaurus structure embodies rigorous semantic relationships and reflect the principle of post-coordination of terms. 

Post-coordination means that thesaurus terms are combined at the time of searching rather than at the time of indexing, as in subject heading lists. For example, 'mammal' and 'habitat' [examples mine] would be separate terms in a thesaurus. The user would combine the terms using a Boolean operator, e.g., 'mammal AND habitat'. Post-coordination leads to greater flexibility in constructing search strategies but often reduces the precision of indexing because the nature of the relationship between terms remains unexpressed. 

Rigorous semantic relationships allow a user to enter the thesaurus and to identify the appropriate search term(s). Thesauri contain three types of semantic relationship: 

  • equivalence
  • hierarchy
  • association
The equivalence relationship covers synonyms. Synonyms may be words representing technical and common usage, e.g., lateral oscillation, snaking [examples mine]. Sometimes thesaurus designers pool words that really have a hierarchical relationship under one broad term for simplicity, e.g., chairs and tables under furniture. 

The hierarchical relationship links broader and narrower terms. Three types of hierarchical relationships can be coded. You can identify these three hierarchical relationships using the "is a" test: 

  • is a type of, e.g. a cow is a type of mammal
  • is a part of, e.g. a finger is a part of a hand
  • is an instance of, e.g. Haley's comet is an instance of a comet
Refining the coding of hierarchical relationships allows for more accurate inferences. 

Some thesaurus terms belong to only one hierarchy or tree structure. However, in some cases a thesaurus term may have two broader terms or two different hierarchical relationships. For example, a string bass is a type of string instrument and is a part of a jazz ensemble [example mine]. Thesaurus design should accommodate such "polyhierarchies." 

You employ the associative relationship when two terms overlap in meaning. The associative may be 

  • symmetrical, e.g., gold is related to money and money is related to gold
  • asymmetrical, e.g., population control is related to family planning, but there is no related-term reference in the opposite direction. (Someone searching for family planning is unlikely to be interested in population control.)

Thesaurus presentation

There are currently several ways of presenting a thesaurus in print. The flat view shows the thesaurus terms in alphabetical order with accompanying detail and only one level of broader and narrower terms. The example below, from the Thesaurus of Eric Descriptors, shows the record for a single thesaurus term: 

While the flat presentation gives all the relationships for each thesaurus term, a user who begins with a term that is not one of the formal thesaurus terms or a cross reference to it may experience difficulty. This problem can happen when the search term is a word within a formal thesaurus term that is not an entry term (cross reference). 

A "rotated" or "permuted" list displays all the words used in the thesaurus, thus allowing the searcher to enter the thesaurus using any word from either a formal thesaurus term or a synonym. The example below, from the Thesaurus of Engineering and Scientific Terms, lists each word in the thesaurus in Keyword-out-of-context format; the formal thesaurus terms and synonyms in which the word appears are indented below the keyword. 

The multilevel thesaurus presentation and the tree structure show all levels of hierarchical relationship. The example below, from the Art and Architecture Thesaurus, shows a hierarchical (faceted) structure subdivided initially by node labels (in angle brackets) indicating the type of hierarchical relationship: 

Thesaurus term relationships can also be shown graphically. However, the resulting diagram can be so complex as to be nearly incomprehensible to the untrained eye. 

Web or online thesauri tend to mimic the layout of their paper counterparts. Because of the limited resolution of the computer screen, as compared to print on paper, online thesauri currently available present less information and are less readable. Apart from making hypertext links to other terms, online thesauri take little advantage of the online electronic medium. 

Thesaurus management

Good thesaurus management ensures that the thesaurus remains relevant and usable over time. A thesaurus grows and evolves over time. Additions to the body of knowledge in a domain may require new terms. Concepts and vocabulary may evolve as well. In some cases, organizations may choose to merge separate thesauri for related bodies of information. 

The thesaurus standard recommends that a printed thesaurus have a title page indicating the date of release. Although this sounds obvious, Bella demonstrated thesauri that lacked these elements. To be complete, record the dates of all changes to individual thesaurus terms as well. 

Be aware that as a thesaurus becomes large and complex, indexing consistency decreases. Thesaurus managers always face a tradeoff between refining terminology and the ease of locating terms in the body of the thesaurus. 

Look at the frequency of thesaurus terms in the database. A term with a large number of items may be a candidate for subdivision into narrower terms. 

A variety of software tools  are available for creating thesauri. A good thesaurus program can automatically post the reciprocals of relationships and check for consistency. There is no need to write your own software — good tools are available with a variety of capabilities and prices. [The packet distributed at the seminar included brochures for thesaurus software and references to literature about such software. Copies may be ordered from St. John's University (718-990-6200).] 

Final word

The thesaurus can play a key role in enhancing searches — especially in online environments. To work effectively, the thesaurus of the future needs to be designed in accordance with good information science principles and with the user in mind.

Home | Website indexing | Training | Professional expertise | Articles
Test your index | Sample indexes | Internet resources | Contact Allegro

Top ~