Synthesis by Fred Brown
Published in KeyWords, November/December 1998, American Society of Indexers.
Dr. Bella Hass Weinberg presented a seminar entitled "Vocabulary Links:// Thesaurus Design for Information Systems" on April 3, 1998, at the Association of the Bar of the City of New York.
|In a humorous introduction, Dr. Sherry
Vellucci, who teaches cataloging at St. John's University, suggested that
Bella Hass Weinberg became involved in thesaurus construction through her
love of hats. Bella wished to answer such questions as:
When did women start wearing hats — and why?She took her request to the library of a well-known fashion institute. When she entered the word "hat" into the institute's new full-text database, she got back 32,345 documents, including such hits as:
"Versace's friends passed the hat to help him ...",According to Dr. Vellucci, Bella took up the challenge and five years later produced the first draft of the NISO National Standard on Thesaurus Construction [published in 1994].
The thesaurus provides a powerful tool for organizing and searching large bodies of online information such as databases, help systems, and knowledge bases. You can use a thesaurus to generate a standard set of headings and cross-references for a back-of-the-book style index. A thesaurus may be available in printed form, online, or both.
In her presentation, Bella demonstrated the diversity and complexity of current paper and online thesauri. Current displays of thesauri, print and online, have been developed by specialists in the field of information science. User-centered design techniques have yet to be employed in developing thesaurus interfaces, and little or no usability testing has been done. Bella spent much time clarifying thesaurus terminology that varied among different thesauri. Despite her honorable efforts, this inconsistency in thesaurus terminology made it difficult for seminar participants who were new to thesaurus design to grasp the essential concepts. End-users, in addition to being confronted by different nomenclatures, make little use of thesauri when searching.
Bella discussed thesauri in relation to Natural Language Searching. Natural Language Searching has serious problems originating from the nature of language itself. Many words (synonyms) mean the same thing. A single word can have different meanings (homographs) - resulting in ambiguity. Dr. Vellucci's hat query demonstrates that words with a single meaning can also be used in quite different contexts . Thus it becomes difficult to automatically map a given concept in the searcher's mind to a given string of text. [In a private communication, Dr. Weinberg explained that a searching thesaurus cannot overcome the problem of idiomatic uses of words which result in false drops.]
Controlling or managing the vocabulary used for the purposes of indexing and/or searching provides a solution - known as "vocabulary control." The controlled vocabulary process allows both the indexer and the searcher to access the same concepts through authorized terms known as descriptors.
The thesaurus enables you to organize and manage vocabulary. A thesaurus requires
Bella argued eloquently that the thesaurus must be brought to the user instead of the user having to request it. A thesaurus should be designed for usability - otherwise users will not consult it. When the user inputs a term, the system should suggest other related search terms. Users need to see how their search terms are expanded and select the appropriate terms from a suggested list. The expansion should not be done automatically; otherwise, entering a term such as "counselors" could lead to obtaining information about "lawyers" by mistake, when information on camp counselors is sought.
Good thesaurus management ensures that the thesaurus remains relevant and usable over time.
Thesaurus structureThesaurus structure embodies rigorous semantic relationships and reflect the principle of post-coordination of terms.
Post-coordination means that thesaurus terms are combined at the time of searching rather than at the time of indexing, as in subject heading lists. For example, 'mammal' and 'habitat' [examples mine] would be separate terms in a thesaurus. The user would combine the terms using a Boolean operator, e.g., 'mammal AND habitat'. Post-coordination leads to greater flexibility in constructing search strategies but often reduces the precision of indexing because the nature of the relationship between terms remains unexpressed.
Rigorous semantic relationships allow
a user to enter the thesaurus and to identify the appropriate search term(s).
Thesauri contain three types of semantic relationship:
The hierarchical relationship links broader and narrower terms. Three types of hierarchical relationships can be coded. You can identify these three hierarchical relationships using the "is a" test:
Some thesaurus terms belong to only one hierarchy or tree structure. However, in some cases a thesaurus term may have two broader terms or two different hierarchical relationships. For example, a string bass is a type of string instrument and is a part of a jazz ensemble [example mine]. Thesaurus design should accommodate such "polyhierarchies."
You employ the associative relationship when two terms overlap in meaning. The associative may be
Thesaurus presentationThere are currently several ways of presenting a thesaurus in print. The flat view shows the thesaurus terms in alphabetical order with accompanying detail and only one level of broader and narrower terms. The example below, from the Thesaurus of Eric Descriptors, shows the record for a single thesaurus term:
A "rotated" or "permuted" list displays all the words used in the thesaurus, thus allowing the searcher to enter the thesaurus using any word from either a formal thesaurus term or a synonym. The example below, from the Thesaurus of Engineering and Scientific Terms, lists each word in the thesaurus in Keyword-out-of-context format; the formal thesaurus terms and synonyms in which the word appears are indented below the keyword.
Web or online thesauri tend to mimic the layout of their paper counterparts. Because of the limited resolution of the computer screen, as compared to print on paper, online thesauri currently available present less information and are less readable. Apart from making hypertext links to other terms, online thesauri take little advantage of the online electronic medium.
Thesaurus managementGood thesaurus management ensures that the thesaurus remains relevant and usable over time. A thesaurus grows and evolves over time. Additions to the body of knowledge in a domain may require new terms. Concepts and vocabulary may evolve as well. In some cases, organizations may choose to merge separate thesauri for related bodies of information.
The thesaurus standard recommends that a printed thesaurus have a title page indicating the date of release. Although this sounds obvious, Bella demonstrated thesauri that lacked these elements. To be complete, record the dates of all changes to individual thesaurus terms as well.
Be aware that as a thesaurus becomes large and complex, indexing consistency decreases. Thesaurus managers always face a tradeoff between refining terminology and the ease of locating terms in the body of the thesaurus.
Look at the frequency of thesaurus terms in the database. A term with a large number of items may be a candidate for subdivision into narrower terms.
A variety of software tools are available for creating thesauri. A good thesaurus program can automatically post the reciprocals of relationships and check for consistency. There is no need to write your own software — good tools are available with a variety of capabilities and prices. [The packet distributed at the seminar included brochures for thesaurus software and references to literature about such software. Copies may be ordered from St. John's University (718-990-6200).]
Final wordThe thesaurus can play a key role in enhancing searches — especially in online environments. To work effectively, the thesaurus of the future needs to be designed in accordance with good information science principles and with the user in mind.