DBMS
 

 

Building a Logical Data Model

By George Tillmann
DBMS, June 1995

This second part in a series of articles outlines the basic building blocks of a data model.


From A Practical Guide to Logical Data Modeling by George Tillmann. Copyright © 1993 by George Tillmann. Reprinted with permission of McGraw-Hill Inc. To order, please call 800-262-4729.

Though this be madness, yet there is method in it. --Shakespeare

If people knew how hard I had to work to gain my mastery, it wouldn't seem wonderful at all. --Michelangelo

Logical data modeling is a graphic-intensive technique that results in a data model representing the definition, characteristics, and relationships of data in a business, technical, or conceptual environment. Its purpose is to describe end-user data to systems and end-user staff.

Various methods of data modeling exist, each using a host of conventions and tools. The most popular approach is called the entity-relationship (ER) approach, developed by Peter Chen in the late 1970s. Although a number of authors and tool designers have modified and expanded ER concepts, most still have a strong Chen flavor. Also, with the introduction of CASE tools, the number of diagrammatic conventions that the data modeler must master has increased sharply.

Data Modeling Objects

The three types of data objects--entities, attributes, and relationships--are the basic building blocks of modeling: While a number of graphical conventions represent data objects, one convention on which all the different approaches seem to agree is the use of a rectangular box to represent an entity. Relationships are represented by a line. However, for various reasons, including time, space, and laziness, modelers often do not label relationships in both directions. One reason is that it is often unnecessary to label the relationships if the meaning of the relationship is easily understood.

Some modelers use a diamond to represent a relationship, with lines connecting the diamond and entity boxes. (See Figure 1.) Attributes are diagrammed in several different ways or not diagrammed at all. Some modelers place attributes in the entity box while others use ovals to hold attribute names. (See Figure 2.)

However, most modelers do not place attributes on the diagram at all. They believe that diagramming attributes makes sense only for those simple models, presented in textbooks, that have perhaps a dozen attributes in total. In the real world, a diagram could quickly start to look like a Tokyo subway map as attributes are piled onto the page. A more practical approach is to keep attributes out of the data model and in the data dictionary.

Type/Occurrence Distinction

Before pressing on, you should understand the distinction between an entity type and an entity occurrence or instance. (I use "occurrence" and "instance" interchangeably.) An entity type represents the class of objects that share a distinguishing factor. An occurrence is a single case or instance of a type. For example, "Detective" is an entity type, while "Sherlock Holmes," "Hercule Poirot," and "Ellery Queen" are entity instances. The distinction between entity type and entity occurrence is the same as the distinction between record type and record occurrence.

To say that entity type 'A' relates to entity type 'B' means that one or more occurrences of 'A' are (or can be) related to one or more occurrences of 'B.' If this argument sounds familiar to you, it might be because you studied a similar issue in your college philosophy class. Philosophers have what they call the "type token distinction," where 'Man' can represent the set or 'type' of all men (or women), and 'Socrates' represents a single 'token' or occurrence of that set. Attributes and relationships, like entities, have types and occurrences. The distinction between type and occurrence is important for understanding the data modeling concepts of cardinality and modality, which are the two characteristics of relationships.

Membership Class (Connectivity Characteristics)

If entity 'A' relates to entity 'B,' knowing more about how the occurrences of 'A' relate to the occurrences of 'B' is important. One concern is the cardinality of the relationship.

Cardinality is the specification of the number of occurrences of one entity type that can be related to the number of occurrences of another entity type. Cardinality is usually expressed as simply "one" or "many." For example, a husband can have only one wife (in most cultures), while a parent can have many children. Thus, two entities can be related as:

The most popular way to represent cardinality is to use a bar to express one and a trident (also called a "crow's foot" or "chicken foot") to express many. (See Figure 3.) However, several other approaches and diagramming conventions exist. (See Figure 4.)

Note that Chen and Reiner use a diamond to represent a relationship, while the Trident approach uses a line. (For a discussion of the Reiner technique, see Reiner et al., "The Data Base Design and Evaluation Workbench [DDEW] Project at CCA." Database Engineering 7(4):10-15, 1985.) The diamond, as I explain later on, does a better job of representing certain types of relationships, but it is not as good at showing the bidirectionality of relationships. Chen represents cardinality by using a 1, N, or M on the relationship line, while Reiner fills in the diamond. Also, most tools now use the trident to show a cardinality of many, while some give the user a choice of symbols.

In contrast to cardinality, the modality of a relationship indicates whether an entity occurrence must participate in a relationship. Cardinality tells you the maximum number of entity occurrences that can participate in a relationship, while modality (also called optionality) tells you the minimum number of occurrences. The modality values are zero if an occurrence is not needed or optional, and one if an entity occurrence is required or mandatory.

Take the example of Invoice and Line Item entities: An Invoice occurrence can relate to many Line Items, but a Line Item can relate to only one Invoice. This tells you the cardinality. But is it possible to have a Line Item occurrence not related to an Invoice occurrence? The answer, of course, is no. For a Line Item to exist, it must be linked to an Invoice. Therefore, the relationship is mandatory. The same is true in the other direction. It makes no sense to have an Invoice without a Line Item. The relationship is mandatory in both directions.

A bar represents a modality of one, while a circle represents a modality of zero. Let's look at the relationship entity pair, 'Artists Paint Pictures.' (See Figure 5.) Because it is impossible to have a picture without an artist, the relationship 'Pictures Are Painted By Artists' is mandatory. However, it is possible to have Artists who are not related to any Pictures (just go into Greenwich Village some Saturday night); therefore, in the other direction, the relationship is optional. When dealing with the modality of a relationship, modelers usually refer to the "one" end of a one-to-many relationship first. This relationship, then, is mandatory-optional.

There may also be optional-optional relationships. For example, the relationship 'Banks Finance Cars' is optional-optional, because you can have a bank that doesn't finance cars, and there are cars that are not financed.

Most data modelers use the term "optionality" instead of modality. This is an awkward and unfortunate use of the term because the optionality of a relationship could be either optional or mandatory. While an optional optionality appears redundant, it is not as bad as the seeming contradiction of a mandatory optionality.

Modality is a term from modal logic. It is used to distinguish necessary statements (in which truth is necessary or mandatory) from contingent statements (in which truth is conditional or dependent on external conditions). Modality is, in fact, a more accurate, meaningful, and less-confusing term than optionality, and is the one that I use in this series of articles.

By now you have probably noticed that the bar specifying a cardinality of one can usually be inferred, because a cardinality of zero is impossible. However, the cardinality bar does serve a purpose. Because modelers do not always know the cardinality of a relationship, they must have a way to distinguish not knowing from one. Note that many tool vendors do not require, nor do some allow, a bar for a cardinality of one. In that case, the best practice is to specify the nature of the relationship in the relationship description.

Degree

Relationships can have any number of entity types associated with them. When an entity type is related to itself, the relationship degree is called unary or recursive. For example, a child can be related to another child through the relationship 'Sibling.' (See Figure 6.) The most common relationship is the binary relationship that links two entity types.

N-ary relationships are those involving more than two entity types, such as 'Customer Buys a Car from a Dealer.' (See Figure 7.)

All data modeling tools support binary relationships, and most support unary relationships, but only a few support n-ary relationships. The reason is that most DBMSs support binary relationships only.

Attribute Values

As with entities and relationships, there are attribute types and attribute occurrences. An attribute value is an instance or occurrence of an attribute type. An attribute value is a characteristic of or fact about an entity occurrence. The fact might be that the entity's Color is "blue" (the convention is to place attribute values in double quotes) or that the AUTHOR NAME is "Thomas Rowley."

Attribute values are what data processing is all about. They form the core of information management and represent the most tangible and least abstract aspects of all data processing.

Many data modelers divide the world into data and metadata. Data consists of tangible data values, such as "blue," "french fries," and "mustang." Metadata is data about data. For example, the attribute type Menu Item tells us something about the attribute instance "curried pancakes," while the entity type Employee tells us something about the instances of "employee." These objects are called metadata because they are one step removed from the data.

Domains

A domain is the set of possible values an attribute type can have. Examples of domains include: dates, text, integers between 200 and 399, real numbers with two decimal places, state abbreviations, and so on. However, while "July 11, 1983" is an acceptable value for Employment Date, "Curried Pancakes" is not.

Domains are important because they tell you not only what the acceptable values of an attribute are, but also how to use the attribute. For example, the statement: "Medical Coverage = 'Yes' if Claim Date is greater than or equal to Employment Date and less than or equal to Termination Date" makes sense only if the values for Claim Date, Employment Date, and Termination Date share the same domain.

If Claim Date = "May 5, 1987," Employment Date = "July 11, 1983," and Termination Date = "123 South Main Street," the results will be quite unpredictable.

Domains can be specific or generic. Generic domains, such as 'integer,' 'text,' or the ever-popular 'alphanumeric,' are the easiest to work with, but they're also the least meaningful. Domains such as 'Dates between 1/1/50 and 12/31/94' or 'acceptable Zip codes' are more useful.

Domains can also be nested; that is, the scope of one domain can incorporate another. The domain 'Dates between 1/1/50 and 12/31/94' is incorporated in the domain 'Dates,' which, in turn, is incorporated in the domain 'Integers,' and so on.

There are three main types of domains:

Just the Beginning

The ER approach to logical data modeling does not end here. What I have set forth in this article is just a cursory view of some rather complex notions involved in data modeling. Figure 8 shows how the topics I discuss in this article relate to each other.


George Tillmann is a principal with New York-based Booz-Allen & Hamilton's Information Technology Group, the commercial systems division of this international management consulting firm. You can reach George via the Internet at tillmann_george@bah.com.


FIGURE 1.


--The conventional way to show relationships. Proper procedure is to label all relationships in both directions, as shown in the top figure. However, many modelers place the relationship in a diamond, as shown in the bottom figure.


FIGURE 2.


--Diagramming attributes. Some modelers place attributes in the entity box, as shown in the top figure, while others place them in ovals, as shown in the bottom figure.


FIGURE 3.


--This figure shows cardinality. The bar represents one; the trident (or "crow's foot") represents many.


FIGURE 4.


--Different approaches to representing cardinality. Note that Chen and Reiner use a diamond to represent a relationship, while the Trident approach uses a line.


FIGURE 5.


--A mandatory-optional relationship. The bar represents mandatory, while the oval represents optional.


FIGURE 6


--A unary, or recursive relationship. An entity type is related to itself.


FIGURE 7.


--N-ary relationships involve more than two entity types.


FIGURE 8.


--This figure shows how the topics discussed in this article relate to each other.



Subscribe to DBMS and Internet Systems -- It's free for qualified readers in the United States
June 1995 Table of Contents | Other Contents | Article Index | Search | Site Index | Home

DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1995 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to dbms@mfi.com
Updated Saturday, January 25, 1997