This second part in a series of articles outlines the basic building blocks of a data model.
Though this be madness, yet there is method in it. --Shakespeare
If people knew how hard I had to work to gain my mastery, it wouldn't seem wonderful at all. --Michelangelo
Logical data modeling is a graphic-intensive technique that results in a data model representing the definition, characteristics, and relationships of data in a business, technical, or conceptual environment. Its purpose is to describe end-user data to systems and end-user staff.
Various methods of data modeling exist, each using a host of conventions and tools. The most popular approach is called the entity-relationship (ER) approach, developed by Peter Chen in the late 1970s. Although a number of authors and tool designers have modified and expanded ER concepts, most still have a strong Chen flavor. Also, with the introduction of CASE tools, the number of diagrammatic conventions that the data modeler must master has increased sharply.
Some modelers use a diamond to represent a relationship, with lines connecting the diamond and entity boxes. (See Figure 1.) Attributes are diagrammed in several different ways or not diagrammed at all. Some modelers place attributes in the entity box while others use ovals to hold attribute names. (See Figure 2.)
However, most modelers do not place attributes on the diagram at all. They believe that diagramming attributes makes sense only for those simple models, presented in textbooks, that have perhaps a dozen attributes in total. In the real world, a diagram could quickly start to look like a Tokyo subway map as attributes are piled onto the page. A more practical approach is to keep attributes out of the data model and in the data dictionary.
To say that entity type 'A' relates to entity type 'B' means that one or more occurrences of 'A' are (or can be) related to one or more occurrences of 'B.' If this argument sounds familiar to you, it might be because you studied a similar issue in your college philosophy class. Philosophers have what they call the "type token distinction," where 'Man' can represent the set or 'type' of all men (or women), and 'Socrates' represents a single 'token' or occurrence of that set. Attributes and relationships, like entities, have types and occurrences. The distinction between type and occurrence is important for understanding the data modeling concepts of cardinality and modality, which are the two characteristics of relationships.
Cardinality is the specification of the number of occurrences of one entity type that can be related to the number of occurrences of another entity type. Cardinality is usually expressed as simply "one" or "many." For example, a husband can have only one wife (in most cultures), while a parent can have many children. Thus, two entities can be related as:
Note that Chen and Reiner use a diamond to represent a relationship, while the Trident approach uses a line. (For a discussion of the Reiner technique, see Reiner et al., "The Data Base Design and Evaluation Workbench [DDEW] Project at CCA." Database Engineering 7(4):10-15, 1985.) The diamond, as I explain later on, does a better job of representing certain types of relationships, but it is not as good at showing the bidirectionality of relationships. Chen represents cardinality by using a 1, N, or M on the relationship line, while Reiner fills in the diamond. Also, most tools now use the trident to show a cardinality of many, while some give the user a choice of symbols.
In contrast to cardinality, the modality of a relationship indicates whether an entity occurrence must participate in a relationship. Cardinality tells you the maximum number of entity occurrences that can participate in a relationship, while modality (also called optionality) tells you the minimum number of occurrences. The modality values are zero if an occurrence is not needed or optional, and one if an entity occurrence is required or mandatory.
Take the example of Invoice and Line Item entities: An Invoice occurrence can relate to many Line Items, but a Line Item can relate to only one Invoice. This tells you the cardinality. But is it possible to have a Line Item occurrence not related to an Invoice occurrence? The answer, of course, is no. For a Line Item to exist, it must be linked to an Invoice. Therefore, the relationship is mandatory. The same is true in the other direction. It makes no sense to have an Invoice without a Line Item. The relationship is mandatory in both directions.
A bar represents a modality of one, while a circle represents a modality of zero. Let's look at the relationship entity pair, 'Artists Paint Pictures.' (See Figure 5.) Because it is impossible to have a picture without an artist, the relationship 'Pictures Are Painted By Artists' is mandatory. However, it is possible to have Artists who are not related to any Pictures (just go into Greenwich Village some Saturday night); therefore, in the other direction, the relationship is optional. When dealing with the modality of a relationship, modelers usually refer to the "one" end of a one-to-many relationship first. This relationship, then, is mandatory-optional.
There may also be optional-optional relationships. For example, the relationship 'Banks Finance Cars' is optional-optional, because you can have a bank that doesn't finance cars, and there are cars that are not financed.
Most data modelers use the term "optionality" instead of modality. This is an awkward and unfortunate use of the term because the optionality of a relationship could be either optional or mandatory. While an optional optionality appears redundant, it is not as bad as the seeming contradiction of a mandatory optionality.
Modality is a term from modal logic. It is used to distinguish necessary statements (in which truth is necessary or mandatory) from contingent statements (in which truth is conditional or dependent on external conditions). Modality is, in fact, a more accurate, meaningful, and less-confusing term than optionality, and is the one that I use in this series of articles.
By now you have probably noticed that the bar specifying a cardinality of one can usually be inferred, because a cardinality of zero is impossible. However, the cardinality bar does serve a purpose. Because modelers do not always know the cardinality of a relationship, they must have a way to distinguish not knowing from one. Note that many tool vendors do not require, nor do some allow, a bar for a cardinality of one. In that case, the best practice is to specify the nature of the relationship in the relationship description.
N-ary relationships are those involving more than two entity types, such as 'Customer Buys a Car from a Dealer.' (See Figure 7.)
All data modeling tools support binary relationships, and most support unary relationships, but only a few support n-ary relationships. The reason is that most DBMSs support binary relationships only.
Attribute values are what data processing is all about. They form the core of information management and represent the most tangible and least abstract aspects of all data processing.
Many data modelers divide the world into data and metadata. Data consists of tangible data values, such as "blue," "french fries," and "mustang." Metadata is data about data. For example, the attribute type Menu Item tells us something about the attribute instance "curried pancakes," while the entity type Employee tells us something about the instances of "employee." These objects are called metadata because they are one step removed from the data.
Domains are important because they tell you not only what the acceptable values of an attribute are, but also how to use the attribute. For example, the statement: "Medical Coverage = 'Yes' if Claim Date is greater than or equal to Employment Date and less than or equal to Termination Date" makes sense only if the values for Claim Date, Employment Date, and Termination Date share the same domain.
If Claim Date = "May 5, 1987," Employment Date = "July 11, 1983," and Termination Date = "123 South Main Street," the results will be quite unpredictable.
Domains can be specific or generic. Generic domains, such as 'integer,' 'text,' or the ever-popular 'alphanumeric,' are the easiest to work with, but they're also the least meaningful. Domains such as 'Dates between 1/1/50 and 12/31/94' or 'acceptable Zip codes' are more useful.
Domains can also be nested; that is, the scope of one domain can incorporate another. The domain 'Dates between 1/1/50 and 12/31/94' is incorporated in the domain 'Dates,' which, in turn, is incorporated in the domain 'Integers,' and so on.
There are three main types of domains:
--The conventional way to show relationships. Proper procedure is to label all relationships in both directions, as shown in the top figure. However, many modelers place the relationship in a diamond, as shown in the bottom figure.
--Diagramming attributes. Some modelers place attributes in the entity box, as shown in the top figure, while others place them in ovals, as shown in the bottom figure.
--This figure shows cardinality. The bar represents one; the trident (or "crow's foot") represents many.
--Different approaches to representing cardinality. Note that Chen and Reiner use a diamond to represent a relationship, while the Trident approach uses a line.
--A mandatory-optional relationship. The bar represents mandatory, while the oval represents optional.
--A unary, or recursive relationship. An entity type is related to itself.
--N-ary relationships involve more than two entity types.
--This figure shows how the topics discussed in this article relate to each other.
June 1995 Table of Contents | Other Contents | Article Index | Search | Site Index | Home
DBMS and Internet Systems (http://www.dbmsmag.com)
Copyright © 1995 Miller Freeman, Inc. ALL RIGHTS RESERVED
Redistribution without permission is prohibited.
Please send questions or comments to firstname.lastname@example.org
Updated Saturday, January 25, 1997