Back
Home
Up
Next

Declarative referential integrity v/s triggers.

    Relational theory says you must normalize data to make it fit inside a relational model. Of course, entities must be connected in some way. The "glue" or link between two normalized entities is a common field for both entities. The common field acts as a bridge so you can pass from one entity to another using SQL and it's the physical representation of a logical "relationship" between two entities. Indeed, you can have one entity linked with itself or three entities linked at the same time, too, depending on the semantics of your model.

When a model is normalized, this means usually you get one real-world concept or thing decomposed in several normalized entities, usually referred as tables. But these tables must be kept in sync to get a consistent data repository. For example, a bill and its details are two separated tables, but no detail should be allowed if it refers to a bill that doesn't exist. Contrarily, a bill cannot be deleted unless there aren't details or these details are erased first. Validating these rules in client applications is not the bulletproof solution because people might be able to use a third party tool to insert, delete or update data. To the rescue comes referential integrity. This is a metadata declaration that enforces a dependency relationship between two tables through one or more fields. So, the engine itself is in charge of verifying consistency across tables; you can think that one table "refers" to other for validation and hence the name "referential integrity".

In IB, a referential integrity constraint can be set at a field level or at a table level. When more than one field is involved, a table-level constraint is required. In SQL, this constraint is called a foreign key (FK) declaration. The lookup table (the one where  values are verified against) must exist. Three important restrictions are:
Currently, a FK declaration can be done only when one connection is made to the database. If more connections are in effect, an error message like "object is in use" will appear.
If the lookup table has been used in the current session even if only there's one connection to the database, it might be necessary to disconnect and reconnect.
The lookup table must have declared already the referenced field(s) with an unique index (UK) or the command won't be accepted. Only the child (referrer) table will have a non-unique index automatically generated to handle the relationship.

Now that FK declarations have established the database schema, you will find that's not very easy to modify metadata of tables with referential integrity constraints. Specifically, you cannot drop or alter fields that are involved in a FK declaration. Because of these limitations, you may be tempted to define your referential integrity not with DDL standard commands but handle the validations for yourself by means of triggers. However, some drawbacks exist:
The main reason why a referred (master, lookup) table must be altered, too is that a master record cannot be deleted if child records pointing to it exist, so you'll have to define a trigger not only on the child (to avoid insertions without parent) but also on the master. This may become a burden as the number of tables increase.
FK declarations are self-evident by reading them. Having to read the source of a trigger to discover what it does is not as attractive. Also, because a table can have many triggers, to user-defined ones for referential integrity purposes are better given meaningful names. Triggers are most used to make entries in audit tables, to modify or fill some fields before inserting or changing data, to make possible auto-increment fields and to stop operations in case a condition fails and this condition cannot be handled by declarative table-level or column-level constraints so it has to be done with procedural instructions. Also, you can't drop fields that are used by triggers or stored procedures.
Usually, a FK declaration is optimized so it makes a fast check of the conditions. If you intend to replace it with a trigger, make sure you make a good lookup. For example, don't use count(*) to know if you find a match; you only are interested if a match happens, not how many matches exist, so an EXISTS test with a plain select_* will be faster.
From IB5 onwards, declarative FK declarations can include options that specify what should happen when a record in a parent table is deleted or modified and there are dependant records on the child table. The typical options are CASCADE, NULL and DEFAULT. Having to reproduce these features in a trigger becomes unattractive.
Perhaps the most important point was left at the end: IB goes to great pains to ensure that referential integrity constraints defined through PK declarations are ran against committed data from other transactions in a way that's independent on the state and isolation level of your transaction. Of course, inside your own transaction there's no visibility problems. Remember that IB always uses a transaction, either implicit or explicit. Because the default isolation is snapshot (see the explanation on transaction options) you risk missing some entries that were posted and committed after your transaction started, so a trigger making a select will not see these new entries. Here, two things may occur: either you cannot insert in the child table even though another transaction has committed the needed record in the master table or worse, you delete a record from the master table and your trigger doing a delete on the child table doesn't erase some records that meet the condition to be erased because they were committed after your transaction started, so you are leaving some "dangling records" (orphan records) in a problem that resembles the dangling pointers in programming languages, the only difference being that an SQL sentence with orphan records won't crash the engine as an invalid pointer may stop a program. These nightmares are avoided with declarative referential integrity.

In IB, stored procedures and triggers cannot start transactions by themselves and cannot change the current transaction isolation. They run inside the transaction started by the client. You can say that procedural referential integrity (namely, using triggers and not declarations) still can be used provided that the read committed level of isolation is used by all the client. After all, other relational engines used read committed as the default level and the BDE uses it, too. However, nothing prevents your client from using snapshot explicitly (even on other engines) and you have no way to detect that in a trigger. Also, one aim of relational engines is passing the enforcement of all basic rules to the engine, so there's no need to reinvent the wheel unless you really need a very complex lookup in another table. In this case, it can be a sign of a normalization problem.

 

This page was last updated on 2000-05-26 04:28:43 -0400 .