|Home Forum RSS PGP Alerts Links (B)|
Shadows and trails (Time Domain Addressing)Comment on this article
This is a database pattern. I recently discovered a mention of something similar in Gray's Transaction Processing bible where it's named "Time Domain Addressing". Seems it was described in depth in 1978 by Dale Reed (or something similar) but not implemented in many databases as yet. Lots of problems to do it right, it seems. Since there was no in depth description, I can only assume my implementation is a small part of the whole. But a part that's good enough for me.
The problem (well, one of the problems) with a "straight" relational database is that it's always a timedependent snapshot. Example: you've got an invoices database with a customer table. Yesterday you change the address of one of the customers. Today you re-print an invoice from last month, and at the top of the invoice you see the new version of the address. In other words, there's no way you can re-print the invoice exactly as it looked when it was originally entered and printed a month ago.
Another example, more distressing this time. This morning, I get a lab result on patient X showing a hematocrit of 30% on a sample taken at 7 am (for you non-MDs out there, this is lowish), so I have the nursing staff give him a unit of blood. An hour later, the lab corrects that same result for the 7 am sample to 36% and even though I'm damned sure I actually saw "30%" on that screen earlier, now it says "36%" and I'm having a hard time convincing a collegue I actually did the right thing under the circumstances as they were then.
Of course, these kinds of problems can be solved piecemeal by having provisions in the respective tables for keeping a record of changes. But we're bound to provide such backup information only in the obvious places, not everywhere, unless we can set up a generalized mechanism for the entire database that takes care of this issue for all database entities. Can we? Yes we can, else this piece would have been pointless, right?
First, each table (and this means practically every table we've got in the database, except for tables that can never have updates or deletes) gets a second shadow-table with exactly the same structure as the "primary" table. Into this shadow table goes old versions of updated records and the most recent version of deleted records. The "primary" table, and the shadow table, both have a sequence number column added to be used for the trail system.
An extra table, the "trail" table, is added to the database. There's only one for the whole database. This table gains one row for each insert, update or delete on any other table in the system. A row here looks like:
The trail list table is maintained by insert, update and delete triggers on the individual tables, making for performant implementation and the impossibility of applications to get around the system. Also, the applications need not be aware that the trail system even exists.
To illustrate how it works, let's go through an example. We'll add in "Albert", change his name to "Bertrand" and finally delete him. The trail table lacks the Session column since it's irrelevant to the example. Also, we assume "patients" to be table number 10. TSeq and PSeq mean Trail sequence nb resp. Primary sequence nb. DT means DateTime.
Step 1 - at 09:00 we add "Albert" to the system.
Step 2 - at 09:30 we rename "Albert" to "Bertrand".
Step 3 - at 10:00 we delete "Bertrand" from the system.
Note that any application that's oblivious to the system simply reads the primary table and chugs along happily. If you want to use the history data to find the name of a particular patient at any particular time, you can do that through a procedure working as follows (pseudo code):
sp_getPatOnDate(patKey, dtTime) for each DT in Trail (PSeq = patKey) if (dtTime < DT) select from shadow where Key = TSeq return endif endfor select from Primary where Key = patKey return
If you try this algorithm at noon (12:00) with different time values, you'll find:
You can easily check that if the record had never been deleted, the last "select from Primary" would have picked it up correctly. The same goes for the situations where records have never been deleted or updated.
It's not hard to deduce the needed triggers and procedures from the above discussion. It's also easy to see that even though these triggers and procedures aren't trivial, they're very stereotypical and can easily be automatically generated for entire databases.
It's also blindingly obvious (to me, at least) that these operations belong in the database as triggers and stored procedures and not in a separate data access layer on the same or another machine. The reasons for this are several, including watertight control and performance.
There are several uses for this system: