FC Navigation Console

DBMS - March 1998
DBMS Online: Data Warehouse Architect By Ralph Kimball

Meta Meta Data Data

Making a List of Data About Metadata and Exploring Information Cataloging Tools.


Metadata is an amazing topic in the data warehouse world. Considering that we don’t know exactly what it is, or where it is, we spend more time talking about it, worrying about it, and feeling guilty we aren’t doing anything about it than any other topic. Several years ago we decided that metadata is any data about data. This wasn’t very helpful because it didn’t paint a clear picture in our minds. This fuzzy view gradually cleared up, and recently we have been talking more confidently about the "back-room metadata" that guides the extraction, cleaning, and loading processes, as well as the "front-room metadata" that makes our query tools and report writers function smoothly.

The back-room metadata presumably helps the DBA bring the data into the warehouse and is probably also of interest to business users when they ask from where the data came. The front-room metadata is mostly for the benefit of the end user, and its definition has been expanded not only to include the oil that makes our tools function smoothly, but also a kind of dictionary of business content represented by all the data elements.

Even these definitions, as helpful as they are, fail to give the data warehouse manager much of a feeling for what it is he or she is supposed to do. It sounds like whatever this metadata stuff is, it’s important, and we better:

Now there is a good, solid IT set of responsibilities. So far, so good. The only trouble is, we haven’t really said what it is yet. We do notice that the last item in the above list really isn’t metadata, but rather, data about metadata. With a sinking feeling, we realize we probably need meta meta data data.

To get this under control, let’s try to make a complete list of all possible types of metadata. We surely won’t succeed in this first try, but we will learn a lot. First, let’s go to the source systems, which could be mainframes, separate nonmainframe servers, users’ desktops, third-party data providers, or even online sources. We will assume that all we do here is read the source data and extract it to a data staging area that could be on the mainframe or could be on a downstream machine. Taking a big swig of coffee, we start the list:

Now let’s list all the metadata needed to get the data into a data staging area and prepare it for loading into one or more data marts. We may do this on the mainframe with hand-coded Cobol, or by using an automated extract tool. Or we may bring the flat file extracts more or less untouched into a separate data staging area on a different machine. In any case, we have to be concerned about metadata describing:

Once we have finally transferred the data to the data mart DBMS, then we must have metadata, including:

  • DBMS backup status, procedures, and security. In the front room, we have metadata extending to the horizon, including:

    Now we can see why we didn’t know what this metadata was all about. It is everything! Except for the data itself. Suddenly, the data seems like the simplest part.

    With this perspective, do we really need to keep track of all this? We do, in my opinion. This list of metadata is the essential framework of your data warehouse. Just listing it as we have done seems quite helpful. It’s a long list, but we can go down through it, find each kind of metadata, and identify what it is used for and where it is stored.

    There are some sobering realizations, however. Much of this metadata needs to reside on the machines close to where the work occurs. Programs, settings, and specifications that drive processes have to be in certain destination locations and in very specific formats. That isn’t likely to change soon.

    We are going to need a tool for cataloging metadata and keeping track of it at the very least. The tool probably can’t read and write all the metadata directly, but at least it should help us manage the metadata that is stored in so many locations. Fortunately, there is a category of tools dedicated to this very purpose. Go and look at the Information Catalog Tools section of Larry Greenfield’s Web site at pwp.starnetinc.com/larryg/catalog.html. As of this writing, Larry lists no fewer than 14 tools that aim squarely at the metadata problem. Of these, six in particular caught my eye as claiming to take very closely the perspective that I have described in this article. These six are deliveryManager from Virtual Integration Technology (www.vit.com), InfoCat from Enterprise Solutions Inc. (www.infocat.com), Logic Works Universal Directory from Logic Works (www.logicworks.com), Marlow from One Meaning (www.onemeaning.com), Metadata Control Center from Intellidex Systems (www.intellidex.com), and Prism Warehouse Directory from Prism Solutions (www.prismsolutions.com). I recommend that you take a look at these products to see if they meet your metadata needs.

    Once we have taken the first step of getting our metadata corralled and under control, can we hope for tools that will pull all the metadata together in one place and be able to read and write it as well? With such a tool, not only would we have a uniform user interface for all this disparate metadata, but on a consistent basis we would be able to snapshot all the metadata at once, back it up, secure it, and restore it if we ever lost it.

    Don’t hold your breath. As you can appreciate, this is a very hard problem, and encompassing all forms of metadata will require a kind of systems integration that we don’t have today. I believe the Metadata Coalition (a group of vendors trying seriously to solve the metadata problem) will make some reasonable progress in defining common syntax and semantics for metadata, but it has been two years and counting since they started this effort. Unfortunately, Oracle, the biggest DBMS player, has chosen to sit out this effort and has promised to release its own proprietary metadata standard. Other vendors are making serious efforts to extend their product suites to encompass many of the activities listed in this article and simultaneously to publish their own framework for metadata. These vendors include Microsoft, who’s working with the Metadata Coalition to extend the Microsoft Repository, as well as a pack of aggressive, smaller players proposing comprehensive metadata frameworks, including Sagent, Informatica, VMark, and D2K. In any case, these vendors will have to offer significant business advantages in order to compel other vendors to write to their specifications. You can read the Metadata Coalition’s position papers and progress reports on www.he.net/~metadata. Meanwhile, take a look at the information catalog tools I mentioned, and get started entering your meta meta data data.


    Ralph Kimball was coinventor of the Xerox Star workstation, the first commercial product to use mice, icons, and windows. He was vice president of applications at Metaphor Computer Systems and is the founder and former CEO of Red Brick Systems. He now works as an independent consultant designing large data warehouses. His book The Data Warehouse Toolkit: How to Design Dimensional Data Warehouses (Wiley, 1996) is now available. You can reach Ralph through his Web page at www.rkimball.com.

    This is a copy of an article published @ http://www.dbmsmag.com/