Keywords: Multimedia information retrieval, content independence, digital libraries, human computer interaction.
The main problem of multimedia data management is providing access to the stored objects. Three views can be distinguished in a data model for multimedia objects: the logical structure, the content structure, and the layout structure. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects' logical structure. In the case of multimedia data, representation of content is however far from trivial, and not supported by current database management systems.
The information retrieval (IR) community has since long studied the retrieval of text documents by their content. Also, research in topics like computer vision and image analysis has led to content-based retrieval techniques for querying image and audio collections. Retrieval systems based on these ideas are typically standalone systems that have been developed for very specific applications. There is not much consensus on the integration of these techniques in general-purpose DBMSs. State-of-the-art solutions simply make new functions available in the query language. These functions interface to the otherwise still standalone software systems. This leaves to the user the burdens of both query formulation and the combination of results for each single representation into a final judgement. Also, this usually leads to inefficient query processing for queries involving several content representations.
This thesis proposes the Mirror architecture for multimedia database management systems. Like any DBMS, a MMDBMS is a general-purpose software system that supports various applications; however, the support is targeted to applications in the specific domain of digital libraries. Three new requirements are identified for this domain: 1) multimedia objects are active objects, 2) querying is an interaction process, and 3) query processing uses multiple representations. Mirror's design therefore provides basic functionality for the management of both the content structure and the logical structure of multimedia objects. The inference network retrieval model, the basis of a well-known IR system, is adapted for multimedia retrieval. Other characteristics of the Mirror architecture are the support for distribution of both data and operations, and extensibility of data types and operations.
The focus of this work is aimed at design for scalability. To maintain the essential ideas underlying the design of relational database technology, the implementation of the Mirror prototype system is based on the principle of structural object-orientation. A database with support for structural object-orientation distinguishes between atomic data types and structures over these types. Because the DBMS manages also the structure of the objects, this enables a strong notion of data independence between the logical and physical level. In Mirror, we use the Moa object algebra at the logical level, and the Monet database system at the physical level. The separation of concerns between these levels allows for the application of algebraic query optimization techniques, a property hardly ever found in content management systems.
The prototype system is evaluated in three ways. First, the advantages of the integration of content management in the Mirror DBMS are illustrated by several example queries capturing different information needs. The execution performance of IR query processing is evaluated using a standard text retrieval benchmark. Finally, the multimedia IR model developed in this thesis is tested in some small-scale experiments in the domains of music and image retrieval.
BAG< TUPLE< time: Atomic<Time>, date: Atomic<Date>, keyframes: LIST< Atomic<Image> >, audiotrack: Atomic<Audio>, transcript: Atomic<Text> > >;
The MMDBMS's retrieval model is based on the inference network retrieval model (see also the TWLT paper). It consists of the object network (top) and query network (bottom). The dark gray boxes represent different media objects. The light gray boxes represent different content representations.
Reasoning in the network model is performed through queries in the database system. The component-networks per feature space are represented in Moa using the CONTREP structure. The connection of the object network with the query network is obtained with the getBL operator, defined for the CONTREP structure (see also the DS-8 paper).
BAG< TUPLE< time: Atomic<Time>, date: Atomic<Date>, keyframes: LIST< TUPLE< keyframe: Atomic<Image>, color: CONTREP, texture: CONTREP > >, audiotrack: Atomic<Audio>, transcript: TUPLE< transcript: Atomic<Text>, content: CONTREP > > >;
map[sum(THIS)]( map[getBL( THIS, query, stats ) ]( docs ));It is also possible to model documents as a collection of sections (BAG<BAG<CONTREP>>), for a collection with documents of varying lengths. A possible instance of the network retrieval model that handles compound documents is specified in the following query:
map[max( INFNET<THIS> ) ]( map[ map[ sum(getBL( THIS, query, stats ))]( THIS ) ]( docs ));The major advantage of the integrated data model is the possibility to combine constraints on the content of documents with constraints on their logical structure. Assume the following data model:
BAG< TUPLE< Category : str, Content : CONTREP > >;The next query retrieves only the "News" documents that match the query (and not the matching documents of other categories):
map[sum(getBL( THIS.Content, query, stats )) ]( select[ =( THIS.Category, "News" ) ]( docs ) );As a final example of the usefulness of the integration of databases and IR, consider the following equivalent query. In this query, the same documents are ranked, but selection of the "News" documents is performed after IR querying. An algebraic query optimizer may rewrite this query into the previous, probably more efficient query, before the requested ranking is computed.
map[THIS.DocBeliefs]( select[=(THIS.Category,"News")]( map[TUPLE<Category: THIS.Category, DocBeliefs: sum(getBL(THIS.Content, query, stats)) > ]( docs ) ));
The complete Mirror DBMS architecture consists of several layers. The figure above illustrates the overall system architecture. It consists of three layers: the conceptual design, the logical design, and the physical design. Between each level, translation of a high level query language into a lower level query language takes place.
The Monet extensible database system provides the implementation of the physical design of the MMDBMS. Monet can only handle a vertically decomposed data model. Monet's query language is MIL. The special infnet module extends MIL with operators for probabilistic reasoning.
The Moa tool implements the logical design. It can be considered as an object-oriented view of the data stored in Monet. Moa extensions for content management (eg. CONTREP) generate MIL queries that implement the probabilistic reasoning process.
The conceptual level is rather empty at the moment. Although we are working on a translation of generic OQL into Moa, this is far from complete. For the multimedia retrieval projects, we program directly in the Moa language at the logical level. Eventually, the DBMS will provide an easier interface that is better suited for use by end users.
However, a lot of work still has to be done. The current system does not have an implementation at the conceptual level, only at the logical and physical levels. More development is required for a user interface at the conceptual level, and also for a smooth integration of the CORBA architecture (described in the SPIE paper) with the current implementation of the meta-data database. At the moment, the specification of Moa queries is done by hand, and we also have not yet developed a tool for the translation of end user schemas into internal database schemas.
The experimental evaluation of this work has been rather limited so far. We performed some initial "experiments" with music retrieval, a.o. reported in the TWLT paper. Although promising, it should be realized that the results do not say much about the performance of the retrieval model on large data collections with real information needs. High on my agenda, is the evaluation of the system with the TREC data set, to demonstrate that the queries given in the DS-8 paper can be evaluated in reasonable time on real-life collections. Also, some experiments with image retrieval (maybe in cooperation with AMIS) should confirm that reasoning with multiple representations is both possible and useful.