Thinking XML: Basic XML and RDF techniques for knowledge management, Part 5

Country/region [select]

Home

Products

Services & solutions

Support & downloads

My account

developerWorks > XML >


	Thinking XML: Basic XML and RDF techniques for knowledge management, Part 5

Contents:

Just getting on with it

DAML Yankees

Updating the instances

Related content:

Other articles in this series:

Part 1: Generate RDF

Part 2: Combining files

Part 3: semantics

Part 4: Issue tracker schema

Part 6: RDF Query

Part 7: Review and relevance

Subscriptions:

dW newsletters

dW Subscription
(CDs and downloads)

Defining RDF and DAML+OIL

Uche Ogbuji (uche.ogbuji@fourthought.com)
Principal consultant, Fourthought, Inc.
01 Mar 2002

Uche Ogbuji moves on to define RDF and DAML+OIL schemata for the issue tracker application, continuing the discussion of modeling as he goes along.

In my last installment of this column, I discussed how XML knowledge management systems such as RDF shed a different light on age-old problems of data design and modeling. This was done toward the goal of nailing down a schema for the issue tracker package that I have been using to illustrate the use of RDF in association with XML applications. Now I'll complete the definition of the issue tracker schema, in RDFS and DAML+OIL form.

Again, familiarity with RDF, RDFS, and DAML+OIL are required. Since the last installment, I have published an introduction to DAML+OIL (see Resources) with my colleague Roxane Ouellet, so you no longer have to slog through the dense specifications to get a handle on it.

Just getting on with it
With no further ado, I present listing 1, the complete RDFS for the issue tracker.

Listing 1. RDFS schema for the issue tracker




<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY it "http://rdfinference.org/schemata/issue-tracker/">
<!ENTITY dc "http://purl.org/dc/elements/1.1/">
]>
<rdf:RDF
xmlns:rdf="&rdf;"
xmlns:rdfs="&rdfs;"
xmlns:it="&it;"
>

<rdfs:Class rdf:about="&it;Catalog">
<rdfs:label>Issue catalog</rdfs:label>
<rdfs:comment>
An optional collection of resources for which issues have or can
be defined.  Use dc:relation to associate the catalog with its
resources.
</rdfs:comment>
</rdfs:Class>

<rdfs:Class rdf:about="&it;Issue">
<rdfs:label>Issue</rdfs:label>
<rdfs:comment>
A problem, suggestion or other matter for action or discussion
relevant to a resource.  Use Dublin Core properties for base
description.
</rdfs:comment>
</rdfs:Class>

<rdfs:Property rdf:about="&it;issue">
<rdfs:label>issue</rdfs:label>
<rdfs:comment>Associate an issue to its resources</rdfs:comment>
<rdfs:range rdf:resource="&it;Issue"/>
</rdfs:Property>

<rdfs:Property rdf:about="&it;action">
<rdfs:label>action</rdfs:label>
<rdfs:comment>Associate an action with an issue</rdfs:comment>
<rdfs:domain rdf:resource="&it;Issue"/>
<rdfs:range rdf:resource="&it;Action"/>
</rdfs:Property>

<rdfs:Class rdf:about="&it;Action">
<rdfs:label>Action</rdfs:label>
<rdfs:comment>
An action to be taken with regard to an issue
</rdfs:comment>
</rdfs:Class>

<rdfs:Class rdf:about="&it;it:assignee">
<rdfs:label>Assign to</rdfs:label>
<rdfs:comment>
Specify the party to whom the action is assigned
</rdfs:comment>
<rdfs:domain rdf:resource="&it;Action"/>
</rdfs:Class>

<rdfs:Class rdf:about="&it;status">
<rdfs:label>status</rdfs:label>
<rdfs:comment>For instance, "not done" or "done"</rdfs:comment>
<rdfs:domain rdf:resource="&it;Action"/>
</rdfs:Class>

<rdfs:Class rdf:about="&it;comment">
<rdfs:label>comment</rdfs:label>
<rdfs:comment>Associate a comment with an issue</rdfs:comment>
<rdfs:domain rdf:resource="&it;Issue"/>
<rdfs:range rdf:resource="&it;Comment"/>
</rdfs:Class>

<rdfs:Class rdf:about="&it;Comment">
<rdfs:label>Comment</rdfs:label>
<rdfs:comment>A comment made with regard to an issue</rdfs:comment>
</rdfs:Class>

</rdf:RDF>

You will note some changes, including changes to the namespaces used. These, unfortunately, are not as blithely accounted for as the fact that our earlier RDF examples did not use any defined classes. This schema represents what is currently being used for the issue tracker for RDFInference.org, including changes that have been made for various reasons. I'll present corresponding updates to the instance RDF below.

I also adopt some lexical conventions: First of all, I define all the namespace URIs as entities in the DTD internal subset (a convention I learned from Ms. Ouellet), which reduces error and improves readability. Then, I only use rdf:about, never rdf:ID, a convention I recently adopted after hard experience with all the pitfalls associated with resolving IDs against the supposed URI of the containing document. Note that I use rdf:ID only when I can ensure that there is an explicit xml:base declaration, and that all RDF processors for which interoperability is needed support XML base.

The Catalog class provides a way to aggregate all resources that have an issue, or for which users are allowed to create issues. This is mostly an application convenience. Imagine a Web-based form for the tracker. It would probably have a drop-down selection box for the resources of interest. One way to populate that list is to check for all the objects of dc:relation statements from a given catalog. The DAML+OIL schema I'm about to present illustrates another approach.

There are a few other small changes, such as the renaming from "assigned-to" to "assignee" for more consistent use of parts of speech. Otherwise, there are no surprises in this schema, so let's move on to a look at the DAML+OIL version.

DAML Yankees
DAML+OIL is a schema system that provides key improvements over RDFS, including a built-in data typing system, support for enumerations, specializations on properties, and classification and typing by inference. It also goes beyond mere schematics to allow us to define ontologies, which are meant to be approximations of how we hold concepts, but for now we shall be mostly using the basic schematic features. Listing 2 is a DAML+OIL schema for the issue tracker similar to Listing 1.

Listing 2. DAML+OIL schema for the issue tracker



<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY xsd "http://www.w3.org/2000/10/XMLSchema#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY daml "http://www.daml.org/2001/03/daml+oil#">
<!ENTITY dc "http://purl.org/dc/elements/1.1/">
<!ENTITY it "http://rdfinference.org/schemata/issue-tracker/">
]>
<rdf:RDF
xmlns:rdf="&rdf;"
xmlns:rdfs="&rdfs;"
xmlns:daml="&daml;"
xmlns:xsd="&xsd;"
xmlns:it="&it;"
>

<daml:Ontology rdf:about="">
<daml:versionInfo>
<!-- Note: requires expansion by RCS, CVS or the like
$Revision$
</daml:versionInfo>
<rdfs:comment>
Ontology for an issue tracking system for arbitrary resources
</rdfs:comment>
<daml:imports rdf:resource="http://www.w3.org/2001/10/daml+oil"/>
</daml:Ontology>

<daml:Class rdf:about="&it;RelevantResource">
<rdfs:label>Relevant resource</rdfs:label>
<rdfs:comment>
An implied classification of resources that have related issues
</rdfs:comment>
<rdfs:subClassOf>
<daml:Restriction>
  <daml:onProperty rdf:resource="&it;issue"/>
  <daml:toClass rdf:resource="&it;Issue"/>
</daml:Restriction>
</rdfs:subClassOf>
</daml:Class>

<daml:Class rdf:about="&it;Issue">
<rdfs:label>Issue</rdfs:label>
<rdfs:comment>
A problem, suggestion or other matter for action or discussion
relevant to a resource.  Use Dublin Core properties for base
description.
</rdfs:comment>
</daml:Class>

<daml:ObjectProperty rdf:about="&it;issue">
<rdfs:label>issue</rdfs:label>
<rdfs:comment>Associate an issue to its resources</rdfs:comment>
<rdfs:range rdf:resource="&it;Issue"/>
</daml:ObjectProperty>

<daml:ObjectProperty rdf:about="&it;action">
<rdfs:label>action</rdfs:label>
<rdfs:comment>Associate an action with an issue</rdfs:comment>
<rdfs:domain rdf:resource="&it;Issue"/>
<rdfs:range rdf:resource="&it;Action"/>
</daml:ObjectProperty>

<daml:Class rdf:about="&it;Action">
<rdfs:label>Action</rdfs:label>
<rdfs:comment>An action to be taken with regard to an
issue</rdfs:comment>
</daml:Class>

<daml:ObjectProperty rdf:about="&it;it:assignee">
<rdfs:label>Assign to</rdfs:label>
<rdfs:comment>
Specify the party to whom the action is assigned
</rdfs:comment>
<rdfs:domain rdf:resource="&it;Action"/>
</daml:ObjectProperty>

<daml:ObjectProperty rdf:about="&it;status">
<rdfs:label>status</rdfs:label>
<rdfs:comment>For instance, "not done" or "done"</rdfs:comment>
<rdfs:domain rdf:resource="&it;Action"/>
</daml:ObjectProperty>

<daml:ObjectProperty rdf:about="&it;comment">
<rdfs:label>comment</rdfs:label>
<rdfs:comment>Associate a comment with an issue</rdfs:comment>
<rdfs:domain rdf:resource="&it;Issue"/>
<rdfs:range rdf:resource="&it;Comment"/>
</daml:ObjectProperty>

<daml:Class rdf:about="&it;Comment">
<rdfs:label>Comment</rdfs:label>
<rdfs:comment>A comment made with regard to an issue</rdfs:comment>
</daml:Class>

</rdf:RDF>

Before any definitions comes the ontology header. This is a DAML convention that describes the document and specifies the schema (hence the empty rdf:about, which sets the document itself as the subject). It features a revision statement, which I define using a keyword to be expanded by the revision-control system. It also features an import, which is an explicit mechanism added by DAML+OIL for incorporating definitions from other files into the current one (before DAML, you either had to merge multiple sources into a model, or use a lower-level mechanism such as XInclude). As standard practice, I import the core DAML+OIL schema, adding definitions for DAML+OIL-specific resources.

Next comes a special class, RelevantResource, whose instances are not stated explicitly, but which is defined by inference on the properties of instances. A closer look at the RelevantResource class should make this clear. It is defined as a subclass of an anonymous in-line resource, which in turn is of type daml:Restriction. This is a special DAML+OIL mechanism that allows you to define rules according to the properties instances have, and the values of those properties. In this case, the restriction selects all resources that have an issue property where the value of that property is of class Issue. By its subclassing from this restriction, the RelevantResource class is a sort of virtual class that includes a set of all resources that meet the restriction. If at any time a resource acquires the right property with a value of the right class, it automatically becomes a member of this virtual class, without needing to be explicitly stated as such.

Please note that part of this restriction is strictly unnecessary. The range of the issue resource has already been constrained to class Issue by an rdfs:range statement. I left the toclass in the DAML restriction purely for illustration.

This is a very important facility to have when you may not have control over all of the information space over which you are operating, and this is why DAML+OIL is put forth as a big step forward in the sort of technology that would be needed to underpin the Semantic Web. In our more modest case, this facility allows us to not have all resources explicitly registered for issue tracking, as we do in the RDFS form of the schema (using the Catalog class).

I define all classes using daml:Class, which is a subclass of rdfs:Class That provides all the additional facilities introduced by DAML. Similarly, I use daml:ObjectProperty to define properties. The issue tracker schema does not use particular data types (string, integer, etc.) to define the value of any property, but as a note, such properties are defined in DAML+OIL as being subclasses of daml:DatatypeProperty.

The DAML+OIL schema is actually what is being used in RDFInference.org applications, and is what we'll use as the basis of continuing work in this column.

Updating the instances
Because of the changes I've noted, I have revisited and updated the sample instances of issues that we've been looking at so far in this column -- see Listing 3.

Listing 3. Updated instance data




<?xml version='1.0'?>
<!DOCTYPE rdf:RDF [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#">
<!ENTITY daml "http://www.daml.org/2001/03/daml+oil#">
<!ENTITY dc "http://purl.org/dc/elements/1.1/">
<!ENTITY foaf "http://xmlns.com/foaf/0.1/">
<!ENTITY it "http://rdfinference.org/schemata/issue-tracker/">
<!ENTITY rit "http://rdfinference.org/ril/issue-tracker/">
]>
<rdf:RDF
xmlns:rdf="&rdf;"
xmlns:rdfs="&rdfs;"
xmlns:daml="&daml;"
xmlns:rit="&rit;"
xmlns:it="&it;"
xmlns:dc="&dc;"
xmlns:foaf="&foaf;"
xmlns="&it;"
>

<rdf:Description rdf:about='http://rdfinference.org/ril/ril-20010502'>
<issue rdf:resource='&rit;i2001030423'/>
<issue rdf:resource='&rit;i2001042003'/>
</rdf:Description>

<Issue rdf:about='&rit;i2001030423'>
<dc:title>Unnecessary abbreviation</dc:title>
<dc:creator rdf:resource='mailto:Alexandre.Fayolle@logilab.fr'/>
<dc:description>
Is the abbreviation of rdf:type predicates in queries necessary?
</dc:description>
<dc:date>2001-03-04</dc:date>
<comment rdf:parseType="Resource">
<dc:creator rdf:resource='mailto:Alexandre.Fayolle@logilab.fr'/>
<dc:description>
The abbreviation in listing 8 doesn't seem necessary to Nico
Chauvat or me.
</dc:description>
</comment>
<action rdf:parseType="Resource">
<dc:description>Organize a vote on this topic</dc:description>
<it:assignee rdf:resource='mailto:uche.ogbuji@fourthought.com'/>
</action>
</Issue>

<Issue rdf:about='&rit;i2001042003'>
<dc:title>Inconsistent versioning</dc:title>
<dc:creator rdf:resource='mailto:Nicolas.Chauvat@logilab.fr'/>
<dc:description>
The RIL versioning is not clear (there's a mix of 0.1, 0/1, 0.2
and 0/2)
</dc:description>
<dc:date>2001-04-20</dc:date>
<action rdf:parseType="Resource">
<dc:description>
Correct all to use the "0/1" form in the next draft.
</dc:description>
<it:assignee rdf:resource='mailto:uche.ogbuji@fourthought.com'/>
</action>
</Issue>

<rdf:Description rdf:about='mailto:Alexandre.Fayolle@logilab.fr'>
<foaf:name>Alexandre Fayolle</foaf:name>
</rdf:Description>

<rdf:Description rdf:about='mailto:uche.ogbuji@fourthought.com'>
<foaf:name>Uche Ogbuji</foaf:name>
</rdf:Description>

<rdf:Description rdf:about='mailto:Nicolas.Chauvat@logilab.fr'>
<foaf:name>Nicolas Chauvat</foaf:name>
</rdf:Description>

</rdf:RDF>

We define a resource against which the sample issues are raised. According to the DAML+OIL schema, http://rdfinference.org/ril/ril-20010502 is automatically a member of the RelevantResource class. The other significant change is that we refer to people through mailto URLs, which are then linked to their regular names using "friend of a friend" (FOAF), a well-known DAML+OIL schema for specifying information about individual contacts, suitable for describing who might be attached to an electronic mailbox. Note that there is another well-known choice for modeling contact information in RDF, based on the common vCard format for embedding contact information as e-mail attachments. The vCard RDF schema is more general in coverage than the FOAF schema, but we don't need its additional properties. And if we did, there is also a FOAF-based option: FOAFCorp, which adds elements related to corporate structure to the core personal profile information in FOAF.

The changes to the XSLT that generate this form rather than the original are minor -- mostly the changing of literal result element names and namespace URIs.

Conclusion
Generally, even if you wish to apply constraints in the loose way discussed in the last installment of this column, you should have a schema of some sort, for documentation if nothing else. RDFS is still the simplest and most pervasive choice, but DAML+OIL has many things to recommend it: not just the additional features, but the cleaner core semantics as well. Now that we have a schema for the issue tracker, we'll move on to improving the way we construct our queries: We'll look at Versa, an open query language for RDF that will make all the query code we've presented simpler and faster.

Resources

In addition to the introductory resources I listed in the last installment, there is now Introduction to DAML: Part I, by Roxane Ouellet and Uche Ogbuji.
Here's the "friend of a friend" (FOAF) schema for managing personal profile information. There is also FOAFCorp, which adds more detail for expressing the structure of corporate entities.
Representing vCard Objects in RDF/XML is a W3C note by Renato Iannella.
Check out Thinking XML's previous columns.

About the author

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can contact him at uche.ogbuji@fourthought.com.

developerWorks > XML >

About IBM | Privacy | Terms of use | Contact