In this article, I consider the problems of semantics-free identifiers in OWL and suggest another (possible) solution to the problem.

The problems of identifiers and their semantics are not new. I have written about these problems previously in the context of: blog permalinks (http://www.russet.org.uk/blog/2011/05/permalink-semantics/); and with conversion between OBO format and Manchester syntax (http://www.russet.org.uk/blog/2009/09/obo-format-and-manchester-syntax/). The basic issue is one of choosing your compromise. Identifiers with semantics in them (which this blog uses although I wish it did not) are considerably more human readable, but are not resiliant to change, as the semantics in the identifiers can become out of date with respect to the content they describe. But neither compromise is entirely satisfactory; we need a more pragmatic approach (http://robertdavidstevens.wordpress.com/2011/05/26/unicorns-in-my-ontology).

Recently, I was looking at the move of the OBI ontology (10.1186/2041-1480-1-S1-S7) from BFO 1.0 to BFO 2.0. I have commented extensively on BFO before (10.1371/journal.pone.0012258), (http://www.russet.org.uk/blog/2010/07/realism-and-science/) (http://www.russet.org.uk/blog/2010/09/the-status-quo-farewell-tour-on-realism/), and I was interested in what changes have been made for BFO 2.0.

Unfortunately, it is not that easy to work out. While diffs have never been the most human readable of output, the OBI diffs raise this to a new level Consider this change:

svn diff -r 3424:3425 https://obi.svn.sourceforge.net/svnroot/obi/trunk/src/ontology/branches/obi.owl

@@ -204,7 +197,7 @@
     <owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000107">
         <rdfs:label>provides_service_consumer_with</rdfs:label>
         <rdfs:domain rdf:resource="http://purl.obolibrary.org/obo/OBI_0001173"/>
-        <rdfs:subPropertyOf rdf:resource="http://www.obofoundry.org/ro/ro.owl#has_part"/>
+        <rdfs:subPropertyOf rdf:resource="http://purl.obolibrary.org/obo/BFO_0000051"/>
     </owl:ObjectProperty>

Also available here for those without access to a local subversion. The resource previously known as has_part has become the rather more obscure BFO_OOOOO51. In short, BFO has become semantics-free.

In general, I think that this is a good thing. The use of semantics in the identifiers for this blog is generally not helpful, although I have never carried through my year-old threat (http://www.russet.org.uk/blog/2011/05/permalink-semantics/) to change the identifier scheme as I am not sure older links will be maintained. But the total unreadability of the OBI diff demonstrates a problem. One answer is that we should not be reading OWL source in the first place, but using tools. These tools exist (http://www.ebi.ac.uk/efo/bubastis/), in fact, but they are not a replacement for a diff, but a supplement to it. Source code must be in a readable syntax because line-orientated syntax is the lowest common denominator; semantic diffs are nice, but next we would need an OWL aware versioning tool, as versioning depends on diffing. Then OWL aware regexp search and replace tools for when syntactic alterations were needed. Eventually, we would end up replacing an entire software stack and, no doubt, doing it badly, since tools such as versioning software have a long heritage and are now very functional (and incredibly complex!).

My previous, minimal suggestion was to use a denormalisation, by adding a new comment character. So

ObjectProperty http://purl.obolibrary.org/obo/BFO_0000051

would become

ObjectProperty http://purl.obolibrary.org/obo/BFO_0000051[has_part]

The denormalisation here — presenting the same information as an opaque string and as a text string, fulfils both requirements. However it would require significant effort to keep the two in sync.

My new idea would be to use a similar idea to a Colour Lookup Table (http://en.wikipedia.org/wiki/Colour_look-up_table). These are used to define a palette of colours selected from a much larger colour space. We could use a similar approach here. Essentially the idea is to put semantics free IDs at the top of the file, then meaningful ones in the middle. The idea is also similar to the use of abbreviations for namespaces in XML; for instance,

<owl:ObjectProperty rdf:about="http://purl.obolibrary.org/obo/OBI_0000107">

the rdf: prefix actually refers to “http://www.w3.org/1999/02/22-rdf-syntax-ns#”. The letters rdf could be replaced by anything at all, so long as we update the namespace declaration without changing semantics.

In Manchester syntax, we could address this with an addition of an alias keyword. So:

ObjectProperty http://purl.obolibrary.org/obo/OBI_0000107
   Annotations: rdfs:label="provides_service_consumer_with"
   Domain: http://purl.obolibrary.org/obo/OBI_0001173
   SubPropertyOf: http://purl.obolibrary.org/obo/BFO_0000051

would become

Prefix: obo: http://purl.obolibrary.org/obo/
Alias: obo:OBI_0000107 "provides_service_consumer_with"
Alias: obo:OBI_0001173 "service"
Alias: obo:BFO_0000051 "has_part"


ObjectProperty provides_service_consumer_with
   Annotations: rdfs:label="provides_service_consumer_with"
   Domain: service
   SubPropertyOf: has_part

In this case, because we are defining a term and attaching a label we get the same string twice, but there is no formal link between the two. With this system in place, moving the identifiers for BFO would have required an update to only the Alias table at the top. Now an obvious place for the strings to come from would be the source ontology (so “has_part” would come from RO (10.1186/gb-2005-6-5-r46), or now BFO); this would, in fact, serve as a useful check. If I reference an external ontology and it’s labels do not match with my Alias definitions, I may wish to check to see whether the concepts I have imported still have the semantics that I intended.

The same approach could be directly translated into the XML representation without change, I believe, with the use of XML entities which are defined at the start of an XML document. Of course, this is entirely horrible, and changing the OWL schema would make more sense. Extending Manchester syntax is straight-forward as I think I have shown here. Likewise, for OBO format. And the practical upshot would be a significant increase in the readability of many ontologies without eschewing the good practice of semantics free identifiers.

Bibliography

8 Comments

  1. Chris Mungall says:

    The problem with the XML entities approach is the current limit to 64k entity expansions in the OWL API. It’s possible but fiddly to extend this. Seemingly trivial in future versions of the OWL API, but we have to be careful to ensure we don’t break existing software with a solution.

    Anyway, I would rather have manchester or functional as the source anyway – in practice I am using functional for new ontologies, as manchester doesn’t support GCIs.

    Although it’s not hard to imagine extending either of these, would this not formally have to wait until OWL3 if the specs are stable? Compatibility problems might be eased if we think of these as different formats, like Manchester+ or Functional+, with a simple procedure for interconverting between these. This could presumably be added to the OWLAPI fairly easily.

    There are a few messy syntactic details to work out. What about unicode in labels? What about non alphanumeric characters? You could strip these but then you have problems with perverse ontologies that have both “foo” and “foo'” as class labels for two distinct classes.

    I think most of these issues are easily worked out it will just take a small working group to see them through to implementation. How do we start?

    This is really important for migrating from obo to owl. obo doesn’t need any extension here, it was designed for VCSs from the ground up – there is a recommended tag ordering (irrelevant for semantics, but necessary for sensible diffs) and recommended printing of labels in parser-invisible comments.

  2. Alan Ruttenberg says:

    There is a script that writes out the labels of each term in XML comments that is usually used in OBI releases. For example in http://obi.svn.sourceforge.net/svnroot/obi/releases/2009-11-06/merged/merged-obi.owl you will see:

    I’m not sure why this isn’t being run before release currently, but you might request it from the obi-devel group.

    Regarding BFO2, note that the version of BFO2 currently being used is not an official release, but that the identifiers used will be the same as those used by the official release. The current release is a snapshot of work done some months ago and is due for an update.

  3. Alan Ruttenberg says:

    Hey! The comment for stole my example! Let’s see what happens if I escape with html entities.

    <owl:disjointWith>
    <owl:Class rdf:about=”http://purl.obolibrary.org/obo/OBI_0200025″/><!– loess scale group transformation –>

  4. Alan Ruttenberg says:

    Incidentally, I’m not sure I agree about having to fall back to text diff. Diff is hookable in subversion, for example, and the hookable diff is *not* the same as the diff used to compact versions for storage. The details are at http://svnbook.red-bean.com/en/1.2/svn.advanced.externaldifftools.html

    It would be perfectly reasonable to have the external diff program be something like bubastis or any of a number of other such tools.

    There’s also pre and post commit hooks that could be put to good use for OWL projects, for example by precomputing and caching a diff.

    Not that I disagree with the proposals for extra syntax. The entity limit bug in OWLAPI should be fixed and that’s a reasonable approach to use. During the working group I argued for the ability to comment in all the syntaxes but those proposals were rejected, as was one which would have allowed for the (controlled) use of labels in Manchester syntax, as is allowed in the Protege syntax (which technically is *not* Manchester syntax).

    I have something similar in LSW. If you want to refer to a term by label you write e.g.

    !’assay’@obi

    There’s an alternate, more robust version where you (or a renderer) writes:

    !’assay’@obi(obo:OBI_0000070)

    The parenthesized id is so that if a lookup fails to find the label (perhaps because it has changed since that was written) it can still look it up by ID.

    The “obi” in @obi is purely symbolic. The LSW reader, on seeing that symbol, assumes that there is a constructor for a ‘label-source’ with the name ‘obi’ that can translate labels into URIs. The one I set up for OBI (bottom of http://obi.svn.sourceforge.net/svnroot/obi/trunk/src/tools/build/util.lisp) reads the labels used for OBI and warns about, and subsequently ignores ambiguous labels.

  5. An Exercise in Irrelevance » Blog Archive » Ontology Building with Emacs says:

    […] it would work. I’ve have been engaged in discussions recently about syntactic aspects of OWL (http://www.russet.org.uk/blog/2040); the main reason for this is my long-held believe of the need for editing tools that work at the […]

  6. An Exercise in Irrelevance » Blog Archive » OWL Concepts as Lisp Atoms says:

    […] this will allow me to address a second problem, that of semantics vs semantics free identifiers (http://www.russet.org.uk/blog/2040). I can call a class, ontology or object property anything at all, and refer to it with a easy to […]

  7. An Exercise in Irrelevance » Blog Archive » Clojure OWL 0.2 says:

    […] next problem is OBIs use of semantic-free identifiers (http://www.russet.org.uk/blog/2040). Even if the reasons behind this decision are good, the resulting numeric atoms (OBI_0000107) are […]

  8. An Exercise in Irrelevance » Blog Archive » Remembering the World as it used to be says:

    […] The difficulty here is that OBI uses semantics-free identifiers (http://www.russet.org.uk/blog/2040). While there are some good reasons for this, would result in Clojure of the […]

Leave a Reply