Toying around with Riak for Linked Data

So I stumbled upon Rob Vesse’s tweet the other day, where he said he was about to use MongoDB for storing RDF. A week earlier I watched a nice video about links and link walking in Riak, “a Dynamo-inspired key/value store that scales predictably and easily” (see also the Wiki doc).

Now, I was wondering what it takes to store an RDF graph in Riak using Link headers. Let me say that it was very easy to install Riak and to get started with the HTTP interface.

The main issue then was how to map the RDF graph into Riak buckets, objects and keys. Here is what I came up so far – I use a RDF resource-level approach with a special object key that I called:id, which is the RDF resource URI or the bNode. Further, in order to maintain the graph provenance, I store the original RDF document URI in the metadata of the Riak bucket. Each RDF resource is mapped into a Riak object; for each literal RDF object value the literal value is stored directly via an Riak object-key, for each resource object (URI ref or bNode), I use a Link header.

Enough words. Action.

Take the following RDF graph (in Turtle):


@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix : <http://sw-app.org/mic.xhtml#>.

:i foaf:name "Michael Hausenblas" ;
foaf:knows <http://richard.cyganiak.de/foaf.rdf#cygri> .

To store the above RDF graph in Riak I would then using the following curl commands:

curl -X PUT -d 'Michael Hausenblas' http://127.0.0.1:8098/riak/res0/foaf:name


curl -X PUT -d 'http://sw-app.org/mic.xhtml#i' http://127.0.0.1:8098/riak/res0/:id


curl -X PUT -d 'http://richard.cyganiak.de/foaf.rdf#cygri' http://127.0.0.1:8098/riak/res1/:id


curl -X PUT -d 'http://sw-app.org/mic.xhtml#i' -H "Link: </riak/res1/:id>; riaktag=\"foaf:knows\"" http://127.0.0.1:8098/riak/res0/:id

Then, querying the store is straight-forward like this (here: list all people I know)

curl http://127.0.0.1:8098/riak/res0/:id/_,foaf:knows,_

Yes, I know, the prefixes like foaf: etc. need to be taken care of (but that’s rather easy, can be put in the bucket’s metadata as well, along with the prefix.cc service. Further, the bNodes might cause troubles. And there is no smushing via owl:sameAs or IFPs (yet). But the most challenging area is maybe how to map a SPARQL query onto Riak’s link walking syntax.

Thoughts, anyone?

Be the first to like this post.

6 Responses to Toying around with Riak for Linked Data

  1. Nathan says:

    Nice one Michael :) FWIW I’ve been doing pretty much the same with REDIS for fast memory storage where you map a triple on to a hash – key is subject, value is hashmap of properties and objects. Further, REDIS supports pub sub and message queues so you can essentially make a fast in memory stream of triples (or changes).

    Thinking that all of these web techs are converging in to something nice!

    Again, great post and hacking!

    Best,

    Nathan

    • woddiscovery says:

      Nathan,

      Thanks! Do you have anything online available? Observations, a benchmark, whatever? Would be great to learn more about your experience as well.

      Cheers,
      Michael

  2. I can see the appeal of storing RDF in MongoDB and the appeal of using HTTP to talk to MongoDB.

    Not sure i see the appeal of treating each triple as an HTTP “resource” w/ it’s own URI. I suppose it will work for small collections of data, but will not scale well.

    HTTP is a good fit for large, coarse-grained messages (whole docs|grapphs), but a poor fit for small, chatty convos (single element in a doc|single triple).

    • woddiscovery says:

      Mike,

      Thanks for sharing your thoughts! This is exactly what I mean … just because something is technically possible doesn’t mean it makes sense. However, what I don’t know (yet) is, what is in fact the typical size of the objects in Riak? Any idea?

      Cheers,
      Michael

  3. [...] Toying around with Riak for Linked Data « Web of Data [...]

  4. Luca Garulli says:

    Hi Michael,
    interesting post. What about using a GraphDB for RDF? RDF data could be seen as a graph. Where GraphDB helps, IMHO, is in performance on traversing relationships and Query languages.

    Although I don’t know native SPARQL support for GraphDB I’d like to plug this gap using OrientDB or even Tinkerpop Blueprints API. WDYT?

    bye,
    Luca Garulli

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>