Platform Banner

A guide to the Talis Platform For application developers The Talis Platform offers a cloud-based data repository that provides:

  • A simple, consistent web API for storing, managing and retrieving both structured and unstructured data
  • Flexible, schema-free metadata that allows applications to be easily evolved
  • A range of data access and query options enabling easy integration into both new and existing applications
  • Access control options to support hosting of both public and private data
  • A data hosting solution that is founded on open internet standards and web architectural best practices
  • Software as a Service, enabling rapid development with zero deployment costs
  • Low, even free, utility based pricing for services and hosting allowing costs to grow with usage
  • A highly available and scalable infrastructure to ensure that the repository grows in line with your applications needs

The following sections provide additional background information on each of these feature sets. For detailed developer documentation consult our online API docs.

Data Storage

The Platform provides support for storing two kinds of data: structured metadata and unstructured chunks of binary content.

The unstructured data storage operates very similar to services like Amazon S3: binary data can be streamed into the Platform for storage and later retrieval. These binary streams can be associated with a mime-type to ensure that they are later correctly re-delivered, and the Platform will correctly handle HTTP caching operations to ensure that delivery is as efficient as possible. As an application developer you can simply store content in the Platform, letting the system assign it a unique URL, or you can stream data to specific URLs, letting you organise and publish data as you see fit.

This unstructured data storage provides a simple and easy way to publish documents, data (e.g. as spreadsheets or XML), or even web site assets, allowing the Platform to be used as a one-stop data and content hosting solution.

The real power of the Platform is in the structured data storage. Based on Semantic Web technologies like RDF, the Platform provides a flexible, schema-free data repository that can be used to store data and metadata for any number of different uses. The Platform places no constraints on what data can be stored, or how it is structured. By using RDF, which encourages all resources to have a unique global identifier, the Platform provides a way to publish data that can be easily meshed into the growing web of data.

A single Platform store can contain facts and data from a range of different domains, and may be sub-divided into smaller data collections ("graphs") that can be separately queried and access controlled. This provides a rich way to store and publish both public and private data. Access to the RDF data store ("triple store") is achieved through a simple XML and HTTP based protocol that provides a full set of capabilities for reading, writing and updates data.

Data Access, Query, and Augmentation

A data store wouldn't be much use if there weren't a way to get it back out again. Talis Platform provides a rich set of data access and query options for retrieving structured data.

All structured data in the Platform is stored as RDF, and the Platform supports access to that data as Linked Data and through SPARQL, the structured query language for the semantic web. This means that:

  • Every resource in your Store has a unique URL from which its metadata can be retrieved with a single web request
  • SPARQL queries can be used to perform more complex queries, retrieving results as a tabular result set or as RDF
  • Content negotiation can be used to retrieve data as RDF, XML, or JSON allowing you to chose the right format for your application

SPARQL is a standard query language that has been designed to support a range of ways of interacting with RDF data sets. This allows you to not only query for data, but also probe for and find data of interest, and transform that data into alternate vocabularies and structures. SPARQL is a very powerful language, and is well designed for manipulating semi-structured data. However for many use cases this functionality is overkill.

Alongside the SPARQL query engine, the Platform provides a more traditional search engine that can be used to query metadata held in the Platform. This search engine provides all of the features that you would expect including keyword analysis and stemming, relevance ranking, boolean search operators, etc. Through the API, the search engine is fully configurable allowing a custom search index to be created over you stored metadata. Search results are delivered as RSS 1.0 feeds, using the OpenSearch extensions for paging and relevance ranking, making them easy to process or transform within an application. The feeds also embedded the key metadata about each result, ensuring that the useful metadata is delivered alongside the result, avoiding the need for extra API calls.

Increasingly many custom search engines allow users to refine a search by drilling down into specific categories or "facets" of a data set. The Platform provides access to this functionality through its facetted search function. Just like the core search engine this feature is fully configurable, allowing developers to quickly create a rich search experience over their data.

The final form of data access pattern in the Platform is known as augmentation. This feature further reduces the complexity of data retrieval, to the point that it avoids the need for any form of explicit query. This feature allows an RSS 1.0 feed to be proxied through a Platform Store and, based on its contents, the feed will be automatically enriched with the relevant metadata in the store. Each item in the RSS feed is examined to see whether it matches or refers to any resource(s) that are present in the Store. If so, then the relevant entry will be updated to include the additional metadata. This form of data retrieval is extremely powerful, coupling simplicity with the ability to create data pipelines that can be used combine data from multiple Platform stores.

Access Control

A key goal of the Talis Platform is to support the spread of public, open data on the Web. This means that by default every Platform Store is world-readable: all of the core data access, search and augmentation features are available for public use. But only the owner of that Platform Store has the rights to update the data or configuration. So if you want to use the Platform as a means of disseminating open data then the default configuration will be all that you need.

However Talis recognises that some data must stay private, whether that is for commercial or security reasons. Therefore each Platform Store can be configured with its own set of access control rules that limit who can access the data it contains, and what features they can use. These access control rules can also be applied to specific graphs within a Platform Store, allowing part of a hosted data set to be publically accessible and part to be private. This allows, e.g. personal user data, or commercially sensitive information to be kept private, whilst still allowing some data to be freely shared with others.

In short the Platform allows:

  • By default, freely available data access and search for anonymous users
  • Restricted access to data and configuration updates to authorised users
  • A role based access control mechanism for limiting use of specific features to only authorised users
  • Private, secure data hosting with all access restricted to one or more authorised users
  • Partitioning a data set into sub-graphs that have their own access management rules

All authorization in the Platform is carried out using HTTP Digest Authentication.

Technology

Talis is an advocate not only of open data but also open source and open standards. Wherever possible the Talis Platform is based on standard web technologies. Where we have had to plug the gaps between current technologies, we have published open documentation, registered new media types, and, where necessary, created open source implementations.

The Platform follows web architecture very carefully with each service in the API designed to be closely conformant with HTTP, including support for content negotiation and caching. The services are designed to the principles of the REST architectural style as well as our own Service Checklist , which aims to encourage consistency across the Platform as a whole.

The following technologies are used within the Platform and its range of services:

  • XML
  • XSLT
  • JSON
  • RDF
  • SPARQL
  • RSS 1.0
  • OpenSearch
  • OAI-PMH

The Platform API itself is also published under a Creative Commons license that supports re-implementation of the API by other services and projects.

Licensing the Platform

The Talis Platform is intended to support the spread and reuse of open linked data. This is only achievable if access to that data is as easy and ubiquitous as possible. With this in mind, certain aspects of the Platform are being made freely available.

Firstly, all basic data retrieval operations, including searches and augmentation requests are available at no charge to either the Platform owner or the user. This means that, assuming that the data owner has not restricted access to their data, that anyone can retrieve, perform simple keyword or facetted searches, or perform RSS feed augmentation, over any data stored in the Platform.

Secondly, to support the creation and dissemination of public domain data, the Talis Connected Commons programme allows data to be published onto the Platform at no cost, assuming that data is licensed under one of several public domain licenses. These Platform Stores will also have freely available SPARQL endpoints, allowing for richer forms of data access.

Finally, Talis provides free developer access to the Platform for the purposes of experimentation and prototyping. These developer accounts provide free access to all Platform features for at least 3 months, and longer at Talis's discretion.

All other uses of the Platform are covered by a commercial agreement. As a SaaS based solution, use of the Platform is charged under a simple utility model that includes:

  • A fee for data hosting based on a fixed price per Million triples hosted
  • A fixed fee and ongoing per-request usage fees for additional value-added services such as access control (for private data) and SPARQL endpoints.

For further information see our Licensing page.

Further Reading

For further information on the Talis Platform you can: