W3C

Semantic Web

In addition to the classic “Web of documents” W3C is helping to build a technology stack to support a “Web of data,” the sort of data you find in databases. The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.

Linked Data Header link

The Semantic Web is a Web of data — of dates and titles and part numbers and chemical properties and any other data one might conceive of. RDF provides the foundation for publishing and linking your data. Various technologies allow you to embed data in documents (RDFa, GRDDL) or expose what you have in SQL databases, or make it available as RDF files.

Vocabularies Header link

At times it may be important or valuable to organize data. Using OWL (to build vocabularies, or “ontologies”) and SKOS (for designing knowledge organization systems) it is possible to enrich data with additional meaning, which allows more people (and more machines) to do more with the data.

Query Header link

Query languages go hand-in-hand with databases. If the Semantic Web is viewed as a global database, then it is easy to understand why one would need a query language for that data. SPARQL is the query language for the Semantic Web.

Inference Header link

Near the top of the Semantic Web stack one finds inference — reasoning over data through rules. W3C work on rules, primarily through RIF and OWL, is focused on translating between rule languages and exchanging rules among different systems.

Vertical Applications Header link

W3C is working with different industries — for example in Health Care and Life Sciences, eGovernment, and Energy — to improve collaboration, research and development, and innovation adoption through Semantic Web technology. For instance, by aiding decision-making in clinical research, Semantic Web technologies will bridge many forms of biological and medical information across institutions.

News Atom

W3C today published the final report of the Linked Enterprise Data Workshop , hosted by W3C on the 6-7 December in Cambridge, MA, USA. This workshop provided a way for the community to meet and discuss some of the challenges when deploying application relying on the principles of Linked Data. The presentations covered many different topics, ranging from the benefits a set of additional conventions would bring to specific technical issues such as the challenges of dealing with the reality that URLs do change sometimes, as well as the need for a more robust security model, and specific gaps in the current set of standards.

Participants of the Workshop agreed that W3C should create a Working Group to define a “Linked Data Platform”. This is expected to be an enumeration of specifications which constitute Linked Data, with some small additional specifications to cover specific functionality such as pagination. We anticipate a draft charter will be available in the coming weeks.

The HTML Data Task Force of the W3C Semantic Web Interest Grouphas published two documents today:

  • The HTML Data Guideaims to help publishers and consumers of HTML data. With several syntaxes (microformats, microdata, RDFa) and vocabularies (schema.org, Dublin Core, microformat vocabularies, etc.) to choose from, it provides guidance on deciding what to choose in a way that meets the publisher’s or consumer’s needs.
  • The Microdata to RDFdescribes processing rules that may be used to extract RDF from an HTML document containing microdata.

Both documents are Working Drafts, with the goal of publishing a final version as Interest Group Notes. Comments and feedbacks are welcome; please send them to the public-html-data-tf@w3.orgmailing list.

Knowing how, where, when and why content was produced is an important part of making a trustworthy web. However, it is often difficult to interchange this provenance information between systems. For example, it’s often difficult to locate or find provenance information for a web page. Even if the provenance information is located, it is often only available as text or if it is available in a structured way it does not use a common terminology — making it difficult to create software that can leverage this information.

The Provenance Working Groupwas charted to help address these limitations. The group has been working diligently to create a family of specifications (called PROV) that allow for the interchange of provenance. The group is looking for your feedback. This post provides an overview of the various working drafts that have been published and should help you find your way around.

The set of specs at this point addresses two aspects of provenance interoperability introduced above:

  • provenance access
  • provenance representation

PROV-AQ: Provenance Access and Queryaddresses how to both make available and retrieve provenance information for Web resources. The document specifies how to use existing Web technologies such as HTTP, link headers, and SPARQL to accomplish this. Where possible the specification attempts to be agnostic the format of the provenance being accessed.

Once some provenance is obtained, it is important for the information to be understandable in a machine interpretable fashion. The Working Group has defined a data model ( PROV-DM ) that provides facilities for representing the entities, people and activities involved in producing a piece of data or thing in the world. The data model is domain-agnostic and has well defined extensibility points. Importantly, the data model has a corresponding OWL ontology ( PROV-O) that encodes the PROV-DM. PROV-O is envisioned to specify the serialization for exchanging provenance information.

To help orient users of PROV-O and PROV-DM, the working group has developed a primer ( PROV-Primer) that introduces the core constructs of the data model and provides examples using PROV-O. It is recommended that users and reviewers of the specification begin with the primer before moving to the ontology or data model.

The group is looking for feedback of all types: Would you expose provenance using PROV-AQ? Can you represent your provenance information using the PROV-O data model? Does PROV-O integrate well with your Linked Data or other Semantic Web infrastructure?

Let us know what you think.

The PROV family of specifications:

Paul Groth and Luc Moreau on behalf of the PROV-WG

The W3C Provenance Working Grouphas published two new documents:

Both documents are First Public Working Drafts; feedbacks and comments are welcome! Please, use the public-prov-comments@w3.org mailing list to provide your comments.

The W3C SPARQL Working Grouphas published the (second) Last Call Working Drafts of the following SPARQL 1.1 documents:

  • SPARQL 1.1 Updatedefines an update language for RDF graphs.
  • SPARQL 1.1 Service Descriptiondefines a vocabulary and discovery mechanism for describing the capabilities of a SPARQL endpoint.
  • SPARQL 1.1 Query Languageadds support for aggregates, subqueries, projected expressions, and negation to the SPARQL query language.
  • SPARQL 1.1 Protocoldescribes a means for conveying SPARQL queries and updates to a SPARQL processing service and returning the results via HTTP to the entity that requested them.
  • SPARQL 1.1 Entailment Regimesdefines conditions under which SPARQL queries can be used with entailment regimes such as RDF, RDF Schema, OWL, or RIF.

Review comments are welcome through 6 February; please use the dedicated mailing list: public-sparql-dev@w3.org.

English: RDFa Content Editor

Image via Wikipedia

There has been a flurry of activities around RDFa 1.1 in the past few months. Although a number of blogs and news items have been published on the changes, all those have become “officialized” only the past few days with the publication of the latest drafts , as well as with the publication of RDFa 1.1 Lite. It may be worth looking back at the past few months to have a clearer idea on what happened. I make references to a number of other blogs that were published in the past few months; the interested readers should consult those for details.

The latest official drafts for RDFa 1.1 were published in Spring 2011. However, lot has happened since. First of all, the RDFWA Working Group , working on this specification, has received a significant amount of comments. Some of those were rooted in implementations and the difficulties encountered therein; some came from potential authors who asked for further simplifications. Also, the announcement of schema.org had an important effect: indeed, this initiative drew attention on the importance of structured data in Web pages, which also raised further questions on the usability of RDFa for that usage pattern This came to the fore even more forcefully at the workshop organized by the stakeholders of schema.org in Mountain View. A new task force on the relationships of RDFa and microdatahas been set up at W3C; beyond looking at the relationship of these two syntaxes, that task force also raised a number of issues on RDFa 1.1. These issues have been, by and large, accepted and handled by the Working Group (and reflected in the new drafts).

What does this mean for the new drafts? The bottom line: there have been some fundamental changes in RDFa 1.1. For example, profiles, introduced in earlier releases of RDFa 1.1, have been removed due to implementation challenges; however, management of vocabularies have acquired an optionalfeature that helps vocabulary authors to “bind” their vocabularies to other vocabularies, without introducing an extra burden on authors (see another blog for more details). Another long-standing issue was whether RDFa should include a syntax for ordered lists; this has been done now (see the same blogfor further details).

A more recent important change concerns the usage of @propertyand @rel. Although usage of these attributes for RDF savy authors was never a real problem (the former is for the creation of literal objects, whereas the latter is for URI references), they have proven to be a major obstacle for ‘lambda’ HTML authors. This issue came up quite forcefully at the schema.org workshop in Mountain View, too. After a long technical discussion in the group, the new version reduces the usage difference between the two significantly. Essentially, if, on the same element, @propertyis present together with, say, @hrefor @resource, and @relor @revis notpresent, a URI reference is generated as an object of the triple. I.e., when used on a, say, <link>or <a>element, @property  behaves exactly like @rel. It turns out that this usage pattern is so widespread that it covers most of the important use cases for authors. The new version of the RDFa 1.1 Primer (as well as the RDFa 1.1 Core, actually) has a number of examples that show these. There are also some other changes related to the behaviour of @typeofin relations to @property; please consult the specification for these.

The publication of RDFa 1.1 Lite was also a very important step. This defines a “sub-set” of the RDFa attributes that can serve as a guideline for HTML authors to express simple structured data in HTML without bothering about more complex features. This is the subset of RDFa that schema.org will “accept”,   as an alternative to the microdata , as a possible syntax for schema.org vocabularies. (There are some examples on how some schema.org example look like in RDFa 1.1 Lite on a different blog .) In some sense, RDFa 1.1 Lite can be considered like the equivalent of microdata, except that it leaves the door open for more complex vocabulary usage, mixture with different vocabularies, etc. (The HTML Task Forcewill publish soon a more detailed comparison of the different syntaxes.)

So here is, roughly, where we are today. The recent publications by the W3C RDFWA Working Group have, as I said, ”officialized” all the changes that were discussed since spring. The group decided not to publish a Last Call Working Draft, because the last few weeks’ of work on the HTML Task Forcemay reveal some new requirements; if not, the last round of publications will follow soon.

And what about implementations? Well, my “shadow” implementation of the RDFa distiller (which also includes a separate “ validator ” service) incorporates all the latest changes. I also added a new feature a few weeks ago, namely the possibility to serialize the output in JSON-LD (although this has become outdated a few days ago, due to some changes in JSON-LD …). I am not sure of the exact status of Gregg Kellogg’s RDF Distiller, but, knowing him, it is either already in line with the latest drafts or it is only a matter of a few days to be so. And there are surely more around that I do not know about.

This last series of publications have provided a nice closure for a busy RDFa year. I guess the only thing now is to wish everyone a Merry Christmas, a peaceful and happy Hanukkah, or other festivities you honor at this time of the year.  In any case, a very happy New Year!


Filed under: Python , Semantic Web , Work Related Tagged: HTML , HTML5 , JSON , RDFa , RDFa 1.1 Lite , Resource Description Framework , schema.org

The RDF Web Applications Working Group has published a Working Draft of RDFa Core 1.1 , a specification for attributes to express structured data in any markup language. The group also published an update to XHTML+RDFa 1.1, a Host Language for RDFa Core 1.1. This document is intended for authors who want to create XHTML Family documents that embed rich semantic markup.

The Provenance Working Group has published a new Working Draft of The PROV Data Model and Abstract Syntax Notation . Provenance of information is crucial in deciding whether information is to be trusted, how it should be integrated with other diverse information sources, and how to give credit to its originators when reusing it. In an open and inclusive environment such as the Web, users find information that is often contradictory or questionable: provenance can help those users to make trust judgments. PROV-DM is a data modelfor provenance for building representations of the entities, people and activities involved in producing a piece of data or thing in the world.

The W3C Provenance Working Group has published the First Public Working Draft of The PROV Ontology: Model and Formal Semantics . The PROV Ontology (also known as PROV-O) encodes the PROV Data Modelin the OWL2 Web Ontology Language (OWL2). The PROV ontology consists of a set of classes, properties, and restrictions that can be used to represent provenance information. The PROV ontology can also be specialized to create new classes and properties for modeling provenance information specific to different domain applications. The PROV ontology supports a set of entailments based on OWL2 formal semantics and provenance specific inference rules. The PROV ontology is available for download as a separate OWL2 document.

15 – 16 March 2012, Luxembourg. Co-located with the European Commission’s Language Technology Showcase Days, and hosted by the Directorate-General for Translation (DGT) of the European Commission.

The MultilingualWeb projectis looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. The project aims to raise the visibility of existing best practices and standards and identify gaps. The core vehicle for this is a series of four events which are planned over two years.

After three highly successful workshops in Madrid, Pisa, and Limerick, this final workshop in the series will continue to investigate currently available best practices and standards aimed at helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web.

Participation is free. We welcome participation from both speakers and non-speaking attendees. For more information, see the Call for Participation

Talks and Appearances Header link

See also the full list of W3C Talks and Appearances.