Archive for the 'Web' Category

Content negotiation: bad use cases I recently observed

Given the current projects I am working on, I daily see misuse of content-negotiation methodology, particularly the misuse of the Accept and Content-type HTTP header parameters.

As you will see bellow, I came across many misuse of these HTTP header parameters: potentially by their misunderstanding, or simply by forgetting to set them properly when content is negotiated between their web servers and applications requesting pages.

In any way, people should take a greater care about setting the content-negotiation properly between their servers and other applications. In fact, I saw many examples, on many web servers: from the semantic web research groups, to the hobbyists.

The principle

The principle is simple, if a requester sends a HTTP query with the Accept header:

Accept: text/html, application/rdf+xml

The web server should check the priority of the mime types that the requester is requesting and send back the requested document type with the greater priority, along with the Content-type of the document in the HTTP header answer.

The Content-type parameter is quite important, since if a user application request a list of 10 mimes having all the same priority, it should know which of them as been sent by the web server.

Trusting the Content-type parameter

It is hard.

In fact, Ping the Semantic Web do not trust any web server that returns Content-type. This parameter is so misused that it makes it useless. So I had to develop procedures to detect the type and the encoding of files it crawls.

For example, sometime, people will return the mime TEXT/HTML when it facts it’s a RDF/XML or a RDF/N3 file; this is just one example among many others.

The Q parameter

Another situation I came across recently was with the “priority” of each mime in an Accept parameter.

Ping the Semantic Web is sending this Accept parameter to any web server from which it receives a ping:

Accept: text/html, html/xml, application/rdf+xml;q=0.9, text/rdf+n3;q=0.9, application/turtle;q=0.9, application/rdf+n3;q=0.9, */*;q=0.8

The issue I came across is that one of the web servers was sending me a RDF/XML document for that Accept parameter string, even if it was able to send a TEXT/HTML document. In fact, if the server was reading “application/rdf+xml” in the Accept parameter, it was automatically sending a RDF document to it, even if it has a “lesser priority” than theTEST/HTML document.

In fact, this Accept parameter means:

Send me text/html or html/xml is possible.

If not, then send me application/rdf+xml, text/rdf+n3, application/turtle or application/rdf+n3.

If not, then send me anything; I will try to do something with it.

This is really important to consider the Q (or absence of the Q) parameter. Because its presence, or its non-presence, mean much.

Discrimination of software User-agents

Recently I faced a new kind of cyber-discrimination: discrimination based on the User-agent string of a HTTP request. In fact, even if I was sending “Accept: application/rdf+xml”, I was receiving a HTML document. So I contacted the administrator of the web server and he pointed me out to an example available on the W3C’s web site, called Best Practice Recipes for Publishing RDF Vocabularies, which explained why he has done that:

# Rewrite rule to serve HTML content from the namespace URI if requested
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^example5/$ example5-content/2005-10-31-docs/index.html [R=303]

# Rewrite rule to serve HTML content from class or prop URIs if requested
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^example5/(.+) example5-content/2005-10-31-docs/$1.html [R=303]

So, in the .htaccess file they published in the article, we can see that if the user agent is “Mozilla” it will send a HTML document.

However, it is wrote in the same document:

Note that, however, with RDF as the default response, a ‘hack’ has to be included in the rewrite directives to ensure the URIs remain ‘clickable’ in Internet Explorer 6, due to the peculiar ‘Accept:’ header field values sent by IE6. This ‘hack’ consists of a rewrite condition based on the value of the ‘User-agent:’ header field. Performing content negotiation based on the value of the ‘User-agent:’ header field is not generally considered good practice.

So no it is not a good practice, and people should really take care about this.

Conclusion

People should really take care of the Accept parameter when their server receive a request, and send back the good Content-type depending on the document they send to the requester. Content-negotiation is becoming the main way to find and access RDF data on the Web, and such behaviors should be fixed by web server administrators and developers.

Has Robert Scoble got some incentives to ‘finally’ get what the semantic web is?

Everybody do errors and I have just done one.

Thanks for proving me that I was wrong.

I probably should have wrote a blog post about it instead of writing a comment, that way I would have been sure that you get it (like this blog post that created an instant reaction).

Robert, it seems you didn’t received my email 4 days ago, so I am sorry about that.

Anyway it doesn’t change the essence of this blog post, and my comment. This is not a good start, but a good way, to try to tie the link between the “Web 2.0” (sorry but I don’t like that term 😉 ) and the Semantic Web [academic] community. There are much things going on around that could benefit everyone.

The only thing I would like to say that people would remember is that the Semantic Web is not the result of one or a couple of companies, but the result of a Whole; the result of the interaction between all beings.

Robert, I hope you will continue to dig deeper to find all the things people are working on related with the Semantic Web. Sorry about that, and I hope you a beautiful day!

I am asking the question and I hope I am wrong.

Some days ago Robert Scoble wrote an enflaming post about what Radar Networks are currently developing. This “thing” (I refer to a “thing” because no body know what it really is (some type of semantic web system)) finally helped Robert to understand what the semantic web is.

At that moment I was happy to see that a “Web 2.0” guru understood how Semantic Web technologies could help him; how they could be used to make the World a better place to live in.

Then I told myself: “Fred, help him to see what other people are doing in that direction too. Show him what you are working on; what other people are developing too; what they are writing on the subject; Etc.”

Then I wrote that comment on his blog post:

Hi Robert,

Could I suggest a couple of reading in that direction that could potentially interest you?:

 

  1. Zitgist Search Query Interface: A new search engine paradigm
  2. The Linked-Open-Data mailing list
  3. Planet RDF


From there, you will be able to dig deeper into the semantic web community, the ideas it plays with, what the Web is becoming, etc.

Hope it can helps some people to eventually understand what is going on with the semweb.

Take care,

Fred

This comment has never appeared on its blog post. It seems he rejected it by moderation. I sent him an email 3 days ago and he never replayed to me.

Why Robert rejected this innocent comment? I got my idea that lead to the topic of this blog post: “Has Robert Scoble got sone incensitives to ‘finally” get what the semantic web is?”

Does Robert rejected it because I was referring to Zitgist; and that is a possible competitor to what Radar Networks is working on right now?

I have no idea, but I am always frustrated to see when bloggers doesn’t tell to their readers they got some incentives to write articles about special things.

Otherwise, why my comment got rejected? I have no idea, but I would like to know.

At the end, these people will probably have to learn that the Semantic Web is more about cooperation between people, enterprises, other entities and honesty than a more traditional way to do things and business.

I think that the Semantic Web will change things in a major way, as long as people, societies and the way we live.

Making the bridge between the Web and the Semantic Web

 

Many people think that the semantic web will never happens, at least in next few years, because there is not enough useful data published in RDF. This is fortunately a misconception. In fact, many things are already accessible in RDF, even if it doesn’t appear at the first sigh.

 

Triplr

Danny Ayers recently pointed out a new web service created by Dave Beckett called Triplr: “Stuff in, triples out”.

Triplr is a bridge between well-formed XHTML web page containing GRRDL, RSS and their RDF/XML or Turtle formatting.

Here is an example

 

Virtuoso’s Sponger

Another bridging service called the Sponger also exists. Its goal is the same as Triplr: taking different sources of data as input, and creating RDF as output.

The Virtuoso Sponger will do everything possible to find RDF triples from a given URL (via content-negotiation and checking for “link” elements in HTML files). If no RDF document is available from a URL, it will tries to convert the data source available at that URL into RDF triples. Converted data sources are: microformats, RDFa, eRDF, HTML meta data tags, HTTP headers, as well as APIs like Google Base, Flickr, Del.icio.us, etc.

 

How does it work?

  1. The first thing the Sponger is doing is trying to dereference a given URL to get RDF data from it. If it finds some, it returns it, otherwise, it continues.
  2. If the URL refers to a HTML file, the Sponger will try to find “link” elements referring to RDF documents. If he finds one or more of them, it will add their triples into a temporary RDF graph in and continue its process.
  3. If the Sponger finds microformat data into the HTML file, it will maps it using related ontologies (depending on the microformat) and will creates RDF triples from that mapping. It will add these triples to the temporary RDF graph and continues.
  4. If the Sponger finds eRDF or RDFa data into the HTML file, he will extracts them from the HTML file and add them into the RDF graph and continues.
  5. If the Sponger find that it is talking with a web service such as Google Base, it will maps the API of the web service with an ontology, creates triples from that mapping and includes the triples into the temporary RDF graph and continues.
  6. If nothing is found and that there is some HTML meta-data, it will maps them with some ontologies, creates triples and add them to the temporary RDF graph.
  7. Finally, if nothing is found, it will returns an empty graph.

The result is simple: from any URL, it is most than likely sure that you will get some RDF data related to that URL. The bridge is now made between the Web and the Semantic Web.

 

Some examples

There are some examples of data sources converted by the Sponger:

 

Conclusion

What is fantastic for a developer is that he only has to develop its system according to RDF to make its application communicating with any of these data sources. The Virtuoso Sponger will do all the job of interpreting the information for him.

This is where we really meet the Semantic Web.

With such tools, it is like looking at the semantic web in a lens.




This blog is a regularly updated collection of my thoughts, tips, tricks and ideas about data mining, data integration, data publishing, the semantic Web, my researches and other related software development.


RSS Twitter LinkedIN


Follow

Get every new post on this blog delivered to your Inbox.

Join 94 other followers:

Or subscribe to the RSS feed by clicking on the counter:




RSS Twitter LinkedIN