FYI ? Web of Data

Cloud Cipher Capabilities

Posted by mhausenblas ⋅ 2013-03-24 ⋅ 3 Comments

Where I’m reviewing support for encryption in the context of IaaS|PaaS|SaaS cloud service offerings as well as concerning Hadoop. While the motivation for encryption might differ, the primary question is if systems support this (transparently) or if developers are forced to code this in the application logic. Continue reading →

Elephant filet

Posted by mhausenblas ⋅ 2013-03-10 ⋅ Leave a comment

In situations where Hadoop is used in a shared setup we witness two competing forces: the user expects performance vs. the view of the cluster owner who aims to optimise throughput and maximise utilisation. In the post, Michael elaborates a bit on challenges and solutions on this topic. Continue reading →

MapR, Europe and me

Posted by mhausenblas ⋅ 2013-01-01 ⋅ 9 Comments

You might have already heard that MapR, the leading provider of enterprise-grade Hadoop and friends, is launching its European operations. Guess what? I’m joining MapR Europe as of January 2013 in the role of Chief Data Engineer EMEA and will support our technical and sales teams throughout Europe. Pretty exciting times ahead! As an aside: … Continue reading →

Hosted MapReduce and Hadoop offerings

Posted by mhausenblas ⋅ 2012-11-08 ⋅ 5 Comments

Today’s question is: where are we regarding MapReduce/Hadoop in the cloud? That is, what are the offerings of Hadoop-as-a-Service or other hosted MapReduce implementations, currently? A year ago, InfoQ ran a story Hadoop-as-a-Service from Amazon, Cloudera, Microsoft and IBM which will serve us as a baseline here. This article contains the following statement: According to … Continue reading →

Interactive analysis of large-scale datasets

Posted by mhausenblas ⋅ 2012-09-02 ⋅ 7 Comments

The value of large-scale datasets – stemming from IoT sensors, end-user and business transactions, social networks, search engine logs, etc. – apparently lies in the patterns buried deep inside them. Being able to identify these patterns, analyzing them is vital. Be it for detecting fraud, determining a new customer segment or predicting a trend. As … Continue reading →

Turning tabular data into entities

Posted by mhausenblas ⋅ 2012-05-10 ⋅ 6 Comments

Two widely used data formats on the Web are CSV and JSON. In order to enable fine-grained access in an hypermedia-oriented fashion I’ve started to work on Tride, a mapping language that takes one or more CSV files as inputs and produces a set of (connected) JSON documents. In the 2 min demo video I … Continue reading →

Linked Data – the best of two worlds

Posted by mhausenblas ⋅ 2012-04-02 ⋅ 6 Comments

On the one hand you have structured data sources such as relational DB, NoSQL datastores or OODBs and the like that allow you to query and manipulate data in a structured way. This typically involves schemata (either upfront with RDB or sort of dynamically with NoSQL that defines the data layout and the types of … Continue reading →

Why I luv JSON …

Posted by mhausenblas ⋅ 2012-03-24 ⋅ 1 Comment

… because it’s simple, agnostic and an end-to-end solution. Wat? OK, let’s slow down a bit and go through the above keywords step by step. Simple Over 150 frameworks, libraries and tools directly support JSON in over 30 (!) languages. This might well be because the entire specification (incl. ToC, all the legal stuff and … Continue reading →

Large-Scale Linked Data Processing: Cloud Computing to the Rescue?

Posted by mhausenblas ⋅ 2012-03-01 ⋅ 3 Comments

At the upcoming 2nd International Conference on Cloud Computing and Services Science (CLOSER 2012) we – Robert Grossman, Andreas Harth, Philippe Cudré-Mauroux and myself – will present a paper with the title Large-Scale Linked Data Processing: Cloud Computing to the Rescue? and the following abstract: Processing large volumes of Linked Data requires sophisticated methods and … Continue reading →

Synchronising dataspaces at scale

Posted by mhausenblas ⋅ 2012-02-13 ⋅ 10 Comments

So, I have a question for you – how would you approach the following (engineering) problem? Imagine you have two dataspaces, a source dataspace, such as Eurostat with some 5000+ datasets that can take up to several GB in the worst case, and a target dataspace (for example, something like what we’re deploying in the … Continue reading →

Web of Data

Search

FYI