A Web-based, massive distributed graph data processing framework and reference implementation
JavaScript
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
design
lib
static
style
test
.gitignore
README.md
cloudisus.config.js
run_dev_server.js
server.js

README.md

Cloud Is Us

Cloud Is Us distributes the effort necessary to process large graph datasets to a number of so called contributors, running in a Web browser. Each contributor processes a tiny fraction of the graph data, which is in turn combined and delivered to the client. The allocation of a part of the graph and the combination of the results is performed by the allociner (= allocate + combine).

Architecture

The following steps are performed in a typical Cloud Is Us processing phase:

  1. The client initiates the processing by ingesting a graph dataset into the allociner through providing a HTTP URI that points to the location of a dataset - called the source - in N-Triples format.
  2. The allociner stream-reads the data from the client's source and allocates data chunks round-robin on a per-subject basis to contributors.
  3. Once all contributors have loaded the data locally the client can issue a query, which is distributed to all contributors.
  4. Each contributor locally executes the query and sends back the result to the allociner where it is combined and made available to the client.

cloudisus architecture

Performance and Scalability Considerations

The more contributors are available to Cloud Is Us, the faster a query can be executed. The bottleneck is likely to be the allociner, responsible both for initially distributing the data to the contributors and combining it, eventually from them.

Let's have a look now how, given a dataset with 1 billion (= 1.000.000.000 = 1B) triples, with an increasing numbers of contributors the processing capabilities increase. One easily runs into the dimension of 1B triples these days - take for example an application that uses statistical data from Eurostat together with data from DBpedia, LinkedGeoData and data.gov.uk.

#contributors#triples per contributor
10100M
10010M
1.0001M
10.000100k
100.00010k
1.000.0001k

Essentially, the table above tells us that with some 10k contributors, that is, people having an instance of it running in their Web browser, we're are able to process a 1B triples dataset fairly straight-forward as it would mean a load of some 100k triples per contributor.

Components

Todo

  • implement round-robin stream load in allociner
  • implement local SPARQL query in contributor
  • implement combine in allociner
  • implement client
  • implement dashboard

License

The software provided here is in the Public Domain.