I have to confess: I really don’t like relational databases. I can’t wait for the day we can ditch them.
Think about it for a second: Databases store data to disk. Thats all what 90% of us use them for. They are essentially elaborate hash tables backed by a disk drive. Why are they more lines of code than some operating systems?
Despite that, unless you have a really well thought out setup, a disk failure is still a major disaster. Even if you have backups, even if you have replication, there will be downtime and manual labor while a new master server is established.
Databases never put your 10-20 commodity server boxes with all their spare disk space to use. They always sit on these really expensive ivory tower IBM boxes outside of your cheap cluster.
Despite million man years of research databases are actually pretty dumb. You have to tell them about every nuance of your schema, you have to tell them about indexes and so on. If you forget an index they are perfectly happy to run sequentially run through all the data you ever inserted into them many times a second.
Replication is generally a nightmare and every machine involved in the replication needs to have enough disk space to store the entire content of the database.
There are several interesting projects which try to re-invent the database as we know them. Yesterday i found out about a particularly interesting one: CouchDB a contender for “The next generation web storage” as their website proclaims. The project started out using C but eventually changed to Erlang which is a perfect choice for highly parallel server software.
CouchDB has no tables, it just has a flat global namespace for documents. A document is a simple JSON record.
POST /shopify/
{
"value":
{
title:"Arbor Draft",
type:"Product",
price:299.00,
tags:["snowboarding", "freestyle", "wintersport"],
description:"...."
}
}
Instead of defining the schema we simply add arbitrary records. There are no tables.
So how do we receive all the records again? CouchDB uses the concept of views which are essentially javascript methods. It uses map/reduce to find matching records in its global namespace so that at query time the results are available instantaneously. This is a huge performance boost for web applications which generally have many more queries than update/inserts.
Lets install some usefull views under /shopify/all:
PUT /shopify/all
{
"_view_documents": "function(doc) { return doc; }",
"_view_products": function(doc) {
if(doc.type == 'product') { return doc; }
}
}
GET http://couchserver/shopify/all:products
returns:
{
"_id":"all:products",
"rows":
[
{
"_id":"64ACF01B05F53ACFEC48C062A5D01D89",
"_rev":"62D22746",
title:"Arbor Draft",
type:"Product",
price:299.00,
tags:["snowboarding", "freestyle", "wintersport"],
description:"...."
},
}
There are a lot more cool things in CouchDB. Notice that the returned document has a _rev? Older revisions of documents are only deleted if you say so. If you are working on a wiki you just got your historical data for free. Unfortunately CouchDB is still in alpha but i think the fundamentals are sound. Its a lot more aligned with the way a modern web application works and needs its data represented. Its replication system is already much more powerful than that of other database systems and in fact is very similar to the way google works with tis bigtable and map/reduce infrastructure.
For further information head to the projects Wiki.