HREF Considered Harmful: Ruby and other gems

March 08, 2008

Ruby and other gems

I've had a number of conversations recently about Gemstone Smalltalk, largely in the wake of their announcement of support for my web framework, Seaside. It's complicated to explain Gemstone to people. It's not just an object database (though it is that), and it's not just a Smalltalk implementation (though it's that, too). The best thing I can compare it to is a Ruby on Rails deployment: not the framework, but the entire cluster of servers and software that goes into a large scale Rails app. Which is to say, perhaps, that Gemstone is best understood not as a piece of software but as an architecture.

At a high level, a typical Rails deployment looks like this: a cluster of servers supports one storage engine, several memory caches, and many worker processes. In Rails, the storage engine is always a relational database (usually MySQL), and sits on an especially hefty server by itself. Any number of other smaller, identical servers are each configured to run one memory cache (memcached) and 8-12 or so worker processes (Ruby interpreters running Rails and the Mongrel web server, generally just referred to as "mongrels").

The mongrels accept the web requests and run the actual application code. The objects inside these worker processes are live objects: they're sending and receiving messages, executing methods, changing state, and so on. They exist only inside the memory of a particular mongrel, for the duration of a single request that the mongrel is processing.

Many objects need to be persisted for longer than that, and these get written to and read from the storage engine - in Rails, using ActiveRecord. The storage engine is centralized (though it may be replicated to protect against failure), so that all of the worker processes see a consistent view of the data: if one of the mongrels modifies an object and commits that change to MySQL, the others will see that change the next time they need to load that object. The objects inside the storage engine are dead - they don't do anything until they're loaded into a worker process - but they're well preserved: they're kept on disk, not memory, so they'll survive a server reboot or other catastrophe.

Loading from and saving to the storage engine is relatively slow, and keeping objects there eats disk space, so the memory cache is an important third player in this game. A mongrel that's gone to the work of retrieving an object from MySQL might stash a copy in memcached for the other mongrels to retrieve, more quickly, if and when they need the same one. An object that's expensive to build - like a piece of complex HTML - but not important enough to save to disk might also be placed there for the convenience of the other workers on the same server. In Rails, the cache has to be managed carefully, so that you don't get out of sync with the consistent view of data maintained by the storage engine, but the work pays off with lower loads and faster response times. Objects in the cache are dead - usually marshalled into a meaningless string - and also transient, since the cache is purely in memory.

What about Gemstone? As it happens, the architecture is exactly the same: there's a single storage engine (called a "stone"), a memory cache on each server (the "shared page cache"), and any number of Smalltalk VM worker processes ("gems"). The gems handle the requests and run the code, and they stash objects in the page cache for speed and in the stone for persistence. The difference is, in Gemstone, these have all been designed from the ground up to work together as quickly and seamlessly as possible. In particular, this means two things:

1. Each part of the architecture uses exactly the same format to store the objects: whether it's a live object running in a gem, a cached object in the page cache, or a stored object on disk, the sequence of bytes is exactly the same. Unlike in Rails, where you have to be mapping and marshalling at every step, in Gemstone copying objects from storage to cache to worker process is pretty much just that - a simple byte copy. This makes it fast.

2. Objects are automatically kept in sync between each part of the system. The worker processes always load objects from the memory cache, because they can trust it to grab a recent copy from storage if needed. They also always save to the cache, because it will write the same change through to the storage without being asked. The gems also keep track of which objects have changed so that you don't have to, and will update the cache - and get updates from other gems back - automatically and transparently. The effect is as if all of your worker processes were running their objects inside a single, consistent and impossibly large chunk of persistent memory. This makes it easy.

To be extra clear, here's the mapping I'm trying to describe:

	Rails		Gemstone
	Provided By	Stores	Provided By	Stores
Storage Engine	MySQL	objects mapped to relational tables	"Stone" object store	Smalltalk objects
Memory Cache	memcached	objects marshalled to strings	Shared page cache	Smalltalk objects
Worker Process	MRI/Mongrel	Ruby objects	"Gem" Smalltalk VM	Smalltalk objects

So there you have it: Gemstone, it's like Rails, but faster and easier. If only it ran Ruby...

Posted at 01:02 AM | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e0098be7b3883300e550be5e298833

Listed below are links to weblogs that reference Ruby and other gems:

Comments

The Gemstone architecture is really nice.

Imagine having to power your website with steam, but to get the steam, you had to keep melting ice (your database) into a liquid, and then heating that liquid into steam. And when you were done, you had to cool the rearranged steam back to liquid and then back to ice for the next hit. Ouch.

Gemstone lets you "store the steam". Yes, make your website steam powered with Gemstone!

Posted by: Randal L. Schwartz | March 08, 2008 at 07:17 AM

Gemstone is interested in supporting the OODB in Ruby. JRuby with their Java based product, but more interestingly in Rubinius, where they can do some of their pointer magic (Rubinius uses tagged pointers and has some tag space left so Gemstone can use it for their purposes).
Not sure what the current status of this is, but at least this was the plan last August:
http://www.infoq.com/news/2007/08/gemstone-ruby

Posted by: murphee | March 08, 2008 at 05:50 PM

I'm curious how you deal with cache invalidation ("the other hard problem in computer science") - do the worker processes communicate amongst each other, do cached items take time to propagate from worker to worker, does the storage engine have a hand in keeping them synchronized, and does the overhead grows as the system expands to many workers? One of the best innovations introduced by memcache and other early 00's web application technologies was a basic comfort level with slightly-stale information.

Posted by: Michal Migurski | March 08, 2008 at 10:32 PM

@Michal
As an OODB, GemStone is transactional and cache invalidation occurs at transaction boundaries.

Without going into great detail (GemStone's technology has been evolving over the last 20 years, so their is quite a bit of detail) an object table is used to track the objects associated with a transactional view. When a gem aborts, it picks up a reference to the object table for that view and it gets a list of objects that have been changed since the gem last aborted. The gem invalidates the copies of the changed objects in it's head and uses the object table to find the new version of the object in the shared page cache (the object is loaded from disk if it isn't in the shared page cache) if and when that object is referenced again.

We have and continue to optimize these interactions. There are production systems that have thousands of gems connected to the stone and there are production systems that routinely perform thousands of transactions per second.

Posted by: Dale Henrichs | March 09, 2008 at 09:56 AM

"If only it ran Ruby..."

And if so, when?

Posted by: Mitch Thomas | March 09, 2008 at 09:30 PM

Interesting comparison. As I'm trying to learn Gemstone (GLASS by now) want to ask:

The "Stone" object store can be on more than one server, I mean isn't tied to a only one hefty machine?

At last you writes: "If only it ran Ruby... ", means that currently you prefer Ruby over Smalltalk or means other thing?

Cheers.

Posted by: German Arduino | March 19, 2008 at 04:13 AM

@German

The "stone" is a single process that runs on a single machine and it manages the repository. For the higher transaction rates the stone's will be a hefty machine with a fancy disk subsystem, but you can get pretty good performance from commodity hardware.

The "gems" can be run on multiple hosts and they all connect to the "stone."

Posted by: Dale Henrichs | March 26, 2008 at 11:27 AM

I'll take a half dozen, please.

Posted by: Dr Nic | March 27, 2008 at 04:43 AM

Everything I learned about databases, large-scale domain modeling, I learned (initially) from GemStone/J and Smalltalk, around 10 years ago. I still am friends with a few GemStoners from FPL, OOCL, and other spots, and I'm glad to see it's still around outside those circles.

Frankly, I do think Rails is one of the closest things to that tradition, and in some ways it's getting even better. Moreover, I think the Semantic Web may also be a source of great work that harkens back to the days of the OODB....

Posted by: stu | March 29, 2008 at 08:29 AM

@Dale:

Thanks by your explanation.

Posted by: German Arduino | April 08, 2008 at 11:11 AM

Excellent article, thanks!

One thing occurred to me though: You make it sound like MySQL or another relational DB is bad because it's different from the OO implementation. That is only partially correct. It's bad because the storage model (tables) doesn't fit the application model (objects). However, it's good because it provides an abstraction from the app model.

This is the major problem with writing out binaries - the db is inherently dependent on the language and perhaps even version of the language it was implemented in.

Ideally, I would want an OO DB that's in a general, and language-unrelated format. Object-oriented, but generalized and mapped to the implementation. I would want the DB to be language-agnostic. It should not matter whether I am using Smalltalk, Ruby, Java, or any other OO language on the app layer.

Posted by: nik heger | June 01, 2008 at 05:21 AM

@nik - if you want a language agnostic OODB, check out GOODS. It has clients for various languages, and those for Java, Python, and Smalltalk are interoperable (I wrote the latter two).

Posted by: Avi Bryant | June 01, 2008 at 10:07 AM

Hi,The greatest blessing for you!jtysxqslzgltykyxxl

Posted by: rosettastonesoft | December 01, 2010 at 10:08 PM

Verify your Comment

Previewing your Comment

Posted by: |

This is only a preview. Your comment has not yet been posted.

Your comment could not be posted. Error type:

Your comment has been posted. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

HREF Considered Harmful

Avi Bryant

March 08, 2008

Ruby and other gems

TrackBack

Comments

Verify your Comment

Previewing your Comment

Post a comment

Archives

Twitter Updates