Wednesday, October 10, 2007

Clustering XWiki

I spent some time yesterday with Ludovic Dubost and Vincent Massol, CEO and CTO respectively, of XWiki, working on getting Xwiki clustered with Terracotta. They were in town for the Google summer-of-code wrap up (they mentored some very cool projects--and they have been adding some very exciting capabilities to XWiki over the last year which I'll attempt to describe in my next blog entry...)

There are basically two aspects to clustering XWiki:
  • clustering their page cache; and
  • clustering their Lucene indexes.
The first thing we tackled was getting their page cache clustered. They currently use OSCache, but they are looking at moving away from it to something else (they have a prototype implementation of the JBoss Cache that someone submitted).

Getting Started


Neither Ludovic nor Vincent were very familiar with Terracotta, so I went through my usual "Introduction to Terracotta" spiel (which, incidentally, I'm trying to capture as a screencast). After that, we took a stab at getting Vincent set up with the Eclipse plugin while Ludovic helped Dave upgrade our installation of XWiki from v0.9 to v1.1.

Vincent uses IDEA rather than Eclipse (although, he has considerable facility using Eclipse which belies some familiarity with it), so we had to set up his Eclipse environment which took a little while. It also turns out that the latest version of our Eclipse plugin (2.4.4) on the Mac has some kind of freak-out that makes it take forever to do anything. It sounds like Gary K., the author of our Eclipse plugin, is on it, though.

Getting Things Running

After abandoning the Eclipse plugin for a while, we focussed on getting all the plumbing set up. This basically involved:
  • Building an XWiki sandbox installation that uses Jetty 6. Terracotta's built-in support for Jetty doesn't support Jetty 5.
  • Modifying the XWiki start script to put the necessary Terracotta JVM parameters (i.e., path to the boot jar, path to the configuration, path to Terracotta installation)
  • Writing a starter tc-config.xml
  • Generating the Terracotta boot jar. There were some things that were being clustered that don't go into the default boot jar, e.g., java.net.URL and a few other things.
  • Starting the server
  • Starting XWiki
Refining the Configuration

After we got XWiki to come up with Terracotta, we ran into a non-portable object exception which means that we were trying to share an object that either wasn't instrumented, wasn't in the boot jar but needed to be, or was a priori not shareable (an object representing a system resource).

We looked in the Terracotta client log to see what the problem was. When Terracotta encounters a problem like this, it dumps a textual representation of the offending object graph to the client-log. The non-portable object graph turned out to be a vestigial reference to their object store (which contains references to a bunch of Hibernate infrastructure) in XWikiDocument. Vincent said that he's been on a yearlong campaign to refactor the XWiki codebase (he has converted them over to using maven and has done an heroic job shaping the ironing the wrinkles out of the code), so we spent some time refactoring the vestigial reference to the object store out of XWikiDocument. We could have made it transient, but that would have been a hack. The XWiki code is now slightly cleaner for his efforts.

Clustering OSCache

Now that XWikiDocument objects were shareable, we worked on adding locking to the com.opensymphony.oscache.general.GeneralCacheAdministrator. I added a named lock to all of the methods which got rid of the lock exceptions, but we ran into a deadlock. I didn't really look at the OSCache code very much, so it's entirely possible that we can get tweak on the OSCache config a little to get it to work, but, considering that they use their own hash-based structures internally, I kind of doubt we'll be able to get OSCache to work right without a bit more real effort building a formal Terracotta integration module.

Considering that they are planning to replace OSCache with something else, we decided to stop this avenue. I suggested that we replace the OSCache implementation of the XWiki cache with a quick-and-dirty, simple Map based implementation, but, as Vincent pointed out, that would only show Terracotta working. It wouldn't be something they'd want to integrate into their codebase, since it wouldn't have eviction policies or any of the other nice stuff you expect from a cache implementation.

We decided, instead, to pause and revisit the clustered cache component when they get to reimplementing their cache in about a month.

Next Steps

The next steps as I see them are:
  • Get clustered caching configured when XWiki replaces OSCache with their next implementation.
  • Get Lucene clustering working. This will probably require making the Directory instance pluggable, so we can use a RAMDirectory when we want it clustered.
  • Refining the Terracotta configuration to include only what is needed. We used a *..* include; for startup performance, we might want to identify the set of classes that will actually need to be shared and switch to including those.
  • Tuning
  • Automated tests. We discussed the possibility of using the test infrastructure that we're building for the Terracotta forge projects. They also have some XWiki clusters that we can test on. Continued success, of course, will require building automated tests and running them in a continuous integration process.

1 Comments:

vmassol said...

Hi Orion,

Thanks for this complete article! I'm impressed by the precision of your blog post. It's exactly as it happened :)

We've linked your blog post on xwiki.org at http://www.xwiki.org/xwiki/bin/view/Main/ExternalLinks

Thanks again for offering space and food for our XWiki meetup. That was very kind of Terraccotta :)

Once we're back we'll try the JBossCache contribution we've received and when it's in place we'll try clustering with Terraccota.

Thanks again
-Vincent

3:56 PM  

Post a Comment

Links to this post:

Create a Link

<< Home