January 07, 2004

XGrid

Today, Apple introduced Xgrid. What is Xgrid and why should one care? This article describes my findings on Xgrid. Everything is available in the documentation or somewhere on the web, but this article presents a quick overview.

The second part of this article, entitled "Getting acquainted with Xgrid", is available here. The third part "Xgrid: Povray example" is available here.

The announcement and files


First, to get Xgrid, go to The Advanced Computation Group and download the disk image. Mount it so you can get access to the documentation files. I will refer to the Xgrid 1.0 disk image that gets mounted as "The Xgrid disk".. The press release is also available.

What is XGrid?


XGrid allows you to take a program a run it on various machines in parallel in order to get the result faster. There are three players in an Xgrid calculation: the client is the computer who wants to initiates a calculation (i.e. the one running /Applications/Xgrid.app), the controller is the actual computer who will initiate the calculation and the agents are the computers performing the calculation. A given computer can act as any of the three, even at the same time (you set both the controller and agent in System Preferences).

Xgrid is targeted towards computations that take a very long time (several hours). Typical applications that gain from this are: Monte Carlo calculations, 3D rendering, and other calculations that can be broken in several sub-tasks that don't affect each other. Apple provides a few examples, the most obvious is Mandelbrot: the calculation of the Mandelbrot map at a given point does not depend on the result at another point. Hence, one can split the whole map in sub-maps and ask the agent computers to perform their part of the calculation.

Xgrid does not perform the calculation. Actually, Xgrid does not know squat about math or science. Even worse (or better?), Xgrid does not even know you are trying to "compute" something. Xgrid provides the basic infrastructure so that one computer can talk to several others, run a command and get the result. That's it. It is based on BEEP, which is a (new?) HTTP-like protocol. You can get very good information on it here and there, but I will come back to it later. BEEP is the plumbing to do the talking.

The examples


When you follow the documentation of the Read Me First.pdf file (on the Xgrid disk), you can quickly run the Mandelbrot program or a program called shell. There is also a program that allows you to run any abitrary Unix command on all the agents.

The shell program requires particular attention because the source code is also provided. The shell program runs any command that is available on the agent's machine. The real question I have is this: for the Shell program or the Mandelbrot program, does the agent run its local copy (which it finds in /Library/Xgrid/Plug-ins/Mandelbrot.xgplug/ for instance), or does it receive a copy over the connection from the controller? I suspect it is the former, which would make everything less useful than it appears: you would need to have a local copy of your program installed on all the machine you want to run it on. Hence, if you have some scientific program you've made, you would need to find the agents and copy the files onto them and always make sure they have the proper version. That in itself would defeatd the purpose of rendezvous: you might not even know where the agents are and you highly likely don't have access to them anyway, let alone administrator access. [Note, Jan 10th: However, the custom plug-in allows one to set an arbitrary program name and a working directory which may even contain files. Upon completion, the directory is copied back to the "Destination directory". More on that in another article.]

The source code provided by Apple (the Shell program) does not give enough information to get to the guts of Xgrid: one must derive a class from XgJobViewController and override a few functions, and we don't have the code for that class. Hence, the details of the Xgrid protocol are kind of hidden, which makes me scratch my head more than I should. And this brings me to the last section.

What Xgrid is not


If you read the FAQ on the Xgrid1.0 disk image, you will find question 14 and 10:

14 What about other software clustering technologies (MPI)?

• Xgrid is not a replacement for MPI. MPI is an API that enables programmers to write portable parallel applications, whereas Xgrid is a suite of applications and daemons which enables scientists to run distributed computations using a simple Mac OS X application.
• An Xgrid plug-in could be written and used as a replacement for programs such as mpirun, which coordinate the start and stop of MPI applications on a cluster of computers. However, no such plug-in is included with this release of Xgrid.

10 Can I use Xgrid with other UNIX-based computers?

• The short answer is no.
• The long answer is that Xgrid uses an XML property list protocol built on top of BEEP for all of its inter-computer communication and coordination, and because these protocols are open, it is possible a client, agent, or controller could be written to run on other UNIX-based computers and interoperate with Xgrid. However, no such programs have been written.

(Bold passages by me). MPI (Message Passing Interface) is the standard for parallel computation, at least in academia. It allows you to easily split a computation in sub-tasks, execute the sub-tasks on other computers that you specifiy manually in some configuration file or on the command line. How MPI talks to the other nodes is irrelevant: it just does and one should not care. However, MPI provides facilities to collect all the results of a calculation and "sum" them, which is something that Xgrid does not provide. Xgrid provides the piping and finds the agents to perform a task, but that's it. What I don't understand is how one can take the current MPI programs (with all the convenient functions for "summing" results) and use them in Xgrid. Apple alludes to the fact that they at least thought of it (I suspect they even have some kind of solution), but I just don't understand, since MPI has its own communication scheme. What do we need here? Some kind of xGridMPI? I am not sure.

But really, what I do know for sure is this: although some of us are lucky enough to have an OS X machine on the desk at work, most people around us don't. Moreover, the real powerful machines for calculations in Universities are Unix-based and they aren't running OS X. Hence, it is critical that the protocol that Xgrid implements (what is the controller asking the agents to do and how) be made public so that Xgrid agents can be programmed for Linux, SunOS, IRIX, etc. Since BEEP has been implemented on tons of architectures (see http://www.beepcore.org/), the base plumbing is there for a brave soul to implement the Xgrid client, agent and controller on their machine of choice (and rendezvous). Mac OS X will be the best machine from which to initiate the calculation, but as long as Xgrid does not interact with other architectures, its adoption in academia will be quite limited. We don't all have 1100 G5 in our labs.

Wrap up

Xgrid looks good and removes a lot of complexity in managing parallel computations, but how one tailors it to suit ones needs is not clear to me. If it is required to recreate the functionalities of MPI, then I don't see the gain in using Xgrid (so far) considering the time investment. Moreover, how Xgrid differs from Pooch is also unclear. [Added Jan 8th: Actually Dauger has a FAQ about the difference between Xgrid and Pooch. This is it: Pooch does MPI, Xgrid does not. The discussion above is correct.]

The second part of this article is available here.

Other stories


Toxic Software

Posted by dccote at January 7, 2004 12:47 AM | TrackBack
Comments

www.beep-core.org links are broken - use www.beepcore.org instead.

Posted by: Jonathan Wight at January 7, 2004 02:49 PM

Just a single and direct information: Apple will never develop anything for PCs hardware or any other OS. They are a HARDWARE company, so... they sell hardware.
If they develop Xgrid or anything else to another plataform but Macintosh will loose money. They did in the past and today they know that develop for others are not a good idea.

So... buy a Mac or use others cluster solutions. Buy a Mac or do not have these great Apple softwares. I loved Xgrid and I used Pooch from UCLA, so... I use Macs for clusters from a long time and I am not suprised with Xgrid, it is similar with Pooch (used from a long time with Macs before MacOS X).

Posted by: Alberto V. M. at January 14, 2004 09:11 PM

Have you ever heard of the iTunes Music Store for Windows? Apple will develop software for other platforms if it makes sense. Also, they release things like Rendezvous and QuickTime Streaming Server for other platforms via Open Source if they don't want to do it themselves.

Posted by: Anonymous Coward at January 15, 2004 03:23 PM