Grid Computing

Xgrid

Grid computing can be a powerful tool in any researcher’s computational arsenal. Most grids provide additional computational resources at no additional cost to the researcher. For IT professionals, they provide a way to divert some computational demand to lower-cost systems, freeing up valuable time on high-end computational resources.

While the term “grid” means many things to many people, there are practical applications of grid technologies today that can have a positive impact on research computing. Many are open source technologies, some are commercial solutions, and every technology tends to have both benefits and drawbacks for given applications. Whether a given grid technology is useful for your work often comes down to whether you and the developer define “grid” in the same way.

What is a grid?

There are many approaches to grid computing, each of which serves a unique need in the world of information technology. The following table shows some of the more common grid types:

Type Benefit
Desktop recovery or cycle harvesting
Idle systems on a network are put to work on grid tasks when they’re not otherwise in use. Uses existing hardware to accomplish extra work at no additional cost.
Resource aggregation and abstraction
Compute resources, such as computational clusters, are connected to one another from anywhere in the world. Schedulers assign tasks to these resources as they are free. Grid software automatically chooses the best available resource for a given job without requiring the end user to be an expert.
Load balancing
Multiple systems capable of performing the same tasks dynamically share the load of tasks, such as database transactions. Very scalable: the more systems you add to the grid, the more load they’ll handle.
Data grid
Copies of data exist in one or more locations on the grid with universally addressable locations. Anyone can access data from anywhere on the grid without knowing or caring where it exists.

Generally speaking, a given grid technology takes on one of these grid types. Condor from the University of Wisconsin is a cycle harvesting application. Products from Platform and components of the GLOBUS toolkit enable worldwide grids with resource abstraction. Systems like Oracle’s 10g database software enable straightforward load balancing across a grid, as well as some data grid capabilities. Xgrid, built into Mac OS X Server and Mac OS X, is perfect for desktop harvesting applications on local and wide area networks.

What’s a grid good for?

Building a computational grid out of systems you already own lets you easily provide additional resources to researchers without buying new hardware and without placing undue burden on administrators. By installing some additional software, your systems will take on tasks whenever they’re idle, increasing the utility of systems that would otherwise do nothing at night and on weekends.

Grids that abstract resources from around the world are more complicated, but they provide a powerful tool to researchers who need access to high-end systems without directly managing their own. Take CERN, for example, where everyone needs computational power, but few who collaborate with CERN actually work there. How do you provide resources to different researchers from various countries? A grid that makes everyone’s system available to any authenticated user on the grid allows access to large pools of computational resources without requiring any one user to have ownership of the entire pool.

How do I build a grid?

Whether you’re a researcher or IT professional, building a cycle harvesting grid can be a fairly straightforward endeavor, especially given the technologies that have matured over the last few years. Xgrid makes it easy to aggregate idle desktop power into a powerful computational grid.

Leopard

Installing agent software onto the systems in the grid is usually easy. For systems running Mac OS X Leopard or Tiger, the software is already installed. For systems running Mac OS X Panther, you can download a free package installer from Apple and install it. Third-party software would require a separate installation.

Grids require a controller or scheduler. With Xgrid, any system running Mac OS X Leopard or Tiger Server already has the Xgrid controller built in. All you have to do is turn it on by specifying some basic parameters. Your agents can then connect to the controller and make themselves available to the grid.

Finally, clients who submit jobs into the grid must be configured with the appropriate software. As you might expect, any system running Mac OS X Leopard or Tiger already has the appropriate software. Just type “xgrid” at the command line, or use any application that takes advantage of the Xgrid APIs in Mac OS X.

Clients and agents automatically discover the controller using Bonjour, Mac OS X’s built-in service discovery protocol. System administrators can specify controllers manually and limit the use of the cluster based on LDAP- and Kerberos-based authentication and credentials.

What can I run on the grid?

In theory, anything that can be run on a cluster can be run on a grid. In reality, however, the characteristics of a grid limit the practical use to problems with specific characteristics. The most amenable problems are those that are “embarrassingly parallel,” meaning they have data sets that can be broken into independent pieces. Good examples are BLAST searching or batch image processing.

Generally, any application that can be executed from the command line can be run on a grid. If it’s capable of having its operating instructions set from the command line, you can easily execute multiple copies of the application at once.

Let’s say you have an image processing script that takes images from a digital microscope, normalizes the luminosity of the images with a set of known parameters, converts the images from RGB to CMYK and tags the images with Spotlight data based on the device used to capture them. All of this could be done in a simple shell script with facilities available from the command line in Mac OS X. If your digital microscope places 2,000 images on a file server, each Xgrid agent can grab a different subset of images, process them and return them to a known location. Such processes are ideal for grid deployments using Xgrid, Condor or other desktop recovery mechanisms.