MPICH-G2
What is MPICH-G2?
MPICH-G2 is a grid-enabled implementation of the
MPI v1.1 standard.
That is, using services from the Globus Toolkit®
(e.g., job startup, security), MPICH-G2 allows you to couple multiple
machines, potentially of different architectures, to run MPI applications.
MPICH-G2 automatically converts data in messages sent between machines of
different architectures and supports multiprotocol communication by
automatically selecting TCP for intermachine messaging and (where available)
vendor-supplied MPI for intramachine messaging.
MPICH-G2 is a complete redesign and implementation of our previous
implementation MPICH-G
(see How does MPICH-G differ from MPICH-G2?).
It is implemented as one of the devices (called the globus2
device) of the popular
MPICH library,
which in turn was developed and is distributed by the MPICH group
led by Bill Gropp
and Ewing (Rusty) Lusk
of the Mathematics and
Computer Science Division at
Argonne National Laboratory.
Should I use MPICH-G2?
One important class of problems is those that are
distributed by nature, that is, problems whose solutions
are inherently distributed, for example, remote visualization
applications in which computationally intensive work producing
visualization output is performed at one location, perhaps as an MPI
application running on some massively parallel processor (MPP), and the
images are displayed on a remote high-end (e.g., IDesk, CAVE) device. For
such problems, MPICH-G2 allows you to use MPI as your programming model.
A second class of problems is those that are distributed by design,
in which you have access to multiple computers, perhaps at multiple sites
connected across a WAN, and you wish couple
these computers int a computational grid, or simply grid.
Here MPICH-G2 can be used to run your application
using (where available) vendor-supplied implementations of MPI for
intramachine communication and TCP for intermachine communication.
In one scenario illustrating this second class of problems, you have a
cluster of workstations.
Here Globus services made available through MPICH-G2 provide
an environment in which you can conveniently launch your MPI application.
An example is of this scenario is the
Grid
Application Development Software (GrADS) Project.
In another scenario you have an MPI application that runs on
a single MPP but have problem sizes that are too large for any
single machine you have access to. In this situation a wide-area
implementation of MPI like MPICH-G2
may help by enabling you to couple multiple MPPs in a single execution.
Making efficient use of the additional CPUs
that are distributed across a LAN and/or WAN typically requires modifying
the application to adjust to the relatively poor latency and bandwidth
introduced by the intermachine communication. Two example applications
are Cactus (winner
of the Gordon Bell Award at SuperComputing 2001,
see MPI-Related Papers) and
Overflow(D2) from Information
Power Grid (IPG).
MPICH-G2 feature/release history
New MPICH-G2 features in:
MPICH-G2 in MPICH v1.2.6.
This is primarily a bug fix release (i.e., no new features). The
bug fixed in this release are as follows:
- Fixed bugs in collective operations.
There were a couple of bugs in the implemenation of the collective
operations that, under certain conditions, would result in
segmentation faults. We think we have found and fixed all
known problems in this area.
- Fixed MPI 2.0 functions.
On certain systems where a vendor MPI was available there
was a bug in our implementation of the client/server
functions from the MPI 2.0 standard. This bug has been fixed.
MPICH-G2 in MPICH v1.2.5.3.
This is primarily a bug fix release (i.e., no new features). The
bug fixed in this release are as follows:
- MPICH-G2 now builds with Globus Toolkit 2.x libraries distributed in Globus 3.0 or later.
See How
do I acquire/install MPICH-G2 using Globus v3.x? for details.
NOTE: MPICH-G2 does not work with GT 3.2 or
3.2.1 although there are Globus patches that can be applied
to GT 3.2.1 so that it can work with MPICH-G2
(see Things that don't work or are missing
in MPICH-G2).
- Fixed MPICH-G2 collective operations.
There was a memory leak that caused applications to segmentation
fault should they call collective operations (e.g., MPI_Bcast)
many times.
- Fixed bug that limited the number of processes that could be launched.
Prior to this release certain conditions would limit the
number of processes that could be launched with mpirun.
New MPICH-G2 features in MPICH v1.2.5.1.
Due to an unfortunate mishap in the MPICH release process MPICH
v1.2.5 was released at a time when MPICH-G2 was in a non-working
state, and therefore, MPICH-G2 is not supported for
MPICH v1.2.5. You will need to bypass v1.2.5 and acquire v1.2.5.1
or later.
We have made significant changes and additions to MPICH-G2
in v1.2.5.1 which are all described in detail later in this section.
The additions utilize new features in Globus (e.g., GridFTP) which
require that MPICH-G2 be configured with Globus v2.2 or later.
Also, some of the changes required us to change MPICH-G2's TCP
wire protocol thus rendering MPICH-G2 v1.2.5.1 incompatible
with previous versions of MPICH. Therefore,
when running an MPICH-G2 application across multiple sites in which
many versions of MPICH-G2 may be involved (conceivably
each site may have installed a unique version of MPICH-G2)
if even one site is using MPICH v1.2.5.1 or later
then all sites must upgrade their MPICH-G2
intallation to MPICH v1.2.5.1 or later.
Here is a list of the additions and changes present in MPICH v1.2.5.1,
each of which is described in detail below.
-
MPICH-G2 over MPICH-based vendor-MPI
Earlier releases of MPICH-G2 could not use other MPICH-based
implementations of MPI as their underlying vendor-supplied
MPI. In other words, MPICH-G2 could use SGI's MPI on an
Origin2000 or IBM's MPI on an IBM-SP, but it could not
use MPICH-GM on a Linux cluster equipped with a Myrinet
switch.
As of MPICH v1.2.5.1 that restriction no longer exists.
MPICH-G2 may now use even MPICH-based implementations
of MPI (other than MPICH-G2, that is). This requires
that the underlying MPICH-based implementation be
an MPICH version v1.2.5 or later and that mpirun
of that underlying MPICH exports environment variables
to the application. Note also that you will need the
Grid Packaging Toolkit (GPT) v2.2.8 or later to build
an MPI flavor of Globus which uses an MPICH-based MPI
implemenation.
Both Globus and MPICH-G2 make extensive use of environment
variables. When running your MPICH-G2 application on
a machine that has been equipped with a vendor-supplied MPI
you must specify (jobtype=mpi) in your RSL
(see Once installed, how do I use MPICH-G2?
for details). Doing so triggers the use of the vendor-supplied
MPI's mpirun to launch the application on that system.
It is imperative that the enviroment be exported to the
application. See in the
Testing vendor-supplied MPI
mpirun's ability to export environment in the
Troubleshooting
section for a program to test whether your vendor-supplied MPI's
mpirun exports environment variables to the application.
Return to New MPICH-G2 features in MPICH
v1.2.5.1.
-
Parallel TCP streams over point-to-point TCP channels
In some situations there can be significant performance
gains when sending large messages over TCP by opening
mutiple sockets between the two endpoints,
partioning the large message into packets, sending
those packets in parallel using those multiple sockets,
and re-assembling the large message on the receiving side
(see "High-Resolution Remote Rendering of Large
Datasets in a Collaborative Environment" in
MPI-Related Papers). Using the
Globus GridFTP
client library we now make that easy to do in an MPI
application by simply assigning three values and using
MPI_Attr_put.
The use of parallel TCP streams is not supported
in installations of MPICH-G2 that were configured using
threaded flavors of Globus..
To illustrate how to use parallel streams between two
processes that communicate over TCP we provide here an
example MPI application mstream.c and its associated Makefile.
We provide links to the files here, however if you want
to download these files do not cut/paste them from
the these links or text below. This will not work as cut/paste
changes tab characters into multiple spaces which will not
work for make. To download these files, right click on each
file link below and 'Save Link As ...'.
If you download these files, edit the Makefile by changing
MPICH_INSTALL_PATH to your MPICH-G2 installation
directory.
The files are shown here for your review.
mstream.c
#include <mpi.h>
int main(int argc, char **argv)
{
int numprocs, my_id;
struct gridftp_params gfp; /* MPICH-G2 structure in mpi.h */
MPI_Init(&argc;, &argv;);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs;);
MPI_Comm_rank(MPI_COMM_WORLD, &my;_id);
if (my_id == 0 || my_id == 1) {
/* must set these three fields */
gfp.partner_rank = (my_id ? 0 : 1);
gfp.nsocket_pairs = 64;
gfp.tcp_buffsize = 256*1024;
MPI_Attr_put(MPI_COMM_WORLD,
MPICHX_PARALLELSOCKETS_PARAMETERS,
&gfp;);
} /* endif */
/*
* from this point all messages exchanged between
* MPI_COMM_WORLD ranks 0 and 1 will be automatically
* partitioned and transported over parallel sockets
*/
MPI_Finalize();
} /* end main() */
Makefile
#
# assumes MPICH-G2 was installed in /usr/local/mpich
#
MPICH_INSTALL_PATH = /usr/local/mpich
mstream: force
$(MPICH_INSTALL_PATH)/bin/mpicc -o mstream mstream.c
force:
clean:
/bin/rm -rf *.o mstream
Return to New MPICH-G2 features in MPICH
v1.2.5.1.
-
MPICH-G2 now uses Globus Callback Spaces,
can now be configured with threaded flavors of Globus
MPICH-G2, through its use of
Globus-I/O
for all its TCP messaging, makes extensive use of the Globus
event handling system which, prior to Globus v2.2, maintained a
single "event space" for the entire application. This had
the undesired (and unforseen) side effect that if an
MPICH-G2 application or library that it linked also
used the Globus event handling system then MPICH-G2 polling
for events would inadvertently trigger events in the
third party library, or vice versa, which proved problematic.
Such a scenario existed with our partners in the
GrADS Project. This inspired the creation of discrete
callback spaces in the Globus event system which
solved this problem. As a result, MPICH-G2 has been
modified to now use the new callback spaces which enables
MPICH-G2 applications or other libaries that it may link
to use the Globus event handling system without risk of
intefering with MPICH-G2's Globus events.
The
GrADS project also required threaded flavors of Globus
so now MPICH-G2 may be configured with threaded flavors
of Globus. Note that this does not mean that
MPICH-G2 is thread-safe. The MPICH library itself is
not thread-safe, and therefore, neither is MPICH-G2.
This means that although MPICH-G2 may now be configured with
threaded flavors of Globus and MPICH-G2 applications may
even create multiple threads we still have the restriction
that at most one thread per process may call MPI
functions.
Return to New MPICH-G2 features in MPICH
v1.2.5.1.
-
More collective operations now topology-aware
Six new collective operations are now also
topology-aware (see "Exploiting Hierarchy
in Parallel Computer Networks to Optimize Collective Operation
Performance" in
MPI-Related Papers for a full
discussion of topology-aware collective operations).
- MPI_Allgather
- MPI_Allgatherv
- MPI_Alltoall
- MPI_Allreduce
- MPI_Reduce_scatter
- MPI_Scan
This is in addition to the five collective operations
that were made topology-aware and released in
MPICH v1.2.2.3
(see Topology-aware collective
operations in that section). The remaining three
collective operations (MPI_Gatherv, MPI_Scatterv, and
MPI_Alltoallv) are not topology-aware. These
functions still use the MPICH default binomial tree-based
algorithm.
Return to New MPICH-G2 features in MPICH
v1.2.5.1.
-
Improved previous topology-aware collective operations
The topology-aware collective operations MPI_Gather and
MPI_Scatter have been improved by removing memory copies.
Also, a bug that was found in MPI_Barrier has been corrected.
Return to New MPICH-G2 features in MPICH
v1.2.5.1.
-
Improved topology discovery mechanism
The implementation of the topology discovery mechanism
(see Topology discovery mechanism
in section MPICH v1.2.2.3 for
details on topology discovery) has been improved to now
use attribute caching which is correct and safer.
Return to New MPICH-G2 features in MPICH
v1.2.5.1.
New MPICH-G2 features in MPICH v1.2.3.
New MPICH-G2 features in MPICH v1.2.2.3.
each of which is described in detail below.
- Added some functions from the MPI Standard 2.0
We have implemented the following functions from section 5.4 of
the MPI Standard 2.0:
- Server routines (MPI Standard 2.0, Section 5.4.2):
MPI_Open_port, MPI_Close_port, and MPI_Comm_accept.
- Client routines (MPI Standard 2.0, Section 5.4.3):
MPI_Comm_connect.
Here is an example MPI client/server application, mserver.c and
mclient.c, and its associated Makefile. The files are shown
here for your review, however if you want to download these
files do not cut/paste them from the text below.
This will not work as cut/paste changes tab characters into
multiple spaces which will not work for make. To download
these files, right click on each file link below and
'Save Link As ...'.
If you download these files, edit the Makefile by changing
MPICH_INSTALL_PATH to your MPICH-G2 installation directory.
After editing the Makefile and following the steps in the section
Once it's installed, how do I use MPICH-G2?,
make both the client and server by typing the following:
% make all
<MPICH_INSTALL_PATH>/bin/mpicc -o mserver mserver.c
<MPICH_INSTALL_PATH>/bin/mpicc -o mclient mclient.c
%
Then launch the server being careful to note its output
which should look something like this:
% <MPICH_INSTALL_PATH>/bin/mpirun -np 1 mserver
m1.utech.edu 36256
Then in a separate shell start the client by typing the following:
% <MPICH_INSTALL_PATH>/bin/mpirun -np 1 \
mclient "m1.utech.edu 36256"
being careful to cut/paste the output from mserver and
passing it (as a single command line argument surrounded
in double quotations) to mclient. After starting the
client, the total output from the server should look
like this:
% <MPICH_INSTALL_PATH>/bin/mpirun -np 1 mserver
m1.utech.edu 36256
after sending passed_num 111
%
and the output from the client should look like this:
% <MPICH_INSTALL_PATH>/bin/mpirun -np 1 \
mclient "m1.utech.edu 36256"
after receiving passed_num 111
%
Here are the contents of mserver.c, mclient.c, and Makefile for your
review, but remember, please do not cut/paste this text.
Download by using the links above.
mserver.c
#include <stdio.h>
#include <mpi.h>
main(int argc, char **argv)
{
int my_id;
char port_name[MPI_MAX_PORT_NAME];
MPI_Comm newcomm;
int passed_num;
MPI_Init(&argc;, &argv;);
MPI_Comm_rank(MPI_COMM_WORLD, &my;_id);
passed_num = 111;
if (my_id == 0)
{
MPI_Open_port(MPI_INFO_NULL, port_name);
printf("%s\n\n", port_name); fflush(stdout);
} /* endif */
MPI_Comm_accept(port_name,
MPI_INFO_NULL,
0,
MPI_COMM_WORLD,
&newcomm;);
if (my_id == 0)
{
MPI_Send(&passed;_num, 1, MPI_INT, 0, 0, newcomm);
printf("after sending passed_num %d\n", passed_num);
fflush(stdout);
MPI_Close_port(port_name);
} /* endif */
MPI_Finalize();
exit(0);
} /* end main() */
mclient.c
#include <stdio.h>
#include <mpi.h>
main(int argc, char **argv)
{
int passed_num;
int my_id;
MPI_Comm newcomm;
MPI_Init(&argc;, &argv;);
MPI_Comm_rank(MPI_COMM_WORLD, &my;_id);
MPI_Comm_connect(argv[1],
MPI_INFO_NULL,
0,
MPI_COMM_WORLD,
&newcomm;);
if (my_id == 0)
{
MPI_Status status;
MPI_Recv(&passed;_num,
1,
MPI_INT,
0,
0,
newcomm,
&status;);
printf("after receiving passed_num %d\n",
passed_num);
fflush(stdout);
} /* endif */
MPI_Finalize();
exit(0);
} /* end main() */
Makefile
#
# assumes MPICH-G2 was installed in /usr/local/mpich
#
MPICH_INSTALL_PATH = /usr/local/mpich
all: mserver mclient
mserver: force
$(MPICH_INSTALL_PATH)/bin/mpicc -o mserver mserver.c
mclient: force
$(MPICH_INSTALL_PATH)/bin/mpicc -o mclient mclient.c
force:
clean:
/bin/rm -rf *.o mclient mserver
Return to New MPICH-G2 features in MPICH v1.2.2.3.
- Topology-aware collective operations
The following collective operations are now
topology-aware (see "Exploiting Hierarchy
in Parallel Computer Networks to Optimize Collective Operation
Performance" in
MPI-Related Papers for a full
discussion of topology-aware collective operations).
- MPI_Barrier
- MPI_Bcast
- MPI_Gather
- MPI_Scatter
- MPI_Reduce
As a grid-aware MPI, MPICH-G2 is often used to run
applications on machines that are separated by short (LAN)
and long (WAN) distances which, in turn, results in
different point-to-point communication performance. Topology-aware
collective operations use point-to-point communication
patterns that attempt to minimize communication across
the slowest links (i.e., WAN) and maximize communication across
the fastest links (i.e., intra-machine messaging). Under this
strategy MPICH-G2 orders (from slowest to fastest)
the various communication methods based on performance asserting
WAN-TCP < LAN-TCP < intra-machine TCP < vendor-supplied MPI.
To illustrate, consider an MPICH-G2 application running on three
machines as depicted in the following diagram:
m1.utech.edu m2.utech.edu c1.nlab.gov
p0-p9 p10-p19 p20-p29
which could have been started using the following RSL in which
m1.utech.edu and c1.nlab.gov are equipped with vendor-supplied
MPI and m2.utech.edu is not:
+
( &(resourceManagerContact="m1.utech.edu")
(count=10)
(jobtype=mpi)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0))
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
( &(resourceManagerContact="m2.utech.edu")
(count=10)
(label="subjob 1")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 1))
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
( &(resourceManagerContact="c1.nlab.gov")
(count=10)
(jobtype=mpi)
(label="subjob 2")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 2))
(directory=/users/smith)
(executable=/users/smith/myapp)
)
Now consider an MPI_Bcast over MPI_COMM_WORLD rooted at
P0 which is running on m1.utech.edu. The
broadcast is implemented using a sequence of broadcasts,
involving selected processes at each stage,
over each of the communication methods (i.e., first
broadcast over WAN-TCP, then LAN-TCP, then intra-machine-TCP,
and then finally vendor-supplied MPI). The broadcast above
would be implemented in the following sequence of broadcasts.
At each level there are one or more sets of processes. Each
set represents a single broadcast in which the first process
is always the root.
- WAN-TCP level:
- LAN-TCP level (in parallel):
- intra-machine-TCP level (in parallel):
- {P0} *
- {P10,...,P19}
- {P20}*
- vendor-supplied MPI level (in parallel):
- {P0,...,P9}
- {P20,...,P29}
* Broadcasts over sets of single processes are essentially
no-ops. They are shown in the sequenences above for completeness
and clarity.
Let us now assume further that m1.utech.edu and m2.utech.edu
are on the same LAN. We can "tell" MPICH-G2 that fact
using the environment variable GLOBUS_LAN_ID in the RSL.
+
( &(resourceManagerContact="m1.utech.edu")
(count=10)
(jobtype=mpi)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
(GLOBUS_LAN_ID foo))
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
( &(resourceManagerContact="m2.utech.edu")
(count=10)
(label="subjob 1")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 1)
(GLOBUS_LAN_ID foo))
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
( &(resourceManagerContact="c1.nlab.gov")
(count=10)
(jobtype=mpi)
(label="subjob 2")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 2))
(directory=/users/smith)
(executable=/users/smith/myapp)
)
which now coalesces m1.utech.edu and m2.utech.edu into a
single local area cluster and improves the broadcast
as follows. Again,
at each level there are one or more sets of processes. Each
set represents a single broadcast in which the first process
is always the root.
- WAN-TCP level:
- LAN-TCP level (in parallel):
- intra-machine-TCP level (in parallel):
- {P0} *
- {P10,...,P19}
- {P20}*
- vendor-supplied MPI level (in parallel):
- {P0,...,P9}
- {P20,...,P29}
* Broadcasts over sets of single processes are essentially
no-ops. They are shown in the sequenences above for completeness
and clarity.
Return to New MPICH-G2 features in MPICH v1.2.2.3.
- Topology discovery mechanism
Note: The topology discovery mechanism described in this
section works directly only from C/C++. It does not
work directly from F77/F90. Fortran applications that
want to make use of MPICH-G2's topology discovery mechanism
should write a C function that calls the MPICH-G2 routines
described in this section, call that C function from their
Fortran application, and have their C function return the
topology information to the Fortran application.
MPICH-G2 communicates over TCP or a vendor-supplied MPI (vMPI).
However, some of MPI's collective operations are
implemented in MPICH-G2 by making a distinction between WAN-TCP,
LAN-TCP, and intra-machine TCP (see
Topology aware collective operations).
Some MPI applications could make use of this TCP stratification
by creating communicators (e.g., MPI_Comm_split)
that cluster processes based on this topology information.
To that end, MPICH-G2 has added two attributes associated with
every communicator, MPICHX_TOPOLOGY_DEPTHS and
MPICHX_TOPOLOGY_COLORS.
Described briefly, MPICH-G2 processes communicate using either
TCP or, where available, the preferred vMPI.
Associated with this multi-protocol design is something we
call topology depth. Processes that communicate
using TCP only have a topology depth=3 (lvl0=WAN-TCP,
lvl1=LAN-TCP, and lvl2=intra-machine TCP)
while processes that can communicate using vMPI have a topology
depth=4 (lvl3=vMPI).
MPICHX_TOPOLOGY_DEPTHS is a vector (length = communicator size)
of integers in which the ith element is
the topology depth of the ith-ranked process
in the communicator.
Using these ordered lists of multi-method communication we
can ask the question "Can process A communicate with process
B at multi-protocol level i?" For example, any two processes
can communicate with each other at lvl0=WAN-TCP.
Processes can communicate at lvl1=LAN-TCP if and
only if they are in the same LAN cluster (see
Topology aware collective operations),
they can communicate at lvl2=intra-machine TCP if and
only if they are in the same RSL subjob (see
Once it's installed, how do I use MPICH-G2?),
and they can communicate at lvl3=vMPI if and only if
they are in the same RSL subjob and that subjob specifies
(jobtype=mpi). We capture this notion of "being able
to communicate at a particular level" by defining something that
we call color. At any given level two processes
have the same color (an integer value always >=0) if and
only if they can communicate at that level.
MPICHX_TOPOLOGY_COLORS is a vector (again, length = communicator
size) of integer pointers in which the ith
element is, in turn, a pointer to a vector of integers
(length = MPICHX_TOPOLOGY_DEPTHS[i]) and MPICHX_TOPOLOGY_COLORS[i][j]
is the color of the ith-ranked process at level
j (note that those processes that cannot communicate
over vMPI have a topology-depth=3, and therefore, do not have
a color defined at MPICHX_TOPOLOGY_COLORS[i][3]).
To illustrate the use of MPICHX_TOPOLOGY_DEPTHS and
MPICHX_TOPOLOGY_COLORS we provide an
example MPI application, report_colors.c,
and its associated Makefile. The files are shown here
for your review, however if you want to download these files
do not cut/paste them from the text below. This will
not work as cut/paste changes tab characters into multiple spaces
which will not work for make. To download these files, right click
on each file link below and 'Save Link As ...'.
If you download these files, edit the Makefile by changing
MPICH_INSTALL_PATH to your MPICH-G2 installation
directory. After editing the Makefile and following the steps
in the section Once it's installed, how do
I use MPICH-G2?, make report_colors by typing the following:
% make report_colors
<MPICH_INSTALL_PATH>/bin/mpicc -o report_colors \
report_colors.c
%
Here are the contents of report_colors.c and Makefile for your
review, but remember, please do not cut/paste this
text. Download by using the links above.
report_colors.c
#include <mpi.h>
#include <stdio.h>
void print_topology(int me,
int size,
int *depths,
int **colors)
{
int i, j, max = 0;
FILE *fp;
char fname[100];
sprintf(fname, "colors.%d", me);
if (!(fp = fopen(fname, "w")))
{
fprintf(stderr,
"ERROR: could not open fname %s\n",
fname);
MPI_Abort(MPI_COMM_WORLD, 1);
} /* endif */
fprintf(fp, "proc\t");
for (i = 0; i < size; i++)
fprintf(fp, "% 3d", i);
fprintf(fp, "\nDepths\t");
for (i = 0; i < size; i++)
{
fprintf(fp, "% 3d", depths[i]);
if ( max < depths[i] )
max = depths[i];
} /* endfor */
for (j = 0; j < max; j++)
{
fprintf(fp, "\nlvl %d\t", j);
for (i = 0; i < size; i++)
if ( j < depths[i] )
fprintf(fp, "% 3d", colors[i][j]);
else
fprintf(fp, " ");
} /* endfor */
fprintf(fp, "\n");
fclose(fp);
return;
} /* end print_topology() */
int main (int argc, char *argv[])
{
int me, nprocs, flag, rv;
int *depths;
int **colors;
MPI_Init(&argc;, &argv;);
MPI_Comm_rank(MPI_COMM_WORLD, &me;);
MPI_Comm_size(MPI_COMM_WORLD, &nprocs;);
rv = MPI_Attr_get(MPI_COMM_WORLD,
MPICHX_TOPOLOGY_DEPTHS,
&depths;,
&flag;);
if ( rv != MPI_SUCCESS )
{
printf("MPI_Attr_get(depths) failed, aborting\n");
MPI_Abort(MPI_COMM_WORLD, 1);
} /* endif */
if ( flag == 0 )
{
printf("MPI_Attr_get(depths): depths not available...\n");
MPI_Abort(MPI_COMM_WORLD, 1);
} /* endif */
rv = MPI_Attr_get(MPI_COMM_WORLD,
MPICHX_TOPOLOGY_COLORS,
&colors;,
&flag;);
if ( rv != MPI_SUCCESS )
{
printf("MPI_Attr_get(colors) failed, aborting\n");
MPI_Abort(MPI_COMM_WORLD, 1);
} /* endif */
if ( flag == 0 )
{
printf("MPI_Attr_get(colors): depths not available...\n");
MPI_Abort(MPI_COMM_WORLD, 1);
} /* endif */
print_topology(me, nprocs, depths, colors);
MPI_Finalize();
return 0;
} /* end main() */
Makefile
#
# assumes MPICH-G2 was installed in /usr/local/mpich
#
MPICH_INSTALL_PATH = /usr/local/mpich
report_colors: force
$(MPICH_INSTALL_PATH)/bin/mpicc -o report_colors report_colors.c
force:
clean:
/bin/rm -rf *.o report_colors
The first thing to note is that the vectors depth and color
returned by MPI_Attr_get should not be freed by
the MPI application (i.e., it is erroneous to do so).
The second thing to note is that, for a given communicator,
every process in that communicator gets identical
values for both MPICHX_TOPOLOGY_DEPTHS and
MPICHX_TOPOLOGY_COLORS. This will always be the case
and provides a convenient mechanism for MPI applications to
create new grid-aware communicators that cluster
processes based on process topology. For example, one could
partition MPI_COMM_WORLD into LAN-TCP clusters adding the
following to report_colors.c:
MPI_Comm LANcomm;
MPI_Comm_split(MPI_COMM_WORLD,
colors[me][1],
0,
&LANcomm;);
or perhaps partitioning MPI_COMM_WORLD into vMPI clusters by
adding:
MPI_Comm Vcomm;
MPI_Comm_split(MPI_COMM_WORLD,
(depths[me] == 4 ? colors[me][3] : -1),
0,
&Vcomm;);
which has the added benefit(?) of placing all processes that
cannot communicate over vMPI into a single group (recall that
colors are defined to be >= 0). Alternatively, one could
partition MPI_COMM_WORLD into vMPI clusters by adding:
MPI_Comm Vcomm;
MPI_Comm_split(MPI_COMM_WORLD,
(depths[me] == 4 ? colors[me][3] : MPI_UNDEFINED),
0,
&Vcomm;);
in which case Vcomm is set to MPI_COMM_NULL for those processes
that cannot communicate over vMPI.
Return to New MPICH-G2 features in MPICH
v1.2.2.3.
- Setting IP address
range
It is now possible to specify a network interface
using the environment variable
MPICH_GLOBUS2_USE_NETWORK_INTERFACE in your RSL (see
Once it's installed, how do I use
MPICH-G2?). You may specify your network interface in
any of the following ways:
- specific address:
(MPICH_GLOBUS2_USE_NETWORK_INTERFACE
140.221.8.120)
- network address:
(MPICH_GLOBUS2_USE_NETWORK_INTERFACE
140.221.8.0/255.255.255.0)
- alt. network address:
(MPICH_GLOBUS2_USE_NETWORK_INTERFACE
140.221.8.0/24)
Return to New MPICH-G2 features in MPICH
v1.2.2.3.
- Setting port
range
It is now possible to specify TCP/IP port ranges
using the environment variable
GLOBUS_TCP_PORT_RANGE in your RSL (see
Once it's installed, how do I use
MPICH-G2?). For example,
(GLOBUS_TCP_PORT_RANGE "min max") where min and
max are the minimum and maximum TCP port numbers to be
used by the job.
Return to New MPICH-G2 features in MPICH
v1.2.2.3.
- Tuning TCP buffer
size
You may tune the TCP buffer size by using the environment
variable MPICH_GLOBUS2_TCP_BUFFER_SIZE in your RSL (see
Once it's installed, how do I use
MPICH-G2?). For example,
(MPICH_GLOBUS2_TCP_BUFFER_SIZE nbytes) where
nbytes is the size (in bytes) of the TCP buffer size.
Note that the value for MPICH_GLOBUS2_TCP_BUFFER_SIZE
must match between subjobs that communicate with each
other. Failure to match such values will result in
processes from one subjob "driving" TCP communication with
processes from the other subjob inefficiently.
Return to New MPICH-G2 features in MPICH
v1.2.2.3.
- Improved (scalable)
startup
Based on our experience reported in "Supporting
Efficient Execution in Heterogeneous Distributed Computing
Environments with Cactus and Globus" (see
MPI-Related Papers) we
reimplemented the bootstrapping code in MPI_Init, making
better use of the Globus inter- and intra-subjob communication
libraries, resulting in faster and scalable
(to many thousands of processes) startup.
Return to New MPICH-G2 features in MPICH
v1.2.2.3.
- Benchmarking collective
operations
Based on the timing methodology presented in "Accurately
Measuring MPI Broadcasts in a Computational Grid"
(see MPI-Related Papers) we have
made additions to the MPICH performance suite (distributed with
MPICH) that benchmark MPI_Bcast.
To trigger this new timing methodology to benchmark MPI_Bcast,
use -bcastalt instead of -bcast to start the test from
mpich/examples/perftest/{mpptest or goptest}.
Return to New MPICH-G2 features in MPICH
v1.2.2.3.
MPICH-G2 born! MPICH v1.2.1.
- MPICH-G2 enters the world with the release
of MPICH v1.2.1, released September 2000.
How do I acquire and install MPICH-G2?
In this section we discuss issues that pertain directly to the configuration
and installation of MPICH with the globus2 device. This section
is not intended to replace the MPICH Installation Manual distributed
with MPICH. You should read that manual before installing and configuring
MPICH and should use the information in this section to augment the
instructions found in the manual.
Before installing MPICH-G2 you must have already installed Globus.
The MPICH-G2 installation steps are slightly different for machines
equipped with Globus v1.1.4 and those equipped with Globus v2.0 or later.
- How do I acquire and install MPICH-G2 using Globus v2.x?
Follow the instructions in this section for machines equipped with
Globus v2.0 or later (you need Globus v2.2 or later for MPICH v1.2.5.1
or later). The
Globus website
provides a full set of
instructions on how to acquire and download Globus, and therefore,
we do not offer such instructions here. Described briefly, you will
need the Grid Packaging Toolkit (GPT) (you will need GPT v2.2.8 or
later to create MPI flavors of Globus that use MPICH-based MPI
implementations), the SDK and client bundles from
the Resource Management pillar, and possibly the server bundle
from the Resource Management pillar.
We provide here a detailed list of requirements of any Globus v2.0
or later installation that intends to be used to configure MPICH-G2.
- You must always source download any Globus code
that you acquire. Attempts to configuring MPICH-G2 using
binary downloads of Globus is not supported.
- You must install all Globus components under
the same GLOBUS_LOCATION.
- You must source download and install the Grid
Packaging Toolkit. The Grid Packaging Toolkit must be
installed before installing any other Globus
component.
Do not build an mpi flavor of the Grid
Packaging Toolkit.
- You must source download and install the SDK bundle
(for Globus libraries) from the Resource Managment pillar.
If you are installing Globus on a system that offers
a vendor-supplied MPI (e.g., SGI) and that vendor-supplied
MPI is not an MPICH-based implementation
(see In
choosing a flavor below) then you will need to build
an mpi flavor of the SDK bundle to
enable MPICH-G2 to utilize the vendor-supplied MPI for
intra-machine messages.
- You must source download an install the client bundle
(for Globus tools like globusrun and
grid-proxy-init) from the Resource Management pillar.
Do not build an mpi flavor of the client
bundle.
- Source downloading and installing the server bundle
from the Resource Management pillar is not required
and is necessary only if you wish to deploy your
own Globus gatekeeper. Note that Globus v2.0 or
later is compatible with Globus v1.1.3 or later
gatekeepers.
If you decide to source download and install the
server bundle, do not build an mpi flavor of
the server bundle.
- This item is a hint more than a requirement. Early
releases of Globus v2.0 had a minor bug the tool
gpt-postinstall. The simple workaround is
to execute gpt-postintall three times in a row.
- Before attempting to configure MPICH-G2 with your
installation of Globus v2.0 or later you should
test your installation using the
Globus-based
"hello, world" program.
You must pass the "hello, world" test before
you can continue with MPICH-G2 installation.
Any problems that you experience with the "hello, world"
test are not MPICH-G2 related problems, so
when contacting the Globus developers please take care
when describing your symptoms to not associate
your bug report with MPICH-G2 in any way. This will only
serve to slow down our repsonse time by having us route
the problem away from MPICH-G2.
You will need a version of MPICH v1.2.3 or
later if you want to install MPICH-G2 under Globus v2.0 or later.
Do the following on each machine you
intend to compile and run your MPICH-G2 application.
- Before configuring MPICH-G2, you will need an installation of
Globus v2.0 or later.
Set the environment variable
GLOBUS_LOCATION to the directory in which Globus
is installed (e.g., /usr/local/globus). This is
a requirement as of Globus v2.0. You will need this
environment variable set throughout the MPICH-G2
configuration/make/install process and whenever
running MPICH-G2 applications.
- Acquire MPICH v1.2.3 or later.
- Uncompress/untar MPICH.
% gunzip -c mpich.tar.gz | tar xvf -
- Configure MPICH specifying the globus2 device. The
MPICH configuration script is capable of accepting many
command-line configuration options. When configuring with
the globus2 device, you must use the -device=
and optionally -arch= and -prefix= options (both
described later in this section). You should avoid using any
other configuration options.
When configuring with the globus2 device you must
specify one of the Globus flavors (e.g., mpi, debug
or nodebug, threads, 32- or 64-bit). To see the complete list
of Globus flavors installed on your machine use
% ls $GLOBUS_LOCATION/etc/globus_core
The flavors that are available to you (i.e., installed on your
machine) are enumerated as files in that directory.
Globus flavors in that directory are typically named
flavor_<flavor_name>.gpt. For example,
Globus installation on a Solaris workstation might have the
following flavors:
flavor_vendorcc32dbg.gpt/
flavor_gcc32dbg.gpt/
Configure MPICH by specifying the globus2 device
and a Globus flavor, for example,
% cd mpich
% ./configure -device=globus2:-flavor=gcc32dbg
On architectures that allow you to distinguish between 32-
and 64-bit builds, you should explicitly use the -arch=
option during configuration. Here is an example of
configuring on 32-bit build on an SGI machine.
% ./configure -arch=IRIXN32 -device=globus2:-flavor=gcc32dbg
In this next example we configure a 64-bit build on an SGI
machine:
% ./configure -arch=IRIX64 -device=globus2:-flavor=gcc64dbg
In choosing
a flavor,
- If the flavor has mpi in its name
then you should choose one of those
mpi flavors. This triggers the use of
vendor-supplied MPI for intramachine communication
(as opposed to TCP, which is the default) which
typically delivers better latency and bandwidth.
Note that prior to MPICH v1.2.5.1 MPICH-G2
cannot be configured using an mpi
flavor of Globus if the MPI that Globus found/uses
is an MPICH-based implementation. As of v1.2.5.1
MPICH-G2 can be configured atop Globus mpi flavors
that use MPICH-based implementations.
- Unless debugging, avoid dbg flavors where
possible. They are less efficient.
- For versions of MPICH prior to v1.2.5.1, do not use
"threaded" flavors of Globus (e.g., flavors
with pthr, solthr, or sproc in
their name).
Note that threaded flavors of Globus
may be used for versions of MPICH v1.2.5.1 or later
but that does not mean that MPICH-G2 is
thread-safe. MPICH is not yet thread-safe
(see MPICH-G2 now
uses Globus Callback Spaces, can now be configured with
threaded flavors of Globus in
New MPICH-G2 features in MPICH v1.2.5.1 for
details).
- Build MPICH-G2 by typing make.
- Optional: Install MPICH-G2. If you specified
a --prefix directory during configuration you
install MPICH-G2 in that --prefix directory by typing
make install.
Return to How do I acquire and install MPICH-G2?
- How do I acquire/install
MPICH-G2 using Globus v3.x?
The Globus Toolkit libaries v2.x are distributed with Globus 3.x.
If you acquire Globus 3.x you will need
MPICH v1.2.5.3 or later
and then simply follow the directions found in
How do I
acquire/install MPICH-G2 using Globus v2.x?
Please note that MPICH v1.2.5.3 uses only the GT 2.x libraries
distributed in GT 3.x and therefore is not integrated
with any of the web services found in GT 3.x.
Please also note that in MPICH-G2 will not work with
GT 3.2 or GT 3.2.1 although there are Globus patches that can
be applied to GT 3.2.1 so that it can work with MPICH-G2
(see Things that don't work
or are missing in MPICH-G2).
Return to How do I acquire and install MPICH-G2?
- How do I acquire/install
MPICH-G2 using Globus v4.x?
The Globus Toolkit libaries v2.x are distributed with Globus 4.x.
If you acquire Globus 4.x you will need
MPICH v1.2.5.3 or later
and then simply follow the directions found in
How do I
acquire/install MPICH-G2 using Globus v2.x?
Please note that MPICH v1.2.5.3 uses only the GT 2.x libraries
distributed in GT 4.x and therefore is not integrated
with any of the web services found in GT 4.x.
COMING SOON: We have a version of MPICH-G2 that is
fully integrated with the web services distributed
in version v4.x of the Globus Toolkit. We are currently working
on that version to complete its integration with the new MPICH-2
library and expect to have our first release in the first quarter
of 2006.
Return to How do I acquire and install MPICH-G2?
Once it's installed, how do I use MPICH-G2?
Before using MPICH-G2 you must have already
acquired your
Globus security credentials. Then, on each machine you
intend to run your MPI application,
- you must have an account;
- Globus v1.1.4 or later and MPICH-G2 v1.2.1 or later must be installed;
- on those machines that you intend to type MPICH-G2's mpirun
and that are running Globus v2.0 or later, you must do one
of the following at least once before running your application;
- source $GLOBUS_LOCATION/etc/globus-user-env.csh, or
- $GLOBUS_LOCATION/etc/globus-user-env.sh
- a Globus gatekeeper (a daemon), configured with at least one
jobmanager service, must be running; and
- you must be a registered Globus user by having your Globus
ID (part of your Globus security credentials) placed into the
Globus gridmap file by the local Globus administrator.
Once these are done, you are ready to compile and execute your MPI application
using MPICH-G2 by following these steps:
- Compile your application on each machine you intend to run
using one of the MPICH-G2 compilers:
C Compiler |
<MPICH_INSTALL_PATH>/bin/mpicc |
C++ Compiler |
<MPICH_INSTALL_PATH>/bin/mpiCC |
Fortran77 Compiler |
<MPICH_INSTALL_PATH>/bin/mpif77 |
Fortran90 Compiler |
<MPICH_INSTALL_PATH>/bin/mpif90 |
Of course, if you are planning to run only on a cluster of
binary-compatible workstations that share a filesystem, it suffices
to compile your program only once.
- Launch your application using MPICH-G2 mpirun. Every
mpirun command under the globus2 device submits
a Globus Resource Specification Language Script, or
simply RSL script, to a Globus-enabled grid of computers.
Each RSL script is composed of one or more RSL subjobs,
typically one subjob for each machine in the computation. You may
supply your own RSL script
mpirun,
or you may have mpirun construct an RSL script
for you based on the arguments you pass to mpirun and the
contents of your machines file (discussed below). In either
case, it is important to remember that communication between nodes
in different subjobs is always done over TCP/IP, while the more
efficient vendor-supplied MPI is used only among nodes within the
same subjob.
You may terminate the entire job by hitting cntrl-c
in your mpirun window. Be careful to hit cntrl-c
only once, as hitting multiple times will foil clean
termination. Be patient; terminating all the processes on all
the machines cleanly can sometimes take a few minutes.
- Using mpirun to construct an RSL script for
you
You would use mpirun if you wanted to launch a
single executable file, which implies a set of
one or more binary-compatible machines that all share the
same filesystem (i.e., they can all access the executable
file).
Using mpirun to construct an RSL script for you
requires a machines file. The mpirun command
determines which machines file to use as follows:
- If a -machinefile argument
is specified to mpirun, it uses that; otherwise,
- It looks for a file named "machines" in the
directory in which you typed mpirun; and finally,
- it looks for
<MPICH_INSTALL_PATH>/bin/machines.
If it cannot find a machines file from any of those places,
then mpirun fails.
The machines file is used to list the computers upon
which you wish to run your application. Computers are listed
by naming the Globus jobmanager service on that machine.
For most Globus installations, the default
jobmanager service can be used, which requires
specifying only the fully qualified domain name. Consult your
local Globus administrator for the name of your Globus
jobmanager service.
Consider the following example in which we present a pair of
fictitious binary-compatible machines,
{m1,m2}.utech.edu, that have access to the same
filesystem. Here is what a machines file that uses the
default Globus jobmanager service on each machine might
look like.
"m1.utech.edu" 10
"m2.utech.edu" 5
The number appearing at the end of each line is optional
(default=1). It specifies the maximum number of nodes that
can be created in a single RSL subjob on each machine.
mpirun uses the -np specification by "wrapping
around" the machines file. For example, using the
machines file above mpirun -np 8 creates an
RSL consisting of a single subjob with 8 nodes on
m1.utech.edu; while mpirun -np 12 creates an RSL with
two subjobs where the first subjob has 10 nodes on
m1.utech.edu and the second has 2 nodes on m2.utech.edu;
and finally mpirun -np 17 creates an RSL with three
subjobs with 10 nodes on m1.utech.edu, followed by 5 nodes on
m2.utech.edu, and ending with 2 nodes on m1.utech.edu again.
Note that intersubjob messaging
is always communicated over TCP, even if the two
separate subjobs are the same machine.
- Using mpirun by supplying your own RSL
script
You would use mpirun supplying your own RSL script
if you were submitting to a set of machines that could not
run or access the same executable file (e.g., machines that
are not binary compatible and do not share a file system).
In this situation, you must use the Globus
Resource Specification Language (RSL) to write an
RSL script specifying the executable filename for each
machine. The RSL scripting language is very flexible but
can be rather complex. Here are some rules that you must
follow when writing your own RSL for MPICH-G2 applications.
Note that not all these rules are not required for all RSL
scripts, only for MPICH-G2 applications.
- Your RSL script must be a multirequest,
which requires that the first nonwhitespace
character must be "+".
- Each subjob must name a Globus jobmanager service using
(resourceManagerContact="<globus_jm_service>").
For most Globus installations <globus_jm_service>
will simply be the fully qualified domain name
of the machine the subjob will be executed on.
- Each subjob requires a unique index, starting with 0
and counting up consecutively from there. The
unique index must appear in two places in each subjob;
(label="subjob 0") and
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)).
- For those subjobs running on machines equipped with
vendor-supplied implementations of MPI and MPICH-G2
was configured by specifying an 'mpi' flavor of
Globus, the line (jobtype=mpi) must appear.
- Some sites require you to specify a 'project' to their
scheduler for accounting purposes. For each machine
where such a requirement exists, add
(project=xxx) to the subjob.
The easiest way to write your own RSL request
is to modify one generated for you by mpirun.
Specifying -dumprsl on the mpirun command prints
the generated RSL and does not launch the program.
Consider our previous example in which we wanted to run
an application on a cluster of workstations. Recall that
our machines file looked like this:
"m1.utech.edu" 10
"m2.utech.edu" 5
Using mpirun with -dumprsl
% mpirun -dumprsl -np 12 myapp 123 456
produces the following output (but does not launch the
application):
+
( &(resourceManagerContact="m1.utech.edu")
(count=10)
(jobtype=mpi)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0))
(arguments=" 123 456")
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
( &(resourceManagerContact="m2.utech.edu")
(count=2)
(jobtype=mpi)
(label="subjob 1")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 1))
(arguments=" 123 456")
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
Additional environment variables may be added as in
the example below:
+
( &(resourceManagerContact="m1.utech.edu")
(count=10)
(jobtype=mpi)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
(MY_ENV 246))
(arguments=" 123 456")
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
( &(resourceManagerContact="m2.utech.edu")
(count=2)
(jobtype=mpi)
(label="subjob 1")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 1))
(arguments=" 123 456")
(directory=/homes/users/smith)
(executable=/homes/users/smith/myapp)
)
After creating your RSL file you may submit it directly
to mpirun as follows:
% mpirun -globusrsl my.rsl
Note that when supplying your own RSL, it should be the
only argument you specify to mpirun.
By default all stdout and stderr will appear on the screen
from which you typed the mpirun command. This can be changed
by specifying specific filenames with (stdout=myapp.out)
and/or (stderr=myapp.err) in your RSL script.
An example: Your first MPICH-G2 application
Here is an example MPI application, ring.c, and its associated Makefile.
The files are shown here for your review, however if you want to download
these files do not cut/paste them from the text below. This
will not work as cut/paste changes tab characters into multiple spaces
which will not work for make. To download these files, right click on
each file link below and 'Save Link As ...'.
If you download these files, edit the Makefile by changing
MPICH_INSTALL_PATH to your MPICH-G2 installation directory.
After editing the Makefile and following the steps in the preceding section
Once it's installed, how do I use MPICH-G2?, type the
following:
% make ring
% <MPICH_INSTALL_PATH>/bin/mpirun -np 4 ring
You should see the following output:
Master: end of trip 1 of 1: after receiving passed_num=4 (should be =trip*numprocs=4) from source=3
Here are the contents of ring.c and Makefile for your review, but remember,
please do not cut/paste this text. Download by using the links above.
ring.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
/* command line configurables */
int Ntrips; /* -t <ntrips> */
int Verbose; /* -v */
int parse_command_line_args(int argc, char **argv, int my_id)
{
int i;
int error;
/* default values */
Ntrips = 1;
Verbose = 0;
for (i = 1, error = 0; !error && i < argc; i ++)
{
if (!strcmp(argv[i], "-t"))
{
if (i + 1 < argc && (Ntrips = atoi(argv[i+1])) > 0)
i ++;
else
error = 1;
}
else if (!strcmp(argv[i], "-v"))
Verbose = 1;
else
error = 1;
} /* endfor */
if (error && !my_id)
{
/* only Master prints usage message */
fprintf(stderr, "\n\tusage: %s {-t <ntrips>} {-v}\n\n", argv[0]);
fprintf(stderr, "where\n\n");
fprintf(stderr,
"\t-t <ntrips>\t- Number of trips around the ring. "
"Default value 1.\n");
fprintf(stderr,
"\t-v\t\t- Verbose. Master and all slaves log each step. \n");
fprintf(stderr, "\t\t\t Default value is FALSE.\n\n");
} /* endif */
return error;
} /* end parse_command_line_args() */
int main(int argc, char **argv)
{
int numprocs, my_id, passed_num;
int trip;
MPI_Status status;
MPI_Init(&argc;, &argv;);
MPI_Comm_size(MPI_COMM_WORLD, &numprocs;);
MPI_Comm_rank(MPI_COMM_WORLD, &my;_id);
if (parse_command_line_args(argc, argv, my_id))
{
MPI_Finalize();
exit(1);
} /* endif */
if (Verbose)
printf("my_id %d numprocs %d\n", my_id, numprocs);
if (numprocs > 1)
{
if (my_id == 0)
{
/* I am the Master */
passed_num = 0;
for (trip = 1; trip <= Ntrips; trip ++)
{
passed_num ++;
if (Verbose)
printf("Master: starting trip %d of %d: "
"before sending num=%d to dest=%d\n",
trip, Ntrips, passed_num, 1);
MPI_Send(&passed;_num, /* buff */
1, /* count */
MPI_INT, /* type */
1, /* dest */
0, /* tag */
MPI_COMM_WORLD); /* comm */
if (Verbose)
printf("Master: inside trip %d of %d: "
"before receiving from source=%d\n",
trip, Ntrips, numprocs-1);
MPI_Recv(&passed;_num, /* buff */
1, /* count */
MPI_INT, /* type */
numprocs-1, /* source */
0, /* tag */
MPI_COMM_WORLD, /* comm */
&status;); /* status */
printf("Master: end of trip %d of %d: "
"after receiving passed_num=%d "
"(should be =trip*numprocs=%d) from source=%d\n",
trip, Ntrips, passed_num, trip*numprocs, numprocs-1);
} /* endfor */
}
else
{
/* I am a Slave */
for (trip = 1; trip <= Ntrips; trip ++)
{
if (Verbose)
printf("Slave %d: top of trip %d of %d: "
"before receiving from source=%d\n",
my_id, trip, Ntrips, my_id-1);
MPI_Recv(&passed;_num, /* buff */
1, /* count */
MPI_INT, /* type */
my_id-1, /* source */
0, /* tag */
MPI_COMM_WORLD, /* comm */
&status;); /* status */
if (Verbose)
printf("Slave %d: inside trip %d of %d: "
"after receiving passed_num=%d from source=%d\n",
my_id, trip, Ntrips, passed_num, my_id-1);
passed_num ++;
if (Verbose)
printf("Slave %d: inside trip %d of %d: "
"before sending passed_num=%d to dest=%d\n",
my_id, trip, Ntrips, passed_num, (my_id+1)%numprocs);
MPI_Send(&passed;_num, /* buff */
1, /* count */
MPI_INT, /* type */
(my_id+1)%numprocs, /* dest */
0, /* tag */
MPI_COMM_WORLD); /* comm */
if (Verbose)
printf("Slave %d: bottom of trip %d of %d: "
"after send to dest=%d\n",
my_id, trip, Ntrips, (my_id+1)%numprocs);
} /* endfor */
} /* endif */
}
else
printf("numprocs = %d, should be run with numprocs > 1\n", numprocs);
MPI_Finalize();
exit(0);
} /* end main() */
Makefile
#
# assumes MPICH-G2 was installed in /usr/local/mpich
#
MPICH_INSTALL_PATH = /usr/local/mpich
ring: force
$(MPICH_INSTALL_PATH)/bin/mpicc -o ring ring.c
force:
clean:
/bin/rm -rf *.o ring
Firewalls
You can use MPICH-G2 to run applications in which all the processes are on
the same side of a firewall. However, if you want to run your
MPICH-G2 application where processes are on opposite sides of
a firewall, then you will need to make some special accomodations.
The two issues that arise in the presence of firewalls are job control
(e.g., start-up, monitoring, and termination) and TCP messaging during
execution. MPICH-G2 uses Globus for all both of these, and therefore,
using MPICH-G2 through firewalls is really an issue of using Globus
through firewalls. Therefore, we refer our MPICH-G2 users that need
to run their applications through firewalls to the
Globus
web page on firewalls which provides an excellent description of the
problem and offers a number of solutions.
Described briefly here, the basic strategy behind the solution is to
have your system administrators create a small hole of port numbers
in the fire wall (what are called controllable ephemeral ports
on the
Globus
web page on firewalls) and to specify that port range with the environment
variable GLOBUS_TCP_PORT_RANGE in your RSL.
Setting environment variables is described in Using mpirun by supplying
your own RSL script
in Once it's installed, how do I use MPICH-G2? and
setting the GLOBUS_TCP_PORT_RANGE is described in
Setting port range in
new MPICH-G2 features in MPICH v1.2.2.3.
Finally, there is a small
Perl-based connection test found in
Troubleshooting section to help quickly determine
if processes of your MPICH-G2 application are sitting on opposite sides of
a firewall.
Troubleshooting
If you did not encounter any problems in running the ring program from
An example: Your first MPICH-G2 application, you
may skip this section and proceed directly to the next section
How does MPICH-G2 work?. On the other hand,
if you did have some trouble we provide some small test programs
here that strip away all of MPICH-G2 and focus on the various steps in
using MPICH-G2.
These small test programs are intended to be run in sequence in the
order specified immediately below. If a particular test
fails you should stop the testing sequence and contact the group
(e.g., Globus developers or MPICH developers) identified each test's section.
- A Globus-based "hello, world"
- A Perl-based connection test
- Testing vendor-supplied MPI mpirun's ability to export environment
- What to try if you get a failed globusrun pr_tcp assertion when trying mpirun
- A Globus-based "hello, world"
This test is limited only to Globus-related issues of launching a job.
Below is our Globus version of Kernighan and Ritchie's "hello, world" program,
accompanied by instructions to make and run it. In the same spirit as K&R;
presented their program, we offer ours as a very small (minimal?) program
designed to flush out all the details of installing and deploying Globus,
acquiring Globus security credentials, registering yourself as Globus a
user on each machine, etc.
The instructions below are intended to test one machine at a time.
If you are planning to run your MPICH-G2 application on many different
machines, you should start by following the instructions below on
one machine at a time.
Here is a link to hello.c. The contents of hello.c is shown below for
your review To download this file, right click on each file link below and
'Save Link As ...'.
, however if you want to download
this files do not cut/paste it from the text below. This
will not work as cut/paste changes tab characters into multiple spaces
which will not work for make.
To download these files, right click on
each file link below and 'Save Link As ...'.
Here is the contents of hello.c and Makefile for your review.
Download by using the links above.
hello.c
#include <globus_duroc_runtime.h>
int main(int argc, char **argv)
{
#if defined(GLOBUS_CALLBACK_GLOBAL_SPACE)
globus_module_set_args(&argc;, &argv;);
#endif
globus_module_activate(GLOBUS_DUROC_RUNTIME_MODULE);
globus_duroc_runtime_barrier();
globus_module_deactivate(GLOBUS_DUROC_RUNTIME_MODULE);
printf("hello, world\n");
}
The instructions for making and running "hello, world" depend on the
version of Globus that you are testing. Select a link from the list
below based on your version of Globus.
- Making and running "hello, world" under Globus v1.1.4.
Here is a link to Makefile. The contents of Makefile is shown below for
your review, however if you want to download
this files do not cut/paste it from the text below. This
will not work as cut/paste changes tab characters into multiple spaces
which will not work for make.
To download these files, right click on
each file link below and 'Save Link As ...'.
Here is the contents of Makefile for your review.
Download by using the link above.
Makefile
#
# Modify this file by:
# 1. set GLOBUSDIR to your Globus installation
# 2. set FLAVOR to one of the Globus flavor directories
#
GLOBUSDIR = /soft/pub/packages/globus/globus-release
FLAVOR = sparc-sun-solaris2.7_nothreads_standard_debug
###################################################
###################################################
#
# The rest of the file should _not_ change.
#
###################################################
###################################################
include $(GLOBUSDIR)/development/$(FLAVOR)/etc/makefile_header
hello:
$(CC) $(CFLAGS) $(GLOBUS_DUROC_RUNTIME_CFLAGS) -c hello.c
$(LD) -o hello hello.o \
$(LDFLAGS) \
$(GLOBUS_DUROC_RUNTIME_LDFLAGS) \
$(GLOBUS_DUROC_RUNTIME_LIBS) \
$(LIBS)
clean:
$(RM) -rf *.o hello
Edit the Makefile by changing
GLOBUSDIR and FLAVOR to the Globus installation directory
and Globus flavor you specified when configuring MPICH-G2 (see preceding
section How do I acquire and install MPICH-G2?).
For most installations this will require you to set GLOBUSDIR
to your environment variable $GLOBUS_INSTALL_PATH and,
assuming you used the default Globus flavor when configuring MPICH-G2,
set FLAVOR to the last directory in the path returned by
$GLOBUS_INSTALL_PATH/bin/globus-development-path. For example, if
% $GLOBUS_INSTALL_PATH/bin/globus-development-path
/globus/development/sparc-sun-solaris2.7_nothreads_standard_debug
%
then set FLAVOR to sparc-sun-solaris2.7_nothreads_standard_debug.
On the other hand, if you named a particular flavor during configuration,
either explicitly by using the -dir= option or implicitly by
using the -flavor= option, then you must change FLAVOR
to that directory (e.g., mips-sgi-irix6.5-n32_nothreads_mpi_debug).
- Download hello.c and Makefile using the links above.
- Edit the Makefile changing GLOBUSDIR and FLAVOR
as described above.
- Compile the hello.c. NOTE: You are not using MPICH-G2's
mpicc.
% make hello
- Write your own RSL file called hello.rsl.
- If you are not using an MPI flavor of Globus
then your RSL file should look like this:
hello.rsl
+
( &(resourceManagerContact="m1.utech.edu")
(count=2)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0))
(directory=/homes/users/smith)
(executable=/homes/users/smith/hello)
)
- If you are using an MPI flavor of Globus
then you must add (jobtype=mpi) to your RSL file
so that it looks like this:
hello.rsl
+
( &(resourceManagerContact="m1.utech.edu")
(count=2)
(jobtype=mpi)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0))
(directory=/homes/users/smith)
(executable=/homes/users/smith/hello)
)
In either case, change resourceManagerContact to
your machine and change directories in directory and
executable to point to your directory.
- Run the program using your hello.rsl and globusrun.
NOTE: your are not using MPICH-G2's mpirun. You
should see the following output.
% $GLOBUS_INSTALL_PATH/tools/*/bin/globusrun \
-w -f hello.rsl
hello, world
hello, world
%
If hello.c compiled without any errors but did not run
correctly, then the problem is
not with MPICH-G2 nor its installation. It is most likely a
Globus-related problem. Start by contacting your local Globus administrator
and, if necessary, continue by checking the
Globus Toolkit Error FAQ.
If you still don't know what the problem is, contact the
Globus
Developers by submitting a Globus problem report form found there
specifying Installation as the "product" that you
are having trouble with. Do not specify MPICH-G2.
This will only serve to slow down our repsonse time by having us route
the problem away from MPICH-G2.
Back to Troubleshooting
- Making and running "hello, world" under Globus v2.0 or later.
Here is a link to Makefile. The contents of Makefile is shown below for
your review, however if you want to download
this files do not cut/paste it from the text below. This
will not work as cut/paste changes tab characters into multiple spaces
which will not work for make.
To download these files, right click on
each file link below and 'Save Link As ...'.
Here is the contents of Makefile for your review.
Download by using the link above.
Makefile
#
# It is assumed that you have created file called "makefile_header"
# using following command substituting "" for a particular
# flavor of your Globus v2.0 or later installation:
#
# $GLOBUS_LOCATION/sbin/globus-makefile-header -flavor= \
# globus_common globus_gram_client globus_io globus_data_conversion \
# globus_duroc_runtime globus_duroc_bootstrap > makefile_header
#
#
RM = /bin/rm
###################################################
###################################################
#
# The rest of the file should _not_ change.
#
###################################################
###################################################
include makefile_header
hello:
$(GLOBUS_CC) $(GLOBUS_CFLAGS) $(GLOBUS_INCLUDES) -c hello.c
$(GLOBUS_LD) -o hello hello.o \
$(GLOBUS_LDFLAGS) \
$(GLOBUS_PKG_LIBS) \
$(GLOBUS_LIBS)
clean:
$(RM) -rf *.o hello
Before using the Makefile you must create a file called makefile_header
using the Globus tool globus-makefile-header specifying one
of the Globus flavors at your installation. You should select the
same Globus flavor you intend to use when configuring MPICH-G2.
Here is an example of how you must use globus-makefile-header
to create makefile_header specifying a gcc32dbg as the flavor:
% $GLOBUS_LOCATION/sbin/globus-makefile-header -flavor=gcc32dbg \
globus_common globus_gram_client globus_io globus_data_conversion \
globus_duroc_runtime globus_duroc_bootstrap > makefile_header
- Download hello.c and Makefile using the links above.
- Use globus-makefile-header to create the file
makefile_header as described above.
- Compile the hello.c. NOTE: You are not using MPICH-G2's
mpicc.
% make hello
- Write your own RSL file called hello.rsl.
- If you are not using an MPI flavor of Globus
then your RSL file should look like this:
hello.rsl
+
( &(resourceManagerContact="m1.utech.edu")
(count=2)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
(LD_LIBRARY_PATH /usr/local/globus/lib/))
(directory=/homes/users/smith)
(executable=/homes/users/smith/hello)
)
- If you are using an MPI flavor of Globus
then you must add (jobtype=mpi) to your RSL file
so that it looks like this:
hello.rsl
+
( &(resourceManagerContact="m1.utech.edu")
(count=2)
(jobtype=mpi)
(label="subjob 0")
(environment=(GLOBUS_DUROC_SUBJOB_INDEX 0)
(LD_LIBRARY_PATH /usr/local/globus/lib/))
(directory=/homes/users/smith)
(executable=/homes/users/smith/hello)
)
In either case, change resourceManagerContact to
your machine, change /usr/local/globus of
the LD_LIBRARY_PATH environment variable to the
GLOBUS_LOCATION on that machine (note the value
for LD_LIBRARY_PATH still ends with /lib/),
and change directories in directory and
executable to point to your directory.
- Setup your Globus environment using
% source $GLOBUS_LOCATION/etc/globus-user-env.csh
- Run the program using your hello.rsl and globusrun.
NOTE: your are not using MPICH-G2's mpirun. You
should see the following output.
% $GLOBUS_LOCATION/bin/globusrun -w -f hello.rsl
hello, world
hello, world
%
If hello.c compiled without any errors but did not run
correctly, then the problem is
not with MPICH-G2 nor its installation. It is most likely a
Globus-related problem. Start by contacting your local Globus administrator
and, if necessary, continue by checking the
Globus Toolkit Error FAQ.
If you still don't know what the problem is, contact the
Globus
Developers by submitting a Globus problem report form found there
specifying Installation as the "product" that you
are having trouble with. Do not specify MPICH-G2.
This will only serve to slow down our repsonse time by having us route
the problem away from MPICH-G2.
Back to Troubleshooting
- A Perl-based connection test
This test is limited only to the ability for one CPU to socket-connect to
another. It is good for detecting problems often introduced by firewalls.
It is a perl program (requiring perl5) donated to this page by
Brian Toonen
of the Mathematics and
Computer Science Division (MCS) at
Argonne National Laboratory.
Here is a link to a small perl program perl_connect which is
shown here for your review, however if you want to download
this file do not cut/paste them from the text below. This
will not work as cut/paste changes tab characters into multiple spaces
which will not work for make. To download these files, right click on
each file link below and 'Save Link As ...'.
Here is the contents of perl_connect for your review, but remember,
please do not cut/paste this text. Download by using the link above.
perl_connect
#!/usr/bin/perl -w
# Perl script to test TCP connection establishment and communication.
# This code is based on the examples in 'man perlipc' with vastly
# improved error checking and a few bug fixes.
use strict;
use Getopt::Long;
use IO::Socket;
use Sys::Hostname;
my $N_MSGS = 1024;
my $rc = 0;
sub usage
{
print "usage $0 <-server | -client host:port>\n";
exit 1;
}
my $server=0;
my $client=0;
GetOptions('s|server' => \$server,
'c|client' => \$client);
&usage; if ($client && $server || !$client && !$server);
&usage; if ($server && $#ARGV > -1);
&usage; if ($client && $#ARGV != 0);
my $EOL = "\015\012";
sub logmsg
{
print "$0 $$: @_ at ", scalar localtime, "\n";
}
sub s_catch_int
{
close Server;
logmsg "caught Ctrl-C...terminating server";
exit 0;
}
sub errnoprn
{
printf "errno=%d, %s\n", $!, $! if ($! != 0);
}
sub dieprn
{
print "@_\n";
&errnoprn;
exit 1;
}
if ($server)
{
my ($tcp_proto, $s_sockaddr, $s_addr, $s_host, $s_port,
$c_sockaddr, $c_addr, $c_host, $c_port);
print "$0: establishing server...";
($tcp_proto = getprotobyname "tcp")
|| &dieprn;("failed protocol name lookup");
(socket Server, PF_INET, SOCK_STREAM, $tcp_proto)
|| &dieprn;("failed to obtain a socket");
$SIG{INT} = \&s;_catch_int;
(bind Server, (sockaddr_in 0, INADDR_ANY))
|| &dieprn;("failed to bind socket to port");
($s_sockaddr = getsockname Server)
|| &dieprn;("unable to obtain socket address");
(($s_port, $s_addr) = sockaddr_in $s_sockaddr)
|| &dieprn;("unable to obtain port number");
($s_host = gethostbyaddr $s_addr, AF_INET)
|| ($s_host = hostname)
|| &dieprn;("unable to get hostname");
(listen Server, SOMAXCONN)
|| &dieprn;("error establishing listener on socket");
print "established on $s_host:$s_port\n";
logmsg "server started on port $s_port";
while (1)
{
if ($c_sockaddr = accept Client, Server)
{
($c_port,$c_addr) = sockaddr_in $c_sockaddr;
($c_host = gethostbyaddr $c_addr, AF_INET)
|| ($c_host = inet_ntoa $c_addr) ;
logmsg "connection established from $c_host:$c_port";
for (my $i = 0; $i < $N_MSGS; $i++)
{
if (!(print Client "Hello there, $c_host, it's now ",
scalar localtime, $EOL))
{
my $msg;
if ($! != 0)
{
$msg = sprintf "ERROR sending message to " .
"$c_host:$c_port (errno=%d, %s)", $!, $!;
}
else
{
$msg = "ERROR sending message to $c_host:$c_port";
}
logmsg $msg;
last;
}
}
logmsg "messages successfully sent to $c_host:$c_port";
if (close Client)
{
logmsg "connection to $c_host:$c_port successfully closed";
}
else
{
my $msg;
if ($! != 0)
{
$msg = sprintf "ERROR closing connection to " .
"$c_host:$c_port (errno=%d, %s)",
$!, $!;
}
else
{
$msg = "ERROR closing connection to $c_host:$c_port";
}
logmsg $msg;
}
}
}
}
else
{
my ($tcp_proto, $sockaddr, $addr, $host, $port);
&usage; if (!($ARGV[0] =~ /^([^:]+):(\d+)$/));
$host = $1; $port = $2;
print "$0: attempting to connect to $host:$port...";
($tcp_proto = getprotobyname "tcp")
|| &dieprn;("failed protocol name lookup");
($addr = inet_aton($host))
|| &dieprn;("name lookup failed");
($sockaddr = sockaddr_in($port, $addr))
|| &dieprn;("sockaddr failed");
(socket Sock, PF_INET, SOCK_STREAM, $tcp_proto)
|| &dieprn;("failed to obtain a socket");
(connect Sock, $sockaddr)
|| &dieprn;("connection failure");
print "connection established\n";
my $n = 0;
$! = 0;
while(<Sock>)
{
$n++;
if ($! != 0)
{
print "Error reading messages from the connection\n";
&errnoprn;
exit 1;
}
}
if ($n < $N_MSGS)
{
print "ERROR: fewer messages received ($n) than expected ($N_MSGS)\n";
$rc = 1;
}
else
{
print "All messages received.\n";
}
if (close Sock)
{
print "Connection with $host:$port successfully closed.\n";
}
else
{
print "ERROR closing the connection.\n";
&errnoprn;
exit 1;
}
}
exit $rc;
Here is how to run this test.
The instructions below are intended to test two machines at a time.
- Launch the server on the first machine,
% perl perl_connect -server
perl_connect: establishing server...established on pitcairn.mcs.anl.gov:62574
perl_connect 27936: server started on port 62574 at Fri Jan 25 11:24:28 2002
You will see something other than
pitcairn.mcs.anl.gov:62574 at the
end of the first output line. You should see the host:port
of your first machine.
- While the server is still running on the first machine, launch
the client on the second machine,
% perl perl_connect -client pitcairn.mcs.anl.gov:62574
perl_connect: attempting to connect to pitcairn.mcs.anl.gov:62574...connection established
All messages received.
Connection with pitcairn.mcs.anl.gov:62574 successfully closed.
%
Of course, you should replace pitcairn.mcs.anl.gov:62574
with the host:port that appeared at the end of the first
line of output when you started the server in step 1 above.
If the client produces both the All messages received.
and the Connection with ... successfully closed.
then the test ran successfully.
- Kill the server by typing control-c.
- Repeat steps 1-3 this time running the server on the machine
you had just run the client on and vice versa.
If the perl_connect test above did not run correctly
then the problem is not rooted in Globus, MPICH, or MPICH-G2.
It may be a problem with firewall(s). If you know that one or both
of the machines are sitting behind a firewall, or you suspect that
there may be a problem with firewalls, then try reading
Notes on getting MPICH running on RH 7.2
written by
Rob Ross
of the Mathematics and
Computer Science Division (MCS) at
Argonne National Laboratory.
If after reading that documentation you still believe that you are
having firewall problems, you should pursuit the problem.
Start by contacting your local Globus administrator
and, if necessary, continue by checking the
Globus Toolkit Error FAQ.
If you still don't know what the problem is, contact the
Globus
Developers by submitting a Globus problem report form found there
specifying GlobusIO as the "product" that you
are having trouble with. Do not specify MPICH-G2.
This will only serve to slow down our repsonse time by having us route
the problem away from MPICH-G2.
Back to Troubleshooting
-
Testing vendor-supplied MPI mpirun's ability to export environment
This is a test to determine if environment variables are passed
to an application when it is launched using mpirun.
It should be used run to test the vendor-supplied MPI
that was used to build the MPI flavor of Globus (i.e., not to
test MPICH-G2). For MPICH-G2 to successfully use a
vendor-supplied MPI that vendor-supplied MPI's mpirun
must pass environment variables to the MPI application.
Here is a link to a small program tenv.c which is shown here
for your review. You may cut/paste this program or
right click on the file link below and 'Save Link As ...'.
tenv.c
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
main(int argc, char **argv)
{
char *value;
MPI_Init(&argc;, &argv;);
if (value = getenv("FOO"))
{
printf("the value for env var FOO=%s\n", value);
}
else
{
printf("env var FOO is not defined\n");
}
MPI_Finalize();
} /* end main() */
Here is how to run this test.
- Compile and link the tenv.c using the vendor-supplied MPI
C compiler (i.e, not MPICH-G2's mpicc). This must
be the same MPI that was used to create the MPI flavor of Globus
that was, in turn, used to configure MPICH-G2.
% mpicc -o tenv tenv.c
- Unset the environment variable FOO.
% unsetenv FOO
- Using the vendor-supplied mpirun launch the
program.
% mpirun tenv
env var FOO is not defined
%
- Set the environment variable FOO.
% setenv FOO bar
- Run the program again using the vendor-supplied
mpirun.
% mpirun tenv
the value for env var FOO=bar
%
If the tenv test did not run as shown above then MPICH-G2 cannot be
configured using that MPI flavor of Globus. You will need to contact
the authors of the underlying vendor-MPI or possibly have your Globus
system administrator modify the Globus Job Manager at your site to
"push" the environment variables into your application.
Back to Troubleshooting
- What to try if you get a failed globusrun pr_tcp assertion when trying mpirun
This is a test to determine if environment variables are passed
If after you type your mpirun command you see an error message
that is similar to this:
globusrun: pr_tcp.c:1548: outgoing_open: Assertion `rc == 0' failed.
It is most likely caused by the fact that some or all of your
compute nodes do not return fully qualified domain names (FQDN) in
response to a call to gethostname(). This can
easily be tested with the following program phost.c and its associated
Makefile.
The files are shown here for your review, however if you want to
download these files do not cut/paste them from the text
below. This will not work as cut/paste changes tab characters into
multiple spaces which will not work for make. To download these
files, right click on each file link below and 'Save Link As ...'.
phost.c
#include <globus_common.h>
int main(int argc, char **argv)
{
char hostname[1024];
if (globus_libc_gethostname(hostname, 1024))
{
globus_libc_fprintf(stderr,
"ERROR: failed globus_libc_gethostname()");
exit(1);
} /* endif */
globus_libc_fprintf(stdout, "hostname >%s<\n", hostname);
} /* end main() */
Makefile
include makefile_header
phost:
$(GLOBUS_CC) $(GLOBUS_CFLAGS) $(GLOBUS_INCLUDES) -c phost.c
$(GLOBUS_LD) -o phost phost.o \
$(GLOBUS_LDFLAGS) \
$(GLOBUS_PKG_LIBS) \
$(GLOBUS_LIBS)
Before using the Makefile you must create a file called
makefile_header using the Globus tool
globus-makefile-header specifying one of the Globus flavors
at your installation. You should select the same Globus flavor
you intend to use when configuring MPICH-G2. Here is an example
of how you must use globus-makefile-header to create
makefile_header specifying a gcc32dbg as the flavor:
% $GLOBUS_LOCATION/sbin/globus-makefile-header -flavor=gcc32dbg \
globus_common globus_gram_client globus_io globus_data_conversion \
globus_duroc_runtime globus_duroc_bootstrap > makefile_header
- Download phost.c and Makefile using the links above.
- Use globus-makefile-header to create the file
makefile_header as described above.
- Compile the phost.c. NOTE: You are not using MPICH-G2's
mpicc.
% make phost
- Run your phost.
% grid-proxy-init
% globusrun -o -r "m1.utech.edu" \
'&(count=1)(executable=/home/smith/phost)'
If the hostname that gets printed is not a fully qualified
domain name then that is the problem (globusrun requires
the FQDNs of all compute nodes). There are two possible solutions.
You can either re-configure your compute nodes so that they
do return FQDNs in response to gethostname() or you can
specify the domain name in the environment variable
GLOBUS_DOMAIN_NAME in your RSL like this:
% globusrun -o -r "m1.utech.edu" \
'&(count=1)(environment=(GLOBUS_DOMAIN_NAME "utech.edu"))(executable=/home/smith/phost)'
This should return a FQDN and so if you do not re-configure your
compute nodes you need only specify a value for the environment
variable GLOBUS_DOMAIN_NAME in all your RSLs that run on
that machine.
Back to Troubleshooting
How does MPICH-G2 work?
Here we provide an overview of how MPICH-G2 works: how it interfaces with
the vendor's MPI, how it uses Globus services, and so on. We included this
section for the curious; it may be skipped by the casual reader. Our
intention in providing this overview is to enhance the reader's understanding
of MPICH-G2 and, hence, its strengths and weaknesses.
- Compiling MPICH-G2 and your application, and linking
with the vendor's MPI
Configuring MPICH-G2 with an "mpi" flavor of Globus
(see How do I acquire and install MPICH-G2?)
implicitly declares that all programs linked
with MPICH-G2 will include the vendor's implementation of MPI,
(i.e., the vendor's MPI library). This, of course, presents
an immediate linking problem in that MPICH-G2 is itself an
MPI library.
Our solution preprocesses MPICH-G2
source code (with the exception of one file) and all C/C++
application source code renaming all C-binding MPI symbols
from {P}MPI_xxx to {P}MPQ_xxx.
The one MPICH-G2 file spared prepreprocessing is the "wrapper
file," which houses all of MPICH-G2's calls to the vendor's MPI.
The C-binding MPI symbols are renamed in C/C++ source files
using the CPP with a sequence of #define statements.
Fortran{77,90} source files are not preprocessed, and
C++ references to MPI symbols are also left untouched.
Fortran{77,90} and C++ symbols are resolved by making sure that
the MPICH libraries appear before vendors' MPI libraries.
This preprocessing is presumably a "safe" practice in that
according to Sections 2.5 "Language Binding" and
2.5.2 "C Binding Issues" of the
MPI v1.1
standard:
"Programs must not declare variables or functions
with name beginning with the prefix, MPI_."
- Launching your application with MPICH-G2's mpirun
You can use MPICH-G2's mpirun in two ways to launch your
application (see Once it's installed, how do I use
MPICH-G2?): you may write your own Globus RSL script and submit
that directly to mpirun, which in turn passes your RSL script
directly to
globusrun (a Globus utility), or you may use
mpirun with its arguments as they are explained in the MPICH User's
Manual, in which case mpirun writes its own Globus RSL script and
submits it to globusrun. Either way, MPICH-G2 jobs are launched
by passing a Globus RSL script to globusrun.
- During execution
MPI_Init has a Globus-enforced (DUROC) barrier that waits for
all processes, across all machines, to be loaded
and start execution before proceeding. Thereafter MPICH's design
distills all communication (including collective operations) into its
constituent point-to-point components before passing them on to the
lower-level device.
The globus2 device is therefore
presented with only point-to-point communication requests.
The choice of protocol (TCP or vendor-supplied MPI) is based on
the source/destination, that is, vendor-supplied MPI for intramachine
messaging (assuming MPICH-G2 was configured with an "mpi" flavor
of Globus) and TCP for all other messaging. In situations where
MPI_ANY_SOURCE is specified on a receive, both TCP and the vendor's
MPI are polled for incoming messages until the receive is satisfied.
The following additional points about MPICH-G2's implementation of
point-to-point communication are noteworthy:
- MPICH-G2 does not use hidden "forwarder" nodes for
TCP communication. All TCP communication is implemented as
point-to-point communication.
- Data is automatically converted between incompatible
architectures using a "reader-makes-right" model; that is,
any necessary data conversion is done by the receiver.
MPICH-G2 does not convert TCP messages to a "network standard"
before sending the message.
- When an MPI_Recv names a specific source (i.e., not
MPI_ANY_SOURCE) that dictates intramachine messaging
over vendor-supplied MPI and if there are no outstanding
communication requests (e.g., an MPI_I{send,recv} that
has not completed), then the MPI_Recv translates
directly to vendor's MPI_Recv; otherwise it translates
essentially to a (less efficient) test loop in which
we alternatively poll TCP and vendor-supplied MPI (MPI_Iprobe)
as necessary until the message arrives. Once the message
is known to arrive, we complete the MPI_Recv by ultimately
calling the vendor-supplied MPI_Recv.
- We addressed two additional items in
interfacing with the vendor's MPI: maintaining
application-level communicators and derived data types in
tandem with the vendor's MPI. Each time an MPI application
creates/destroys a communicator or derived datatype, the
globus2 device performs the same operation in the
vendor's MPI.
Things that don't work or are missing in MPICH-G2
Problem: |
MPICH-G2 does not work with GT 3.2 or GT 3.2.1
GT 3.2 has moved its implementation of GlobusIO
(something MPICH-G2 relies very heavily on) atop of the new Globus XIO.
This triggered a new bug which caused MPICH-G2 to "hang" when
configured with either GT 3.2 or GT 3.2.1.
|
Solution: |
The problem has been fixed and the revised code will be
distributed with GT 3.2.2 and later. In the meantime, if you
have GT 3.2 you need to upgrade to GT 3.2.1. With a GT 3.2.1
install you can apply two update packages to GT 3.2.1 found on
www-unix.globus.org/toolkit/advisories.html.
Follow the instructions on that page to acquire and apply the two
packages identified as "globus_io-5.5" and "globus_nexus-6.5"
(both with date 2004-08-12 and both say "for mpich-g2" in
their descriptions). |
|
Problem: |
MPI_PACKED
According to Sections 3.3.1 "Type Matching Rules" and 3.13 "Pack and
Unpack" of the MPI
v1.1 standard, type MPI_PACKED matches any other type. That is, a
message sent with any type (including MPI_PACKED) can be received using
the type MPI_PACKED, and a message sent as MPI_PACKED can be received
as the message's constituent type.
We assume that a vendor's implementation of MPI_Pack is to
essentially perform a memcpy. Under that assumption, we can
meet the standard as it is stated above for both inter- and
intramachine messaging.
Under the standard it is possible to send data as MPI_PACKED,
receive it as MPI_PACKED, and then forward the packed data
to a third process by sending it as MPI_PACKED. This will
not always work in MPICH-G2. Forwarding MPI_PACKED
data in this manner will work as long as the protocol
is homogeneous throughout the forwarding chain (e.g., all
TCP or all vendor-MPI). Forwarding MPI_PACKED data will
definitely fail in a heterogeneous protocol forwarding chain;
for example, it will fail if process 0 sends MPI_PACKED data
to process 1 over vendor-MPI and then process 1 sends
the same buffer also as MPI_PACKED data to process
2 over TCP.
|
Solution: | None. |
|
Problem: |
MPI_{Cancel,Wait}
MPICH-G2, like many other MPI libraries, uses an "eager" protocol
(data is transferred to the receiver before a matching receive is
posted) for TCP messaging. Under an eager protocol, cancelling a
send (MPI_Cancel) requires communication with the intended receiver
in order to free allocated buffers. The following is a quote from
MPI v1.1
standard, section "3.8. Probe and Cancel", about MPI_{Cancel,Wait}:
"If a communication is marked for cancellation, then a MPI_WAIT
call for that communication is guaranteed to return, irrespective
of the activities of other processes (i.e., MPI_WAIT behaves as a
local function). ..."
Under an eager protocol, satisfying the statement above, in particular,
MPI_Wait returning "irrespective of the activities of other
processes" on most systems requires interrupting the intended
receiver, (i.e., an asynchronous listener on all machines).
MPICH-G2's implementations of MPI_{Cancel,Wait} (as well as most other
MPICH devices and many other MPI implementations) is not
compliant with this when waiting for the cancellation of TCP-sent
messages. In MPICH-G2 cancelling a send (MPI_Cancel) marks
the request for cancellation and returns immediately, but MPI_Wait
on a cancelled TCP send might wait for the intended receiver,
who may be in a deep computational loop, to make its next MPI call.
|
Solution: |
None (or maybe relax the standard?). In the future
(see Future Work) we plan to make
the globus2 thread safe, which will allow users
to configure MPICH-G2 with a "threaded" flavor of Globus.
MPICH-G2 will then comply with the standard
in that the MPI_Wait that follows an MPI_Cancel will return
immediately, "irrespective of the activities of other
processes." |
|
Problem: |
MPI_LONG_DOUBLE
Section 3.2.2 "Message Data" of the
MPI v1.1
standard lists
C type MPI_LONG_DOUBLE as a required datatype.
MPICH-G2 does not support MPI_LONG_DOUBLE for
TCP messages. Although "long double" is part of the ANSI C
standard it has not been added to the
Globus
data conversion library. Intramachine messages over
vendor-supplied MPI are passed directly to the vendor's MPI,
so if they support MPI_LONG_DOUBLE, then so too does MPICH-G2.
|
Solution: |
When "long double" support is added to the
Globus
data conversion library, then MPICH-G2 will support MPI_LONG_DOUBLE
for TCP messages.
|
|
Problem: |
stdout/stderr on MPI_Abort
When calling MPI_Abort, stdout/stderr are not always flushed unless
the user explicitly flushes (fflush) both prior to calling MPI_abort,
and even then, the data is sent to stdout/stderr of the other processes
|
Solution: |
This is a bug in one of the Globus services
(GASS)
used by MPICH-G2. We are aware of the problem, and a future Globus
patch should fix it. In the meantime, writing your own RSL (see
Once it's installed, how do I use MPICH-G2?)
and specifying "(stdout=..)" and "(stderr=...)" in each subjob
tends to alleviate (not eliminate) the problem by getting more
of the data out.
|
|
Problem: |
exit code on MPI_Abort
The exit code passed to MPI_Abort does not get propagated back
to mpirun.
|
Solution: |
This is a limitation of the Globus job startup mechanisms.
We are working on those portions of Globus that would enable the
exit code to be propagated back to mpirun.
|
|
Problem: |
MPICH test suite fails
Some of the tests in the test suite distributed with MPICH fail.
|
Solution: |
If you have configured MPICH-G2 with an "mpi" flavor of Globus
and have written a <MPICH_INSTALL_PATH>/bin/machines file that
induces vendor-supplied MPI communication for intramachine
messaging (see How do I acquire and install
MPICH-G2), then there is a good chance that the test is failing
because the underlying vendor-supplied implementation of MPI is
incorrect. To test this, make and run the failing test using
the vendor's implementation of MPI (i.e., not MPICH-G2).
If the test fails using the vendor's MPI, then that is likely the
reason why it is failing under MPICH-G2.
|
|
Problem: |
silent loss of information
MPICH-G2 automatically converts data in messages passed between
machines with different data representations (i.e., big- vs.
little-endian). This data conversion can sometimes results in
a loss of information. For example, an
"unsigned long" may be 64 bits on one machine and only 32 bits
on another. A message sent from the 64-bit to the 32-bit machine
containing an "unsigned long" whose value is >=
232 will lose information as a result of
data conversion. Further, this loss of information will
occur silently (i.e., no error/warning messages).
|
Solution: |
Of course, there is nothing that MPICH-G2 can do about this
loss of information. However, in the future (see
Future Work)
we plan to provide optional mechanisms by which users will be
notified (e.g., warning messages to stderr) when
information is lost.
|
|
Problem: |
mpirun under Linux
When using MPICH-G2's mpirun on a Linux platform such that
mpirun is constructing an RSL script for you (see
Once it's installed, how do I use MPICH-G2?)
you may find that MPICH-G2's mpirun does not work and reports
many syntax error messages of the form "integer expression expected
before -eq". This is due to a bug in the the Linux shell.
As described above, each line of the machines file must name
a Globus jobmanager service and then optionally end with an
integer value (default value of 1). When you omit the optional
integer in your machines file, Linux shells cannot correctly
parse the machines file and you get the errors above.
|
Solution: |
Locate the machines file your mpirun command is using. In most
cases this will be <MPICH_INSTALL_PATH>/bin/machines,
but it could be a file named "machines" in the directory in which
you typed mpirun or a file that you explicitly named with
-machinefile on your mpirun command line. In any case,
locate that file and edit it by explicitly ending
each line of the file with an integer. If there was no integer
at the end of the line, then placing a 1 there is semantically
equivalent. Always be sure to surround each Globus job manager
service (typically a machine name) in "double quotes", like this:
"m1.utech.edu" 10
"m2.utech.edu" 5
|
|
How does MPICH-G2 differ from MPICH-G?
MPICH-G2, like MPICH-G, still uses many Globus services (e.g., job
startup, security, data conversion, etc.). The major difference between
MPICH-G and MPICH-G2 is that we have removed Nexus (a Globus component
which was used for all communication in MPICH-G) from MPICH-G2.
While Nexus provided the communication infrastructure for MPICH-G
for many years and had many attractive features (e.g., multiprotocol support
with highly tuned TCP support and automatic data conversion), there were
other attributes of Nexus that could be improved.
MPICH-G2 now handles all communication directly by reimplementing the
good things about Nexus and improving the others. For a quantitative
comparison of MPICH-G2 and MPICH-G see
MPICH-G2 Performance Evaluation.
Here is a summary of what the changes bring.
- Increased bandwidth
In MPICH-G all message passing was done through Nexus, which required
the data to be copied from the application's source buffer into a
Nexus buffer before sending, and on the receiving side, to copy the
data from the received Nexus buffer into the application's
destination buffer. For intramachine messages where a vendor-supplied
implementation of MPI exists, these two extra copies have been
eliminated in MPICH-G2. In this situation, sends/receives now
flow directly from/to application buffers.
Also, for TCP (intermachine) messaging involving basic MPI datatypes
(e.g., MPI_INT, MPI_FLOAT as defined in Section 3.2.2 "Message
Data" of the
MPI v1.1
standard), the extra copy has been eliminated on the sending side.
- Reduced latency for intramachine (vendor-supplied MPI)
messaging
Nexus was capable of multiprotocol support. That is, for example,
MPICH-G translated MPI_Send's to Nexus-sends
which, in turn, picked the fastest protocol available based on
the messages destination using vendor-supplied MPI for intramachine
messages and TCP for intermachine messages. On the receiving side,
Nexus continuously (and unconditionally) polled all its supported
protocols for incoming messages in a round-robin fashion. In a typical
MPICH-G run, two protocols were present: TCP and
vendor-supplied MPI. Accordingly, Nexus polled TCP, then MPI,
then TCP, then MPI, and so on, independent of the activities of the MPI
application (i.e., polling occurred even if there
were no outstanding MPI_Recvs). Polling TCP requires a (relatively)
long timeout. In many cases, 'internal' nodes of a computation
(i.e., nodes that communicated only with other nodes on the same
machine and thus used only MPI) payed an unnecessary
latency cost as they occasionally had to wait for the TCP poll to
time out before they serviced the incoming MPI message.
This unnecessary TCP polling has been eliminated in MPICH-G2.
In MPICH-G2 TCP is polled only when the MPI application is expecting
data from a source that dictates (or might dictate
as is the case when an MPI_Recv specifies source=MPI_ANY_SOURCE)
TCP messaging. In other words, TCP polling now occurs only when
absolutely necessary rather than all the time.
- More efficient use of sockets
Nexus accommodated point-to-point TCP messaging by opening
two pairs of sockets between the two ends and
utilizing each of pair of sockets as a simplex channel (i.e., data
flowed in only on direction over each socket pair).
MPICH-G2 now does its own socket management, opening only a single
pair of sockets between two points and using the sockets in a
bi-directional manner. This not only reduces the amount of requested
system resources (fewer socket connections), but by using sockets in the
bidirectional manner in which they were intended, it also improves TCP
efficiency in that many systems typically piggy-back TCP ACKs on
messages flowing in the opposite direction.
- Increased latency for intermachine (TCP)
messaging
Nexus has an optimized implementation of TCP message passing.
Its implementation of TCP messaging is based on an elaborate state
machine that introduces minimal data overhead. By abandoning Nexus
in MPICH-G2 we also (unfortunately) lost that state machine and
we were forced to implement our own state machine for
TCP messaging and create our own message headers. For the first
release of MPICH-G2 we decided to make that state machine as simple
as possible (i.e., no optimization) and to focus our attention on the
improvements described above.
We made that decision for two main reasons: (1) we had the
Nexus state machine in hand and believed that we could later
revisit our state machine and make it equally good and (2) TCP
performance, particularly across a WAN, is so (relatively) poor that
inefficiencies introduced by our naive state machine might be
small in comparison with the significant improvements we gained
in intramachine messaging.
The net result (see MPICH-G2 vs MPICH-G graphs in
MPICH-G2 Performance Evaluation)
of all our TCP changes -- bidirectional sockets,
eliminated extra copy on sending side for basic MPI datatypes,
naive state machine, and increased message overhead -- appears
to be increased latency and bandwidth. This results
in slower communication for small messages and faster communication
for large messages when comparing MPICH-G2 to MPICH-G over TCP.
- Support for MPI_LONG_LONG and MPI-2 file
operations
Through the addition of "long long" support to the
Globus
Data Conversion library MPICH-G2 now supports MPI_LONG_LONG whose
presence enables MPICH-G2 to also support the MPI-2 I/O operations
(implemented in MPICH using ROMIO) distributed with MPICH.
- Added C++ support
MPICH-G2 now supports applications written in C++ (mpiCC).
MPICH-G2 Performance Evaluation
We evaluated MPICH-G2 using the performance tool mpptest (distributed
with MPICH in examples/perftest) on
an SGI Origin 2000 (denali)
the IBM SP (quad)
and at Argonne National Laboratory's
Center for
Computational Science and Technology (CCST), and for LAN TCP/IP
evaluation, a pair of SUN workstations in Argonne's
Mathematics and Computer Science Division. In all experiments
MPICH-G2 and MPICH-G were configured with non-threaded and no-debug
flavors of Globus v1.1.4, and unless otherwise noted, MPICH-G2 was
configured using mpi flavors of Globus.
On the SGI and IBM machines we conducted three separate sets of experiments
each exercising different MPICH-G2 receive behavior (see
How does MPICH-G2 work?),
- Specified
Each MPI_Recv explicitly specifies its source rank. In the absence of
any unsatisfied asynchronous requests (e.g., MPI_Irecv), this allows
MPICH-G2 to map the application's MPI_Recv directly to
the vendor's blocking MPI_Recv. This is the most favorable
circumstances under which an application's MPI_Recv can be performed.
- Specified-pending
Like specified above each MPI_Recv explicitly specifies its source
rank. However, at the very beginning of the program each process
posts an MPI_Irecv, also specifying a source rank, for a message
that is never sent, resulting in an ever-present unsatisfied
asynchronous request. This forces MPICH-G2 to continuously
poll (MPI_Probe) the vendor's MPI for incoming messages.
This scenario results in less efficient MPICH-G2 performance in
that the induced polling loop increases latency.
- Non-specified (MPI_ANY_SOURCE)
Here each MPI_Recv specifies MPI_ANY_SOURCE. This forces MPICH-G2
to continuously poll TCP/IP and the vendor's MPI. This is
the least efficient MPICH-G2 scenario in that the relatively large
cost of TCP/IP polling results in even greater latency.
For each set of experiments we present the results in a sequence of 7 graphs
each time increasing the range of the independent variable Message Size
from 1KB, 2KB, 16KB, 32KB, 64KB, 512KB, and finally 1MB. We chose to present
the results in this way to provide a better, more detailed, view at
various message sizes. Here are convenience links to our graphs:
- SGI Origin 2000
Denali is an SGI Origin 2000 runing IRIX 6.5.9 and SGI's
implementation of MPI MPT 1.4. It is equipped with
90 250 MHz CPUs each with 512MB of RAM.
- IBM SP
Quad is an IBM SP running AIX 4.2.1+ and IBM's implementation
of MPI PSSP 2.3. It is equipped with 120MHz P2SC nodes and
the TB3 high-performance switch.
- LAN TCP/IP
Finally, we present a set of experiments using two SUN workstations,
neither equipped with vendor-supplied implementations of MPI, connected
over a LAN. The first workstation (pitcairn) was equipped with 8 Sun
UltraSPARC-II 248 MHz CPUs and 1GB RAM, running SunOS 5.7,
and connected to the LAN via gigabit ethernet. The second workstation
(goshen) was equipped with 2 Sun UltraSPARC-II 296 MHz CPUs
running SunOS 5.7, and connected to the LAN via fastethernet (100 Mb/s).
MPICH-G2 and MPICH-G were not configured with
mpi flavors of Globus. In these experiments we compare the TCP/IP
performance of MPICH-G2, MPICH-G, and MPICH configured with p4.
SGI Experiments - MPICH-G2,
MPICH-G, and SGI-MPI - Specified
Back to MPICH-G2 Performance Evaluation
SGI Experiments - MPICH-G2,
MPICH-G, and SGI-MPI - Specified-pending
Back to MPICH-G2 Performance Evaluation
SGI Experiments - MPICH-G2,
MPICH-G, and SGI-MPI - Non-specified (MPI_ANY_SOURCE)
Back to MPICH-G2 Performance Evaluation
IBM Experiments - MPICH-G2 and
IBM-MPI - Specified
Back to MPICH-G2 Performance Evaluation
IBM Experiments - MPICH-G2
and IBM-MPI - Specified-pending
Back to MPICH-G2 Performance Evaluation
IBM Experiments - MPICH-G2
and IBM-MPI - Non-specified (MPI_ANY_SOURCE)
Back to MPICH-G2 Performance Evaluation
LAN Experiments - TCP/IP - MPICH-G2,
MPICH-G, and MPICH with p4
Back to MPICH-G2 Performance Evaluation
Future Work
- Add a shared-memory protocol module
- Add option for dedicated, hidden TCP-forwarding/receiving CPU's
- Enhance mpirun to allow RSL augmentations to be specified in the
machines file (e.g., "m1.utech.edu" 10 "(project=foo)(queue=bar)
(environment=(FOO bar))")
- Extend GridFTP mechanism to a reliable UDP-based solution
for environments where the TCP protocol inhibits bandwidth
(e.g., so-called "long, fat pipes", see "High-Resolution
Remote Rendering of Large Datasets in a Collaborative
Environment" in MPI-Related
Papers).
- Improve TCP performance
- improve latency by reducing header size and number of
operating system calls
- increase bandwidth by eliminating copy on receive side when
the receive has already been posted and the sending and
receiving machines have the same data formats
- increase latency hiding by using threaded flavors of Globus
- Allow MPICH-G2 to use threaded flavors of Globus libraries
further improving TCP performance
- Add support for QoS-enabled inter-machine communication (see
"MPICH-GQ: Quality-of-Service for Message Passing
Programs " in
MPI-Related Papers and
GARA)
- Add optional warning messages for information loss resulting from data
conversion
- Provide advanced developers with more control over compile/link
options and library ordering
- Propagate exit code passed to MPI_Abort back to mpirun
- Flush stdout/stderr upon MPI_Abort
MPI-Related Papers
MPICH's list of
MPI-related papers
Our MPI-related papers:
-
MPICH-G2: A Grid-Enabled Implementation of the Message Passing
Interface,
N. Karonis, B. Toonen, and I. Foster,
Journal of Parallel and Distributed Computing (JPDC), Vol. 63, No. 5,
pp. 551-563, May 2003.
(
PDF (3MB),
Postscript (397KB),
Gzipped Postscript (101KB)
)
-
MPICH-GQ: Quality-of-Service for Message Passing Programs,
A. Roy, I. Foster, W. Gropp, N. Karonis, V. Sander, and B. Toonen,
Proc. SC00 (SC2000), no page numbers available,
Dallas, TX, November 4-10, 2000, (nominated best paper at conference).
(
Postscript (367KB),
Gzipped Postscript (98KB)
)
-
Exploiting Hierarchy in Parallel Computer Networks to Optimize
Collective Operation Performance,
N. Karonis, B. de Supinski, I. Foster, W. Gropp, E. Lusk, and J. Bresnahan,
Fourteenth International Parallel and Distributed Processing Symposium
(IPDPS '00), pp. 377-384, Cancun, Mexico, May 1-5, 2000,
(nominated best paper at conference).
(
Postscript (230KB),
Gzipped Postscript (84KB)
)
-
Accurately Measuring MPI Broadcasts in a Computational Grid,
B. de Supinski and N. Karonis,
Proc. 8th IEEE Symp. on High Performance Distributed Computing (HPDC-8),
pp. 29-37, Redondo Beach, CA, August 1999.
(
Postscript (180KB),
Gzipped Postscript (47KB)
-
A Grid-Enabled MPI: Message Passing in Heterogeneous Distributed Computing
Systems,
I. Foster and N. Karonis, Proc. Supercomputing 98 (SC98),
no page numbers available, Orlando, FL, November 1998.
(
Postscript (199KB),
Gzipped Postscript (58KB)
)
-
Wide-Area Implementation of the Message Passing Interface,
I. Foster, J. Geisler, W. Gropp, N. Karonis, E. Lusk, G. Thiruvathukal,
and S. Tuecke, Parallel Computing, Vol. 24, No. 12, pp. 1735-1749,
1998.
(
Postscript (212KB),
Gzipped Postscript (58KB)
)
Our Application papers:
-
Nektar, SPICE, and Vortonics: Using Federated Grids for Large
Scale Scientific Applications,
B. Boghosian, P. Coveney, S. Dong, L. Finn, S. Jha, G. Karniadakis,
and N. Karonis,
Challenges of Large Applications in Distributed Environments
(CLADE) 2006, Paris, France, June 19, 2006, to appear.
-
Simulating and Visualizing the Human Arterial System on the TeraGrid,
S. Dong, J. Insley, N.T. Karonis, M. Papka, J. Binns, and G.E. Karniadakis,
Future Generation of Computer Systems (FGCS), Vol. 22, No. 8,
pp. 1011-1017, 2006.
-
Grid Solutions for Biological and Physical Cross-site Simulations
on the TeraGrid,
S. Dong, N.T. Karonis, and G.E. Karniadakis,
IEEE International Parallel and Distributed Processing Symposium
(IPDPS) 2006 Conference, Rhodes Island, Greece, April 25-29, 2006,
to appear.
-
Grid Enabled Solution of Groundwater Inverse Problems
on the TeraGrid Network,
K. Mahinthakumar, M. Sayeed, and N. Karonis,
High-Performance Computing Symposium (HPC 2006)
Huntsville, AL, April 2-6, 2006, to appear.
-
CFD Cross-Site Computations on the TeraGrid,
S. Dong, N.T. Karonis, and G.E. Karniadakis,
Computing in Science & Engineering (CiSE) Magazine,
joint publication of the IEEE Computer Society and the American
Institute of Physics, invited article, Vol. 7, No. 5, pp. 14-23, 2005.
-
Development and Performance Analysis of a Simulation-Optimization
Framework on TeraGrid Linux Clusters,
B.Y. Mirghani, M.E. Tryby, D.A. Baessler, N.T. Karonis, R.S. Ranhthan,
and K.G. Mahinthakumar,
The 6th LCI International Conference on Linux Clusters: The HPC
Revolution 2005, Chapel Hill, NC, April 26-28, 2005.
(
PDF (237KB)
)
-
High-Resolution Remote Rendering of Large Datasets in a Collaborative
Environment,
N. Karonis, M. Papka, J. Binns, J. Bresnahan, J. Insley, D. Jones,
and J.Link,
Future Generation of Computer Systems (FGCS), Vol. 19, No. 6,
pp. 909-917, August 2003.
(
PDF (7.8MB),
Postscript (12.3MB),
Gzipped Postscript (3.8MB)
)
-
Supporting Efficient Execution in Heterogeneous Distributed Computing
Environments with Cactus and Globus,
G. Allen, T. Dramlitsch, I. Foster, N.T. Karonis, M. Ripeanu, E. Seidel,
and B. Toonen,
Proc. SC01 (SC2001), no page numbers available,
Denver, CO, November 10-16, 2001,
awarded Gordon Bell Prize.
(
Postscript (2MB),
Gzipped Postscript (262KB)
)
-
Multivariate Geographic Clustering in a Metacomputing Environment Using
Globus,
G. Mahinthakumar, F. M. Hoffman, W. W. Hargrove, and N. Karonis,
Proc. SC99, no page numbers available,
Portland, OR, November 1999.
(
Postscript (4MB),
Gzipped Postscript (819KB)
)
How to contact us
- For bug reporting, please use the
Globus problem report form and be
sure to select MPICH-G2 in the Problem deals with:
pulldown menu. Submitting a bug report requires you to
create an account in their bug reporting system. The
instructions are on that page.
- For non-bug related general questions/comments please send email to
mpi@globus.org.
Related Globus topics
Project Sponsors
MPICH-G2 is supported by the following government agencies.
Acknowledgements
MPICH-G2, specifically the globus2 device, was written by
Many of the MPICH-G2 precompile ideas that enabled the use of vendor-supplied
MPI for intramachine messaging came from discussions between Olle Mulmo,
Warren Smith, Nick Karonis, and Brian Toonen. In fact, the precompile
idea was prototyped by Olle Mulmo and first used by Warren Smith on
a Cray T3E.
MPICH-G2 is the successor of MPICH-G, which was originally designed and
implemented by Jonathan Geisler while he was in MCS at Argonne. Later MPICH-G
was adapted by George Thiruvathukal (also while he was in MCS at Argonne) and
further developed. Finally, MPICH-G was passed on to Nick Karonis.
We thank Bill Gropp,
Ewing (Rusty) Lusk,
Rajeev Thakur,
Debbie Swider, David
Ashton, and Anthony Chan of the MPICH group at ANL for their guidance,
assistance, insight, and many discussions. Their input was a valuable
contribution to this work.
We thank Sebastien Lacour for many of the additions in the MPICH v1.2.2.3
release. In particular, we thank him for implementing the topology-aware
collective operations, the topology discovery mechanisms, MPICH-G2 over
MPICH-based vMPI, and the perftest collective operations. His insight
and understanding of these issues coupled with his ingenuity proved
invaluable in the implementation of these additions.
We also thank Sebastien Lacour for his efforts in conducting the
performance evaluation and preparing all the graphs. We thank the
MCS Division at Argonne for the use of the resources in
their Center for
Computational Science and Technology (CCST) in conducting those
experiments. In particular we thank Sandra Bittner and the rest of the
MCS Systems Group
for their cooperation and patience in providing us exclusive access to CCST
resources so that we may collect our performance data.