Background

The MCR Linux cluster is currently being built by Linux NetworX in Utah. See the Build information for more details and photos.

MCR Statement of Work Contents
1 Background 1.1 Multiprogrammatic and Institutional Computing (M&IC) 1.2 Partnership with the Stockpile Stewardship Program (SSP) 1.3 M&IC Applications Overview 1.4 M&IC Scientific Software Development Environment 1.5 M&IC Applications Execution Environment 1.6 M&IC Operational Environment 1.7 Utilization of Existing Facilities	2 MCR Strategy and Architecture 2.1 LC Hardware and Software Strategy and MCR Architecture 2.1.1 LC Linux Strategy for HPTC Scalable Clusters 2.1.2 MCR Hardware Architecture 2.2 LC Software Environment for Linux Clusters 2.2.1 Clustered High Availability Operating System (CHAOS) 2.2.2 LLNL Cluster Tools 2.2.3 Simple Linux Utility for Resource Management 2.2.4 Distributed Production Control System (DPCS) 2.2.5 Lustre Lite Cluster Wide File System 2.2.6 The Livermore Computing Linux Cluster Support Model 2.2.7 Integration Testing 2.3 MCR Build Strategy

MCR Statement of Work
Contents

1 Background
1.1 Multiprogrammatic and Institutional Computing (M&IC)
1.2 Partnership with the Stockpile Stewardship Program (SSP)
1.3 M&IC Applications Overview
1.4 M&IC Scientific Software Development Environment
1.5 M&IC Applications Execution Environment
1.6 M&IC Operational Environment
1.7 Utilization of Existing Facilities

2 MCR Strategy and Architecture
2.1 LC Hardware and Software Strategy and MCR Architecture
2.1.1 LC Linux Strategy for HPTC Scalable Clusters
2.1.2 MCR Hardware Architecture
2.2 LC Software Environment for Linux Clusters
2.2.1 Clustered High Availability Operating System (CHAOS)
2.2.2 LLNL Cluster Tools
2.2.3 Simple Linux Utility for Resource Management
2.2.4 Distributed Production Control System (DPCS)
2.2.5 Lustre Lite Cluster Wide File System
2.2.6 The Livermore Computing Linux Cluster Support Model
2.2.7 Integration Testing
2.3 MCR Build Strategy

1 Background

1.1 Multiprogrammatic and Institutional Computing (M&IC)

Lawrence Livermore National Laboratory (LLNL) has identified high-performance computing as a critical competency necessary to meet the goals of LLNL's scientific and engineering programs. Leadership in scientific computing demands the availability of a stable, powerful, well-balanced computational infrastructure, and it requires research directed at advanced architectures, enabling numerical methods and computer science.

M&IC was created to encourage all programs to benefit from the huge investment being made by the Advanced Simulation and Computing Program (ASCI) at LLNL and to provide a mechanism to facilitate multiprogrammatic leveraging of resources and access to high-performance equipment by researchers.

The Livermore Computing (LC) Center, a part of the Computations Directorate Integrated Computing and Communications (ICC) Department can be viewed as composed of two facilities, one open and one secure. This acquisition is focused on the M&IC resources in the Open Computing Facility (OCF).

For the M&IC program, recent efforts and expenditures have focused on enhancing capacity and stabilizing the TeraCluster 2000 (TC2K) resource. Capacity is a measure of the ability to process a varied workload from many scientists simultaneously. Capability represents the ability to deliver a very large system to run scientific calculations at large scale.

In this procurement action, we intend to significantly increase the capability of the M&IC resource to address multiple teraFLOP/s problems as well as increasing the capacity to do many 100 gigaFLOP/s calculations.

1.2 Partnership with the Stockpile Stewardship Program (SSP)

The M&IC platforms form part of the unclassified computing environment at LLNL. This environment is called the Open Computing Facility (OCF). Some of the platforms in the OCF represent primarily Stockpile Stewardship Program investments (for instance, the unclassified ASCI systems). Others, such as the shared memory multiprocessors (SMP) clusters, represent fairly evenly divided investments between multiple programs (including the SSP) and the institution.

The support infrastructure (everything but the compute platforms) is covered both by M&IC and by the NNSA SSP, and the partners share the resources. For instance, the IBM HPSS storage environment represents an ~80/20 split, with the SSP making the heavier investment. On the other hand, the NFS home space environment was procured by M&IC. The visualization environment is more heavily funded by the SSP, but the System Area Network is supported by the M&IC. The LC balances these investments according to the utilization of the broad support infrastructure by the partners. The end result is a more powerful OCF than any program (or the Institution) could afford alone.

2.2.1.1 CHAOS Status