SSI Project
  Overview
  License
  Download
  Contributed Code
  Mailing List
  Bruce's Corner
  Project List
  Demos
  Features
   
SSI Documentation
  Documentation
   
Clustering Team
  Contact Us
  Background
  Links


"I really want to see the Compaq clustering code, the IBM DLM and OpenGFS in the 2.5 tree creating a real clustered Linux with true failover facilities. That will really open the door to the enterprise market."


Overview

The SSI project leverages both Compaq's NonStop Clusters for Unixware technology and other open source technology to provide a full, highly available SSI environment for Linux. Goals for SSI Clusters include availability, scalability and manageability, built from standard servers. Technology pieces will include: membership, single root and single init, cluster filesystems and DLM, single process space and process migration, load leveling, single and shared IPC space, device space and networking space, and single management space. The SSI project leverages the Cluster Infrastructure (CI) for Linux project.

Our Sourceforge.net project summary page is located here.

License [Top]

Both the SSI and CI code are being released under the GNU General Public Licence (GPL), version 2. This is the same license used by the Linux kernel.

Download [Top]

Contributed Code [Top]

You can find contributed patches and such here. See the mailing list archive for context.

Mailing List [Top]

  • Subscribe to the Single System Image Cluster mailing list
  • Read our archives (Geocrawler | MARC)
  • Missing messages between August 7 and 22, 2001 can be found here
  • Post a message

Bruce's Corner [Top]

Project List [Top]

Project Assigned to
Significant work already done
Single, persistent devfs-based clusterwide /dev John Byrne
MOSIX load leveler integration to SSI Laura Ramirez
Stackable Cluster Filesystem - CFS from NSC Dave Zafman
SSI of TCP/IP Networking and LVS functionality Kai-Min Sung
Subsecond failover time and modification of node detection algorithm Kai-Min Sung
Port to Alpha Aneesh Kumar
Just starting or lots left to do
Single install and simple boot model - looking at Scyld Brian Watson
Integration with Samba to produce an HA, scalable Samba server John Byrne
Clusterwide /proc Laura Ramirez
UserMode Linux (UML) extensions to allow large clusters on a small number of nodes Kitrick Sheets
Socket and other IPC object migration Bruce and Brian
Wish list
Faster interconnects - native Myrnet, Infiniband, ... open
Integration with the linux-cluster HA work open
STOMITH integration - from Sistina/GFS open
HA interconnect (there is some code from NSC) open
Integration with FailSafe open
Integration with Heartbeat open
Integration with DLM open
Integration with Beowulf open
Scale to thousands of nodes open
Hooks into Linux 2.5/2.6 open
IA-64 port open
SSI of semaphores and unix-domain sockets open
SSI of system's mgmt open
Cluster Logical Volume Manager open
Timesync open
Clusterwide run levels open
Integration with the remote data mirroring capability open
Clusters across subnets open
Active/active data replication open
Clusterwide paging model open
Performance tools open
Bproc library for SSI cluster open

Demos [Top]

These are the demos we showed at Caldera Forum (8/20 - 8/22) and LinuxWorld (8/28 - 8/30). There are 4 separate demos to expose several of the technical features of the Open SSI Cluster project to date. Together, the demos illustrate:
  • process movement, inheriting open files, file offsets, and open devices
  • clusterwide pids; distributed process relationships and access
  • clusterwide job control
  • single clusterwide root and single init
  • clusterwide fifos
  • clusterwide device naming and access to remote devices
Demo Config:
  • 4 node cluster with a shared FCAL
  • nodes connected by 100Mb ethernet
  • separate machine to display from and run memexpd
  • 4 windows on the display machine, one per node, and an extra window for node 1
Demo 1 (process movement with open files, inheriting files and devices):
  • get demoprog and build it by running make
  • determine the pty of the extra window for node 1 with the tty command
  • if the device is in /dev/pts, run demoprog with its corresponding device in /devfs/pts
  • start demoprog on node 1 with ./demoprog alphabet </devfs/pts/X >/devfs/pts/X 2>&1
  • demoprog processes records from the file given it
  • demoprog is also listening for direction to move to another node; if you type a node number, it moves there and continues with the next record
  • if you type CR, demoprog will move on its own until you type something else
  • start 'top' on all nodes
  • you can observe the process movement in the 'top' windows
  • quit demoprog by typing 'q'
Demo 2 (clusterwide pids; distributed process relationships and access; clusterwide job control; single clusterwide root):
  • from node 1, onnode 2 vi /tmp/newfile
  • type some text and write it out
  • on node 2, ps to see the vi process
  • on node 3, cat the file to show the clusterwide filesystem
  • on node 4, kill the vi on node 2 (just use its pid)
  • death should show up on node 1
Demo 3 (clusterwide fifos):
  • make a fifo on the shared root (mkfifo /fifo)
  • echo something into the fifo on one node
  • cat the fifo on another node
Demo 4 (clusterwide devices and remote device access):
  • determine the pty of the window on a given node with the tty command
  • if the device is in /dev/pts, use its corresponding device in /devfs/pts
  • write "Hello World" to that device from another node

Features [Top]

  1. Membership
    • courtesy of the CI project
    • includes libcluster and the cluster command (part of Cluster Tools)
  2. Internode Communication
  3. Filesystem
    • GFS is there as the root filesystem
    • reopen of files when processes move is supported
    • GFS is not yet integrated with DLM so you must have another node outside the cluster act as the lock manager node
    • going multiuser with GFS as root sometimes panics (Sistina just added support for rd/wr mapped file sharing)
    • not sure if you can have other GFS mounted filesystems
    • there is no mount enforcement across nodes in the cluster
    • our Cluster File System (CFS) is not there yet
  4. Process Management
    • some, but not all pieces there:
      • clusterwide pids
      • process migration and distributed rexec() with reopen of files, sockets, pipes, devices, etc.
      • vprocs
      • clusterwide signalling, get/setpriority
      • capabilities
      • distributed process groups, session, controlling terminal
      • surrogate origin functionality
      • no single points of failure (cleanup code to deal with nodedowns)
    • some of the things that aren't there:
      • ptrace and thus strace only work on processes which are local
      • no rfork(), due to unresolved signal races
      • threads all stay on the same node
      • no load leveler (we are looking at adapting the Mosix one)
      • no clusterwide /proc
      • no clusterwide ps, etc.
  5. Devices
    • there is a clusterwide device model via extensions to the devfs code
    • all kernels know about all devices on all nodes
    • a process on any node can open a device on any node
    • devices are reopened when processes move
    • devfs cannot be mounted on /dev, yet (there are unresolved naming issues)
    • devfs can be mounted somewhere else (e.g., /devfs) to allow opening of remote devices
    • the planned data file for device naming persistence across reboots isn't there
    • there are some problems with nodedown and node rejoin
  6. IPC
    • several IPC objects/mechanisms are clusterwide:
      • pipes
      • fifos
      • signalling
      • message queues
      • BSD-based ptys
    • some are not:
      • unix-domain sockets
      • semaphores
      • shared memory
      • /dev/pts-based ptys
    • reopen of sockets and clusterwide objects is there for process movement
    • nodedown handling is there for the objects that are clusterwide
  7. Clusterwide TCP/IP
    • not there, yet
    • each node is currently independent
    • sockets are reopened on process movement
  8. Paging/Swapping
    • not clusterwide
    • each node is independent
  9. Kernel Data Replication Service
    • it is in there (cluster/ssi/clreg)
    • not sure any subsystems are using it
  10. CLVM
    • not there
  11. Shared Storage
    • we were testing with shared SCSI and found filesystem corruption that we suspect is a timing problem in GFS
    • FCAL is working better for Sistina and for us
  12. HA interconnect
    • not supported
  13. DLM
    • not integrated and thus not there yet
  14. Sysadmin
    • nothing has been done to tweak the base sysadmin tools to present the SSI cluster
  15. Init, Booting and Run Levels
    • system runs with a single init which will failover/restart on another node if the node it is on dies (part of Cluster Tools 0.5.7 and later)
    • currently all but the first node do a very minimal bootup since the complete run-level support is not there
  16. Application Availability
    • application monitoring/restart provided by spawndaemon/keepalive (part of Cluster Tools 0.5.8 and later)
    • base daemons have not been investigated to determine how/if they should be restarted on failure of a node, or whether they should be run in parallel
  17. Timesync
    • nothing done here
    • should look at NTP for now
  18. Load Leveling
    • the daemon from NSC has not been adapted
    • looking at the Mosix algorithms
    • for connection load balancing, nothing is there yet
  19. Packaging/Install
    • primitive for now
  20. Object Interfaces
    • standard interfaces for objects work as expected
    • no new interfaces for object location or movement except for processes (rexec(), migrate(), and SIGMIGRATE to move a process)

SourceForge Logo

Click here

This file last updated on Friday, 01-Feb-2002 15:01:55 PST
privacy and legal statement