Using SCI to Build Shared Memory Clusters ----------------------------------------- presented by Helen Raizen and John Robinson Dolphin Interconnect Solutions, Inc. Scalable Coherent Interface (SCI) supports a wide range of cluster topologies and organizations. In between loosely-coupled clusters and tightly coupled CC-NUMA symmetric multiprocessors (SMP), Dolphin is exploring what we call the Shared Memory Cluster (SMC), which, at the operating system level looks like a typical cluster, but presents a single system image to applications. Dolphin's SMC uses an SCI-based interconnect to provide coherent shared memory across the cluster in a way that supports a separate OS on each node, but allows applications to use shared memory. Software to support the SMC concept consists of installable drivers and a library layer, allowing the cluster to run on commodity operating systems. The SMC cluster has advantages over both the loosely-coupled cluster and the tightly-coupled SMP. With a separate OS on each node, availabilty can be enhanced, because a failure in one node does not take down the entire cluster. Having a separate instance of the OS on each node reduces off-node references and thus reduces the load on the interconnect. Also, the OS only has to scale to the number of processors within a node, rather than to the total number of processors in the cluster. These are typical advantages of a cluster. But unlike most clusters, an SMC system presents programs with the shared memory model expected by a wide range of existing applications. Thus applications can take advantage of processors and memory on more than one node, without needing to be rewritten. Coherent shared memory also provides high performance message passing for applications that need it. This poster presents the software architecture and some of the hardware support features that Dolphin is building as part of our SMC system. It also contrasts this approach to both SMP and loosely- coupled clusters. Our hardware includes: - mapping hardware to allow us to separate the node's physical addresses from the global SCI addresses, supporting an independent physical address space on each node. - support for very large far memory cache, reducing off-node references to coherence misses. - fast node to node interrupts, supporting fast internode synchronization. Our software includes: - a driver which in conjunction with the interconnect hardware, provides support for cluster-wide shared memory. - a middleware layer which provides remote semantics for operating system objects by intercepting system calls and redirecting them to remote nodes when the object of the call is remote. Objects which may be referenced remotely include files, processes and semaphores. Remote file system access is coherent (unlike common implementations of NFS). Our middleware layer also supports access to cluster-wide shared memory via standard OS interfaces. - administration tools to aid in the administration of the cluster The poster also presents some of the challenges in implementing an SMC system, including maintaining the high availability that loosely-coupled clusters can provide while still providing the application scalability and transparency that SMP systems can deliver.