Efficient Message Passing in the Avalanche Multiprocessor Alan Davis Ravindra Kuramkote Leigh Stoller Mark Swanson Department of Computer Science University of Utah Abstract The objective of the Avalanche project is the design and implementation of a communications interface for commodity workstations connected by a commodity communications fabric. The interface supports both message passing and distributed shared memory, but here we focus solely on aspects that support efficient message passing. At the software level, we provide a lightweight, sender-based protocol (SBP) appropriate for high-performance system area networks (SANs). In the hardware arena, we intelligently integrate the interface into the workstation memory hierarchy, and we provide simple, but powerful support in the interface for the SBP. Together, these mechanisms enable extremely low latency and low overhead inter-workstation communication. Of interest to the NOW/COW community are the following aspects of Avalanche: -- the use of commodity Hewlett Packard multiprocessor-capable; workstations (J200/J210 PA RISC based using a 100/120 MHz Runway system bus); -- implementation as a processor card (to be plugged into an empty processor slot); -- use of a commodity fabric (Myrinet II); -- provision of a cache coherent message passing mechanism; -- avoidance of any changes to the workstation hardware; -- preservation of the workstation's uniprocessor performance. We have performed extensive, detailed architectural simulations of the processor, first level cache, system bus, and network interface (using conservative delays for the latter based on prototype VHDL simulations). Using a simulated single-issue processor running at a speed of 120MHz and a Myrinet frequency of 160 MHz, we have measured user-to-user one way message times of less than 4 microseconds for 32 bytes of data. This time includes accessing the 32 bytes of data within the processor. Roundtrip times (32 byte request and 4 byte reply) have been measured at 8 microseconds. The interface is currently being implemented as an .6 micron CMOS ASIC with a the target frequency of 120 MHz, which matches the speed of HP's faster version of its Runway system bus. A cluster of between 32 and 64 workstations with these custom interfaces is to be constructed by the end of 1997. Technical reports describing the sender-based protocol (Direct Deposit), the simulation environment (PAINT), and other aspects of Avalanche can be found via this URL: http://www.cs.utah.edu/projects/avalanche/avalanche-publications.html