Title: Design and Implementation of Virtual Memory-Mapped Communication on Myrinet Authors: Cezary Dubnicki, Angelos Bilas, Kai Li (Princeton University) and James F. Philbin (NEC Research Institute, Inc.) Virtual memory-mapped communication (VMMC) is a communication model providing direct data transfer between the sender's and receiver's virtual address spaces. This model eliminates operating system involvement in communication, provides full protection, supports user-level buffer management and zero-copy protocols, and minimizes software communication overhead. VMMC has been designed and implemented for the SHRIMP multicomputer where it delivers user-to-user latency and bandwidth close to the limits imposed by the underlying hardware. In this work we describe the design and implementation of the VMMC mechanism for the Myrinet. The goal of this work is to provide implementation of VMMC on commercially available hardware platform; to investigate how much of the VMMC benefits can be realized on the new hardware; and to investigate network interface design tradeoffs by comparing SHRIMP with Myrinet and its respective VMMC implementations. Implementation of VMMC on Myrinet uses software TLB in network interface SRAM to provide virtual-to-physical translation of send addresses. Like in SHRIMP we support multiple senders and implement protected, user-level send. Except of making a few static symbols external in Linux source, all system software is implemented at user-level or as a loadable device driver. Currently we have 4 node system operational. Each node is 166 a MHz Pentium PC connected to Myrinet switch with Myrinet PCI network interface. The available network bandwidth is 1.28 Gb/s per connection. For one word transfers we are able to achieve user-to-user latency about 9.9 microsecond. The maximum user-to-user bandwidth is 108 MB/s which is about 98% of the hardware bandwidth. By comparing Myrinet with SHRIMP we have found not surprisingly that the latter provides better platform for VMMC implementation at the cost of specialized network interface and more OS modifications. However, Myrinet implementation is also able to provide latency and bandwidth quite close to the hardware limits. The price for it is more resources used by the Myrinet network interace, including LANai processor and on-board SRAM.