Building a Teraflop (one million Megaflop) Windows NT Supercomputer and its Implications for Distributed Software Professor Andrew A. Chien Department of Computer Science and National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The rapid maturation of desktop and high volume technologies in microprocessors, high speed networks, and operating systems make commodity building blocks powerful elements for building supercomputing systems. In my research group, we have been researching core system software technologies to synthesize supercomputers from commodity hardware. These include lightweight, low-latency networking software (Fast Messages), coordination processor and memory scheduling (Dynamic Coscheduling), and efficient implementations of commodity API's which are combined into High Performance Virtual Machines (HPVMs). Delivered HPVM application performance includes: Message Passing Interface (MPI) with latencies of 15 microseconds, and 80 megabytes/second, Global Memory (Shmem Put/Get) and Global Array Interfaces with latencies of 4 and 27 microseconds, and sustained bandwidths of >60 megabytes/second, and Remote Procedure Call (RPC, Java RMI, DCOM) performance of 47 microseconds (RT) and 35-70 megabytes/second. HPVM software runs on both Windows NT and Linux. Though significant software challenges remain, these technologies have revolutionary implications for supercomputing and local-area distributed computing. To transfer demonstrate these technologies, we have built a series of clusters (30-processor Pentium Pro, 92-processor Pentium II). We are now building a series of high performance Windows NT clusters with NCSA. The HPVM/NCSA cluster (April 1998) will include 128 300Mhz Pentium II processors, and have aggregate capabilities of 38.4 Gigaflops, 32 Gigabytes of DRAM, 10 Gigabytes/second bisection bandwidth, and 256 Gigabytes of disk storage. The second HPVM/NCSA cluster (late 1998) will be 512 Deschutes processors with aggregate capabilities of >220 Gigaflops (and 4-6x in the other dimensions). Teraflop systems will follow in 1999. ANDREW A. CHIEN's research involves networks, network interfaces, and the interaction of communication and computation in high performance systems. His work also involves compilation techniques for high performance object systems. He is an Associate Professor of Computer Science at the University of Illinois with joint appointments with Electrical and Computer Engineering and as a Senior Research Scientist in the National Center for Supercomputing Applications. Andrew received his undergraduate, master's, and doctoral degrees from the Massachusetts Institute of Technology and is a recipient of a 1994 National Science Foundation Young Investigator Award.