High Performance Virtual Machines
PI: Andrew
A. Chien , co-PIs: Daniel Reed, David Padua
High Performance Virtual Machines (HPVMs) can increase the accessibility
and delivered performance of distributed computational resources for high
performance computing applications. Successful HPVM's will reduce the effort
required to build efficient parallel applications on distributed resources,
increase the performance delivered to those applications, and leverage
parallel software tools from existing parallel systems to distributed environments.
Our approach is to exploit the rapidly increasing performance of low-cost
computing systems has produced a rich environment for desktop, distributed,
and wide-area computing. However this wealth of computational resources
has not been effectively harnessed for high performance parallel computing.
High Performance Virtual Machines (HPVMs) are a new technology which leverage
the software tools and developed understanding of parallel computation
on scalable parallel systems to exploit distributed computing resources.
The objective to reduce the effort to build high performance applications
on distributed systems.
High Performance Virtual Machines depend on building a uniform, portable
abstraction -- a virtual machine -- with predictable, high performance
characteristics. To successfully insulate application programs, a virtual
machine must (1) deliver a large fraction of the underlying hardware performance,
(2) virtualize resources to provide portability and to reduce the effort
in building application programs, and (3) deliver predictable, high performance.
The project is developing novel technology that leverages commodity components
(hardware and software) to deliver high performance communication over
cluster and wide area interconnects, predictable communication and computation,
coordinated scheduling, and uniform access to resources (e.g. files, mass
storage, embedded sensors). The HPVM project involves not only the development
of novel communication, scheduling, and resource management technologies,
but also dissemination of a series of software release which embody these
ideas.
For More Information:
See also Concurrent Systems Architecture
Group
Recent Highlights
- A large demonstration HPVM system of 256 processors (128 2-way multiprocessor nodes), has not only been demonstrated on a large number of applications, it is currently in use for production computing at the National Computational Science Alliance (NCSA). This system is driving the rapid improvement of various NT tools and libraries for scalable parallel computing, both academic and commercial packages. In addition, the visibility of the effort has catalyzed a significant fraction of the high performance computing community to adopt Windows NT more rapidly, improving the breadth and quality of application software for high performance computing available.
- Developed and distributed an improved HPVM software system, HPVM1.2, which increases the performance, usability, and portability of the system technologes. Specifically, HPVM1.2 includes an Installshield installation package, providing convenient installation and configuration management for cluster nodes. This represents a major increase in usability. HPVM 1.2 also includes an improved version of Fast Messages (FM) which supports large numbers of concurrent users (32 contexts) and delivers 100+ MB/s and latencies of 8.5 microseconds. Delivered MPI-FM performance reaches 92+ MB/s to the application.
- Developed an implementation architecture and design for Fast Messages on Compaq's Servernet cluster interconnect and Giganet's Virtual Interface Architecture. These implementations demonstrate the generality and portability of the Fast Messages communication substrate. Both FM ports deliver all of the underlying performance (32MB/s, 18 microseconds latency for Servernet and 80 MB/s and 10 microseconds for the Giganet), in some cases 10 times better performance than the standard commercial communication software.
- Completed the design of a dynamic FM communication architecture which preserves the high performance of the system while enabling dynamic configuration and adaptation. This allows single nodes and groups of nodes to leave and join the cluster without terminating ongoing jobs, enabling cluster/HPVM technologies to be adapted to resource environments with transient resource availability.
- Developed a prototype WAN bridge which allows an HPVM cluster to span in IP network bridge. FM packets are transparently bridged through IP networks, using an architecture which hides the underlying unreliability of the IP network and efficiently bridges packets without memory copies. Current experiments with this infrastructure are limited by the speed of available IP implementations (~28MB/s on the PC platforms), but future efforts will explore lower level protocols and striping approaches to increase performance.
Current Plan
- Implement the dynamic HPVM architecture enabling both synchronous and asynchronous process departure
- Demonstrate a WAN bridged FM implementation which enables the federation of geographically separated HPVM clusters with high performance
- Complete implementation and testing, and integrate into mainstream HPVM code base of an SMP transport which delivers the full performance of the underlying memory system.
- Integrate and distribute an HPVM implementation which supports the varied physical transports (Myrinet, Giganet, Servernet, IP, shared memory, etc.)
- Continue to support demonstrations of large Windows NT clusters with NCSA, the National Computational Science Alliance, to drive the scalability, performance, and visibility of these platforms as viable large-scale computational elements.
Back to CSAG home page
Last updated January 1999
webmaster