This is a continuation of the HPVM-MinuteSort project that in April 1st 1999 established a new record in the MinuteSort category of Disk-to-Disk Sorting, using 60 machines of the NCSA NT Supercluster in dedicated mode to sort 10.3 GB. This time, using the new HPVM cluster hardware that CSAG built at UCSD and using the last release of the HPVM software, the challenge was to maximize the utilization of and heterogeneous cluster, where the systems do not have to same I/O and memory capabilities. The configuration of this new cluster is shown below:
Dual 300 MHz Pentium
Dual 400 MHz Pentium II
|SDRAM||384 MB||1 GB|
|Disk (s)||Four 20 GB 7200 RPM Disk Connected to a 3Ware
1 IBM 20 GB 7200 RPM IDE (Used as System Disk)
|Two SCSI 18.2 GB 7200 RPM|
Given the hardware described we calculated the theoretical and practical limits of the components, and set the following goal, projections, architecture and performance.:
Benchmark the capabilities of the cluster to move among disks, memory and network, in order to maximize the data flow and pinpoint the hardware bottlenecks of the current homogeneous configuration.
–Good overall system benchmark–Parallel Sort: Stresses I/O subsystem, CPU and communication subsystem
|•Kayak Read Rate||50 MB/s|
|Kayak Send Rate||100 MB/s|
|Kayak Receive Rate||70 MB/s|
|In-core Sorting Rate||189 MB/s|
|Total Writing Rate||60 MB/s|
|Kayak Write Rate||40 MB/s|
|Netserver Write Rate||20 MB/s|
|Launch Time||5-10 s|
|Total Time||< 52 sec|
The following picture depicts the model for data movement followed to implement the MinuteSort benchmark. Simply stated, the idea is to move the data from the machines with greater I/O capacity and few memory (Kayaks), to the machines with more memory and poor I/O capability (NetServers). This model also took advantage of a couple of more differences among these systems, one of them is obvious given the difference of processor speed between Kayaks and NetServers and the difference of communication bandwidth when receiving data. This last difference is due to the memory bandwidth in the Kayaks systems, allowing them to receive at a maximim speed of 70 MB/s, whereas the Netservers achieve full use of the PCI bandwidth and tops at about 100 MB/s. The example in the picture shows an example with four nodes, 2 kayaks and 2 NetServers, the data initially flows from Kayaks to NetServers, after this stage, the NetServers perform a bucket sort, finally the Netservers send most of the data back to the Kayaks and also help to improve performance in this stage by taking the reponsability of writing an slice of the data to disk.
The last result obtained before the April 1st deadline was 21.8 GB sorted in less than a minute. The breakdown of the sorting time is shown in the graph below, divided in the three stages described above, and not including launching time. To launch the application we used Catapult, a utility obtained from the NOW group at UC Berkely, which is faster than LSF, the utility used previously in the NTSupercluster. To this time, we have to add the FM initialization time, which in some occasions is greater and less predictable than the time taken by Catapult. Both times combined when launching 64 nodes is variable and our best time was 7 seconds, which is the time we are reporting. In this case, given the memory demands of the sorting algorithm we run out of memory in the NetServers before running out of time.