HP-RMI : High Performance Java RMI over FM

Geetanjali Sampemane, Luis Rivera, Lynn Zhang, Sudha Krishnamurthy
{geta, lrivera, l-zhang, krishnam}@cs.uiuc.edu

Abstract:

Java RMI [4] is a convenient way of making remote method calls in distributed object systems. However, from our experiments with Java RMI, we found that the current implementation over TCP/IP was not particularly fast. It is our contention that marshalling and the transport mechanism are the main sources of overhead. In this project, we aim to minimize the overheads incurred by the transport layer. We attempt to improve the performance of Java RMI by using Fast Messages [2] as the underlying transport and make a comparative study of the resulting performance.



Background and Motivation

Distributed systems require processes running in different address spaces to be able to communicate and exchange data. There are two basic ways in which this is achieved -- one is the sockets interface, which is flexible, but requires the application programmer to encode and decode messages sent over the network. Another is the Remote Procedure Call (RPC) mechanism which hides the network details and provides the abstraction of a procedure call. This is easier to use than the former.

Current distributed object systems however, need a variant of this -- they need to communicate by invoking methods of remote objects. The Java Remote Method Invocation (RMI) system has been written to provide this functionality for Java objects. From our experiments detailed in Section 3, we found that the Java RMI performs poorly in comparison to other RPC systems [3]. In this project, we attempt to minimize the overheads associated with the communication layer of Java RMI.



Proposed Implementation

A normal RMI invocation works as follows: a client application invokes a method on an object on the server. The parameters are marshalled and sent out over the network to the server machine, where the computations are performed, results are marshalled and sent back to the client. The two main areas where delays occur are the marshalling (``object serialization'') and the network overheads. We focus in this project on reducing network overheads.

To develop an RMI application, one has to create the client, server and the RMI interface definition (which are written in Java). After generating the Java class files, the RMI compiler ( rmic) runs on them and generates stubs and skeletons. When the server is started, the server binds itself to the registry and waits for a network connection. When the client is started, the stub contacts the registry to find out the address of the server, and then makes a network connection to the server.

The current implementation of RMI uses TCP/IP sockets as the default transport mechanism. Our first attempt at reducing the communication latency is to replace the heavy-weight TCP/IP transport layer with a leaner, faster messaging layer, namely the Illinois Fast Messages.

Illinois Fast Messages (FM) is a low-overhead, high-performance software messaging layer with an implementation for the Myrinet [1] network. Myrinet is a high speed local area network with full duplex 1.28 Gbps links. It decouples the host processor and the network by using a processor in the network interface to handle most of the processing. One of the primary goals of FM is to deliver as much of the underlying network bandwidth as possible, to the applications and higher messaging layers.

We would like our changes to be completely transparent to the user of RMI -- any application should be able to take advantage of this faster network by selecting this version of the network library at runtime. At this time, this seems feasible by creating a new ``socketFactory'' to be used by RMI.

By running our test applications using the two different network-layer implementations (TCP/IP and FM) and measuring the results, we should be able to measure the speedup obtained.

Our plans for implementation are as follows:



Choice of platform

We decided to select Windows NT as the platform since all the software we require is available on it. We plan to use the Pentium machines in the NT lab with NT4.0 and the Java Development Kit (jdk-1.1.4). We will interface this with FM (v2.1). We are working on having the environment setup with the necessary tools (C compilers, and other necessary tools for compiling jdk).

To obtain the best possible performance, we are trying to obtain a JIT compiler. We are investigating the freely available JIT compilers to see which one best suits our application -- most JIT compilers do not seem to support RMI yet.

Our final set of measurements will be on a Myrinet network with the same version of all the software. To be able to compare results, we will take all measurements again (ie both TCP/IP version of RMI and the FM version of RMI).



Experimental setup

We develop a test library that consists of the following functions:

We expect that the time taken for send and receive should be similar for similar sized data, in which case we will analyze only one of them.

We run the tests in two cases:

This will give us an estimate of the network overhead in the current setup.

We run the test suite on our lab setup (Windows NT4.0, jdk-1.1.4, jpp-1.1) to obtain measurements of the system as it stands now. They are presented in the next section.

We were unable to run the server using the JIT compiler since it encountered problems while trying to bind with the rmiregistry. We also tried out several public domain JIT compilers for Windows NT: kaffe, Supercede, grok. We haven't been successful so far since some of these do not support JDK1.1 and RMI yet. From the postings in the rmi-users FAQ, we understand that the Symantec JIT compiler does not work with RMI either. So we are still exploring the JIT compiler to be used and plan to try out the JIT compiler bundled along with Microsoft's Internet Explorer 3.0+.

Once we settle on a JIT compiler, we have to ensure that the results are repeatable, and garbage collection and any other such factors do not affect our results.



Implementation details

All the networking in the Java RMI interface is handled by the java.net library. While generating stubs and skeletons, RMI sets the ``RMISocketFactory'' to decide the type of sockets that will be used. If we can modify this to use a different networking library, the RMI system will use the FM socket library as the default instead of the TCP/IP one. This is the approach we plan to take.

We will need to develop a new socket interface over FM, and generate a DLL that can be linked into the application and provide the necessary socket functionality. The library will be linked to java using the Java Native Interface (JNI).



Preliminary Measurements

 

As an overall assessment of the RMI performance, we measured the elapsed time required to make a total of 10,000 invocations for sending and receiving different number of bytes between two machines running NT. The performance was measured for two cases: with the client and server running on the same machine (using the loopback interface) and with with them running on different machines on the same Ethernet LAN. The differences in timings between these two cases gives us an idea of the network overhead in RMI.

The experiments were conducted at a time when the machines as well as the network were unloaded. These experiments were conducted by running both the client and the server using the JVM as well as running the client using a JVM and the server using the JIT compiler from Sun. The results from the former are tabulated in Table 1 and Table 2 and those from the latter are tabulated in Table 3 and Table 4.

Elapsed time for 10,000 calls: client run on JVM, server run on JVM
Table 1: Passing bytes by value

Function} # of args # of results Local Call(sec) Remote Call(sec)
Null 0015.17217.345
sendbyte 1 015.33217.054
recvbyte 0115.29217.105

Table 2: Passing fixed length byte arrays
Function} # of bytes (args) # of bytes(results) Local Call(sec) Remote Call(sec)
senddata 1440 0 25.537 34.209
recvdata 0 1440 27.099 33.408

Elapsed time for 10,000 calls : client run on JVM, server run on JIT
Table 3: Passing bytes by value

Function} # of args # of results Local Call(sec) Remote Call(sec)
Null 0012.37717.976
sendbyte 1012.48817.735
recvbyte 0112.43817.746


Table 4: Passing fixed length byte arrays

Function # of bytes (args) # of bytes (results) Local Call(sec) Remote Call(sec)
senddata 14400 21.65135.511
recvdata 0144022.47333.988

Project Schedule

The timeline is attached.

Bi-Weekly meeting with Prof Chien on Wednesdays at 2pm.



References

1
K. Connelly and A. Chien. Fm-qos : Real-time communication using self-synchronizing schedules. In Proceedings of Supercomputing, 1997.

2
S. Pakin et al. HPVM 1.0 User Documentation, August 1997.

3
M.D. Schroeder and M. Burrows. Performance of firefly rpc. In ACM Symposium on Operating System Principles, December 1989.

4
Java RMI Specification. Available from http://www.javasoft.com/products/jdk/1.1/docs/guide/rmi/spec/rmiTOC.doc.html.

5
JNI Specification. Available from http://www.javasoft.com/products/jdk/1.1/docs/guide/jni/spec/jniTOC.doc.html.


Thu Nov 20 13:19:42 CST 1997