Geetanjali Sampemane, Luis Rivera, Lynn Zhang,
Sudha Krishnamurthy
{geta, lrivera, l-zhang, krishnam}@cs.uiuc.edu
Distributed systems require processes running in different address spaces to be able to communicate and exchange data. There are two basic ways in which this is achieved -- one is the sockets interface, which is flexible, but requires the application programmer to encode and decode messages sent over the network. Another is the Remote Procedure Call (RPC) mechanism which hides the network details and provides the abstraction of a procedure call. This is easier to use than the former.
Current distributed object systems however, need a variant of this -- they need to communicate by invoking methods of remote objects. The Java Remote Method Invocation (RMI) system has been written to provide this functionality for Java objects. From our experiments detailed in Section 3, we found that the Java RMI performs poorly in comparison to other RPC systems [3]. In this project, we attempt to minimize the overheads associated with the communication layer of Java RMI.
A normal RMI invocation works as follows: a client application invokes a method on an object on the server. The parameters are marshalled and sent out over the network to the server machine, where the computations are performed, results are marshalled and sent back to the client. The two main areas where delays occur are the marshalling (``object serialization'') and the network overheads. We focus in this project on reducing network overheads.
To develop an RMI application, one has to create the client, server and the RMI interface definition (which are written in Java). After generating the Java class files, the RMI compiler ( rmic) runs on them and generates stubs and skeletons. When the server is started, the server binds itself to the registry and waits for a network connection. When the client is started, the stub contacts the registry to find out the address of the server, and then makes a network connection to the server.
The current implementation of RMI uses TCP/IP sockets as the default transport mechanism. Our first attempt at reducing the communication latency is to replace the heavy-weight TCP/IP transport layer with a leaner, faster messaging layer, namely the Illinois Fast Messages.
Illinois Fast Messages (FM) is a low-overhead, high-performance software messaging layer with an implementation for the Myrinet [1] network. Myrinet is a high speed local area network with full duplex 1.28 Gbps links. It decouples the host processor and the network by using a processor in the network interface to handle most of the processing. One of the primary goals of FM is to deliver as much of the underlying network bandwidth as possible, to the applications and higher messaging layers.
We would like our changes to be completely transparent to the user of RMI -- any application should be able to take advantage of this faster network by selecting this version of the network library at runtime. At this time, this seems feasible by creating a new ``socketFactory'' to be used by RMI.
By running our test applications using the two different network-layer implementations (TCP/IP and FM) and measuring the results, we should be able to measure the speedup obtained.
Our plans for implementation are as follows:
We decided to select Windows NT as the platform since all the software we require is available on it. We plan to use the Pentium machines in the NT lab with NT4.0 and the Java Development Kit (jdk-1.1.4). We will interface this with FM (v2.1). We are working on having the environment setup with the necessary tools (C compilers, and other necessary tools for compiling jdk).
To obtain the best possible performance, we are trying to obtain a JIT compiler. We are investigating the freely available JIT compilers to see which one best suits our application -- most JIT compilers do not seem to support RMI yet.
Our final set of measurements will be on a Myrinet network with the same version of all the software. To be able to compare results, we will take all measurements again (ie both TCP/IP version of RMI and the FM version of RMI).
We develop a test library that consists of the following functions:
We expect that the time taken for send and receive should be similar for similar sized data, in which case we will analyze only one of them.
We run the tests in two cases:
This will give us an estimate of the network overhead in the current setup.
We run the test suite on our lab setup (Windows NT4.0, jdk-1.1.4, jpp-1.1) to obtain measurements of the system as it stands now. They are presented in the next section.
We were unable to run the server using the JIT compiler since it encountered problems while trying to bind with the rmiregistry. We also tried out several public domain JIT compilers for Windows NT: kaffe, Supercede, grok. We haven't been successful so far since some of these do not support JDK1.1 and RMI yet. From the postings in the rmi-users FAQ, we understand that the Symantec JIT compiler does not work with RMI either. So we are still exploring the JIT compiler to be used and plan to try out the JIT compiler bundled along with Microsoft's Internet Explorer 3.0+.
Once we settle on a JIT compiler, we have to ensure that the results are repeatable, and garbage collection and any other such factors do not affect our results.
All the networking in the Java RMI interface is handled by the java.net library. While generating stubs and skeletons, RMI sets the ``RMISocketFactory'' to decide the type of sockets that will be used. If we can modify this to use a different networking library, the RMI system will use the FM socket library as the default instead of the TCP/IP one. This is the approach we plan to take.
We will need to develop a new socket interface over FM, and generate a DLL that can be linked into the application and provide the necessary socket functionality. The library will be linked to java using the Java Native Interface (JNI).
As an overall assessment of the RMI performance, we measured the elapsed time required to make a total of 10,000 invocations for sending and receiving different number of bytes between two machines running NT. The performance was measured for two cases: with the client and server running on the same machine (using the loopback interface) and with with them running on different machines on the same Ethernet LAN. The differences in timings between these two cases gives us an idea of the network overhead in RMI.
The experiments were conducted at a time when the machines as well as the network were unloaded. These experiments were conducted by running both the client and the server using the JVM as well as running the client using a JVM and the server using the JIT compiler from Sun. The results from the former are tabulated in Table 1 and Table 2 and those from the latter are tabulated in Table 3 and Table 4.
| Function} | # of args | # of results | Local Call(sec) | Remote Call(sec) |
| Null | 0 | 0 | 15.172 | 17.345 |
| sendbyte | 1 | 0 | 15.332 | 17.054 |
| recvbyte | 0 | 1 | 15.292 | 17.105 |
| Function} | # of bytes (args) | # of bytes(results) | Local Call(sec) | Remote Call(sec) |
| senddata | 1440 | 0 | 25.537 | 34.209 |
| recvdata | 0 | 1440 | 27.099 | 33.408 |
| Function} | # of args | # of results | Local Call(sec) | Remote Call(sec) |
| Null | 0 | 0 | 12.377 | 17.976 |
| sendbyte | 1 | 0 | 12.488 | 17.735 |
| recvbyte | 0 | 1 | 12.438 | 17.746 |
Table 4: Passing fixed length byte arrays
| Function | # of bytes (args) | # of bytes (results) | Local Call(sec) | Remote Call(sec) |
| senddata | 1440 | 0 | 21.651 | 35.511 |
| recvdata | 0 | 1440 | 22.473 | 33.988 |
The timeline is attached.
Bi-Weekly meeting with Prof Chien on Wednesdays at 2pm.