To measure the performance improvement brought about by object inlining, we compiled several our chosen object-oriented benchmark programs using our Concert compiler both with and without object inlining; to provide calibration, we also compiled the same programs with G++; G++ was used with -O2 for both the C++ programs and the Concert compiler generated code. Measurements were taken on a SparcStation 20/60, and are the average of 10 runs.
Figure 17 is normalized to the performance of Concert code without object inlining; it shows that the Concert System, without inlining, gives roughly similar performance to G++ except on polyOver. Our use of C++ as a portable assembly language, unambitious array optimization, and, naturally, the cost of accessing uninlined objects are the major contributors when Concert is slower; a highly-tuned memory allocator is the major reason Concert is faster in Silo.
Figure 17: Object Inlining Performance
The performance gain of object-inlining is most dramatic on polyOver: both the array and list versions are roughly three times as fast as without it. This code boasts the most aggressive use of inlining: polygons are inlined into arrays, tightening inner loops; result polygons are merged with the cons cells of their list, reducing dynamic allocation; and a list of cons cells is inline allocated, which also tightens loops. The combining of resultant polygons and cons cells produces tighter data-structures than the C code, which is why the array version is faster than G++; the list version should be faster for the same reason, but low-level code generation issues in the Concert compiler frustrate this.
OOPACK is nearly twice as fast with inlining as without, and it ends up substantially faster than G++. This is due, in part, to inlining laying out the complex number array as parallel arrays (Fortran style) rather than by object, which seems to improve cache performance for this code. The 14% gain for Silo comes primarily from reducing dynamic allocation by merging cons cells with their data. Richards creates very few objects, and the 5% gain it shows derives partly from eliding pointer dereferences. Both Silo and Richards benefit from improved object field caching with object inlining.
Thus, object inlining overall makes code run up to three times as fast as without inline allocation and matches the performance of code with inline allocation specified by hand.