Traditional object-oriented languages (e.g. SmallTalk [[13]])sport a simple, uniform, abstract programming model; all objects are accessed via references and all calls are dynamic message sends. This simplifies programming, as the programmer faces a single model of object behavior, and because different portions of programs can be isolated behind opaque interfaces. This abstract model, however, introduces overhead: even simple field accesses become dynamic dispatches, indirection through references is required to accesses every object, and individual functions shrink [5]. Hybrid languages like C++ [11] provide multitudinous low-level features (e.g virtual vs non-virtual functions, inline directives, and explicit stack, inline or heap allocation) so programmers can manually reduce overhead by removing unused abstraction.
However, the original notion of a clean, simple semantics is re-emerging in recent object-oriented languages (e.g. Java [27], NewtonScript [3], Cecil [7]). This leaves eliding abstraction overhead to aggressive implementations. Dynamic dispatches are optimized statically by type inference [1, 6, 21, 23], dynamically by inline caching [16] or with hybrid approaches like type feedback [17]. Static or hybrid type analysis is combined with method specialization [10, 24] to allow inlining, removing the small functions common in object-oriented code.
The combination of static type analysis and specialization also permits inline allocation of objects within containers, thereby eliminating many object references and reducing object allocation. We present object inlining, the first fully automatic optimization for doing inline allocation in an object-oriented language. We exploit the power of analysis and cloning that can handle data flow through object state [23, 24] to inline allocate child objects even for polymorphic containers and to systematically exploit such allocation wherever the child objects are used.
To make the problem concrete, consider the example in
Figure 1
. A direct implementation is shown in
Figure 2(a); inlined allocation, as in C++, would
produce Figure 2(b). Accesses to coordinates of
rectangles are cheaper in the inlined implementation, requiring one
dereference fewer. Furthermore, cache performance is likely to
benefit. Inline allocation also reduces object allocation costs, since
the sub-objects are allocated with the container. This especially
benefits languages like Java where objects have conceptually infinite
extent, necessitating heap allocation in general. Note that methods
such as Point::area in Figure 4 could cache
fields with inline allocation of Points since p and
this could not be aliased when the method is called from
Rectangle::area.
Figure 2: Two Rectangle Representations
Inline allocation creates a correspondence between container and containee that our analysis permits to be exploited as alias information. An example use of this is caching object fields in registers: more precise aliasing information concomitantly enables more thoroughgoing register allocation of object state as noted above. Furthermore, treating inline allocation as a global program transformation, as we do, allows customized data layouts, such as parallel arrays for an array of objects. We evaluated object inlining on several object-oriented benchmarks, and found it making programs up to three times as fast as without inline allocation, and matching code with inline allocation done by hand.
In subsequent sections, we first introduce our running example (Section 2) and provide background (Section 3). We next cover our analysis (Section 4) and program transformation (Section 5). We then evaluate our optimization (Section 6), discuss related work (Section 7) and conclude (Section 8). We consider future directions in Section 9.