Next: Conclusions
Up: Automatic Inline Allocation of
Previous: Discussion
The idea of rearranging data layout to improve performance is
widespread, and the work most relevant to ours comes from three camps:
the object-oriented realm, the functional community and Fortran array
optimization.
The impetus for object inlining came from the
ability to specify such inlining by hand in hybrid-object languages
like C++ and Oberon-2 [14, 20] that distinguish between objects
and object references. But the idea of doing inline allocation
automatically is by no means new; the Emerald [4, 18] object system was designed with
this goal in mind. Indeed, they wanted to automatically generate
specialized representations of objects and transform the code that
used them. This is exactly what we do. However, their compiler was
not capable of the requisite analysis (Section 4). The compiler did do type
inference, and they could inline allocate immediate types when they
had precise type information. They could do this because immediate
types in Emerald, as in most object-oriented languages, are
values-their contents are important but their identities are not-so
they could avoid our analyses by simply copying in and out of the
field.
There has been much work in the
functional community on unboxing, in which specialized
representations are used to reduce storage and access overhead. Of
particular relevance to our work is unboxing in the presence of
polymorphism [19, 15]. In [15], Cordelia Hall and company present
a transformation for Haskell that, upon being told what variables to
unbox, generates specialized code to exploit the unboxing, even for
polymorphic functions. They generate specialized function versions
using a partial evaluator to propagate ``unboxedness'' through the
program. The propagation of ``unboxedness'' resembles the propagation
of tags our use analysis defines; however, it only works for
immediate types.
In [25], Shao et al. unroll linked
lists-essentially inline allocating tail pointers-in a functional
subset of ML. Their analysis works using refinement types [12] that distinguish odd and even
length lists. These refined types are propagated using an abstract
interpretation, with rules for the refined types generated by
cons statements. All functions that take list parameters are
cloned and specialized with all possible combinations of refinement
types for their list parameters. Our analysis is similar is using an
abstract interpretation, but our field tags are more general, as they
handle arbitrary object structures, rather than lists. Also our
scheme, because our abstract interpretation is interprocedural, only
analyzes specializations that are actually used, whereas theirs
analyzes all possible combinations.
Since this work was done for functional languages, their analysis can
be more straightforward than ours: it need not handle
structure assignment, i.e. data flow through containers. This provides
two simplifications: it eliminates the difficulties created by
assigning inlined object into other containers, and it obviates the
need to prove inline allocation is safe.
Optimizing array layout for cache performance [2, 29] also involves transforming data
layout. Arrays and loops that operate upon them are tiled;
that is, the inner loops are strip-mined to operate upon small
portions of the arrays that will fit in cache. Because array
iterations cannot be reordered arbitrarily-due to data dependence-the
arrays are rearranged in memory so these small portions exhibit
spatial locality. Altering a given array's layout naturally affects
all code that uses it, and finding all such uses is analogous to our
use specialization analysis, although much simpler analyses suffice
for common Fortran code. This alteration of array layout is typically
done only when it can be expressed as an affine transformation on the
array indices and the analysis is generally not context sensitive.
Next: Conclusions
Up: Automatic Inline Allocation of
Previous: Discussion
Julian Dolby
dolby@cs.uiuc.edu