next up previous
Next: Summary Up: ICC++: A C++ Dialect Previous: Performance

Discussion and Related Work

 

C++ Compatibility

  Our intention in designing ICC++ was to avoid gratuitous incompatibilities with sequential C++ programs. As a result, large sections or even entire C++ programs can be incorporated into ICC++ programs with purely mechanical changes. However, there are two important differences between C++ and ICC++. First, ICC++ eliminates pointer arithmetic, requiring explicit array type declarations for collections (encapsulated arrays). This change ensures that pointers are not used to point into arbitrary locations, which would reduce the effectiveness of aggressive compiler analysis techniques. Second, ICC++ objects have well-defined concurrency control semantics. These semantics cen result in deadlock for indirect recursive calls, which is clearly a departure from sequential C++ semantics. External C++ functions can be called easily from ICC++, and seamless interoperability will be achieved with the proposed CORBA IDL binding of C++. In summary, we believe ICC++ will allow many programs to be migrated from C++ with moderate effort, and the resulting ICC++ programs can serve as a single source for both sequential and parallel code versions. To do this, we are building a simple ICC++ to C++ translator which will support all but the guaranteed concurrency features.

Derivation and Concurrency

  The inheritance anomaly arises from the interaction of object state changes and the explicit expression synchronization in program code structures, and is a well-known problem in the concurrent object community [29]. Because concurrent programs often include explicit synchronization, the anomaly makes reuse of class definitions and method code through derivation problematic. In general, subclass designers must examine the implementations of the base classes and perhaps redesign them before extending them with instance variables or member functions. To illustrate the problem, Program 1.7 uses ABC++ syntax. Derivation from the BoundedQueue class is limited by problems with the inheritance anomaly.

class elt;
class BoundedQueue : public Pabc {
  int in, out;
  static int QUEUE_BOUND;
  elt buf[QUEUE_BOUND];
 public:
  void put(elt &i) {
    buf[in++] = i;
    if (in == QUEUE_BOUND) in = 0;
  }
  elt get(void) {
    elt& i = buf[out++];
    if (out == QUEUE_BOUND) out = 0;
  }
  void main(void) {
    in = out = 0;
    if (in == out)
      Paccept(put);
    else if (out == (in + 1) Paccept(get);
    else
      Paccept(put,get);
  }
};

Concurrent base class 

In order for a subclass to add new methods, the BoundedQueue::main function must be rewritten. More important, rewriting main and preserving correctness requires understanding all of the methods defined for the queue. As illustrated in Program 1.8; the addition of BoundedQueued2::put2 requires the redefinition of BoundedQueue2::main, and clearly the implementor of BoundedQueue2 must know about all of BoundedQueue's methods to write the new main.

class BoundedQueue2 : public BoundedQueue {
  void put2(elt &i, elt &j) {
    buf[in++] = i;
    if (in == QUEUE_BOUND) in = 0;
    buf[in++] = j;
    if (in == QUEUE_BOUND) in = 0;
  }
  void main(void) {
    in = out = 0;
    if (in == out)
      Paccept(put,put2);
    else if (out == (in + 1) Paccept(get);
    else if (out != (in + 2) Paccept(put,put2,get);
    else 
      Paccept(put,get);
  }
};

Concurrent derived class 

Adding new instance variables cause similar problems; in general all of the methods must be considered to ensure consistency. In many cases, a significant fraction of the methods may have to be rewritten, dramatically circumscribing the benefits of reuse. Finally, if a concurrent object-oriented language uses method-oriented specification of synchronization (no main function), this merely distributes the concurrency control structure and does not eliminate the problem.

ICC++'s object consistency model addresses the inheritance anomaly by defining the synchronization required between methods solely in terms of local sequentializability. And the code needed for synchronization among methods is determined by the system independently for each class, regardless of inheritance. Therefore derivation which adds methods, overrides methods, or adds instance variables just results in further synchronization constraints for the derived class. This allows classes to be extended and reused, but focusing only on local sequentializability provides a weaker consistency model than traditional Actors.

Parallel C++ Efforts

  The wide variety of parallel C++ efforts can be divided into two general categories: data-parallel and task-parallel extensions. Data-parallel extensions of C++, such as pC++ and C** employ collections or aggregates [13, 12, 36] to describe parallelism, using objects to increase the flexibility of the data-parallel model. However, data-parallel languages cannot easily express more irregular and client-server forms of concurrency, which limits the domain of applications in which they can be used. Rewriting sequential programs as efficient data-parallel programs often requires significant reorganization, as aligning data structures to create efficient parallel collections can cause major program structure disruptions.

The diversity of task-parallel extensions of C++ is much greater; these can be loosely categorized based on their treatment of objects and concurrency. First, there are languages (or libraries) that introduce concurrency without changing the object model, such as Presto [8], COOL, CHARM++, and ES-KIT++ 38]. These languages require the programmer to build concurrency control by convention, providing no language support for object consistency or building abstractions from larger collections of objects. Second, many languages (or libraries) use objects to encapsulate concurrency, exploiting objects to represent data parallel collections or coarse-grained tasks, including DOME [7] and Mentat. In these languages, concurrency control may be expressed explicitly in a library or implicitly via data flow dependences.

The encapsulation of concurrency in these models is generally expensive and is used only sparingly for coarse-grained abstractions. CC++ provides atomic functions, but as these are not allowed to access other objects, they can only be used to build single-object data abstractions. Finally, the meta-architecture of MPC++ provides a programmable language syntax and semantics capable of supporting both data parallelism, control parallelism, and complex synchronization structures (as in Concurrent Aggregates, ABCL, or other Actor-based languages). However, because of its very flexibility, it is difficult to make specific comparisons to MPC++, and the impact of the meta-architecture of implementation efficiency is as yet undetermined [22]. In contrast to these language designs, ICC++ provides an object model that integrates both concurrency control and concurrency guarantees, and is extensible, supporting concurrent data abstractions built with several objects.

Another important distinction among parallel dialects of C++ is the scheduling or concurrency guarantees provided by the language. Data-parallel languages have sequential semantics, so the data- parallel C++'s provide no concurrency guarantees. Of the task- parallel C++ dialects, CHARM++ provides explicit control over scheduling [23], and CC++ [10] provides guaranteed fair thread scheduling for all par constructs.

In contrast, ICC++ emphasizes potential concurrency, and gives concurrency guarantees in a data-oriented form. This gives the implementation freedom to select an execution granularity (thread sizes) for efficiency, facilitating efficient sequential execution. If necessary, ICC++ provides a spawn statement (explicit thread creation) which can be used to guarantee concurrency 17].

Other Concurrent Object-Oriented Languages

  Though there are a wide variety of non-C++ concurrent object-oriented languages [42, 13, 4, 30, 25, 5], we focus on Actor-based languages [1] because they closely integrate the object-oriented message passing and concurrency. This allows programmers to reason at the level of object operations and not memory references. However, the actor model provides neither a clear basis for building data abstractions from collections of objects, nor concurrency guarantees. In contrast, ICC++ includes both concurrency guarantees and language support for building abstractions from ensembles (structures or collections) of objects. In addition, to date, most of the Actor-based languages have been inefficient in their implementation. Recent work in our group [32, 35, 34] and others [39] demonstrates that this need not be the case.

Illinois Concert Project

  ICC++ is the second language supported by the Concert project (the first is Concurrent Aggregates [12, 13]. ICC++ is fully described in [15, 17]. For the past five years, our group has investigated the design of concurrent object-oriented programs [16, 12, 11], and has built application programs totaling over 40,000 lines. In addition, we have studied the design of concurrent object-oriented languages and their implementation exploring a variety of aggressive compiler and runtime techniques. The design of ICC++ was based on this experience, and an extensive survey of parallel object-oriented approaches.

The Illinois Concert system is a complete development environment for irregular parallel applications [14]. The Concert system supports a concurrent object-oriented programming model and includes a globally optimizing compiler, efficient runtime system, symbolic debugger, and an emulator for program development. This system employs novel compiler techniques [32, 35, 34] and runtime techniques [34, 24] to achieve efficient execution of fine-grained programs on both sequential and parallel platforms. The Concert system has demonstrated sequential performance matching C and surpassing C++ on demanding numerical benchmarks such as the Livermore Kernels [35], and superior speedups and absolute performance on a parallel molecular dynamics application called CEDAR [9] on the Cray T3D [18, 31] and Thinking Machines CM-5 [40]. Our ICC++ compiler exploits the same aggressive compiler analysis and code optimization.


next up previous
Next: Summary Up: ICC++: A C++ Dialect Previous: Performance

Julian Dolby
dolby@cs.uiuc.edu