ICC++ provides two primary concurrency constructs: conc blocks and conc loops. Both are extensions to standard C++ syntax and are designed to enable incremental parallelization of existing code. Furthermore, both introduce potential concurrency, not guaranteed concurrency, and thus provide the implementation the freedom to serialize where appropriate for efficiency.
A conc block is a compound statement which defines the following partial order on its constituents: where indicates that statement depends upon . Statements in a conc block are executed such that if , then will be executed completely before is begun. depends upon if any identifier which appears in both and is assigned in one of them, or if contains a jump statement (i.e., a break, continue, or goto). The first rule allows concurrent operations upon objects via pointers, while preserving sequential semantics for local variables, including the pointers themselves. Thus:
conc { a->b(); a->c();}
would be concurrent but:
conc { a = new Foo(); a->b(); }
would be sequential. The second rule gives jump statements a natural
semantics by serializing the conc block around them. A
conc block is finished when all statements within it have
completed. For concurrency control purposes, nested blocks are treated
as single statements within the block in which they are nested.
These rules are designed to expose concurrency while preserving sequential semantics where it is essential. Serializing stat changes for local variables allows existing compound statements that declare and use local variables to be transformed into conc blocks, exposing concurrency for object invocations within them. This exploits the distinction between builtin types (e.g. ints, floats, etc.) and user-defined objects in C++. Builtin types cannot provide concurrency control, so their uses are serializing, but user-defined objects all have concurrency control in ICC++ so operations on objects are allowed to execute concurrently. Similarly, providing natural semantics for jump statements allows conc to be applied to existing code where unstructured control is used.
By specifying potential concurrency rather than guaranteed concurrency, the conc block provides implementation flexibility. Statements in a block need not be scheduled fairly, enabling a compiler to serialize sections and inline procedure calls where appropriate for efficiency. In this manner, the implementation can choose appropriate grain sizes for efficient execution on a variety of machines.
conc while (j > 10){
a->foo(j);
j -= 1;
}
is implicitly expanded to:
if (j > 10) conc {
a->foo(j);
j0 = j - 1;
if (j0 > 10) conc {
a->foo(j0);
j1 = j0 - 1;
}
}
The design of concurrent loops is similar to that of conc blocks. Respecting scalar variable dependences and permitting control flow operations within loops simplifies the addition of concurrency. As with conc blocks, the conc looping constructs express potential concurrency; no guarantees of actual concurrency are provided. This allows actual implementations considerable latitude in scheduling iterations; in particular, subsets can be run sequentially.
void quicksort(poly_t vec[], int l, int r){
if ((r-l)>THRESHOLD) {
int i = partition(vec,l,r);
quicksort(vec,l,i);
quicksort(vec,i+1,r);
} else {
simple_sort(vec, l, r);
}
}
A parallelized quicksort requires only the addition of a single conc annotation, and is excerpted from Section 1.6.
void quicksort(poly_t vec[], int l, int r){
if ((r-l)>THRESHOLD)
conc {
int i = partition(vec, l, r);
quicksort(vec,l,i);
quicksort(vec, i+1, r);
}
else {
simple_sort(vec,l,r);
}
}
The quicksort example illustrates how conc can often be added without requiring structural program changes. In this case, the conc block respects the local dependence for i while allowing the recursive quicksort calls to execute concurrently.
The code below (also taken from the polygon overlay code) presents an example of how conc looping structures can be used. Again, the concurrency can be introduced with no disruption of the surrounding program.
conc for (il=0; il<leftlimit; il++){
conc for (ir=0; ir<rightlimit; ir++){
newLn = polyOverlayLn(left->vec[il]),right->vec[ir]));
if (newLn != NULL){
newLn->link = outList;
outList = newLn;
}
}
}
The loops above represent a naıve parallelization of the simplest polygon overlay algorithm. Because all of the polyOverlayLn() calls are independent, they could be executed in parallel if they were the only statements in the loop body. However, the if statement accumulates the new polygons into a list, serializing the innermost loop. To really generate concurrency, the code would have to accumulate the new polygons in parallel (see Section 1.6).
The two loops below illustrate the basic uses of conc looping constructs.
conc for (i=0; i<SIZE; i++){ // loop #1
elements[i].doit(partners[i]);
}
conc while (i != NULL){ // loop #2
i->element_update();
i = i->next;
}
Loop #1 is parallel across all members of elements[i] and partners[i]. In effect, the semantics are identical to Fortran-90's doall: all loop iterations are executed in parallel. Loop #2 runs down a list, triggering calls to element_update() for each element. The semantics of this loop are similar to doacross in Fortran-90: loop iterations are partially overlapped, subject to the data dependences. In this case, the calls to element_update will be concurrent, but will be launched one at a time as loop progresses down the list. This shows the utility of enforcing scalar dependences, but with renaming semantics to expose concurrency. In both cases, the loops terminate when all of their iterations have completed.