• Aucun résultat trouvé

As is no doubt typical with compiler projects, the amount of future work depends only on the creativity of the compiler architect. In this section, a few interesting directions for future work are outlined.

Dynamic-Routing Support

An intriguing issue is the extent to which hardware dynamic routing support is beneficial for certain applications. Adding a hardware dynamic router to the scheduled router would allow for applications with short-lived dynamic communications to exhibit about the same level of performance as with a traditional dynamic router.

Schedule Length Selection

Currently the schedule length (or periodicity) is given by the user, either as a compiler switch or in the COP input. Although this simplifies the code generation, it also limits somewhat the flexibility of the compiler. A future extension to the compiler might consider a variety of criteria in an attempt to pick a good schedule size for an application:

the presence of schedule-generators that require a schedule to be an even multiple of their base schedule length;

the number of phases in the inner loop of an application, as compared to the total schedule memory size of the target;

or the likelihood of decreasing performance with decreasing schedule size, based on the number of streams passing through each node.

Though not easy, schedules of varying sizes might also by supported. For example, a phase with a large number of streams might benefit from a larger schedule, whereas the rest of the application might do fine with smaller schedules, leaving a smaller footprint in the router.

Trying to transfer to a phase with a longer schedule would require real synchrony from the nodes, rather than integer-multiple synchrony such as is now used. The phase change could consist of a reduction followed by a broadcast that switched the router’s schedule without processor intervention; this would require the schedule to be at the same address on all nodes.

Switching back to a phase with a length that evenly divided the current schedule’s length would not require any special handling.

Operators and Implementations

There are a wide range of interesting implementations to test. For example, currently a per-mutation with a run-time argument must be managed via online routing. As mentioned in Appendix B, schedules for a run-time permutation could be built at run-time, thus improving latency dramatically, assuming the operator was used enough to justify the time spent deriving a schedule for it.

Meta-implementations are discussed in Section 4.1.3. Future work should include an im-plementation of this concept, and testing to compare it against existing imim-plementations.

It might be worthwhile to make operators easier to add; for example, an ‘operator object’

could be defined that knew about the kinds of operands it took, and so forth. Then it would be as easy to add a new operator to the compiler as it is currently to add a new implementation.

Language Features

The definition of(runtime)could be extended to allow it to apply to the optional arguments presented to operators; for example, an array of streams could be specified with(runtime) maximum bandwidths, then at load time set some stream’s bandwidths to zero to give more bandwidth to streams that will actually be used.

Taking that one step further suggests the ultimate run-time feature: eval. The entire com-piler could be linked in with the application, and pass text strings holding operators to the eval function, which would return a tuple of function pointers corresponding to the necessary I/O functions and load function.

Appendix A COP

This appendix provides additional details on COP’s syntax and semantics.

A.1 Primitive COP Operators

(reduce LABEL FUNC DEST) (allreduce LABEL FUNC) (prefix LABEL FUNC)

functional. FUNC is a function that is called internally by the COP-generated code to ‘sum’

two data items. The function should be suitable for a reduction operator; it is passed values in the same manner as they are passed to the regular I/O functions. For array operands, the function should take three arguments (two inputs and an output); for non-array operands, the function takes two inputs and returns the output. DEST is the node that receives the answer.

It can be constant, runtime, or runtime distributed. For plain reduce, all the nodes but the specified destination will read some invalid result, which they should discard.

(broadcast LABEL SRC)

directional. This operator sends data to all the nodes in the selected set from the specified SRC node. It may be constant, runtime, runtime distributed, or runtime dynamic.

(collect LABEL DEST)

directional. This operation (in some sense the opposite of broadcast) allows any of the nodes in the subset to write the operator, and the destination is the only node that can read it.

The operator is only guaranteed to work correctly if one node at a time writes to it, although it may be that in some circumstances multiple nodes can successfully write to this operator. The DEST can be constant, runtime, or runtime distributed.

(barrier LABEL)

functional. All nodes in the subset will pause until all nodes have called the function.

(cshift LABEL DISTANCE) (eoshift LABEL DISTANCE)

functional. These operator types shift data within the selected subset. cshift(“circular”) applies shifts modulo the size of the subset, whereaseoshift(“end-off”) discards any shifts that leave the subset. The distance argument may be constant or runtime.

(permute LABEL PERM)

functional. This operator specifies a permutation of the subset. For compile-time argu-ments, PERM must be a list of nodes the same length as the subset size; the

n

th node in the list is the node to which node

n

sends its data. The argument may also be runtime (in which case all nodes must pass the permutation to the load operator) or runtime distributed (in which case each node only knows the node to which it is writing).

(stream LABEL SRC DEST)

directional. This operator connects the SRC to the DEST. Either of the nodes may be runtime; additionally, the DEST may be runtime dynamic.

(spread LABEL SRC) (gather LABEL DEST)

directional. With the spread operator, the SRC node writes once to each node in the subset (excluding itself); the other nodes read once to get the appropriate value from the SRC node. For thegatheroperator all the nodes in the subset (except DEST) write once, and the DEST node performs

n

,1reads to get all the values in order.