• Aucun résultat trouvé

3.3 Scheduled Communication

4.1.2 Implementation Types

This section covers the several high-level ways that an implementation can be written for sched-uled routing to provide the necessary functionality for an operator, and provides some insight into how the compiler can choose among them.

For primitive operator implementations (as opposed to those that are simply Lisp code), the implementation’s interface to the compiler is specified entirely through a single structure that includes a set of functions for characterizing and including the implementation. Table 4.1 presents the important functions that make up the structure (also included in the structure, but not shown, are miscellaneous flags fields and some other book-keeping data).

Name Description

metric suitability of implementation to operator replace return replacement operators (optional) streams return streams to route (optional) gen router set up schedules and VFSMs (optional) gen io emit I/O functions (read/write or func) gen load emit load-time startup code (optional) Table 4.1: Interface functions for operator implementations

The one function in the table common to all implementations is the metric function. This function is called from the compiler with a pointer to an operator and a by-reference structure to be filled in with information on how this implementation will perform on the given operator. A metric function can simply return ‘false’ if it can’t handle the given operator; this may be for a reason as simple as that it is the wrong type of operator, or that the implementation can’t handle run-time operand values, or for more complex reasons (e.g., a dense broadcast implementation would reject an operator with a sparse subset).

If the implementation can handle the given operator, it fills in the ‘implementation infor-mation’ structure. This structure holds best guesses as to the time to load the operator; the time to reload it on subsequent passes if one argument or the other has changed; the startup time for a message; and the inter-word time for a message. It also holds the number of I/O words required to run to completion, the number of VFSMs predicted to be used in each pipeline, the number of interface registers, and some other flags. All this information is made available to the compiler’s search engine for trying to determine the best implementations to pick.

The following subsections discuss the four main high-level techniques for implementing operators: as dynamic-routing, stream-alias, schedule-generator, or replacement implementa-tions.

Dynamic-Routing Implementations

Dynamic-routing implementations correspond to normal primitive implementations on a dy-namic router. Such an implementation defines a streams function in its interface, which when passed the operator returns an array of streams necessary to handle the communications for this implementation of that operator. For example, a simplestreamoperator with just one instance would return the single appropriate stream from this operator; a reduce operator running in parallel on

n

rows of an array would return

n

reduction trees’ worth of streams. For a dynamic implementation, the streams are used to guess traffic rates, as well as for dependency analysis.

The streams in a dynamic implementation will be single-source, single-destination streams;

however, the sources and destinations may be specified either as compile-time or as run-time with a range of possible values. (A generic ‘value’ structure is used within the compiler to represent both compile-time and run-time values.)

Dynamic implementations can be used for a dynamic-router substrate, unlike stream-alias or schedule-generator implementations, which are specific to scheduled routing. They can also be used with scheduled routing, however, at a performance penalty; this is discussed in detail in Section 4.2.

Stream-Alias Implementations

The basic implementation type for scheduled routing is the stream-alias implementation. It also uses the streams function; the returned streams are used by the scheduled-routing back-end to route actual streams, as well as to perform some depback-endency analysis. The streams are single-source but may be multiple-destination, implemented as a multicast in the router; how-ever, the endpoints must be specified at compile-time so they can be routed at compile-time.

If the implementation needs to edit the schedule generated by its streams, it may include a gen router function as part of its definition. This function is used to specify router-level information, above and beyond that given by the streams returned by the streams function.

For a stream-alias implementation, a gen router function typically isolates the portions of the router used by a given operator’s streams, and modifies the router information (e.g., VFSM sources and destinations) as necessary. For example, a stream with a run-time source can be implemented with a tree of point-to-point streams converging on the compile-time destination.

Once the streams are routed, the gen router function can update the router information such that data is forwarded automatically from one stream to the next, taking data to the destination without requiring intervention from the processors.

In addition to this compile-time modification, a stream-alias implementation may be run-time modifiable. This is a powerful technique, since a stream that is modified at run-run-time can be routed along with streams from other operators in a single phase, but such a stream is more powerful than a simple point-to-point stream. These operators rely on the gen load function specified in the implementation definition. This function supplies additional code to be executed by the load function returned by COP as part of the per-operator code. For fully compile-time operators no such function is specified, since no load-time changes to the router are necessary. The ‘Run-Time Specification’ example of a run-time broadcast in Section 4.1.1 is an example of load-time modified stream-alias implementations: the gen load function

han-dles modifying the router to accept data from the local processor, or to forward data from its neighbors.

Schedule-Generator Implementations

The other fundamental implementation type for scheduled routing is the schedule-generator implementation. The implementation has no streams function, and instead relies solely on the gen router function to generate the necessary VFSMs and schedules from scratch.

These implementations cannot be scheduled in parallel with other implementations, since each schedule-generator implementation assumes it has complete control over the schedule of all the nodes in its phase.

Schedule-generators are useful for a variety of reasons.

Speed. While a alias implementation has to depend on the whims of the stream-routing code to get good performance, a schedule generator can include hand-optimized code to move data as fast as possible through the mesh.

Power. A schedule generator can use every aspect of the underlying router’s capabili-ties, rather than trying to express the desired functionality using the streams abstraction.

For example, thesch_prefiximplementation sets up two tightly-coupled subphases where nodes communicate with neighbors in different subphases and switch between them as the prefix runs.

Run-time. A schedule generator can generate arbitrary schedule code at run time. Thus (as discussed above for ‘Run-Time Operator-Specific Scheduling’), a run-time permu-tation can be provided to the implemenpermu-tation, and it will create a schedule at run time, download it, and use it, rather than relying on a dynamic routing layer or trying to lay down

n

2 streams for all possible permutations.

All of the above implementations rely on the gen io function to return the processor code necessary to make the implementations work. This function implements the read and write, or the func, functions that the processor calls. For simple operators likestream, these functions may just read or write to an interface register and return. More complex operators like reduce and prefix include code for the necessary semantics in gen io.

Replacement Implementations

The final class of implementations consists of replacements. This class of implementation defines the replace function in the definition structure. When called, it returns an operator (or group of operators) that will carry out the function. For example, theallreduceoperator is implemented this way by mapping it toreduce+broadcast.

A related operator ‘implementation’ is the stub implementation. This is typically an im-plementation that is used on one of the operators returned by a previous replacement, and it calls the other operators in the replacement group. For example, abarrieroperator may be implemented as anallreduce along with a piece of stub code that calls theallreduce func function with an arbitrary argument and discards the result.