Synchronization by message-passing - Compiler Analysis to Implement Point-to-Point Synchronizat

Since performing synchronization to support data dependences is most applicable in the shared-memory programming model, the point-to-point synchronization constructs presented in the previous chapter also make use of a shared-memory model. How-ever, communication using a cache-coherent shared-memory model incurs significant overhead that can be alleviated by using explicit message-passing. Recall that synchro-nization is done in the shared-memory environment with the source processor setting a variable to some value and the sink processor spin-locking until the variable reaches a particular value. Because of cache support, spin-locking produces no network traffic and communication is done only after the source processor asserts the value, which causes an invalidation of the value in the cache of the sink processor.

Time

The run-time differences of the two models are illustrated in Figure 5-1. In scenario 1, the source processor asserts the synchronization after the sink processor begins check-ing for it. After the source asserts, four messages must be sent to update the caches before the sink processor sees the new value. Instead, synchronization through explicit sending of a message requires only the time for one message and additional overhead for message processing by the sink processor. Even when the source processor asserts much earlier than the sink begins checking as in scenario 2, using messages allows synchronization to be done with only the message-processing overhead rather than the request-reply round trip of the shared-memory paradigm. Figure 5-2 shows the differ-ence between execution profiles of shared-memory and message-passing synchronization mechanisms. Note that the gaps representing idle synchronization intervals are smaller under the message passing scheme. However, the computation blocks also include extra time required for processing incoming messages.

The same disadvantages caused by the request-reply protocol of the shared-memory interface allows it to be more flexible than one-way message sends. In cases where pro-cessor synchronization targets are not known at compilation, one-way messages cannot be used as easily. For a particular synchronization, if the source processor is depen-dent on a run-time variable, the sink processor can determine the source processor at run time and then issue a memory request to check the synchronization variable.

Ac-Shared-memory

Figure 5-2: Execution of Figure 1-3 using different point-to-point synchronization schemes

complishing the same task using message-passing requires either the same request-reply scheme to be followed or computation by each potential source processor of whether it is the one that would have gotten the request. Although one can imagine cases where this computation can be done at reasonable cost, this section only focuses on implementing synchronization through message-passing when processor relationships can be statically determined.

In the shared-memory model, each synchronization is done through reading and writing to a variable that is shared by the source and sink processors. This same general technique can be supported in the message-passing model by maintaining a copy of the variable on the sink processor. To assert a synchronization, the source processor sends the new variable value to the sink processor. Upon the reception of each message, the sink processor updates its local variable to the new value stored in the message. To check for synchronization, the sink processor spins until its local variable reaches the desired value. These mechanisms assume a machine model where incoming messages are handled through processor interrupts. If messages must be explicitly received, then the sink processor merely spins until a message is received that contains the desired value. Note that the requirements for avoiding deadlocks in the shared-memory model also allow a message-passing scheme to be implemented without danger of deadlocks.

In the above scheme, sink processors are computed with respect to source processors as opposed to the relationship in the shared-memory model where source processors are computed with respect to sink processors. This is essentially equivalent to finding the inverse of the processor target function^P(

p;

^T^b) of the last chapter. One can also imagine

inverting the temporal target function^T(

p;

^T^b) to compute the sink instance at which the synchronization is valid. Unfortunately, this inverse relationship does not completely generalize. As motivation, consider the case where the sink processor does not perform a synchronization in the shared-memory scheme. Even though the value asserted by the source processor is not read, the program still operates correctly. In the message-passing scenario, if the source processor does not send a message, but the sink processor requires one, then deadlock occurs. To be safe, the source processor must always send if there is a chance that the sink processor needs to check the result. Thus in cases where unknown expressions cause processor relationships to not be known, each source processor may be required to broadcast to all possible sink processors. Since these broadcasts may be very inefficient, the shared-memory interface provides a better solution in those cases.

Implementing synchronization through message-passing is only applicable to ma-chines that provide support for both the shared-memory and message-passing mod-els such as the MIT Alewife multiprocessor [Aga91]. On machines that only support message-passing, additional program transformations must be done to manage data shar-ing through explicit communication. Synchronization can be accomplished implicitly in such cases since processors are specifically aware of data sharing with other proces-sors. Other shared-memory multiprocessors also contain mechanisms to overcome the inefficiencies of supporting cache-coherent protocols. The Stanford Dash multiproces-sor [Len92] allows procesmultiproces-sors to write values directly to caches of other procesmultiproces-sors.

Although it is less general, such a mechanism may support point-to-point synchroniza-tion even better than message-passing schemes since no message-processing overhead is required.

Dans le document Compiler Analysis to Implement Point-to-Point Synchronization in Parallel Programs (Page 113-116)