• Aucun résultat trouvé

General computations

Dans le document Message-Passing Systems (Page 108-112)

6.2 Termination detection

6.2.1 General computations

The distributed computation of interest in this section is initiated by any subset N0 of N and progresses through the exchange of messages generically referred to as comp-msg's. We take G to be a strongly connected directed graph, so that the case of an undirected G can also be handled in a straightforward manner. Our approach to termination detection in this section is based strongly on Algorithm A_Record_Global_State, and then we assume that G's edges are FIFO.

The approach we take goes essentially as follows. Before going idle, a node that "suspects"

it may have terminated initiates the recording of a global state. This suspicion is of course highly dependent upon the particular computation at hand, so we let it be indicated by a Boolean variable suspectsi at ni ∈ N. This variable is set to either false or true after, in accordance with Algorithm A_Template, ni has computed and possibly sent out some messages, either spontaneously if ni N0 or upon the receipt of a comp_msg (the initial value assigned to the variable is then unimportant). A global state in which suspectsi = true for all ni ∈ N does not imply global termination in that global state (because there may be

messages in transit), but we assume that global termination does imply that suspectsi = true for all ni ∈ N.

This recording of a global state proceeds entirely along the lines of Algorithm

A_Record_Global_State, that is, through the exchange of marker messages, and may as in that case be initiated concurrently by more than one node if for such nodes the suspects variables become true concurrently. When the recording of a global state is completed at ni, it then sends what it recorded to n1 (the assumed leader), which, upon receiving similar information from every node, checks whether the global state that was recorded indicates global termination. If it does, then a terminate message is broadcast by n1 to all of G's nodes, which then terminate.

Clearly, this procedure may be wasteful because a node ni that receives a marker when suspectsi = false should not propagate the marker's onward because the resulting global state cannot possibly indicate global termination. Aborting a global state recording is not something we have considered before, and there are a few implications to be considered. In our present context, the two problems that result from prematurely aborting a global state recording are the need to terminate the aborted recording properly and the possibility that n1

receives incomplete global states which must somehow be dealt with. We tackle both problems simultaneously, as follows. Every marker is sent with a tag, and every node keeps record of the greatest tag it has seen so far in a marker. If the tag a node attaches to a marker it sends out when initiating a new global state recording is strictly greater than any tag it has ever seen, then the rule for participating in global state recordings is very simple. A node only participates in a new global state recording if the tag accompanying the corresponding marker is strictly larger than every tag it has known of and in addition its suspects variable is true. Any marker received in different circumstances is ignored. The reader should note that this provides the necessary control for terminating aborted global state recordings, and also allows n1 to discard useless information it has recorded if the information that it receives from nodes on a recorded global state is itself accompanied by the tag that was attached to the marker's during the recording of that global state.

This strategy is adopted by Algorithm A_Detect_Termination, given next. In addition to the Boolean variable suspectsi, node ni needs some of the variables employed by Algorithm A_Record_Global_State as well. These are , initialized to , for each

node nj ∈ I_Neigi, and the Booleans recordedi, and , all initialized to false.

The maximum tag ni has seen in a marker is denoted by max-tagi. Finally, another Boolean variable, terminatedi, initially equal to false, is used by ni to indicate whether a terminate message has been received from n1 (terminated1 is used to indicate that global termination has been detected). This variable, in Algorithm Task_t, can be used to exit the repeat…until loop.

In Algorithm A_Detect_Termination, marker messages are sent as marker(t) messages, where t is a positive tag. The initial value of max_tagi is then zero.

Algorithm A_Detect_Termination:

Variables:

suspectsi;

= for all nj∈ I_Neigi; recordedi = false;

= false for all nj ∈ I_Neigi; max_tagi = 0;

terminatedi = false;

Other variables used by ni, and their initial values, are listed here.

Listing 6.6 Input:

msgi = nil.

Action if ni ∈ N0:

Do some computation;

Send one comp_msg on each edge of a (possibly empty) subset of Outi;

if suspectsi then begin

max_tagi:= max_tagi + 1;

recordedi:= true;

Send marker(max-tagi) to all nj ∈ O_Neigi

end.

Listing 6.7 Input:

msgi = comp_msg such that origini(msgi) = (nj → ni).

Action:

Do some computation;

Send one comp_msg on each edge of a (possibly empty) subset of Outi;

if recordedi then

if not then

:= ∪ {msgi};

if suspectsi then begin

:= for all nk ∈ I_Neigi; := false for all nk ∈ I_Neigi; max_tagi := max_tagi + 1;

recordedi := true;

Send marker(max_tagi) to all nj ∈ O_Neigi

end.

Listing 6.8 Input:

msgi = marker(t) such that origini(msgi) = (nj → ni).

Action:

if t = max_tagi then

:= true;

if t > max_tagi then begin

max_tagi := t;

:= for all nk ∈ I_Neigi; recordedi := false;

:= false; for all nk∈ I_Neigi; if suspectsi then

begin

:= true;

:= true;

Send marker(max_tagi) to all nk ∈ O_Neigi

end end;

if for all nk∈ I_Neigi then

Send for all nk ∈ I_Neigi, along with max_tagi, to n1.

Listing 6.9 Input:

msgi = terminate.

Action if ni ≠ n1: terminatedi := true.

This algorithm is, in essence, a blend of Algorithm A_Template on comp_msg's and Algorithm A_Record_Global_State. Specifically, (6.6) is (2.1) enlarged by (5.11) to initiate a global state recording if suspectsi = true. Similarly, (6.7) is (2.2) enlarged by (5.11) and (5.13), respectively to initiate a global state recording if suspectsi = true and to record the messages that comprise an edge's state in the global state being recorded. Finally, (6.8) is (5.12), conveniently adapted to abort ongoing global state recordings upon receipt of a marker carrying a tag strictly greater than the greatest one the node has seen.

Let us consider the functioning of AlgorithmA_Detect_Termination more carefully. Node ni

performs computation, possibly with the sending of somecomp_msg's, either spontaneously, if it is a member of N0, or upon receiving a comp_msg. In the former case, ni may also start a global state recording with marker's carrying a tag increased by one with respect to the greatest tag it has seen (cf. (6.6)). In the latter case, ni may record the comp_msg as part of the state of the edge on which it was received, or it may, as in the other case, start the recording of a global state, after re-initializing its variables related to the recording of global states, or it may do both, in which case the recording of comp_msg will have been in vain (cf.

(6.7)). Notice that these two possibilities account for all the opportunities ni has to start a global state recording, and in both cases such a start is done with properly initialized variables. Because ni maintains only one set of variables for global state recording, it may only participate in the recording of one global state at a time, so that upon initiating its

participation in a new recording it must quit its participation in whatever recording it may have been participating so far. This is the reason for variable re-initialization in (6.7) if suspectsi

becomes true.

The other occasion in which ni may have to forsake its current participation in a global state recording is upon receiving a marker(t) such that t > max_tagi. When this happens, ni re-initializes its variables related to global state recording and, if suspectsi = true, joins in the new global state recording, as in (6.8). If t < max_tagi, then the marker(t) is ignored (it clearly belongs to a long-forsaken global state recording), whereas if t = max_tagi, then it may correspond to a recording in which ni is currently engaged. The receipt of a marker in (6.8) may also imply that ni has finished its participation in the current global state recording, and then what it recorded is sent to the leader for analysis (it only sends the edge states, though, because by (6.6) through (6.8) ni does not participate in the recording of a global state if its local state is anything other than suspectsi = true). This information is sent to the leader along with the tag with which it was recorded, and the leader, upon having received information with the same tag from all nodes, decides whether global termination has been reached, in which case a terminate message is broadcast to all nodes. We have omitted from the algorithm the actions for n1 to perform its role as a leader, and we have also omitted any specific mention to how the broadcast of the terminate order is performed. Filling in these blanks should pose no difficulty, though, especially after our discussion of information propagation in Section 4.1 (cf. Exercise 3). The response of ni ≠ n1 to the terminate message is in (6.9).

The correctness of Algorithm A_Detect_Termination is based on Theorem 5.5 on the correctness of Algorithm A_Record_Global_State if G is strongly connected with FIFO edges, and on the following observation. Suppose a global state in which global termination holds does exist. As we assumed earlier in this section, at this global state it must hold that suspectsi = true for all ni ∈ N, so we may consider the greatest value of max_tagi over all of

N when the corresponding suspectsi's became true for the last time. The nodes at which this greatest value occurred must by (6.6) and (6.7) have initiated a global state recording concurrently, and by (6.8) this global state recording must have been propagated by all nodes. Consequently, at least one global state recording is carried out to completion, including the recording of a global state in which global termination holds.

Before leaving this section, a couple of observations are worth making. The first observation concerns obvious possible simplifications to Algorithm A_Detect_Termination, especially in what concerns the reports that are sent to n1. We elaborate no further on the issue, but encourage the reader to further investigate it (cf. Exercise 4). The second observation relates to the treatment of computations for which global termination does not hold at any global state. Computations like these appear in Section 10.2, and because they ordinarily do not terminate by themselves, what we seek is to force their termination by detecting termination-related properties that appear in some global states with special characteristics. As we discussed briefly in Section 5.2.2, in this case a leader can be employed to search for the special global states, and upon finding one of them for which the desired termination-related properties do hold the leader then directs all other nodes to terminate. In Chapter 10, we address these issues with more detail.

Dans le document Message-Passing Systems (Page 108-112)