Diffusing computations - Termination detection

6.2 Termination detection

6.2.2 Diffusing computations

In this section, we concentrate on detecting the termination of asynchronous algorithms for which N0 has one single member, assumed to be n1, the leader. G is in this section taken to be an undirected graph. The approach we described in the previous section is of course applicable to this case as well (and, for that matter, so is the approach of this section applicable to cases in which N0 is not a singleton—cf. Exercise 5), although in that case n1

would no longer be required to be a member of N0.

Distributed computations for whichN0 is a singleton are referred to as diffusing computations, because in such computations the causality that the flow of messages induces is "diffused"

from one single node. Of course this same intuition is also present in the cases of larger sets of initiators, but the denomination as a diffusing computation is not generally used in those cases because it would seem unnatural to say that the computation is diffused from the members of N0 when such a set can be arbitrarily large, possibly equal to N.

The algorithm we saw in Section 4.1.1 to propagate information with feedback from n1 is an example of diffusing computations, and, as we will shortly see, it is an example of particular interest in the context of detecting the termination of diffusing computations in general. In Algorithm A_PIF, the role played by n1 can be thought of as being not only that of the original propagator of inf, but also that of the detector of when the propagation has terminated throughout all of G. Although, as we remarked earlier in Section 6.2, in that algorithm every node can decide upon its termination rather easily (without the need for intervention from n1), the general idea of having a wave of information collapse back to n1 upon global termination is quite useful for computations whose termination cannot be detected so simply.

One of the main motivations to look for a different solution in the case of diffusing

computations, rather than just employ the general technique of the previous section, is the potentially very high complexity of the methodology realized by Algorithm

A_Detect_Termination. Although, due to its generality, we did not attempt any analysis when presenting that algorithm, clearly its complexity depends on the number of global state recordings it performs, so that the overall complexities may be too high. The specialized solution we study in this section, on the other hand, allows global termination to be detected without affecting the complexities of the computation proper.

The following is an outline of Algorithm A_Detect_Termination_D ("D" for Diffusing) . Every comp_msg is acknowledged with an ack message. Node ni maintains a counter expectedi, initially equal to zero, to indicate the number of ack messages it expects from its neighbors (we assume that expectedi is automatically increased whenever a comp_msg is sent by ni).

As in the case of Algorithm A_PIF, ni also maintains a variable parenti, initialized to nil, to indicate the origin of a comp_msg received in a special situation to be described shortly. The behavior of ni is then the following. Whenever ni receives a comp_msg and expectedi > 0, an ack is immediately sent in response. If, on the other hand, a comp_msg is received and expectedi = 0, then the ack is withheld and sent only when expectedi becomes equal to zero again (if it at all changes with the computation ni does in response to the arriving comp_msg,

otherwise the ack is sent immediately after that computation). The variable parenti is in this case set to point to the node that sent the comp_msg until the ack can be sent. We say that ni has reached a state of tentative termination, or that ni has tentatively terminated when expectedi becomes zero and the pending ack, if any, is sent to parenti. This condition may, however, change many times during the computation, for expectedi may again acquire a positive value as a consequence of the reception of a comp_msg. Global termination is detected when n1 has tentatively terminated, which in the case of n1may happen only once.

The resulting algorithm is Algorithm A_Detect_Termination_D, presented next. As in the previous section, a terminate message is employed by n1 to broadcast the detection of global termination. A Boolean variable terminatedi, initially set to false, is employed by ni to signal that ni may exit the repeat … until loop in Algorithm Task_t. This variable is set to true by ni

upon detection of global termination, if ni = n1, or upon receipt of the terminate message, otherwise.

Algorithm A_Detect_Termination_D:

Variables:

expectedi = 0;

parenti = nil;

terminatedi = false.

Listing 6.10 Input:

msgi = nil.

Action if ni∈ N0:

Do some computation;

Send one comp-msg on each edge of a (possibly empty) subset of Inci.

Listing 6.11 Input:

msgi = comp_msg such that origini (msgi) = (ni, nj).

Action:

if expectedi > 0 then begin

Send ack to nj;

Do some computation;

Send one comp_msg on each edge of a (possibly empty) subset of Inci

end else begin

Do some computation;

Send one comp_msg on each edge of a (possibly empty) subset of Inci;

if expectedi > 0 then parenti := nj

else

Send ack to nj

end.

Listing 6.12 Input:

msgi = ack.

Action:

expectedi := expectedi - 1;

if expectedi = 0 then if parenti ≠ nil then Send ack to parenti.

Listing 6.13 Input:

msgi = terminate.

Action if ni ∉ N0: terminatedi := true.

In Algorithm A_Detect_Termination_D, (6.10) and (6.11) are, in essence, (2.1) and (2.2), respectively, in Algorithm A_Template on comp_msg's, while (6.12) and (6.13) deal with the reception of ack and terminate messages, respectively (the latter for ni ≠ n1). Together, (6.11) and (6.12) can be seen to be closely related to (4.4) in Algorithm A_PIF in that all of them are involved with withholding acknowledgements from a parent neighbor until it is appropriate for that acknowledgement to be sent. This similarity with those two algorithms allows Algorithm A_Detect_Termination_D to be interpreted as a general template for asynchronous diffusing computations in which n1, the computation's sole initiator, detects global termination upon being reached by a collapsing wave of acknowledgements. This view of a computation as a propagating wave is the same that we employed in various occasions in Chapter 4, and in the present context allows the following pictorial interpretation. In Algorithm

A_Detect_Termination_D, a wave is initiated by n1 in (6.10) and throughout G it propagates back and forth with respect to n1. It propagates away from n1 with comp_msg's and

backwards in the direction of n1 with ack's. When the wave hits ni in its forward propagation, it may bounce back immediately (if expectedi > 0 at the beginning of (6.11) or expectedi = 0 at the end of (6.11)) or it may continue further on from that node (otherwise). Node ni may in this case be n1 itself, in which case the wave is sure to bounce back at once. The wave that propagates backwards in the direction of n1 does so by means of ack messages, and

continues to propagate at each node ni that it encounters so long as expectedi becomes zero with its arrival. What differentiates the wave propagations in this case from those of

Algorithm A_PIF is that a node that has already seen the ack wave go by may be hit by a forward-moving wave again (that is, by a comp_msg), so that overall the picture is that of a wave that may oscillate back and forth several times, and in different patterns on the various portions of G, before it finally collapses back onto n1.

Before proceeding with a more formal analysis of this behavior, we mention that, as in the case of Algorithm A_Detect_Termination_D of the previous section, we have not in Algorithm A_Detect_Termination_D been complete to the point of specifying the termination of n1 and

the propagation of the terminate broadcast. The reader should work on providing the missing details (cf. Exercise 6).

The correctness of Algorithm A_Detect_Termination_D is established by the following theorem.

Theorem 6.1.

Every global state in which n1 has tentatively terminated in Algorithm A_Detect_Termination_D is a global state in which global termination holds.

Proof If n1 has tentatively terminated, then by (6.11) and (6.12) every node must have sent a finite number of comp_msg's. As these comp_msg's and the corresponding ack's were received, the value of expectedi for node ni, initially equal to zero, became positive and zero again, possibly several times. Whenever a transition occurred in the value of expectedi from zero to a positive value, parenti was set to point to the node that sent the corresponding comp_msg. Consider the system states in which every node ni is either in a state of positive expectedi following the last transition from zero of its value, if it ever sent a comp_msg during the diffusing computation, or in any state, otherwise. Clearly, at least one of these system states is a global state, as for example the one in which every node that ever sent comp_msg's is in its state that immediately precedes the reception of the last ack (Figure 6.1). In this global state, only ack's flow on the edges, none of which sent as a consequence of the reception of a last ack. Let us consider one of these global states.

In this global state, the variables parenti for ni ≠ n1 induce a tree that spans all nodes in G corresponding to nodes that sent at least one comp_msg during the diffusing computation.

(This tree is in fact dynamically changing with the progress of the algorithm, as parenti may point to several of ni's neighbors along the way; it is always a tree, nevertheless.) This tree is rooted at n1, and its leaves correspond to those nodes from which no other node ni received the comp_msg that triggered the last transition from zero to a positive value of expectedi. As in the proof of Theorem 4.1, we proceed by induction on the subtrees of this tree. Along the induction, the assertion to be shown is that every global state in which the subtree's root has tentatively terminated is a global state in which every other node in the subtree has also tentatively terminated.

The basis of the induction is given by the subtrees rooted at the leaves, and then the assertion clearly holds, as no leaf ni is such that ni = parentj for some

Figure 6.1: Edges in the precedence graph fragment shown in part (a) are drawn as either solid lines or dashed lines. Solid lines represent comp_msg's, dashed lines represent ack's, and the remaining edges of the precedence graph are omitted. In this case, system_state>(Ξ1, Ξ2) is clearly a global state, and is such that every node that ever sent a comp_msg during the diffusing computation (i.e., n1 and n3) is in the state that immediately precedes the reception of the last ack.

In part (b), the spanning tree formed by the variables parenti for each node ni in this global state is shown with directed edges that point from ni to nj to indicate that parenti = nj. In this case, the tree has n1 for root and its single leaf is n3.

node nj. As the induction hypothesis, assume the assertion for all the subtrees rooted at nodes nj such that parentj is n1. Then n1 receives expected1 ack's, at which time it has tentatively terminated, and by the induction hypothesis so have all other nodes.

Let us now return briefly to the question, raised earlier in this section, of the algorithm's complexities. Because exactly one ack is sent for each comp_msg, the message complexity of Algorithm A_Detect_Termination_D is exactly the message complexity that Algorithm A_Template would have to realize the same computation without having to detect global termination. The same holds with respect to the algorithms' time complexities, because the time that Algorithm A_Detect_Termination_D spends in addition to that already spent by the corresponding instance of Algorithm A_Template is used solely for the final collapsing of the ack wave onto n1. This additional time, clearly, does not exceed that of Algorithm

A_Template, as this wave that propagates backwards comes from as far as the corresponding forward-propagating wave got.

Dans le document Message-Passing Systems (Page 112-116)