Building a Spanning Tree - Parallel Traversal: Broadcast and Convergecast

Learning the Communication Graph

1.2 Parallel Traversal: Broadcast and Convergecast

1.2.4 Building a Spanning Tree

This section presents a simple algorithm that (a) implements broadcast and con-vergecast, and (b) builds a spanning tree. This algorithm is sometimes called prop-agation of information with feedback. Once a spanning tree has been constructed, it can be used for future broadcasts and convergecasts involving the same distin-guished processpa.

Local Variables As before, each process pi is provided with a set neighbors_i which defines its position in the communication graph and, at the end of the execu-tion, its local variables parent_i and childreni will define its position in the spanning tree rooted atpa.

To compute its position in the spanning tree rooted atpa, each processpi uses an auxiliary integer local variable denoted expected_msg_i. This variable contains the number of messages thatp_i is waiting for from its children before sending a messageBACK()to its parent.

Algorithm The broadcast/convergecast algorithm building a spanning tree is de-scribed in Fig.1.7. To simplify the presentation, it is first assumed that the channels are FIFO (first in, first out). The distinguished processp_ais the only process which receives the external messageSTART() (line1). Upon its reception,p_a initializes parent_a, childrenaand expected_msg_aand sends a messageGO(data)to each of its neighbors (line2).

When a processp_i receives a message GO(data)for the first time, it defines the senderp_j as its parent in the spanning tree, and initializes childreni to∅ and expected_msg_i the number of its neighbors minusp_j (line 4). If its parent is its only neighbor, it sends back the pair(i, vi)thereby indicating topj that it is one of its children (lines5–6). Otherwise,pi forwards the messageGO(data)to all its neighbors but its parentpj (line7).

If parent_i= ⊥, whenpi receivesGO(data), it has already determined its parent in the spanning tree and forwarded the messageGO(data). It consequently sends by return top_j the messageBACK(∅), where∅is used to indicate top_j thatp_i is not one of its children (line9).

When a processp_i receives a messageBACK(res,val_set)from a neighborp_j, it decreases expected_msg_i(line11) and addsp_j to childreni if val_set= ∅(line12).

Then, ifp_i has received a messageBACK()from all its neighbors (but its parent, line13), it sends to its parent (lines15–16) the set val_set containing its own pair

1.2 Parallel Traversal: Broadcast and Convergecast 13 whenSTART()is received do % onlyp_areceives this message %

(1) parent_i←i; children_i← ∅; expected_msg_i← |neighbors_i|; (2) for eachj∈neighbors_idosendGO(data)topjend for.

whenGO(data)is received fromp_jdo (3) if(parent_i= ⊥)

whenBACK(val_set)is received fromp_jdo (11) expected_msg_i←expected_msg_i−1;

(12) if(val_set= ∅)then children_i←children_i∪ {j}end if;

(13) if (expected_msg_i=0)then % a set val_set_xhas been received from each childp_x% (14) let val_set =(

x∈childrenival_set_x)∪ {(i, v_i)}; letpr=parent_i; (15) if(pr=i)

(16) thensendBACK(val_set)top_pr % local termination forp_i% (17) else p_i(=p_a) can computef (val_set) % global termination % (18) end if

(19) end if.

Fig. 1.7 Construction of a rooted spanning tree (code forp_i)

(i, v_i)plus all the pairs(k, v_k)it has received from its children line14). Then,p_i has terminated its participation in the algorithm (its local variable expected_msg_i then becomes useless). Ifp_i is the distinguished processp_a, the set val_set contains a pair(x, v_x)per processp_x, andp_a can accordingly computef (val_set)(where f ()is the function whose result is the output of the computation).

Let us notice that, when the distinguished processp_a discovers that the algo-rithm has terminated, all the messages sent by the algoalgo-rithm have been received and processed.

Cost Let us observe that a messageBACK() is eventually sent as a response to each messageGO(). Moreover, except on the channels of the spanning tree that is built, two messagesGO() can be sent (one in each direction).

Letebe the number of channels of the underlying communication graph. It fol-lows that the algorithm gives rise to 2(n−1)messages which travel on the chan-nels of tree and 4(e−(n−1)) messages which travel on the other channels, i.e., 2(2e−n+1)messages. Then, once the tree is built, a broadcast/convergecast costs only 2(n−1)messages.

Assuming all messages take one time unit and local computations have zero du-ration, it is easy to see that the time complexity is 2DwhereDis the diameter of the communication graph. Once the tree is built, the time complexity of a

broad-Fig. 1.8 Left: Underlying communication graph; Right: Spanning tree

Fig. 1.9 An execution of the algorithm constructing a spanning tree

cast/convergecast is 2Da, where D_a is the longest distance from p_a to any other process.

An Example An execution of the algorithm described in Fig.1.7for the commu-nication graph depicted in the left part of Fig.1.8is described in Fig.1.9.

Figure1.9is a space-time diagram. The execution of a processp_i, 1≤i≤4, is represented by an axis oriented from left to right. An arrow from one axis to another represents a message transfer. In this picture, an arrow labeledGOx,y()represents a messageGO()sent byp_xtop_y. Similarly, an arrow labeledBACKx,y()represents a messageBACK()sent byp_xtop_y.

The processp1 is the distinguished process that receives the external message

START()and consequently will be the root of the tree. It sends a messageGO() to its neighborsp2andp3. Whenp3receives this message, it defines its parent as being p1and forwards messageGO() to its two other neighborsp2andp4.

Since the first messageGO() received byp2is the one sent byp3,p2defines its parent as beingp3and forwards the messageGO() to its other neighbor, namelyp1. Whenp1receives a messageGO() fromp2, it sends back a message BACK(∅)to p₂. In contrast, whenp₄receives the messageGO() fromp₃, it sends by return to p₃a messageBACK()carrying the pair(4, v4). Moreover, whenp₂has received a messageBACK()fromp₁, it sends to its parentp₃a messageBACK()carrying the pair(2, v2).

Finally, whenp₃ receives the messages BACK()from p₂ andp₄, it discovers that these processes are its children and sends a messageBACK()carrying the set {(2, v2), (3, v3), (4, v4)}to its parentp₁. Whenp₁receives this message, it

discov-1.2 Parallel Traversal: Broadcast and Convergecast 15 ers thatp2is its only child. It can then computef ()on the vector[v1, v2, v3, v4].

The tree that has been built is represented at the right of Fig.1.8.

On the Parenthesized Structure of the Execution It is important to notice that the spanning tree that has been built depends on the speed of the messagesGO().

Another execution of the same algorithm on the same network with the same distin-guished process could produce a different tree rooted atp1.

It is also interesting to observe that each messageGO() can be seen as an opening bracket that can be associated with a messageBACK(), which is the corresponding closing bracket. This appears on the figure as follows:GOx,y()is an opening bracket whose associated closing bracket isBACKy,x().

The Case of Non-FIFO Channels Assuming non-FIFO channels and taking into account Fig.1.9, let us consider that the messageGO1,2() arrives atp₂after the mes-sageBACK1,2(). It is easy to see that the algorithm remains correct (i.e., a spanning tree is built).

The only thing that changes is the meaning associated with line16. When a process sends a messageBACK() to its parent, it can no longer claim that its local computation is terminated. A process needs now to have received a message on each of its incident channels before claiming local termination.

A Spanning Tree per Process The algorithm of Fig.1.7can be easily general-ized to buildntrees, each one associated with a distinct process which is its dis-tinguished process. Then, when any processpi wants to execute an efficient broad-cast/convergecast, it has to use its associated spanning tree.

To build a spanning tree per process, the local variables parent_i, childreni, and expected_msg_i of each processp_i have to be replaced by the arrays parent_i[1..n], childreni[1..n]and expected_msg_i[1..n]and all messages have to carry the identity of the corresponding distinguished process. More precisely, when a processp_k re-ceives a messageSTART(), it uses its local variables parent_k[k], childrenk[k], and expected_msg_k[k]. The corresponding messages will carry the identityk,GO(k,−) andBACK(k,−), and, when a processp_i receives such messages, it will uses its local variables parent_i[k], childreni[k]and expected_msg_i[k].

Concurrent Initiators for a Single Spanning Tree The algorithm of Fig.1.7can be easily modified to build a single spanning tree while allowing several processes to independently start the execution of the algorithm, each receiving initially a message

START(). To that end, each process manages an additional local variable max_idi

initialized to 0, which contains the highest identity of a process competing to be the root of the spanning tree.

• If a processp_i receives a messageSTART()while max_idi =0,p_i discards this message (in that case, it already participates in the algorithm but does not com-pete to be the root). Otherwise,p_i starts executing the algorithm and all the cor-responding messagesGO()orBACK()carry its identity.

Fig. 1.10 Two different spanning trees built from the same communication graph

• Then, when a processpi receives a messageGO(j,−),pidiscards the message if j <max_idi. Otherwise,p_i considersp_j as the process with the highest identity which is competing to be the root. It sets consequently max_idi toj and con-tinues executing the algorithm by using messagesGO()andBACK()carrying the identityj.

It is easy to see that this simple application of the forward/discard strategy ensures that a single spanning tree will be constructed, namely the one rooted atp_j wherej is such that, at the end of the execution, we have max_id1= · · · =max_idn=j.

Dans le document Distributed Algorithms for Message-Passing Systems (Page 40-44)