Minimum-weight spanning tree (MST) in an asynchronous system

5 Terminology and basic algorithms

5.5 Elementary graph algorithms

5.5.12 Minimum-weight spanning tree (MST) in an asynchronous system

There are two approaches to designing the asynchronous MST algorithm.

In the first approach, the synchronous GHS algorithm is simulated in an asynchronous setting. In such a simulation, the same synchronous algorithm is run, but is augmented by additional protocol steps and control messages to provide the synchronicity. Observe from the synchronous GHS that the difficulty in making it asynchronous lies in step 2. If the two nodes at the ends of an unmarked edge are in different levels, the algorithm can go wrong.

Two possible ways to deal with this problem are as follows:

• After each round, an additional broadcast and convergecast on the marked edges are serially done. The newly identified leader broadcasts its ID and round number on the tree edges; the convergecast is then initiated by the leaves to acknowledge this broadcast. When the convergecast completes at the leader, it then begins the next round. Now in step 2, if the recipient of an EXAMINE message is in an earlier round, it simply delays the response to the EXAMINE, thus forcing synchrony.

This costsn·lognextra messages.

• When a node gets involved in a new round, it simply informs each neighbor (reachable along unmarked or non-tree edges) of its new level. Only when the neighbors along unmarked edges are all in the same round does the node send the EXAMINE message in step 2.

This costsL ·lognextra messages.

The second approach to designing the asynchronous MST is to directly address all the difficulties that arise due to lack of synchrony. The original asynchronous GHS algorithm uses this approach even though it is patterned along the synchronous GHS algorithm. By carefully engineering the asyn-chronous algorithm, it achieves the same message complexityOn·logn+l as the synchronous algorithm and a time complexity On·logn·l+d.

We do not present the algorithm here because it is a well-engineered algo-rithm with intricate details; rather, we only point out some of the difficulties in designing this algorithm:

• In step 2, if the two nodes are in different components or in different levels, there needs to be a mechanism to determine this.

• If the combining of components at different levels is permitted, then some component may keep combining with only single-node components in the worst case, thereby increasing the complexity by changing thelogn factor to the factorn.

163 5.6 Synchronizers

• The search for MWOEs by adjacent components at different levels needs to be coordinated carefully. Specifically, the rules for merging such com-ponents, as well as the rules for the concurrent search for the MWOE by these two components, need to be specified.

5.6 Synchronizers

General observations on synchronous and asynchronous algorithms

From the spanning tree algorithms, shortest path routing algorithms, con-strained flooding algorithms, and the MST algorithms, it can be observed that it is much more difficult to design the algorithm for an asynchronous system, than for a synchronous system. This can be generalized to all algorithms, with few exceptions. The example algorithms also suggest that simulating synchronous behavior (of an algorithm designed for a synchronous system) on an asynchronous system is often a direct way to realize the algorithms on asynchronous systems.

Given that typical distributed systems are asynchronous, the logical ques-tion to address is whether there is a general technique to convert an algorithm designed for a synchronous system, to run on an asynchronous system. The generic class of transformation algorithms to run synchronous algorithms on asynchronous systems are called synchronizers. We make the following observations. (i) We consider only failure-free systems, whether synchronous or asynchronous. We will see later (in Chapter14) that such transformations may not be possible in asynchronous systems in which either processes fail or channels are unreliable. (ii) Using a synchronizer provides a sure way to obtain an asynchronous algorithm. However, such an algorithm may have high complexity. Although more difficult, it may be possible to design more efficient asynchronous algorithms from scratch, rather than transforming the synchronous algorithms to run on asynchronous systems. (This was seen in the case of the GHS algorithm.) Thus, the field of systematic algorithm design for asynchronous systems is an open and challenging field.

Practically speaking, in an asynchronous system, a synchronizer is a mecha-nism that indicates to each process when it is safe to proceed to the next round of execution of the “synchronous” algorithm. Conceptually, the synchronizer signals to each process when it is sure that all messages to be received in the current round have arrived.

The mesage complexity M_a and time complexity T_a of the asynchronous algorithm are as follows:

M_a=M_s+M_init+rounds·M_round (5.1) T_a=T_s+T_init+rounds·T_round (5.2)

Table 5.1 The message and time complexities for thesimple,, , and synchronizers.h_cis the greatest height of a tree among all the clusters.L_c is the number of tree edges and designated edges in the clustering scheme for the synchronizer. d is the graph diameter.

Simple synchronizer

synchronizer

M_init 0 0 On·logn Okn²

T_init d 0 On n·logn/logk

M_round 2L OL On OL_c ≤Okn

T_round 1 O1 On Oh_c ≤Ologn/

logk

where:

• M_sis the number of messages in the synchronous algorithm;

• roundsis the number of rounds in the synchronous algorithm;

• T_sis the time for the synchronous algorithm. Assuming one unit (message hop) per round, this equalsrounds;

• M_round is the number of messages needed to simulate a round;

• T_round is the number of sequential message hops needed to simulate a

round;

• M_init andT_init are the number of messages and the number of sequential message hops, respectively, in the initialization phase in the asynchronous system.

We now look at four standard synchronizers: the simple, the , the , and thesynchronizers, proposed by Awerbuch [3]. The message and time complexities of these are summarized in Table5.1.

The,, andsynchronizers use the notion of process safety, defined as follows. A processiis said to besafein roundr if all messages sent byiin roundrhave been received. Theandsynchronizers are extreme cases of thesynchronizer and form its building blocks.

A simple synchronizer

This synchronizer requires each process to send every neighbor one and only one message in each round. If no message is to be sent in the synchronous algorithm, an empty dummy message is sent in the asynchronous algorithm;

if more than one message are sent in the synchronous algorithm, they are combined into one message in the asynchronous algorithm. In any round, when a process receives a message from each neighbor, it moves to the next round.

We make the following observations about this synchronizer.

165 5.6 Synchronizers

• In physical time, any two processes may be only one round apart. Thus, if processi is in roundround_i, any other adjacent process j must be in roundsround_i−1,round_i, orround_i+1 only.

• When process i is in round round_i, it can receive messages only from roundsround_i orround_i+1 from its neighbors.

Initialization

Any process may start roundi. Withindtime units, all processes will partic-ipate in that round. Hence,T_init=d.M_init=0 because no explicit messages are required solely for initialization.

Complexity

Each round requires a message to be sent on each incident link in each direction. Hence,M_round=2LandT_round=1.

The synchronizer

At any processi, thesynchronizer in roundrmoves the process to the next roundr+1 if all the neighboring processes aresafefor roundr.

A process can learn about the safety of its neighbor if any message sent by this process is required to be acknowledged. Once a neighborjhas received acknowledgements for all the messages it sent, it sends a message informing i(and all its other neighbors) that it is safe.

Example The operation is illustrated in Figure5.10. (step 1) Node A sends a message to nodes C and E, and receives messages from B and E in the same round. (step 2) These messages are acknowledged after they are received.

(step 3) Once node A receives the acknowledgements from C and E, it sends a message to all its neighbors to notify them that node A is safe. This allows the neighbors to not wait on A before proceeding to the next round. Node A itself can proceed to the next round only after it receives a safety notification from each of its neighbors, whether or not there was any exchange of application execution messages with them in that round.

Figure 5.10 An example showing steps of the synchronizer. (a) Execution messages (step 1) and their acknowledgements (step 2).

(b) “I am safe” messages (step 3).

Complexity

For every message sent (≤ L) in a round, an ack is required. Ifl<L messages are sent in a round,lacks are needed, giving a message overhead of 2lthus far; but it is assumed that an underlying transport layer (or equivalent) protocol uses acks, and hence these come for free. But additionally, 2L messages are required so that each process can inform all its neighbors that it is safe. Thus the message complexityM_round= 2L +2l=OL. The time complexityT_round=O1.

Initialization

No explicit initialization is needed. A process that spontaneously wakes up and initializes the algorithm sends messages to (some of) its neighbors, who then acknowledge any message received, and also reply that they are safe.

The synchronizer

This synchronizer assumes a rooted spanning tree. Safe leaf nodes initiate a convergecast; an intermediate node propagates the convergecast to its parent when all the nodes in its subtree, including itself, are safe. When the root becomes safe and receives the convergecast from all its children, it uses a tree broadcast to inform all the nodes to move to the next phase.

Example Compared to thesynchronizer, steps 1 and 2 as described with respect to Figure5.10are the same to determine when to notify others about safety. The actual notification about safety uses the convergecast–broadcast sequence on a pre-established tree, instead of using step 3 of Figure5.10.

Complexity

Just as for the synchronizer, an ack is required by the synchronizer for each message of thelmessages sent in a round; hencelacks are required, but these can be assumed to come for free, thanks to the transport layer or an equivalent lower layer protocol. Now instead of 2l further messages as in the synchronizer, only 2n−1 further messages are required for the convergecast and broadcast. Hence,M_round=2n−1. For each round, there is an average case 2·logndelay for T_round and a worst-case 2ndelay for T_round, incurred by the convergecast and the broadcast.

Initialization

There is an initialization cost, incurred by the set up of the spanning tree (the Algorithms in Section 5.5). As noted in Section5.5, this cost is:On· logn+ Lmessages andOntime.

The synchronizer

The network is organized into a set of clusters, as shown in Figure5.11. Within a cluster, a spanning tree hierarchy exists with a distinguished root node. The

167 5.6 Synchronizers

Figure 5.11 Cluster organization for the synchronizer, showing six clusters A–F. Only the tree edges within each cluster, and the inter-clusterdesignated

edges are shown. F E D

B C

Root Designated (inter-cluster) edge

Tree edge

height of a clustering scheme,hc, is the maximum height of the spanning trees across all of the clusters. Two clusters are neighbors if there is at least one edge between one node in each of the two clusters; one of such multiple edges is thedesignatededge for that pair of clusters. Within a cluster, the synchronizer is executed; once a cluster is “stabilized,” thesynchronizer is executed among the clusters, over thedesignatededges. To convey the results of the stabilization of the inter-clustersynchronizer, within each cluster, a convergecast and broadcast phase is then executed. Over thedesignated inter-cluster edges, two types of messages are exchanged for thesynchronizer:

My_cluster_safe, andNeighboring_cluster_safe, with semantics that are self evident. The details of the algorithm are given in Algorithm5.12.

Complexity

• Let L_c be the total number of tree edges plus designated edges in the clustering scheme. In each round, there are four messages –Subtree_safe, This_cluster_safe, Neighboring_cluster_safe, and Next_round – per tree edge, and two My_cluster_safe messages over each designated edge.

Hence,M_round isOL_c.

• Let h_c be the maximum height of any tree among the clusters, then the time complexity component T_round is Oh_c. This is due to the four phases – convergecast, broadcast, convergecast, and broadcast – con-tributing 4h_c time, the two units of time needed for all processes to become safe, and one unit of time needed for the inter-cluster messages My_cluster_safe.

Exercise 5.25 asks you to work out a formal design of how to partition the nodes into clusters, how to choose a root and a spanning tree of appro-priate depth for each cluster, and how to designate the preferred edges. The requirements on the design scheme are to be able to control the complexity by suitably tuning a parameter k. The k synchronizer reduces to the synchronizer when k=n−1, i.e., each cluster contains a single node. The

ksynchronizer reduces to the synchronizer whenk=2, i.e., there is a single cluster. The construction will allow theksynchronizer to be viewed as a parameterized synchronizer based on clustering.

(message types)

Subtree_safe //synchronizer phase’s convergecast within cluster This_cluster_safe //synchronizer phase’s broadcast within cluster My_cluster_safe // embedded inter-clustersynchronizer’s messages

// across cluster boundaries

Neighboring_cluster_safe // Convergecast following inter-cluster // synchronizer phase

Next_round // Broadcast following inter-clustersynchronizer phase foreachrounddo

1. ( synchronizer phase)This phase aims to detect when all the nodes within a cluster are safe, and inform all the nodes in that cluster.

(a) Using the spanning tree, leaves initiate the convergecast of the

“Subtree_safe” message towards the root of the cluster.

(b) After the convergecast completes, the root initiates abroadcastof

“This_cluster_safe” on the spanning tree within the cluster.

(i) During this broadcast in the tree, as the nodes get engaged, the nodes also send “My_cluster_safe” messages on any incident designatedinter-cluster edges.

(ii) Each node also awaits “My_cluster_safe” messages along any such incidentdesignatededges.

2. (Convergecast and broadcast phase)This phase aims to detect when all neighboring clusters are safe, and to inform every node within this cluster.

(a) (Convergecast)

(i) After the broadcast of the earlier phase (1(b)) completes, the leaves initiate a convergecast using “Neighboring_cluster_safe”

messages once they receive any expected “My_cluster_safe”

messages (step 1(c)) on all thedesignatedincident edges.

(ii) An intermediate node propagates the convergecast once it receives the “Neighboring_cluster_safe” message from all its children, and also any expected “My_cluster_safe” message (as per step 1(c)) alongdesignatededges incident on it.

(b) (Broadcast) Once the convergecast completes at the root of the cluster, a “Next_round” message is broadcast in the cluster’s tree to inform all the tree nodes to move to the next round.

Algorithm 5.12 Thesynchronizer.

169 5.7 Maximal independent set (MIS)

Dans le document This page intentionally left blank (Page 182-189)