• Aucun résultat trouvé

Snapshot algorithms for FIFO channels

Dans le document This page intentionally left blank (Page 113-117)

4 Global state and snapshot recording algorithms

4.3 Snapshot algorithms for FIFO channels

This section presents the Chandy and Lamport algorithm [6], which was the first algorithm to record the global snapshot. We also present three variations of the Chandy and Lamport algorithm.

4.3.1 Chandy–Lamport algorithm

The Chandy-Lamport algorithm uses a control message, called amarker. After a site has recorded its snapshot, it sends a markeralong all of its outgoing channels before sending out any more messages. Since channels are FIFO, a marker separates the messages in the channel into those to be included in the snapshot (i.e., channel state or process state) from those not to be recorded in the snapshot. This addresses issueI1. The role of markers in a FIFO system is to act as delimiters for the messages in the channels so that the channel state recorded by the process at the receiving end of the channel satisfies the conditionC2.

Since all messages that follow a marker on channelCij have been sent by processpiafterpihas taken its snapshot, processpjmust record its snapshot no later than when it receives a marker on channelCij. In general, a process

must record its snapshot no later than when it receives a marker on any of its incoming channels. This addresses issueI2.

The algorithm

The Chandy–Lamport snapshot recording algorithm is given in Algorithm4.1.

A process initiates snapshot collection by executing the marker sending rule by which it records its local state and sends a marker on each outgoing channel. A process executes themarker receiving ruleon receiving a marker.

If the process has not yet recorded its local state, it records the state of the channel on which the marker is received as empty and executes the marker sending rule to record its local state. Otherwise, the state of the incoming channel on which the marker is received is recorded as the set of computation messages received on that channel after recording the local state but before receiving the marker on that channel. The algorithm can be initiated by any process by executing themarker sending rule. The algorithm terminates after each process has received a marker on all of its incoming channels.

The recorded local snapshots can be put together to create the global snapshot in several ways. One policy is to have each process send its local snapshot to the initiator of the algorithm. Another policy is to have each process send the information it records along all outgoing channels, and to have each process receiving such information for the first time propagate it along its outgoing channels. All the local snapshots get disseminated to all other processes and all the processes can determine the global state.

Multiple processes can initiate the algorithm concurrently. If multiple processes initiate the algorithm concurrently, each initiation needs to be

Marker sending rulefor processpi (1) Processpirecords its state.

(2) For each outgoing channel C on which a marker has not been sent,pisends a marker along C beforepisends further messages along C.

Marker receiving rulefor processpj On receiving a marker along channel C:

ifpj has not recorded its statethen Record the state of C as the empty set Execute the “marker sending rule”

else

Record the state of C as the set of messages received along C afterpj,s state was recorded and beforepj received the marker along C Algorithm 4.1 The Chandy–Lamport algorithm.

95 4.3 Snapshot algorithms for FIFO channels

distinguished by using unique markers. Different initiations by a process are identified by a sequence number.

Correctness

To prove the correctness of the algorithm, we show that a recorded snapshot satisfies conditionsC1andC2. Since a process records its snapshot when it receives the first marker on any incoming channel, no messages that follow markers on the channels incoming to it are recorded in the process’s snapshot.

Moreover, a process stops recording the state of an incoming channel when a marker is received on that channel. Due to FIFO property of channels, it follows that no message sent after the marker on that channel is recorded in the channel state. Thus, conditionC2is satisfied. When a processpjreceives message mij that precedes the marker on channelCij, it acts as follows: if processpj has not taken its snapshot yet, then it includesmij in its recorded snapshot. Otherwise, it records mij in the state of the channel Cij. Thus, conditionC1is satisfied.

Complexity

The recording part of a single instance of the algorithm requiresOemessages andOdtime, whereeis the number of edges in the network anddis the diameter of the network.

4.3.2 Properties of the recorded global state

The recorded global state may not correspond to any of the global states that occurred during the computation. Consider two possible executions of the snapshot algorithm (shown in Figure4.3) for the money transfer example of Figure4.2:

Figure 4.3Timing diagram of two possible executions of the banking example.

1. (Markers shown using dashed-and-dotted arrows.) Let site S1 initiate the algorithm just aftert1. Site S1 records its local state (account A=$550) and sends a marker to site S2. The marker is received by site S2 after t4. When site S2 receives the marker, it records its local state (account B=$170), the state of channelC12as $0, and sends a marker along channel C21. When site S1 receives this marker, it records the state of channel C21 as $80. The $800 amount in the system is conserved in the recorded global state,

A=$550 B=$170 C12=$0 C21=$80

2. (Markers shown using dotted arrows.) Let site S1 initiate the algorithm just aftert0 and before sending the $50 for S2. Site S1 records its local state (account A = $600) and sends a marker to site S2. The marker is received by site S2 betweent2andt3. When site S2 receives the marker, it records its local state (account B = $120), the state of channelC12 as $0, and sends a marker along channelC21. When site S1 receives this marker, it records the state of channelC21as $80. The $800 amount in the system is conserved in the recorded global state,

A=$600 B=$120 C12=$0 C21=$80

In both these possible runs of the algorithm, the recorded global states never occurred in the execution. This happens because a process can change its state asynchronously before the markers it sent are received by other sites and the other sites record their states.

Nevertheless, as we discuss next, the system could have passed through the recorded global states in some equivalent executions. Suppose the algorithm is initiated in global stateSiand it terminates in global stateSt. Letseqbe the sequence of events that takes the system fromSi toSt. LetS be the global state recorded by the algorithm. Chandy and Lamport [6] showed that there exists a sequenceseqwhich is a permutation ofseqsuch thatSis reachable fromSiby executing a prefix ofseqandStis reachable fromSby executing the rest of the events ofseq.

A brief sketch of the proof is as follows: an event e is defined as a pre-recording/post-recording event if e occurs on a process pandprecords its state after/before e in seq. A post-recording event may occur after a pre-recording event only if the two events occur on different processes. It is shown that a post-recording event can be swapped with an immediately following pre-recording event in a sequence without affecting the local states of either of the two processes on which the two events occur. By iteratively applying this operation to seq, the above-described permutation seq is obtained. It is then shown that S, the global state recorded by the algorithm for the processes and channels, is the state after all the pre-recording events have been executed, but before any post-recording event.

97 4.4 Variations of the Chandy–Lamport algorithm

Thus, the recorded global state is a valid state in an equivalent execution and if a stable property (i.e., a property that persists such as termination or deadlock) holds in the system before the snapshot algorithm begins, it holds in the recorded global snapshot. Therefore, a recorded global state is useful in detecting stable properties.

A physical interpretation of the collected global state is as follows: consider the two instants of recording of the local states in the banking example.

If the cut formed by these instants is viewed as being an elastic band and if the elastic band is stretched so that it is vertical, then recorded states of all processes occur simultaneously at one physical instant, and the recorded global state occurs in the execution that is depicted in this modified space–

time diagram. This is called therubber-bandcriterion. For example, consider the two different executions of the snapshot algorithm, depicted in Figure4.3.

For the execution for which the markers are shown using dashed-and-dotted arrows, the instants of the local state recordings are marked by squares.

Applying the rubber-band criterion, these can be stretched to be vertical or instantaneous. Similarly, for the other execution for which the markers are shown using dotted arrows, the instants of local state recordings are marked by circles. Note that the system execution would have been like this, had the processors’ speeds and message delays been different. Yet another physical interpretation of the collected global state is as follows: all the recorded process states are mutually concurrent – no recorded process state causally depends upon another. Therefore, logically we can view that all these process states occurred simultaneously even though they might have occurred at different instants in physical time.

Dans le document This page intentionally left blank (Page 113-117)