• Aucun résultat trouvé

On-the-Fly Determination of the Immediate Predecessors

The Notion of a Relevant Event At some abstraction level, only a subset of events are relevant. As an example, in some applications only the modification of some local variables, or the processing of specific messages, are relevant. Given a distributed computationH=(H,−→ev ), letRH be the subset of its events that are defined as relevant.

Let us accordingly define the causal precedence relation on these events, denoted

−→, as followsre

e1, e2R: (e1−→re e2)(e2−→ev e1).

This relation is nothing else than the projection of−→ev on the elements ofR. The pairR=(R,−→re )constitutes an abstraction of the distributed computationH= (H,−→ev ).

Without loss of generality, we consider that the set of relevant events consists only of internal events. Communication events are low-level events which, without being themselves relevant, participate in the establishment of causal precedence on relevant events. If a communication event has to be explicitly observed as relevant, an associated relevant internal event can be generated just before or after it.

An example of a distributed execution with both relevant events and non-relevant events is described in Fig.7.17. The relevant events are represented by black dots, while the non-relevant events are represented by white dots.

The management of vector clocks restricted to relevant events (Fig.7.18) is a simplified version of the one described in Fig.7.9. As we can see, despite the fact that they are not relevant, communication events participate in the tracking of causal precedence. Given a relevantetimestampede.vc[1..n], i, the integere.vc[j]is the number of relevant events produced bypj and known bypi.

The Notion of an Immediate Predecessor and the Immediate Predecessor Tracking Problem Given two eventse1, e2R, the relevant evente1is an im-mediate predecessor of the relevant evente2if

(e1−→re e2)

eR: (e1−→re e)(e−→re e2) .

7.2 Vector Time 171

Fig. 7.17 Relevant events in a distributed computation

when producing a relevant internal eventedo (1) vci[i] ←vci[i] +1;

(2) Produce the relevant evente. % The date ofeis vci[1..n]. when sendingMSG(m)topj do

(3) sendMSG(m,vci[1..n])topj. whenMSG(m,vc)is received frompj do

(4) vci[1..n] ←max(vci[1..n],vc[1..n]).

Fig. 7.18 Vector clock system for relevant events (code for processpi)

Fig. 7.19 From relevant events to Hasse diagram

The immediate predecessor tracking (IPT) problem consists in associating with each relevant event the set of relevant events that are its immediate predecessors.

Moreover, this has been done on the fly and without adding control messages. The determination of the immediate predecessors consists in computing the transitive reduction (or Hasse diagram) of the partial orderR=(R,−→re ). This reduction cap-tures the essential causality ofR.

The left of Fig.7.19represents the distributed computation of Fig.7.17, in which only the relevant events are explicitly indicated, together with their vector dates and an identification pair (made up of a process identity plus a sequence number). The right of the figure shows the corresponding Hasse diagram.

An Algorithm Solving the IPT Problem: Local Variables The kth relevant event on processpkis unambiguously identified by the pair(k,vck), where vck is the value of vck[k]whenpkhas produced this event. The aim of the algorithm is

conse-when producing a relevant internal eventedo (1) vci[i] ←vci[i] +1;

(2) Produce the relevant evente; lete.imp= {(k,vci[k])|impi[k] =1}; (3) impi[i] ←1;

(4) for eachj∈ {1, . . . , n} \ {i}do impi[j] ←0 end for.

when sendingMSG(m)topjdo

(5) sendMSG(m,vci[1..n],impi[1..n])topj. whenMSG(m,vc,imp)is received do

(6) for eachk∈ {1, . . . , n}do

(7) case vci[k]<vc[k]then vci[k] ←vc[k]; impi[k] ←imp[k] (8) vci[k] =vc[k]then impi[k] ←min(impi[k],imp[k]) (9) vci[k]>vc[k]then skip

(10) end case (11) end for.

Fig. 7.20 Determination of the immediate predecessors (code for processpi)

quently to associate with each relevant eventea sete.imp such that(k,vck)e.imp if and only if the corresponding event is an immediate predecessor ofe.

To that end, each processpi manages the following local variables:

vci[1..n]is the local vector clock.

impi[1..n]is a vector initialized to[0, . . . ,0]. Each variable impi[j]contains 0 or 1. Its meaning is the following: impi[j] =1 means that the last relevant event produced bypj, as known bypi, is candidate to be an immediate predecessor of the next relevant event produced bypi.

An Algorithm Solving the IPT Problem: Process Behavior The algorithm ex-ecuted by each processpi is a combined management of both the vectors vci and impi. It is described in Fig.7.20.

When a processpi produces a relevant evente, it increases its own vector clock (line 1). Then, it considers impi[1..n] to compute the immediate predecessors of e(line2). According to the definition of each entry of impi, those are the events identified by the pairs(k,cvi[k])such that impi[k] =1 (which indicates that the last relevant event produced bypk and known bypi, is still a candidate to be an immediate predecessor ofe).

Then, as it produced a new relevant event e, pi must reset its local array impi[1..n] (lines3–4). It resets (a) impi[i] to 1 (becausee is candidate to be an immediate predecessor of relevant events that will appear in its causal future) and (b) each impi[j](j =i) to 0 (because no event that will be produced in the future ofecan have them as immediate predecessors).

Whenpi sends a message, it attaches to this message all its local control in-formation, namely, cvi and impi (line5). When it receives a message,pi updates it local vector clock as in the basic algorithm so that the new value of vci is the component-wise maximum of vc and the previous value of vci.

7.3 On the Size of Vector Clocks 173

Fig. 7.21 Four possible cases when updating impi[k], while vci[k] =vc[k]

The update of the array impi depends on the value of each entry of vci.

If vci[k]<vc[k],pj (the sender of the message) has fresher information onpk

thanpi. Consequently,piadopts what is known bypj, and sets impi[k]to imp[k] (line7).

If vci[k] =vc[k], the last relevant event produced bypkand known bypi is the same as the one known bypj. If this event is still candidate to be an immedi-ate predecessor of the next event produced bypi from bothpi’s point of view ((impi[k]) andpj’s point of view ((imp[k]), then impi[k] remains equal to 1;

otherwise, impi[k] is set to 0 (line8). The four possible cases are depicted in Fig.7.21.

If vci[k]>vc[k],pi knows more onpk thanpj. Hence, it does not modify the value of impi[k](line9).

7.3 On the Size of Vector Clocks

This section first shows that vector time has an inherent price, namely, the size of vector clocks cannot be less than the number of processes. Then it introduces the notion of a relevant event and presents a general technique that allows us to reduce the number of vector entries that have to be transmitted in each message. Finally, it presents the notions of approximation of the causality relation and approximate vector clocks.

7.3.1 A Lower Bound on the Size of Vector Clocks

A vector clock has one entry per process, i.e.,nentries. It follows from the algorithm of Fig.7.9and Theorem5that vectors of sizenare sufficient to capture causality and independence among events. Hence the question: Are vectors of sizen neces-sary to capture causality and concurrency among the events produced byn asyn-chronous processes communicating by sending and receiving messages through a reliable asynchronous network?

This section shows that the answer to this question is “yes”. This means that there are distributed executions in which causality and concurrency cannot be captured by vector clocks of size smaller thann. The proof of this result, which is due to B.

Charron-Bost (1991), consists in building such a specific execution and showing a contradiction if vector clocks of size smaller thann are used to capture causality and independence of events.

The Basic Execution Let us consider an execution ofnprocesses in which each processpi, 1≤in, executes a sending phase followed by a reception phase.

There is a communication channel between any pair of distinct processes, and the communication pattern is based on a logical ring defined as follows on the process identities:i,i+1,i+2, . . . ,n, 1, . . . ,i−1, i. The notationsi+x andiy are used to denote thexth successor and theyth predecessor of the identityi on the ring, respectively. More precisely, the behavior of each processpi is as follows.

• A processpi first sends, one after the other, a message to each process of the following “increasing” list of (n−2)processes: pi+1, pi+2, . . . , pn,p1, . . . , pi2. The important points are the “increasing” order in the list of processes and the fact thatpi does not send a message topi1.

• Then, pi receives, one after the other, the (n−2)messages sent to it, in the

“decreasing” order on the identity of their senders, namely,pi receives first the message frompi1, then the one frompi2, . . . ,p1,pn, . . . ,pi+2. As before, the important points are the “decreasing” order in the list of processes and the fact thatpi does not receive a message frompi+1.

This communication pattern is described in Fig.7.22. (A simpler figure with only three processes is described in Fig.7.23.)

Considering a processpi, let fsidenote its first send event (which is the sending of a message topi+1), and lridenote its last receive event (which is the reception of a message frompi+2).

Lemma 1i∈ {1, . . . , n} : lri||fsi+1.

Proof As any process first sends messages before receiving any message, there is no causal chain involving more than one message. The lemma follows from this observation and the factpi+1does not send message topi. Lemma 2i, j: 1≤i=jn: fsi+1−→ev lrj.

Outline

Documents relatifs