African University of Science and Technology
Time, clocks and ordering of events
Pr. Ousmane THIARE
Outline
Modeling Distributed Executions Happen-Before
Global states and cuts
Global predicate evaluation Problem Definition Example: deadlock detection
Real clocks vs logical clocks Logical clocks Scalar logical clocks Vector logical clocks
Passive monitoring Passive monitoring, v.1 Passive monitoring, v.2 Passive monitoring, v.3
Modeling Distributed Executions
Distributed Execution
Definition (Distributed algorithm)
A distributed algorithm is a collection of distributed automata, one per process
Definition (Distributed execution)
Theexecution of a distributed algorithm is a sequence of eventsex- ecuted by the processes
Partial execution: a finite sequence of events
Infinite execution: a infinite sequence of events Possible events
Modeling Distributed Executions
Histories
Definition (Local History)
The local history of processpi is a (possibly infinite) sequence of events hi =ei0ei1ei2· · ·eimi (canonical enumeration)
Definition (Partial history)
The partial history up to event eik is denoted hki and is given by the prefix of the first k events ofhi
Modeling Distributed Executions
Histories
Local histories do not specify any relative timing between events belonging to different processes.
We need a notion of ordering between events, that could help us in deciding whether:
one event occurs before another
they are actually concurrent
Modeling Distributed Executions Happen-Before
Outline
Modeling Distributed Executions Happen-Before
Global states and cuts
Global predicate evaluation Real clocks vs logical clocks Passive monitoring
Modeling Distributed Executions Happen-Before
Happen-Before
Definition (Happen-before)
We say that an event ehappens-beforean event e’, and writee −→e0, if one of the following three cases is true:
∃pi ∈Q
:e =eir,e0 =eis,r <s
(if e and e’ are executed by the same process, e before e’)
e=send(m)∧ e’=receive(m)
(if e is the send event of a message m and e’ is the corresponding receive event)
∃e00:e −→e00−→e0
(in other words, −→ is transitive)
Modeling Distributed Executions Happen-Before
Happen-Before
Space-Time Diagram of a Distributed Computation
Modeling Distributed Executions Happen-Before
Happen-Before
Meaning of Happen-Before)
Ife −→e0, this means that we can find a series of eventse1e2e3· · ·en, wheree1 =e anden=e0, such that for each pair of consecutive events ei and ei+1:
ei andei+1 are executed on the same process, in this order
ei =send(m) andei+1=receive(m) Notes:
happen-before captures the concept of potential causal ordering
happen-before captures a flow of data between two events.
Modeling Distributed Executions Global states and cuts
Outline
Modeling Distributed Executions Happen-Before
Global states and cuts
Global predicate evaluation Real clocks vs logical clocks Passive monitoring
Modeling Distributed Executions Global states and cuts
Global States
Definition (Local state)
The local stateof process pi after the execution of eventeik is denoted σik
The local state contains all data items accessible by that process
Local state is completely private to the process
σ0i is theinitial stateof process pi Definition (Global state)
The global state of a distributed computation is an n-tuple of local states Σ = (σ1,· · · , σn), one for each process.
Modeling Distributed Executions Global states and cuts
Cut
Definition (Cut)
A cutof a distributed computation is the union of n partial histories, one for each process:
C =hc11∪hc22∪ · · · ∪hcnn
A cut may be described by a tuple (c1,c2,· · · ,cn), identifying the frontier of the cut, i.e. the set of last events, one per process.
Each cut (c1,c2,· · ·,cn) has a corresponding global state (σ1c1∪σ2c2∪ · · · ∪σncn).
Modeling Distributed Executions Global states and cuts
Cut
Modeling Distributed Executions Global states and cuts
Consistent Cut
Consider cuts C’ and C in the previous figure.
Is it possible that cut C correspond to a “real” state in the execution of a distributed algorithm?
Is it possible that cut C’ correspond to a “real” state in the execution of a distributed algorithm?
Modeling Distributed Executions Global states and cuts
Consistent Cut
Consider cuts C’ and C in the previous figure.
Definition (Consistent cut)
A cut C isconsistentis for all events e and e’, (e ∈C)∧(e0 −→e)⇒e0 ∈C Definition (Consistent global state)
A global state is consistent if the corresponding cut is consistent.
In other words
A consistent cut is left-closed with regard to (w.r.t.) the
Modeling Distributed Executions Global states and cuts
Consistent Cut
Consider cuts C’ and C in the previous figure.
In the previous figures, C is consistent and C’ is not.
In the space-time diagram, a cut C is consistent if all the arrows start on the left of the cut and finish on the right of the cut.
Consistent cuts represent the concept of scalar time in distributed computation: it is possible to distinguish between a “before” and an “after”.
Predicates can be evaluated in consistent cuts, because they correspond to potential global states that could have taken place during an execution.
Global predicate evaluation Problem Definition
Outline
Modeling Distributed Executions Global predicate evaluation
Problem Definition
Example: deadlock detection
Real clocks vs logical clocks
Global predicate evaluation Problem Definition
Introduction
Consider cuts C’ and C in the previous figure.
Definition (Global Predicate Evaluation)
The problem of detecting whether the global state of a distributed system satisfies some predicate Φ.
Motivation
Many important distributed problems require to react when when the global state of the system satisfies a given condition.
Monitoring: Notify an administrator in case of failures
Debugging: Verify whether an invariant is respected or not
Deadlock detection: can the computation continue?
Garbage collection: like Java, but distributed
Thus, the ability to construct a global state and evaluate a predicate over it is a core problem in distributed computing.
Global predicate evaluation Problem Definition
Examples
Global predicate evaluation Problem Definition
Why GPE is difficult
A global state obtained through remote observations could be
obsolete: represent an old state of the system.
Solution: build the global state more frequently
inconsistent: capture a global state that could never have been occurred in reality
Solution: build only consistent global states
incomplete: not “capture” every moment of the system Solution: build all possible consistent global states
Global predicate evaluation Problem Definition
Space-Time Diagram of a Distributed Computation
Global predicate evaluation Example: deadlock detection
Outline
Modeling Distributed Executions Global predicate evaluation
Problem Definition
Example: deadlock detection
Real clocks vs logical clocks Passive monitoring
Global predicate evaluation Example: deadlock detection
Example – Deadlock detection on a multi-tier system
Processes in the previous figures use RPCs:
Client sends a request for method execution; blocks.
Server receives request.
Server executes method; may invoke other methods in other servers, acting as a client.
Server sends reply to client
Clients receives reply; unblocks.
Such a system can deadlock, as RPCs are blocking. It is important to be able to detect when a deadlock occurs.
Global predicate evaluation Example: deadlock detection
Runs and consistent runs
Processes in the previous figures use RPCs:
Definition (Run)
Arunof global computation is a total ordering R that includes all the events in the local histories and that is consistent with each of them.
In other words, the events ofpi appear in R in the same order in which they appear inhi.
A run corresponds to the notion that events in a distributed computation actually occur in a total order
A distributed computation may correspond to many runs Definition (Consistent run)
A run R is said to be consistent if for all events e and e’, e −→ e0 implies that e appears before e’ in R.
Global predicate evaluation Example: deadlock detection
Runs and consistent runs
e11e21e31e12e22e32e31e32e33e41e43e51e53e61e63 ?
e11e21e31e12e32e33e31e41e32e43e22e51e53e61e63 ?
Global predicate evaluation Example: deadlock detection
Monitoring Distributed Computations
Assumptions:
There is a single processp0called monitorwhich is responsible for evaluating Φ
We assume that the monitorp0is distinct from the observed processes p1· · ·pn
Events executed on behalf of monitoring do not alter canonical enumeration of “real” events
In general, observed processes send notifications about local events to the monitor, which builds an observation.
Global predicate evaluation Example: deadlock detection
Observations
Definition (Observation)
The sequence of events corresponding to the order in which notification messages arrive at the monitor is called anobservation.
Given the asynchronous nature of our distributed system, any permutation of a run R is a possible observation of it.
Definition (Consistent observation)
An observation is consistent if it corresponds to a consistent run.
Real clocks vs logical clocks
How to obtain consistent observations
The happen-before relation captures the concept of potential causality
In the “day-to-day” life, causality/concurrency are tracked using physical time
We use loosely synchronized watches;
Example: I have withdrawn money from an ATM in Trento at 13.00 on 17th May 2006, so I can prove that I’ve not withdrawn money on the same day at 13.20 in Paris
In distributed computing systems:
the rate of occurrence of events is several magnitudes higher
event execution time is several magnitudes smaller
If physical clocks are not precisely synchronized, the causality/concurrence relations between events may not be accurately captured
Real clocks vs logical clocks Logical clocks
Outline
Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks
Logical clocks Scalar logical clocks Vector logical clocks
Real clocks vs logical clocks Logical clocks
Logical clocks
Instead of using physical clocks, which are impossible to synchronize, we use logical clocks.
Every process has a logical clockthat is advanced using a set of rules
Its value is not required to have any particular relationship to any physical clock.
Every event is assigned a timestamp, taken from the logical clock
The causality relation between events can be generally inferred from their timestamps
Real clocks vs logical clocks Logical clocks
Logical clocks
Instead of using physical clocks, which are impossible to synchronize, we use logical clocks.
Definition (Logical clock)
A logical clock LC is a function that maps an event e from a distributed system execution to an element of a time domain T :
LC :H−→T Definition (Clock Consistency)
e −→e0 ⇒LC(e)<LC(e0)
Real clocks vs logical clocks Scalar logical clocks
Outline
Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks
Logical clocks Scalar logical clocks Vector logical clocks
Passive monitoring
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks (I)
Instead of using physical clocks, which are impossible to synchronize, we use logical clocks.
Definition (Scalar logical clocks)
A Lamport’s scalar logical clock is a monotonically increasing software counter
Each process pi keeps its own logical clock LCi
Thetimestampof event e executed by process pi is denotedLCi(e)
Messages carry the timestamp of their send event
Logical clocks are initialized to 0
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks (II)
Update rule
Whenever an event e is executed by process pi, its local logical clock is updated as follows:
LCi =
LCi+ 1 If ei is an internal or send event max{LCi,TS(m)}+ 1 If ei=receive(m)
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks (III)
Real clocks vs logical clocks Scalar logical clocks
Properties
Theorem
Scalar logical clocks satisfy Clock consistency, i.e.
e −→e0 ⇒LC(e)<LC(e0) Proof
This immediately follows from the update rules of the clock.
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks
Theorem
Scalar logical clocks do not satisfy Strong clock consistency, i.e.
LC(e)<LC(e0)6⇒e −→e0
Real clocks vs logical clocks Vector logical clocks
Outline
Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks
Logical clocks Scalar logical clocks Vector logical clocks
Passive monitoring
Real clocks vs logical clocks Vector logical clocks
Causal histories clocks
Definition (Causal History)
The causal history of an event e is the set of events that happen-before e, plus e itself.
θ(e) ={e0 ∈H|e0 −→e} ∪ {e}
Theorem
Causal histories satisfy Strong clock consistency Proof
0 0 0 0 0
Real clocks vs logical clocks Vector logical clocks
Example
Problem:
Causal histories tend to grow too much; they cannot be used as
“timestamps” for messages.
Real clocks vs logical clocks Vector logical clocks
Vector clocks
Causal history projection: θi(e) =θ(e)∩hi =hcii
θ(e) =θ1(e)∪θ2(e)∪ · · · ∪θ(e) =h1c1∪hc22∪ · · · ∪hcnn
In other words,θ(e) is a cut, which happens to be consistent.
Cuts can be represented by their frontiers: θ(e) = (c1,c2,· · ·,cn) Definition
The vector clock associated to event e is a n-dimensional vector VC(e) such that
VC(e)[i] =ciwhereθi(e) =hcii
Real clocks vs logical clocks Vector logical clocks
Vector clocks: Implementation
Each process pi maintains a vector clockVCi, initially all zeroes;
When eventei is executed,VCi assumes the value ofVC(ei);
Ifei =send(m), the timestamp of m isTS(m) =VC(ei);
Update rule
When eventei is executed by process pi,VCi is updated as follows:
Ifei is an internal or send event:
VCi[i] =VCi[i] + 1
Ifei =receive(m):
VCi[j] =max{VCi[j],TS(m)[j]} ∀j 6=i VCi[i] =VC[i] + 1
Real clocks vs logical clocks Vector logical clocks
Example
Real clocks vs logical clocks Vector logical clocks
Properties of Vector clocks (I)
“Less than” relation for Vector clocks
V <V0 ⇐⇒(V 6=V0)∧(∀k : 1≤k ≤n :V[k]≤V0[k]) Strong Clock Condition
e −→e0 ⇐⇒VC(e)<VC(e0)⇐⇒θ(e)⊂θ(e0) Simple Strong Clock Condition
ei −→ej ⇐⇒VC(ei)[i]≤VC(ej)[i]
Real clocks vs logical clocks Vector logical clocks
Properties of Vector clocks (II)
Definition (Concurrent events)
Eventsei andej are concurrent (i.e. ei||ej) if and only if:
(VC(ei)[i]>VC(ej)[i])∧(VC(ej)[j]>VC(ei)[j])
In other words, eventei does not happen-before ej, andej does not happen beforeei.
Real clocks vs logical clocks Vector logical clocks
Concurrent events
(VC(ei)[i]>VC(ej)[i])∧(VC(ej)[j]>VC(ei)[j])
Real clocks vs logical clocks Vector logical clocks
Properties of vector clocks
Definition (Pairwise Inconsistent)
Eventsei andej with i 6=j are pairwise inconsistent if and only if (VC(ei)[i]<VC(ej)[i])∨(VC(ej)[j]<VC(ei)[j]) In other words, two events are pairwise inconsistent if they cannot belong to the frontier of the same consistent cut. The formula characterize the fact that the cut include a receive event without including a send event.
Real clocks vs logical clocks Vector logical clocks
Properties of vector clocks
Pairwise Inconsistent
(VC(ei)[i]<VC(ej)[i])∨(VC(ej)[j]<VC(ei)[j])
Real clocks vs logical clocks Vector logical clocks
Properties of vector clocks
Definition (Consistent Cut)
A cut defined by (c1,· · ·,cn) is consistent if and only if: ∀i,j ∈[1· · ·n]:
VC(eici)[i]≥VC(ejcj)[i])
In other words, a cut is consistent if its frontier does not contain any pairwise inconsistent pair of events.
Passive monitoring
A Passive Approach to GPE
How it works
At each (relevant) event, each process sends anotification to the monitor describing it local state
The monitor collects notifications to reconstruct an observation of the global state.
An observation taken in this way can correspond to:
A consistent run
A run which is not consistent
No run at all
Can you find example of the three cases?
Can you explain why this happen?
Passive monitoring
Observations which are not runs
Problem
Observations may not correspond to a run because each notification sent by the process to the monitor may be delayed arbitrarily and thus arrive in any possible order
Solution
To adopt communication channels between the processes and the mon- itor that guarantee that messages are never re-ordered
Passive monitoring
Message ordering
Definition (FIFO Order)
Two messages sent by pi to pj must be delivered in the same order in which they were sent:
∀m,m0:sendi(m)−→sendi(m0)⇒deliverj(m)−→deliverj(m0) What is “deliver”?
Passive monitoring
Delivery Rules
How to order messages?
To be ordered, each message m carries a timestamp TS(m) containing “ordering” information
The act of providing the process with a message in the desired order is called delivery; the event deliver(m) is thus distinct from receive (m).
The rule describing which messages can be delivered among those received is called delivery rule
Passive monitoring
FIFO Order - Implementation
Each process maintains a local sequence number incremented at each notification sent
The timestamp of a notification message corresponds to the local sequence number of the sender at the time of sending
Definition (FIFO Delivery Rule)
If the last notification delivered byp0 frompj has timestamp s,p0 may deliver “any” message m received frompj with TS(m)=s+1.
Passive monitoring
Observations which are not consistent runs
Problem
If we use FIFO order between all processes and p0, all the observa- tions taken by p0 will be runs; but there is no guarantee that they are consistent runs.
Solution
To adopt communication channels that guarantee that notification ar- rives in an order that respects the happen-before relation..
Passive monitoring
Message ordering
Definition (Causal Order)
Two messages sent by pi and pj to pk must be delivered following the happen-before relation:
∀m,m0 :sendi(m)−→sendj(m0)⇒deliverk(m)−→deliverk(m0) Question
FIFO order among all channels...
Is it sufficient to obtain Causal delivery?
Passive monitoring
Example
Passive monitoring
Causal delivery and consistent observations
Theorem
Ifp0 uses a delivery rule satisfying Causal Order, then all of its obser- vations will be consistent.
Proof
Definition of Causal Order ? definition of a consistent observation Next, we will show three methods to implement a causal delivery rule
Passive monitoring Passive monitoring, v.1
Outline
Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks Passive monitoring
Passive monitoring, v.1 Passive monitoring, v.2
Passive monitoring Passive monitoring, v.1
Passive monitoring with real-time
Initial assumptions
All processes have access to a real-time clock RC
Let RC(e) be the real-time at which e is executed
All messages are delivered within a timeδ
The timestamp of message m sent by an event e=send(m) is TS(m)=RC(e).
Definition (DR1: Real-time delivery rule)
At time t, delivery all received notification messages m in increasing timestamp order.
Theorem
Observation O constructed byp0 using DR1 is guaranteed to be con- sistent
Passive monitoring Passive monitoring, v.1
Stability of messages
Definition (Stability)
A notification message m received by p0 is stable at p0 if no message m’ with TS(m’)¡TS(m) can be received in the future byp0
Definition (DR1: Real-time delivery rule)
At time t, delivery all received notification messages m such that TS(m)≤t−δ in increasing timestamp order.
Passive monitoring Passive monitoring, v.1
Proof
?Safety: Clock Condition for RC
e −→e0 ⇒RC(e)<RC(e0)
Note that RC(e) < RC(e0) 6⇒ e −→ e0, but this rule is sufficient to obtain consistent observations, as two notificationse −→e0 are never delivered in the incorrect order.
Liveness: Stability
At time t, any message sent by timet−δ is stable.
Note that real-time clocks do not support stability; it is the maximum delay of messages that enables it.
Passive monitoring Passive monitoring, v.2
Outline
Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks Passive monitoring
Passive monitoring, v.1 Passive monitoring, v.2
Passive monitoring Passive monitoring, v.2
Proof
Passive monitoring with logical clocks Initial assumptions
All processes have access to a logical clock LC; let LC(e) be the real-time at which e is executed
The timestamp of message m sent through an event e=send(m) is TS(m)=LC(e)
Definition (DR2: Deliver Rule for LC)
Deliver all received messages that are stable atp0 in increasing times- tamp order
Passive monitoring Passive monitoring, v.2
Passive monitoring with logical clocks
?Safety: Clock Condition for LC
e −→e0 ⇒LC(e)<LC(e0)
Note that LC(e) < LC(e0) 6⇒ e −→ e0, but this rule is sufficient to obtain consistent observations, as two notificationse −→e0 are never delivered in the incorrect order.
Passive monitoring Passive monitoring, v.2
Passive monitoring with logical clocks
Liveness: Stability
We need a way to reproduce the concept of δ in an asynchronous system, otherwise no notification message will be ever delivered.
Solution
Each process communicates with p0 using FIFO delivery
Whenp0 receives a message from pi describing an event e with timestamp TS(e), it is sure that it will never receive a message from pi describing an event e’ withTS(e0)≤TS(e)
Stability of message m at p0 can be guaranteed whenp0 has received at least one message from all other processes with a timestamp greater or equal than TS(m)
Passive monitoring Passive monitoring, v.2
Problems of Logical Clocks
They add unnecessary delays to observations
They require a constant flux of messages/events from all processes
Passive monitoring Passive monitoring, v.2
Passive Monitoring with Vector Clocks
Variables maintained atp0
Mthe set of notification messages received but not yet delivered byp0
an array D, initialized to 0’s, such that D[k] contains TS(m)[k]
where m is the last notification message delivered by p0 from process pk.
When a notification message is deliverable byp0?
A notification messagem∈ Mfrom processpj is deliverable as soon asp0 can verify that there is no other notification message m’
(neither inMnor in the channels) such that send(m0)−→send(m).
Passive monitoring Passive monitoring, v.3
Outline
Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks Passive monitoring
Passive monitoring, v.1 Passive monitoring, v.2
Passive monitoring Passive monitoring, v.3
Implementing Causal Delivery with Vector Clocks
m∈ M: a notification message sent by pj top0
m’: the last notification message delivered from process pk,k 6=j Definition (Weak Gap Detection)
If TS(m0)[k] < TS(m)[k] for some k 6= j, then there exists event sendk(m00) such that
sendk(m0)−→sendk(m00)−→sendj(m)
Passive monitoring Passive monitoring, v.3
Implementing Causal Delivery with Vector Clocks
Two conditions to be verified:
There is no earlier message frompj that has not been delivered yet.
Causal Delivery Rule, Part 1: D[j]==TS(m)[j]-1
∀k 6=j, let m’ be the last message from pk delivered byp0
(D[k]=TS(m’)[k]); we must be sure that no message m” frompk exists such that: sendk(m0)−→sendk(m00)−→sendj(m)
Causal Delivery Rule, Part 2: ∀k 6=j :D[k]≥TS(m)[k] It follows from Weak Gap Detection
Passive monitoring Passive monitoring, v.3
Example
Passive monitoring Passive monitoring, v.3
Final Comments
We have seen how to implement Causal Delivery “many-to-one”
The same rules apply if we implement a mechanism for implementing “one-to-many” (reliable broadcast)