Time, clocks and ordering of events

(1)

African University of Science and Technology

Time, clocks and ordering of events

Pr. Ousmane THIARE

(2)

Outline

Modeling Distributed Executions Happen-Before

Global states and cuts

Global predicate evaluation Problem Definition Example: deadlock detection

Real clocks vs logical clocks Logical clocks Scalar logical clocks Vector logical clocks

Passive monitoring Passive monitoring, v.1 Passive monitoring, v.2 Passive monitoring, v.3

(3)

Modeling Distributed Executions

Distributed Execution

Definition (Distributed algorithm)

A distributed algorithm is a collection of distributed automata, one per process

Definition (Distributed execution)

Theexecution of a distributed algorithm is a sequence of eventsex- ecuted by the processes

Partial execution: a finite sequence of events

Infinite execution: a infinite sequence of events Possible events

(4)

Histories

Definition (Local History)

The local history of processp_i is a (possibly infinite) sequence of events hi =e_i⁰e_i¹e_i²· · ·e_i^mⁱ (canonical enumeration)

Definition (Partial history)

The partial history up to event e_i^k is denoted h^k_i and is given by the prefix of the first k events ofhi

(5)

Histories

Local histories do not specify any relative timing between events belonging to different processes.

We need a notion of ordering between events, that could help us in deciding whether:

one event occurs before another

they are actually concurrent

(6)

Outline

Global predicate evaluation Real clocks vs logical clocks Passive monitoring

(7)

Happen-Before

Definition (Happen-before)

We say that an event ehappens-beforean event e’, and writee −→e⁰, if one of the following three cases is true:

∃p_i ∈Q

:e =e_i^r,e⁰ =e_i^s,r <s

(if e and e’ are executed by the same process, e before e’)

e=send(m)∧ e’=receive(m)

(if e is the send event of a message m and e’ is the corresponding receive event)

∃e⁰⁰:e −→e⁰⁰−→e⁰

(in other words, −→ is transitive)

(8)

Happen-Before

Space-Time Diagram of a Distributed Computation

(9)

Happen-Before

Meaning of Happen-Before)

Ife −→e⁰, this means that we can find a series of eventse¹e²e³· · ·eⁿ, wheree¹ =e andeⁿ=e⁰, such that for each pair of consecutive events eⁱ and eⁱ⁺¹:

eⁱ andeⁱ⁺¹ are executed on the same process, in this order

eⁱ =send(m) andeⁱ⁺¹=receive(m) Notes:

happen-before captures the concept of potential causal ordering

happen-before captures a flow of data between two events.

(10)

Modeling Distributed Executions Global states and cuts

Outline

Global predicate evaluation Real clocks vs logical clocks Passive monitoring

(11)

Global States

Definition (Local state)

The local stateof process p_i after the execution of evente_i^k is denoted σ_i^k

The local state contains all data items accessible by that process

Local state is completely private to the process

σ⁰_i is theinitial stateof process p_i Definition (Global state)

The global state of a distributed computation is an n-tuple of local states Σ = (σ₁,· · · , σ_n), one for each process.

(12)

Cut

Definition (Cut)

A cutof a distributed computation is the union of n partial histories, one for each process:

C =h^c₁¹∪h^c₂²∪ · · · ∪h^c_nⁿ

A cut may be described by a tuple (c₁,c₂,· · · ,c_n), identifying the frontier of the cut, i.e. the set of last events, one per process.

Each cut (c1,c2,· · ·,cn) has a corresponding global state (σ₁^c¹∪σ₂^c²∪ · · · ∪σ_n^cⁿ).

(13)

Cut

(14)

Consistent Cut

Consider cuts C’ and C in the previous figure.

Is it possible that cut C correspond to a “real” state in the execution of a distributed algorithm?

Is it possible that cut C’ correspond to a “real” state in the execution of a distributed algorithm?

(15)

Consistent Cut

Definition (Consistent cut)

A cut C isconsistentis for all events e and e’, (e ∈C)∧(e⁰ −→e)⇒e⁰ ∈C Definition (Consistent global state)

A global state is consistent if the corresponding cut is consistent.

In other words

A consistent cut is left-closed with regard to (w.r.t.) the

(16)

Consistent Cut

In the previous figures, C is consistent and C’ is not.

In the space-time diagram, a cut C is consistent if all the arrows start on the left of the cut and finish on the right of the cut.

Consistent cuts represent the concept of scalar time in distributed computation: it is possible to distinguish between a “before” and an “after”.

Predicates can be evaluated in consistent cuts, because they correspond to potential global states that could have taken place during an execution.

(17)

Global predicate evaluation Problem Definition

Outline

Modeling Distributed Executions Global predicate evaluation

Problem Definition

Example: deadlock detection

Real clocks vs logical clocks

(18)

Introduction

Definition (Global Predicate Evaluation)

The problem of detecting whether the global state of a distributed system satisfies some predicate Φ.

Motivation

Many important distributed problems require to react when when the global state of the system satisfies a given condition.

Monitoring: Notify an administrator in case of failures

Debugging: Verify whether an invariant is respected or not

Deadlock detection: can the computation continue?

Garbage collection: like Java, but distributed

Thus, the ability to construct a global state and evaluate a predicate over it is a core problem in distributed computing.

(19)

Examples

(20)

Why GPE is difficult

A global state obtained through remote observations could be

obsolete: represent an old state of the system.

Solution: build the global state more frequently

inconsistent: capture a global state that could never have been occurred in reality

Solution: build only consistent global states

incomplete: not “capture” every moment of the system Solution: build all possible consistent global states

(21)

Space-Time Diagram of a Distributed Computation

(22)

Global predicate evaluation Example: deadlock detection

Outline

Modeling Distributed Executions Global predicate evaluation

Problem Definition

Example: deadlock detection

Real clocks vs logical clocks Passive monitoring

(23)

Example – Deadlock detection on a multi-tier system

Processes in the previous figures use RPCs:

Client sends a request for method execution; blocks.

Server receives request.

Server executes method; may invoke other methods in other servers, acting as a client.

Server sends reply to client

Clients receives reply; unblocks.

Such a system can deadlock, as RPCs are blocking. It is important to be able to detect when a deadlock occurs.

(24)

Runs and consistent runs

Processes in the previous figures use RPCs:

Definition (Run)

Arunof global computation is a total ordering R that includes all the events in the local histories and that is consistent with each of them.

In other words, the events ofp_i appear in R in the same order in which they appear inh_i.

A run corresponds to the notion that events in a distributed computation actually occur in a total order

A distributed computation may correspond to many runs Definition (Consistent run)

A run R is said to be consistent if for all events e and e’, e −→ e⁰ implies that e appears before e’ in R.

(25)

Runs and consistent runs

e₁¹e₂¹e₃¹e₁²e₂²e₃²e₃¹e₃²e₃³e₄¹e₄³e₅¹e₅³e₆¹e₆³ ?

e₁¹e₂¹e₃¹e₁²e₃²e₃³e₃¹e₄¹e₃²e₄³e₂²e₅¹e₅³e₆¹e₆³ ?

(26)

Monitoring Distributed Computations

Assumptions:

There is a single processp₀called monitorwhich is responsible for evaluating Φ

We assume that the monitorp₀is distinct from the observed processes p₁· · ·p_n

Events executed on behalf of monitoring do not alter canonical enumeration of “real” events

In general, observed processes send notifications about local events to the monitor, which builds an observation.

(27)

Observations

Definition (Observation)

The sequence of events corresponding to the order in which notification messages arrive at the monitor is called anobservation.

Given the asynchronous nature of our distributed system, any permutation of a run R is a possible observation of it.

Definition (Consistent observation)

An observation is consistent if it corresponds to a consistent run.

(28)

Real clocks vs logical clocks

How to obtain consistent observations

The happen-before relation captures the concept of potential causality

In the “day-to-day” life, causality/concurrency are tracked using physical time

We use loosely synchronized watches;

Example: I have withdrawn money from an ATM in Trento at 13.00 on 17th May 2006, so I can prove that I’ve not withdrawn money on the same day at 13.20 in Paris

In distributed computing systems:

the rate of occurrence of events is several magnitudes higher

event execution time is several magnitudes smaller

If physical clocks are not precisely synchronized, the causality/concurrence relations between events may not be accurately captured

(29)

Real clocks vs logical clocks Logical clocks

Outline

Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks

Logical clocks Scalar logical clocks Vector logical clocks

(30)

Logical clocks

Instead of using physical clocks, which are impossible to synchronize, we use logical clocks.

Every process has a logical clockthat is advanced using a set of rules

Its value is not required to have any particular relationship to any physical clock.

Every event is assigned a timestamp, taken from the logical clock

The causality relation between events can be generally inferred from their timestamps

(31)

Logical clocks

Definition (Logical clock)

A logical clock LC is a function that maps an event e from a distributed system execution to an element of a time domain T :

LC :H−→T Definition (Clock Consistency)

e −→e⁰ ⇒LC(e)<LC(e⁰)

(32)

Real clocks vs logical clocks Scalar logical clocks

Outline

Passive monitoring

(33)

Scalar logical clocks (I)

Definition (Scalar logical clocks)

A Lamport’s scalar logical clock is a monotonically increasing software counter

Each process p_i keeps its own logical clock LC_i

Thetimestampof event e executed by process pi is denotedLC_i(e)

Messages carry the timestamp of their send event

Logical clocks are initialized to 0

(34)

Scalar logical clocks (II)

Update rule

Whenever an event e is executed by process pi, its local logical clock is updated as follows:

LCi =

LC_i+ 1 If e_i is an internal or send event max{LCi,TS(m)}+ 1 If ei=receive(m)

(35)

Scalar logical clocks (III)

(36)

Properties

Theorem

Scalar logical clocks satisfy Clock consistency, i.e.

e −→e⁰ ⇒LC(e)<LC(e⁰) Proof

This immediately follows from the update rules of the clock.

(37)

Scalar logical clocks

Theorem

Scalar logical clocks do not satisfy Strong clock consistency, i.e.

LC(e)<LC(e⁰)6⇒e −→e⁰

(38)

Real clocks vs logical clocks Vector logical clocks

Outline

Passive monitoring

(39)

Causal histories clocks

Definition (Causal History)

The causal history of an event e is the set of events that happen-before e, plus e itself.

θ(e) ={e⁰ ∈H|e⁰ −→e} ∪ {e}

Theorem

Causal histories satisfy Strong clock consistency Proof

0 0 0 0 0

(40)

Example

Problem:

Causal histories tend to grow too much; they cannot be used as

“timestamps” for messages.

(41)

Vector clocks

Causal history projection: θ_i(e) =θ(e)∩h_i =h^c_iⁱ

θ(e) =θ1(e)∪θ2(e)∪ · · · ∪θ(e) =h₁^c¹∪h^c₂²∪ · · · ∪h^c_nⁿ

In other words,θ(e) is a cut, which happens to be consistent.

Cuts can be represented by their frontiers: θ(e) = (c₁,c₂,· · ·,c_n) Definition

The vector clock associated to event e is a n-dimensional vector VC(e) such that

VC(e)[i] =c_iwhereθ_i(e) =h^c_iⁱ

(42)

Vector clocks: Implementation

Each process p_i maintains a vector clockVC_i, initially all zeroes;

When eventei is executed,VCi assumes the value ofVC(ei);

Ifei =send(m), the timestamp of m isTS(m) =VC(ei);

Update rule

When eventei is executed by process pi,VCi is updated as follows:

Ife_i is an internal or send event:

VC_i[i] =VC_i[i] + 1

Ife_i =receive(m):

VCi[j] =max{VC_i[j],TS(m)[j]} ∀j 6=i VC_i[i] =VC[i] + 1

(43)

Example

(44)

Properties of Vector clocks (I)

“Less than” relation for Vector clocks

V <V⁰ ⇐⇒(V 6=V⁰)∧(∀k : 1≤k ≤n :V[k]≤V⁰[k]) Strong Clock Condition

e −→e⁰ ⇐⇒VC(e)<VC(e⁰)⇐⇒θ(e)⊂θ(e⁰) Simple Strong Clock Condition

ei −→ej ⇐⇒VC(ei)[i]≤VC(ej)[i]

(45)

Properties of Vector clocks (II)

Definition (Concurrent events)

Eventse_i ande_j are concurrent (i.e. e_i||e_j) if and only if:

(VC(e_i)[i]>VC(e_j)[i])∧(VC(e_j)[j]>VC(e_i)[j])

In other words, evente_i does not happen-before e_j, ande_j does not happen beforee_i.

(46)

Concurrent events

(VC(e_i)[i]>VC(e_j)[i])∧(VC(e_j)[j]>VC(e_i)[j])

(47)

Properties of vector clocks

Definition (Pairwise Inconsistent)

Eventsei andej with i 6=j are pairwise inconsistent if and only if (VC(e_i)[i]<VC(e_j)[i])∨(VC(e_j)[j]<VC(e_i)[j]) In other words, two events are pairwise inconsistent if they cannot belong to the frontier of the same consistent cut. The formula characterize the fact that the cut include a receive event without including a send event.

(48)

Properties of vector clocks

Pairwise Inconsistent

(VC(ei)[i]<VC(ej)[i])∨(VC(ej)[j]<VC(ei)[j])

(49)

Properties of vector clocks

Definition (Consistent Cut)

A cut defined by (c₁,· · ·,c_n) is consistent if and only if: ∀i,j ∈[1· · ·n]:

VC(e_i^cⁱ)[i]≥VC(e_j^c^j)[i])

In other words, a cut is consistent if its frontier does not contain any pairwise inconsistent pair of events.

(50)

Passive monitoring

A Passive Approach to GPE

How it works

At each (relevant) event, each process sends anotification to the monitor describing it local state

The monitor collects notifications to reconstruct an observation of the global state.

An observation taken in this way can correspond to:

A consistent run

A run which is not consistent

No run at all

Can you find example of the three cases?

Can you explain why this happen?

(51)

Passive monitoring

Observations which are not runs

Problem

Observations may not correspond to a run because each notification sent by the process to the monitor may be delayed arbitrarily and thus arrive in any possible order

Solution

To adopt communication channels between the processes and the monitor that guarantee that messages are never re-ordered

(52)

Passive monitoring

Message ordering

Definition (FIFO Order)

Two messages sent by pi to pj must be delivered in the same order in which they were sent:

∀m,m⁰:sendi(m)−→sendi(m⁰)⇒deliverj(m)−→deliverj(m⁰) What is “deliver”?

(53)

Passive monitoring

Delivery Rules

How to order messages?

To be ordered, each message m carries a timestamp TS(m) containing “ordering” information

The act of providing the process with a message in the desired order is called delivery; the event deliver(m) is thus distinct from receive (m).

The rule describing which messages can be delivered among those received is called delivery rule

(54)

Passive monitoring

FIFO Order - Implementation

Each process maintains a local sequence number incremented at each notification sent

The timestamp of a notification message corresponds to the local sequence number of the sender at the time of sending

Definition (FIFO Delivery Rule)

If the last notification delivered byp₀ fromp_j has timestamp s,p₀ may deliver “any” message m received frompj with TS(m)=s+1.

(55)

Passive monitoring

Observations which are not consistent runs

Problem

If we use FIFO order between all processes and p0, all the observations taken by p₀ will be runs; but there is no guarantee that they are consistent runs.

Solution

To adopt communication channels that guarantee that notification ar- rives in an order that respects the happen-before relation..

(56)

Passive monitoring

Message ordering

Definition (Causal Order)

Two messages sent by pi and pj to pk must be delivered following the happen-before relation:

∀m,m⁰ :sendi(m)−→sendj(m⁰)⇒deliverk(m)−→deliverk(m⁰) Question

FIFO order among all channels...

Is it sufficient to obtain Causal delivery?

(57)

Passive monitoring

Example

(58)

Passive monitoring

Causal delivery and consistent observations

Theorem

Ifp0 uses a delivery rule satisfying Causal Order, then all of its observations will be consistent.

Proof

Definition of Causal Order ? definition of a consistent observation Next, we will show three methods to implement a causal delivery rule

(59)

Passive monitoring Passive monitoring, v.1

Outline

Modeling Distributed Executions Global predicate evaluation Real clocks vs logical clocks Passive monitoring

Passive monitoring, v.1 Passive monitoring, v.2

(60)

Passive monitoring with real-time

Initial assumptions

All processes have access to a real-time clock RC

Let RC(e) be the real-time at which e is executed

All messages are delivered within a timeδ

The timestamp of message m sent by an event e=send(m) is TS(m)=RC(e).

Definition (DR1: Real-time delivery rule)

At time t, delivery all received notification messages m in increasing timestamp order.

Theorem

Observation O constructed byp₀ using DR1 is guaranteed to be consistent

(61)

Stability of messages

Definition (Stability)

A notification message m received by p0 is stable at p0 if no message m’ with TS(m’)¡TS(m) can be received in the future byp0

Definition (DR1: Real-time delivery rule)

At time t, delivery all received notification messages m such that TS(m)≤t−δ in increasing timestamp order.

(62)

Proof

?Safety: Clock Condition for RC

e −→e⁰ ⇒RC(e)<RC(e⁰)

Note that RC(e) < RC(e⁰) 6⇒ e −→ e⁰, but this rule is sufficient to obtain consistent observations, as two notificationse −→e⁰ are never delivered in the incorrect order.

Liveness: Stability

At time t, any message sent by timet−δ is stable.

Note that real-time clocks do not support stability; it is the maximum delay of messages that enables it.

(63)

Outline

(64)

Proof

Passive monitoring with logical clocks Initial assumptions

All processes have access to a logical clock LC; let LC(e) be the real-time at which e is executed

The timestamp of message m sent through an event e=send(m) is TS(m)=LC(e)

Definition (DR2: Deliver Rule for LC)

Deliver all received messages that are stable atp0 in increasing timestamp order

(65)

Passive monitoring with logical clocks

?Safety: Clock Condition for LC

e −→e⁰ ⇒LC(e)<LC(e⁰)

Note that LC(e) < LC(e⁰) 6⇒ e −→ e⁰, but this rule is sufficient to obtain consistent observations, as two notificationse −→e⁰ are never delivered in the incorrect order.

(66)

Passive monitoring with logical clocks

Liveness: Stability

We need a way to reproduce the concept of δ in an asynchronous system, otherwise no notification message will be ever delivered.

Solution

Each process communicates with p₀ using FIFO delivery

Whenp0 receives a message from pi describing an event e with timestamp TS(e), it is sure that it will never receive a message from pi describing an event e’ withTS(e⁰)≤TS(e)

Stability of message m at p₀ can be guaranteed whenp₀ has received at least one message from all other processes with a timestamp greater or equal than TS(m)

(67)

Problems of Logical Clocks

They add unnecessary delays to observations

They require a constant flux of messages/events from all processes

(68)

Passive Monitoring with Vector Clocks

Variables maintained atp0

Mthe set of notification messages received but not yet delivered byp₀

an array D, initialized to 0’s, such that D[k] contains TS(m)[k]

where m is the last notification message delivered by p₀ from process p_k.

When a notification message is deliverable byp₀?

A notification messagem∈ Mfrom processpj is deliverable as soon asp0 can verify that there is no other notification message m’

(neither inMnor in the channels) such that send(m⁰)−→send(m).

(69)

Outline

(70)

Implementing Causal Delivery with Vector Clocks

m∈ M: a notification message sent by p_j top0

m’: the last notification message delivered from process pk,k 6=j Definition (Weak Gap Detection)

If TS(m⁰)[k] < TS(m)[k] for some k 6= j, then there exists event send_k(m⁰⁰) such that

send_k(m⁰)−→send_k(m⁰⁰)−→send_j(m)

(71)

Implementing Causal Delivery with Vector Clocks

Two conditions to be verified:

There is no earlier message frompj that has not been delivered yet.

Causal Delivery Rule, Part 1: D[j]==TS(m)[j]-1

∀k 6=j, let m’ be the last message from p_k delivered byp0

(D[k]=TS(m’)[k]); we must be sure that no message m” fromp_k exists such that: send_k(m⁰)−→send_k(m⁰⁰)−→send_j(m)

Causal Delivery Rule, Part 2: ∀k 6=j :D[k]≥TS(m)[k] It follows from Weak Gap Detection

(72)

Example

(73)

Final Comments

We have seen how to implement Causal Delivery “many-to-one”

The same rules apply if we implement a mechanism for implementing “one-to-many” (reliable broadcast)