HAL Id: inria-00260643
https://hal.inria.fr/inria-00260643
Submitted on 4 Mar 2008
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Revisiting Simultaneous Consensus with Crash Failures
Y. Moses, Michel Raynal
To cite this version:
Y. Moses, Michel Raynal. Revisiting Simultaneous Consensus with Crash Failures. [Research Report]
PI 1885, 2008, pp.17. �inria-00260643�
I
R
I
S
A
IN ST ITU T D E R EC HE RC HE EN INF OR MA TIQ UE ET SYS TÈM ES ALÉ ATOI RESP U B L I C A T I O N
I N T E R N E
N
oI R I S A
CAMPUS UNIVERSITAIRE DE BEAULIEU - 35042 RENNES CEDEX - FRANCE
ISSN 1166-8687
1885
REVISITING SIMULTANEOUS CONSENSUS WITH CRASH
FAILURES
http://www.irisa.fr
Revisiting Simultaneous Consensus with Crash Failures
Y. Moses*
M. Raynal**
Syst`emes communicantsProjet ASAP
Publication interne n ˚ 1885 — Mars 2008 — 17 pages
Abstract: This paper addresses the “consensus with simultaneous decision” problem in a synchronous system prone totprocess crashes. This problem requires that all the processes that do not crash decide on the same value (consensus) and that all decisions are made during the very same round (simultaneity). So, there is a double agreement, one on the decided value (data agreement) and one on the decision round (time agreement). This problem was first defined by Dwork and Moses who analyzed it and solved it using an analysis of the evolution of states of knowledge in a system with crash failures. The current paper presents a simple algorithm that optimally solves simultaneous consensus. Optimality means in this case that the simultaneous decision is taken in each and every run as soon as any protocol decides, given the same failure pattern and initial value. The design principle of this algorithm is simplicity, a first-class criterion. A new optimality proof is given that is stated in purely combinatorial terms.
Key-words: Consensus, Distributed algorithm, Fault-tolerance, Round-based computation, Simultaneous decision, Synchronous message-passing system.
(R´esum´e : tsvp)
*Department of Electrical Engineering, Technion, Haifa, 32000 Israel [email protected] **IRISA, Universit´e de Rennes 1, Campus de Beaulieu, 35042 Rennes Cedex, France [email protected]
Centre National de la Recherche Scientifique Institut National de Recherche en Informatique (UMR6074) Université de Rennes 1 – Insa de Rennes et en Automatique – unité de recherche de Rennes
D´ecision simultan´ee en environement synchrone avec crash de processus
R´esum´e : Ce rapport pr´esente un algorithme de consensus pour un syst`eme synchrone avec crash de processus, dans lequel les processus qui d´ecident le font `a la mˆeme ronde de calcul. L’accent erst mis sur la simplicit´e de conception de cet algorirthme.Mots cl´es : Syst`eme synchrone, algorithme distribu´e, consensus, crash de processus, d´ecision simulta´ee, mod`ele de calcul fond´e sur les rondes, syst`eme synchrone.
1
Introduction
The consensus problem Fault-tolerant systems often require a means by which processes or processors can arrive at an exact mutual agreement of some kind [15]. If the processes defining a computation have never to agree, that computation is actually made up of a set of independent computations, and consequently is not an inherently distributed computation. The agreement requirement is captured by the consensus problem that is one of the most important problems of fault-tolerant distributed computing. It actually occurs every time entities (usually called agents, processes -the word we use in the following-, nodes, sensors, etc.) have to agree. The consensus problem is surprisingly simple to state: each process is assumed to propose a value, and all the processes that are not faulty have to agree/decide (termination), on the same value (agreement), that has to be one of the proposed values (validity). The failure model considered in this paper is the process crash model.
While consensus is impossible to solve in pure asynchronous systems despite even a single process crash [6] (“pure asynchronous systems” means systems in which there is no upper bound on process speed and message transfer delay), it can be solved in synchronous systems (i.e., systems where there are such upper bounds) whatever the numbernof processes and the numbertof process crashes (t<n).
An important measure for a consensus algorithm is the time it takes for the non-faulty processes to decide. As a computation in a synchronous system can be abstracted as a sequence of rounds, the time complexity of a synchronous consensus algorithm is measured as the minimal number of rounds (R
t
) a process has to execute before deciding, in the worst case scenario. It has been shown (see, e.g., in [5, 12]) thatRt
=
t+ 1
. Moreover, that bound is tight: there exist algorithms (e.g., see [1, 9, 16]) where no process ever executes more thanRt
rounds (these algorithms are thus optimal with respect to that bound).Whilet
+ 1
rounds are needed in the worst case scenario, the major part of the executions have few failures or are even failure-free. So, an important issue is to be able to design early deciding algorithms, i.e., algorithms that direct the processes to decide “as early as possible” in good scenarios. Letf,0
f t, be the number of actual process crashes in an execution. It has been shown that the lower bound on the number of rounds is thenRtf
= min(
f+ 2
t+ 1)
(e.g., [2, 12, 17]). As before, this bound is tight: algorithms in which no process ever executes more thanRtf
exist (e.g., see [2, 7, 16]).Simultaneous decision Consensus agreement is a data agreement property, namely the processes have to agree on the same value. According to the actual failure pattern, and the way this pattern is perceived by the processes, it is possible for several processes to decide at distinct rounds. The only guarantee lies in the fact that this round can be bounded byR
t
(orRtf
).This uncertainty on the set of round numbers at which the processes decide, can be a serious drawback for the real-time oriented applications where agreement is required, not only on the decided value, but also on the time the decision is taken. More precisely, these applications require that the processes decide on the same value (data
agreement), during the very same round (time agreement). This property is also called simultaneous decision.
Among the algorithms that ensure simultaneous decision, there are trivially all the “classical” consensus algorithms where all the processes that do not crash decide systematically at the end of the roundR
t
=
t+ 1
. This observation suggests immediately the following question: “As far as the simultaneous decision property is concerned, are there early deciding algorithms, i.e., algorithms whose maximal number of rounds in the worst case scenario can be deter-mined fromf (andt)?” Unfortunately, it is shown in [2] that the answer to that question is negative: Rt
=
t+ 1
rounds is the best that can be done when both the parameterstandf are considered. At first glance, this can appear as counter-intuitive as it states thatt+1
is a bound for simultaneous decision whatever the value off(i.e., even when no process crashes)!Early simultaneous decision So, given an execution, a more refined analysis requires to consider not the parameters tandf, buttand the failure pattern that actually occurs in the execution. The previous question can now be translated as follows: “Which are the failure patterns that force a simultaneous decision consensus algorithm to decide ink rounds for
2
k t+ 1
”. This question was posed and answered in [3] where a bound is stated and proved (this bound is denotedR StF
in the following;Fstands for the failure pattern that occurs in the considered run).To better understand the intuition that underlies the boundR S
tF
, let us consider the particular failure pattern wheretprocesses have crashed before the execution starts. Then;tremaining processes define consequently a PI n ˚ 1885failure-free system. During the first round each non-crashed process can learn that, from then on, it is in a failure-free system ofn;tprocesses, and consequently all the processes can exchange their view of the system and decide the same value at the end of the second round. More generally, what makes things easier is when many crashes occur early in the computation. Roughly speaking, this is because a crash is stable, while the property “a process has not crashed” is not a stable property. This instability property and the occurrence of only a few crashes, makes an agreement on an early round for a simultaneous decision more difficult to obtain.
The previous discussion suggests that determining the smallest round at which the processes can simultaneously decide should take into account the pairs (round number, number of processes perceived as crashed until that round). This intuition is formalized in [3] as follows. LetC
r]
be the set of the processes that are seen as crashed by (at least) one of the processes that survive (i.e., do not crash before the end of) roundr. For anyr, letdr
= max(0
jCr]
j;r)
; as we shall see,dr
represents the number of rounds that could be saved with respect to the worst caset+ 1
bound, thanks to the failures that occurred and were seen by at least one process that terminate roundr. LetD= max
r
0
(
dr
)
; Drepresents the best saving in terms of rounds. Notice that the values ofdr
andDin a given run are determined only by its failure patternF. It is shown in [3] that the smallest round number at the end of which a common decision can be simultaneously taken isR StF
= (
t+ 1)
;D(whereD=
D(
F)
). The quantityDis considered in [3] to be the waste inherent in the failure patternF, since it specifies the number of rounds the adversary has “lost” in its quest to delay decision for as long as possible. This optimality proof is established in [3] from the theory of distributed (implicit) knowledge and common knowledge (and the way distributed knowledge becomes common knowledge in a distributed system) [4, 8]. An optimal binary consensus algorithm that directs the processes to decide simultaneously at the end of the roundR StF
= (
t+ 1)
;Dis also described in [3]. Its construction is derived based on a the same knowledge-based analysis.Content of the paper This paper presents a simple construction of an optimal simultaneous multi-valued consensus algorithm, that can be seen as a variant that revisits the algorithm presented in [3]. This variant is based on concepts and proofs introduced in [10, 11]. Here, the aim is not to design a brand new algorithm, but to construct a simple algorithm from a few definitions and simple observations. Moreover, a simpler matching lower bound proving the optimality of this algorithm is presented. The main criterion is design simplicity. Interestingly, not only the design, but also the proofs of the algorithm are simple. While the connection to knowledge is hinted at here and there, the development and proofs are purely combinatorial. We hope that the relative simplicity of the development in this paper will make the simultaneous agreement property and the simple (and elegant) concepts its implementation relies on more broadly accessible.
The paper is made up of 6 sections. The computation model and the problem are introduced in Section 2. The algorithm is presented in Section 3, while its proof is given in Section 4. Section 5 presents a new lower-bound proof (simplifying that of [3]) of the optimality of the algorithm presented in Section 3. Finally, Section 6 states a few concluding remarks.
2
Model and problem specification
2.1
Computation model
Round-based synchronous system The system model consists of a finite set of processes, namely,
=
fp 1:::p
n
g, that communicate and synchronize by sending and receiving messages through channels. (Sometimes we also usep andqto denote processes.) Every pair of processes is connected by a bi-directional reliable channel (which means that there is no creation, alteration, loss or duplication of message).The system is round-based synchronous. This means that each execution consists of a sequence of rounds. Those are identified by the successive integers
1
2
etc. For the processes, the current round number appears as a global variablerthat they can read, and whose progress is managed by the underlying system. A round is made up of three consecutive phases:A
send
phase in which each process sends the same message to all the processes (including itself). Areceive
phase in which each process receives messages.The fundamental property of the synchronous model lies in the fact that a message sent by a processp
i
to a processpj
at roundr, is received bypj
at the very same roundr.A
computation
phase during which each process processes the messages it received during that round and executes local computation.Process failure model A process is faulty during an execution if its behavior deviates from that prescribed by its algorithm, otherwise it is correct. We consider here the
crash
failure model, namely, a faulty process stops its execution prematurely. After it has crashed, a process does nothing. If a process crashes in the middle of a sending phase, only a subset of the messages it was supposed to send might actually be received.As already indicated, the model parametert(
1
t < n) denotes an upper bound on the number of processes that can crash in a run. A failure patternF is a list of at mostttripleshqkq
Bq
i. A triplehqkq
Bq
istates that the processqcrashes while executing the roundkq
(hence, it sends no messages after roundkq
), while the setBq
denotes the set of processes that do not receive the message sent byqduring the roundkq
.2.2
The simultaneous consensus problem
The problem has been informally stated in the introduction: every processp
i
proposes a valuevi
(called its initialvalue) and the processes have to decide, during the very same round, on the same value that has to be one of the
proposed values. This can be stated as a set of four properties that any algorithm solving the problem has to satisfy.
Decision
. Every correct process decides.
Validity
. A decided value is a proposed value.
Data agreement
. No two processes decide different values.
Simultaneous decision
(orTime agreement
). No two processes decide at distinct rounds.Given a setV of two or more values, an input configuration is an assignmentI
:
! V of an initial valuevi
to each processpi
. We are considering the simultaneous consensus problem under the assumption that alljVjn
input configurations are possible.Traditional consensus algorithms for the crash failure model are guaranteed to decide within at mostt
+1
rounds [2, 15]. Any such algorithm can be converted into a simultaneous consensus algorithm by delaying any early decision and having all deciding processes decide only in roundt+ 1
. Dwork and Moses showed that simultaneous decision can often be obtained much earlier than that [3]. Although every algorithm will have runs that decide int+ 1
rounds, simultaneous decision can be obtained as early as the second round in some cases. The goal, then, is to design a simple algorithm that direct the processes to decide both as early as possible and simultaneously.3
A simple optimal algorithm
3.1
Preliminary definitions
As the system model requires each process to send a message to all the processes at each round, process failures can be easily detected, and this detection is done as soon as possible. In addition to this very simple failure detection mechanism, the algorithm is based on other simple notions, namely, the notions of clean round (introduced in [3]), and the notion of horizon (introduced in [10]).
Failure discovery The failure of a processqis discovered (for the first time) in roundrifris the first round such that there is a processpthat (1) does not receive a roundrmessage fromq, and (2) survives (i.e., completes without crashing) roundr.
Clean round A roundris clean if no process is discovered faulty for the first time in that round. This means that a process that crashes during a clean roundrhas sent its roundrmessage to all the processes that proceed to the round r
+ 1
.Let us call an algorithm symmetric if a process never sends different messages to distinct processes in the same round. The following property is an immediate consequence of the previous definitions.1
1In a precise sense, a clean round can be used to ensure that the knowledge of the various processes is identical. Once this happens the processes are in agreement about initial values. They then need to discover this and coordinate their decisions.
Property 1 In a symmetric algorithm, if round ris clean, then all the processes that proceed to the roundr
+ 1
received, during the roundr, messages from the same set of processes (including at least all of them).
Let us observe that a clean round is not necessarily a failure-free round. It is possible that a processpcrashes in a clean roundrbut no process active at the end ofrhas noticed its crash (phas crashed after its sending phase and before the end of the roundr, or more generallypas crashed duringrafter sending its roundrmessage at least to the processes that survive roundr). Similarly a failure-free round is not necessarily clean. As an example a failure-free roundr
+ 1
that follows a clean roundrduring which a crash occurred is not clean.Horizon [10] Given a processp
i
and a roundr1
, letxbe the greatest number of process crashes that occurred between the round1
and the roundr;1
(included) and are known bypi
(to have crashed in the firstr;1
rounds) by the end ofr.The valueh
i
(
r) =
r+
t;xis called the horizon ofpi
at roundr. We havehi
(1) =
t+1
. If three crashes occurred by the end of the first round and are reported topi
during the second round, we havehi
(2) =
t;1
.As we will see, the horizon notion (of a processp
i
at roundr) is a key notion to determine the smallest round at the end of which the same value can be simultaneously decided. The following simple theorem (that will be exploited in the presentation of the algorithm) explains why this notion is crucial.Theorem 1 Letxbe defined as indicated above, andp
i
a process that survives roundr. There is a clean roundysuchthatr y h
i
(
r) =
r+
t;x.Proof Let us first observe that, as at leastx processes have been discovered as faulty between the round
0
and the roundr;1
(included), at mostt;xprocesses can be discovered as faulty between the roundr(included) and the roundr+
t;x(included). But there aret;x+ 1
rounds fromrtor+
t;x, from which we conclude that at leastone of these rounds is clean. 2
Theorem
1
3.2
Description of the algorithm
Local variables Each processp
i
manages the following local variables. Some variables are presented as belonging to an array. This is only for notational convenience, as such array variables can be implemented as simple variables.est
i
contains, at the end ofr,pi
’s current estimate of the decision value. Its initial value isvi
, the value proposed bypi
.f
i
r]
denotes the set of processes from whichpi
has not received a message during the roundr. (So, this variable is the best current estimate thatpi
can have of the processes that have crashed.)Letf
i
r] =
nfi
r]
(i.e., the set of processes from whichpi
has received a roundrmessage).f
0
i
r;1]
is a value computed bypi
during the roundr, but that refers to crashes that occurred up to the round r;1
(included), hence the notation. It is the valueS
p
j2
f
ir
]f
j
r;1]
, which means thatf 0i
r;1]
is the set of processes that were known as crashed at the end of the roundr;1
by at least one of the processes from which pi
has received a roundrmessage. This value is computed bypi
during the roundr. As a processpi
receives its own messages, we havefi
r;1]
f0
i
r;1]
.bh
i
r]
represents the best (smallest) horizon value known bypi
at roundr. It ispi
’s best estimate of the smallest round for a simultaneous decision. Initially,bhi
0] =
hi
(0) =
t+ 1
.Process behavior Each processp
i
not crashed at the beginning ofrsends to all the processes a message containing its current estimate of the decision value (esti
), and the setfi
r;1]
of processes it currently knows as faulty. After it has received the roundrmessages,pi
computes the new value ofesti
and the value ofbhi
r]
. The new value ofesti
is the smallest of the estimates values it has seen so far. As far as the value ofbhi
r]
is concerned, we have the following. The computation of bhi
r]
has to take into accounthi
(
r)
. This is required to benefit from Theorem 1 that states that there is a clean roundysuch thatr y hi
(
r)
. When this clean round will be executed, any two processespi
andpj
that execute it will haveesti
=
estj
, and (as they will receive messages from the same set of processes, see Property 1) will be such thatf0
i
r;1] =
f 0j
r;1]
. It follows that, we will havehi
(
y) =
hj
(
y)
, thereby creating correct “seeds” for determining the smallest round for a simultaneous decision. This allows the processes to determine rounds at which they can simultaneously decide.As we are looking for the first round where a simultaneous decision is possible,bh
i
r]
has to be set tomin
; hi
(0)
hi
(1)
:::hi
(
r)
, i.e.,bhi
r] = min
; bhi
r;1]
hi
(
r)
.Finally, according to the previous discussion, the algorithm directs a processp
i
to decide at the end of the first round rthat is equal to the best horizon currently known bypi
, i.e., whenr=
bhi
r]
.The resulting algorithm is presented in Figure 1, whereh
i
(
r)
(see line 08) is expressed as a function ofr;1
to emphasize the fact that it could be computed at the end of the roundr;1
by an external omniscient observer. The local boolean variabledecidedis used only to prove the optimality of the algorithm (see Section 5). Its suppression does not alter the algorithm.algorithm PROPOSE
(
v i)
: (01) est i v i; bh i0]
t+ 1
;f i0]
;decided false; % initialization % (02) whenr
= 1
2
:::do %r: round number %(03) begin round (04) send
(
est i f i r;1])
to all; % includingp iitself % (05) letf 0 ir;
1] =
the union of thef jr;
1]
sets received duringr;(06) letf
i
r
] =
the set of processes from whichpihas not received a message during r; (07) esti
min(
all theestjreceived duringr)
;(08) leth i
(
r) = (
r;1) + (
t+ 1
;jf 0 i r;1]
j)
; (09) bh i r] min
; bh i r;1]
h i(
r)
; (10) ifr=
bh ir
]
thendecided true;return(est i) end if (11) end roundFigure 1: Optimal simultaneous decision despite up totcrash failures (code forp
i
)4
Proof of the algorithm
Lemma 1
Validity property
. A decided value is a proposed value.Proof The proof is an immediate consequence of the initialization of theest
i
local variables (line 01), the reliability of the channels, and themin()
operation used at line 07. 2Lemma
1
Lemma 2 Letp
i
be a correct process.8r0
we havehi
(
r)
r. Proof Since the processes in the setf0
i
r;1]
are processes that have crashed by the end of the roundr;1
, it follows thatt;jf 0i
r;1]
j0
. Consequently,hi
(
r) =
r+
t;jf 0i
r;1]
jr. 2Lemma
2Notation: Considering an arbitrary execution, letp
i
be a process that is correct in that execution.Let BH
i
= min
r
0h
i
(
r)
. BHi is the smallest value ever attained by the function
h
i
(
r)
, i.e., the smallest horizon value determined bypi
.LetL
i
= max(
frjhi
(
r) =
BH ig
)
.Li
is the last round whose horizon value isBH i. It follows from these definitions that ifL0
>L
i
thenhi
(
L 0)
>h
i
(
Li
)
.Lemma 3 Lett < n. The roundL
i
is a clean round (i.e., no process is discovered faulty for the first time in thatround).
Proof Assume, by way of contradiction, thatL
i
is not clean (recall thatpi
is a correct process). This means there is a processpz
that is seen faulty for the first time in roundLi
by some processpy
. Notice thatpz
2=f0
i
Li
;1]
sincepz
was not discovered faulty in the previous rounds. There are two cases.Case 1:p
i
receives a message frompy
in roundLi
+ 1
.(This case includes the case wherep
i
andpy
are the same process). Aspy
does not receive a message from pz
duringLi
, and a crash is stable, we havepz
2fy
Li
]
. Moreover, due to the case assumption, and the fact that the roundLi
+ 1
message frompy
topi
carriesfy
Li
]
, it follows thatf0
i
Li
]
containsf 0i
Li
;1]
fpz
g. Consequently,jf 0i
Li
]
j>jf 0i
Li
;1]
j. It follows thathi
(
Li
+ 1)
hi
(
Li
)
, contradicting the definition ofLi
. PI n ˚ 1885Case 2:p
i
does not receive a message frompy
in roundLi
+ 1
.In that case, bothp
z
andpy
are seen faulty for the first time bypi
during the roundLi
+ 1
. So,fi
Li
+ 1]
containsf0
i
Li
;1]
fpy
pz
g. Sincef 0i
Li
+ 1]
(computed bypi
during the roundLi
+ 2
) containsfi
Li
+ 1]
, we havejf 0i
Li
+ 1]
jjf 0i
Li
;1]
j+ 2
. Thus, we have hi
(
Li
+ 2) = (
Li
+ 2) +
t; jf 0i
Li
+ 1]
j(
Li
+ 2) +
t;(
jf 0i
Li
;1]
j+ 2)
=
Li
+
t; jf 0i
Li
;1]
j=
hi
(
Li
)
which again contradicts the definition ofLi
.2
Lemma
3Lemma 4 Lett<n. Every correct process decides. Moreover, all processes that decide do so in the same round and
decide on the same value.
Proof
Decision property
: Let us consider a correct processpi
. Notice that, due to the initialization and line 09 we have 8r:
bhi
r]
t+ 1
, from which we concludeBHi
t
+ 1
. So, to prove thatpi
decides we have to show thatpi
does not miss the testr=
BHi at line 10. This could happen if the first round
` such thatbh
i
`;1]
>BHi and
bh
i
`] =
BHiis such that
`>BH
i. We prove that this cannot happen.
Let us observe that, due to Lemma 2, we haveh
i
(
`)
`. It then follows frombhi
`;1]
> BHi,hi
(
`)
`, bhi
`] =
BHi, and line 09, that BH
i
=
bh
i
`] = min(
bhi
`;1]
hi
(
`)) =
hi
(
`)
`, i.e.,BH i`, which establishes the result. It follows thatp
i
decides no later than roundt+ 1
.Simultaneous decision for the correct processes
: We first show that no two correct processes pi
andpj
decide at distinct rounds. Due to the algorithm, ifpi
andpj
decide, they decide at roundBHi and BH j, respectively. We show thatBH i
=
BHj. Due to Lemma 3, the round
L
i
is clean. Hence, during the roundLi
,pj
receives the same messages thatpi
receives (Property 1). Thusf0
i
Li
;1] =
f 0j
Li
;1]
and consequently,hi
(
Li
) =
hj
(
Li
)
. Since bhj
Li
]
hj
(
Li
)
by line 09, it follows thatbhj
Li
]
bhi
Li
] =
BHi, and thus BH
j BH
i. By symmetry the same reasoning yieldsBH
i BH
j, from which it follows that BH
i
=
BH
j. This proves that no two correct processes decide at distinct rounds.2
Simultaneous decision for the faulty processes
: BHbeing the round at which the correct processes decide, let us now consider the case of a faulty processpj
. Aspj
behaves as a correct process until it crashes, and as the correct processes decide in the same roundBH, it follows that no faulty process decides beforeBH, and ifpj
executes line 10 of roundBH, it does decide as if it was a correct process.Data agreement property
. The fact that no two processes decide different values comes from the existence of the clean roundLi
that appears before a process decision. During that round, all the processes that are alive at the end of this round have received the same set of estimate values (Property 1), and selected the smallest of them. It follows that, from the end of that round, there is a single estimate value in the system, which proves the data agreement property. 2Lemma
4
Definitions This paragraph recalls notions and results that have already been stated in the introduction. Given an execution, letF be the failure pattern that occurs in that execution.
LetS
r]
be the set of processes that survive (i.e., complete) roundr. LetCr] =
S
p
i2S
r
]f
i
r]
, i.e., the set of the processes that are known to have crashed by at least one of the processes that survives roundr. Observe thatf0
i
r]
Cr]
, for anypi
2Sr]
.2In [3, 8] it is shown that before performing such a simultaneous action the processes must attain common knowledge that they are doing so. In particular, they must have common knowledge that the decided valuevis one of the initial values in the run.
LetD
= max
r
0(
d
r
)
wheredr
=
jCr]
j;r. (In the introduction,Dhas been called the waste inherent inF, i.e., the number of rounds the adversary has lost in his quest to delay decision for as long as possible.)Notice that D
0
, sinceC0] = 0
andD d0
=
C
0]
;0 = 0
. The following optimality results are shown in [3]. The smallest number of roundsR StF
that any simultaneous decision consensus algorithm can achieve is R StF
= (
t+ 1)
;Dwhent<n;1
, andR StF
=
t;Dwhent=
n;1
.Theorem 2 Lett<n. The algorithm described in Figure 1 solves the consensus problem with simultaneous decision.
In a run with failure patternF, decision is reached in roundt
+ 1
;DwhereD=
D(
F)
is the waste inherent inF. Proof The proof of the validity, decision, simultaneous decision and data agreement properties follow from the Lemmas 1 and 4. We now show that the decision is obtained in roundt+ 1
;D. Let us consider an arbitrary run of the algorithm. It follows from Lemma 4 thatBHi
=
BH
j for any pair of processes
p
i
andpj
that decide. LetBH denote this round. The proof of the claim amounts to showing thatBH t+ 1
;DandBH t+ 1
;D.Letp
i
be a process that decides andRthe last round such thatjCR]
j;R=
D(i.e.,jCR+
x]
j;(
R+
x)
<D=
jCR]
j;R, for anyx>0
). Let us observe that, due to the lines 08-10 of the algorithm,BH is attained at the round numbers that make the functionhi
(
r) = (
r;1) +
t;jf0
i
r]
j+ 1
minimal. Moreover, it follows from the definition ofDandRthatjCR+ 1]
j jCR]
j. SinceCR]
CR+ 1]
, it follows thatCR] =
CR+ 1]
, i.e., no new process failure is discovered in roundR+ 1
, so the roundR+ 1
is clean and we havejf0
i
R]
j=
jCR]
j. Due to line 08 of the roundR+ 1
, we havehi
(
R+ 1) =
R+
t+ 1
;jf 0i
R]
j= (
t+ 1)
;(
jf 0i
R]
j;R) =
t+ 1
;D, from which we concludeBH t+ 1
;D.For the other direction, let us recall that, due to Lemma 3, the roundL
i
>0
is clean. It follows thatf 0i
Li
;1] =
CLi
;1]
, since anypi
hears in roundLi
from all processes that survived roundLi
;1
. Therefore,BH=
t+ 1
;(
jf 0i
Li
;1]
j;(
Li
;1)) =
t+ 1
;(
jCLi
;1]
j;(
Li
;1)) =
t+ 1
;d (L
i ;1)t
+ 1
;D, which completes theproof of the theorem. 2
Theorem
2
5
On the optimality of the algorithm:
t+1;Dis a lower bound
This section proves that the PROPOSEalgorithm is optimal: In a synchronous system prone to up totprocess crashes (witht<n;
1
), there is no deterministic algorithm that can ever solve the simultaneous consensus problem in fewer thant+ 1
;Drounds. The proof given here is new (and simpler than the first proof given in [3]). It relies on notions introduced in [11]. It also uses notations introduced in Section 4.The problem of simultaneous consensus is closely related to the knowledge-theoretic notion of similarity among runs at a given time. This notion is captured by the following definitions. For later use, these definitions are made with respect to an arbitrary round-based synchronous deterministic algorithmP (they consequently apply in particular to the PROPOSEalgorithm described in Figure 1).
5.1
Preliminary definitions and lemmas
For ease of exposition, the runs of an arbitrary deterministic algorithmPare denoted by 0
etc.S
r]
denotes the set of processes that survive roundrof, whilels(
pr)
denotes the local state ofpat the end ofrin the run(i.e., its set of local variables and their current values).Definition 1 Given a deterministic algorithmP, a processp, and a roundr, the runsand 0
ofPare indistinguish-able topafter roundr
(
denotedr
p
0
)
if both (i) p2S
r]
\Sr 0]
(i.e.,phas survived roundrin both runs),
and (ii) ls
(
pr) =
ls(
pr 0)
.
Definition 2 The runs and 0
are connected at the end of roundr, denoted
r
0
, if there is a sequence of runs and processes such that
=
0
r
p
0 1r
p
1r
p
k ;1k
=
0 .In other words, an undirected graphG(Pr)(in shortG(r)) can be associated with each roundrof a protocolP. This graph is calledP’s similarity graph for roundr. Its vertices are the runs ofP and there is an edge connecting
and
0
if there is a processqsuch that
r
q
0
. It is easy to see that
r
0 holds if and 0 belong to the PI n ˚ 1885same connected component inG(r). BecauseG(r)is undirected, being connected (
r
) is an equivalence relation.
3
Moreover, observe that if we can show that some propertyAis maintained under
r
q
for allq2, then whenever has propertyAandr
0
, we are guaranteed that 0
has propertyAas well. Lemma 5 Letand
0
be runs of a deterministic algorithmP that solves simultaneous consensus. If some process
decides on valuevin roundrofand
r
, then the processes inS
r 0]
decide the same valuevin the same round rof
0
.
Proof It suffices to show the claim for any two runs 0 such that
r
q
0 for someq2Sr]
\Sr 0]
. In this case,qdecidesvin roundrof. BecauseP is deterministic andqhas the same local state at the end of roundrof and0
,qdecidesv at the end of roundrin 0
. Finally, asP solves simultaneous consensus (lemma assumption) it follows that all processes inS
r0
]
decidevin roundr(and no other process decides a different value in a different
round). 2
Lemma
5
An immediate consequence of Lemma 5 is captured by the following corollary.
Corollary 1 LetP be a deterministic algorithm that solves simultaneous consensus. If 0
is a run ofP such that
(1) no initial value in 0 isv, and (2)
r
0, then no process can decidevin roundrof. Proof Sincen > t, the setS
r0
]
is nonempty. By Lemma 5, if some processqdecidesv in roundrof, the processes inS
r0
]
decidevin roundrof 0
. But this contradicts the Validity property of the simultaneous consensus algorithmP, since the decision valuevis not one of the initial values in
0
. 2
Corollary
1
5.2
A full-information algorithm
For the purpose of proving optimality, we make use of a full-information algorithm, denoted FIP and described in Figure 2.
The algorithm In the first round, each process sends its initial valuev
i
to all processes (including to itself). The algorithm then constructs an array inpi
1
::n]
containing the incoming message from each of the processes (itself included). Ifpi
does not receive a message frompj
then it sets inpi
j]
to the default value?. In each of the later rounds, every processpi
first sends inpi
to all others, and then uses the incoming messages of the current round to construct an updated array inpi
in the same way as in the first round. The local state of the process at the end of roundr is identified simply with the contents of its array inpi
.This algorithm is introduced in order to establish, for each failure patternF, times at which simultaneous consensus cannot be reached by any algorithm whatsoever. Optimality will then be established by showing that the PROPOSE algorithm described in Figure 1 decides as soon as possible, for each and every possible failure pattern (and initial configuration).
algorithm FIP:
(01) forj2f
1
:::ngnfigdo inp ij
]
?end for; inp ii
]
vi; (02) whenr
= 1
2
:::do(03) begin round
(04) sendinpito all; % including to
piitself % (05) forj2f
1
:::ngdo(06) inpi
j]
message received fromp jduringrif any, otherwise? (07) end for
(08) end round
Figure 2: The full-information algorithm FIP(code forp
i
)
Observe that a deterministic algorithmP, an initial configurationI (set of initial values), and a failure patternF determine a run
=
P(
IF)
ofP.3In the knowledge terminology, process
qknows a factAat the end of roundrinif it is true of all runs 0 satisfying 0 r q ; it is common knowledge there ifAholds at all runs
0 r
. Thus, the set of runs connected todetermines what is common knowledge inat the end of roundr.
Definition 3 A runofP corresponds to a run of an algorithmP 0
if, for some initial configurationI and failure
patternF, it is the case that
=
P(
IF)
and=
P 0(
IF
)
.The next lemma shows, in a precise sense, that the connected components of the similarity graph for FIP refine those of any other deterministic algorithm.4
Lemma 6 LetP be a deterministic algorithm for simultaneous consensus. Let us assume that the runsand 0
ofP
correspond to the runs and 0 of FIP, respectively. Then
(
i)
ifr
q
0 thenr
q
0 , and(
ii)
ifr
0 thenr
0 .Proof Let us consider the runs and 0
ofP and the corresponding runs and 0
of FIP. Since
r
is the transitive closure of the relationsfr
p
gp
2, claim (ii) follows from (i). Consequently, it suffices to prove claim (i). Let us observe that, due to the definition of “corresponds to”, the fact thatcorresponds to implies thatSr] =
Sr]
for all roundsr. The proof thatr
q
0 )r
q
0 , is by induction onr.Base case. For the base case, let us consider the initial configuration that corresponds to the fictitious roundr
= 0
. Since the initial state of each process (underP as well as under FIP) is fully determined by its initial value, the fact that0
q
0
implies thatqhas the same initial state in both. Since corresponds to and 0
corresponds to 0
, it follows thatqhas the same initial value inand
0 , and so 0
q
0 .Induction case:r>
0
. Let us assume that item (i) holds for every process at roundr;1
(induction assumption). Moreover, let us assume thatr
q
0 . As before,and 0 correspond to and 0, respectively. We need to show that
r
q
0 .
Sinceqsurvives roundrin both the runs and 0
, and as this depends only on their failure patternsF andF 0
, which are also the failure patterns inand
0
, respectively, we haveq2S
r]
\Sr 0]
. Let us recall that the local state of a processqat the end of a roundrof a run of a deterministic algorithm such asP (namely, the local state ls
(
qr)
) is a function of its local state in roundr;1
(ls(
qr;1
)
) and the messages that it receives in roundr. We have to show that the local statesls(
qr)
andls(
qr0
)
are the same. Since
r
q
0
, it follows that the arrays inp
q
are the same in both the runs and 0at the end of roundr. So,
inp
q
q]
in both runs have the same value at the end ofr; letXbe that value. The fact thatq survives roundr in both and0
means, in particular, that it receives its own roundrmessage sent at line 04 of FIP in both and
0
. The value of this message in is ls
(
qr;1
)
which is the value ofinp
q
q]
at the end ofr, i.e.,ls(
qr;1
) =
X. A similar reasoning shows thatls(
qr;1
0) =
X. It follows fromls(
qr;1
) =
ls(
qr;1
0) =
X thatr
;1q
0and by the inductive hypothesis we obtain
r
;1
q
0 . Consequently,qhas the same local state at the end of roundr;
1
in bothand0
, i.e.,ls
(
qr;1
) =
ls(
qr;1
0)
. It remains to show thatqreceives exactly the same messages during roundrin bothand
0
. Suppose thatq receives messagefrom processqbin roundrof. It follows from the failure patternF that the message sent byqbto qduring roundris received byq. Since corresponds to and in FIPprocessqbsends messages to all processes in every round, we have thatqreceives a message fromqbin roundrof as well. As above (case of the message that, at each round, a process sends to itself), this message in containsls
(
bqr;1
)
(the local state ofqbat roundr;1
of the run ). Fromr
q
0
we have thatqreceives the same message fromqbin 0
. As a result, we have that
r
;1 bq
0 , and by the inductive assumption forr;1
andqbwe obtain thatr
;1b
q
0
. Since the messageis determined byP as a function of the local state ofqbatr;
1
in, we have that the same messageis also sent bybqtoqin roundrof0 . As the roundrmessage sent byqbtoqin
0
is received byq, and 0
corresponds to 0
, we obtain thatqreceivesfrom b
qin 0
as well. It follows that every message received byqin roundrofis received by it in the same round of 0
. By symmetry, the messages received in
0
are also received in. Finally, sinceqhas the same local state in roundr;
1
ofand0
(namely,ls
(
qr;1
) =
ls(
qr;1
0)
), and receives the same messages in roundrof bothand 0 , we obtain that
r
q
0, which concludes the proof of the lemma. 2
Lemma
64In the sequel, we interpretr and
r
among runs of an algorithmPin terms of the similarity graphG(rP)defined on the runs ofP. Thus, the interpretation ofr
and r
in statements such as that of Lemma 6 is always with respect to the algorithm generating the related runs.
On failure patterns and full-information algorithms Observe that in both the FIPand PROPOSEalgorithms, a cor-rect process is required to send a message to each process in every round. As a result, in runs of both FIPand PROPOSE, a processpknows by the end of roundrthatqhas crashed if the failure patternFis such thatqhas crashed before it sent its roundrmessage top.
The setf
i
r]
of the processes thatrhas not heard from in roundrcan be directly computed from the local state in FIPasfpj
jinpi
j] =
?g. Sincepi
sends inpi
to all other processes, the setf0
i
r;1]
of processes thatpi
knows atr to have been discovered as crashed by roundr;1
is thus easily computed from the messages it receives in roundr. Since for corresponding runs of FIPand PROPOSEthese sets coincide (in fact, their values depend only on the failure pattern), we find it convenient to talk about the values offi
rF]
,f0
i
rF]
,CrF]
,DF]
, etc. for runs of FIPas well. Since the Validity property states that it is illegal to decidevin a run that does not containv as one of its initial values, Corollary 1 implies that it is impossible to decide onvas long as there is a connected run that does not containv as one of its initial values. In light of this, we can now show that the algorithm FIPreaches a decision as soon as it possibly can.5.3
Premature rounds
It has been shown in Theorem 2 that the algorithm PROPOSEdecides on a value in roundBH
=
t+1
;D. LetBH(
)
denote the value of the round numberBHof the the run=
PROPOSE(
IF)
. Recall that the value ofBHis solely a function of ’s failure patternF(and not of the initial configuration).Definition 4 A round`is premature
5in
Fif `<t
+ 1
;D=
BH(
)
for every run=
PROPOSE(
IF)
. Ashi
(
r+ 1) =
r+ (
t+ 1
;jf0
i
r]
j)
, andhi
(
r+ 1)
BH(
)
, this means that, for everyrsuch thatr+ 1
`, the propertyr+
t+ 1
;jCrF]
j>`holds. Notice that the failure patternFthat occurs during the run determines whether or not`is premature: In all runs of PROPOSEwith the sameF the setsfi
rF]
andCrF]
are the same for everyiandr, and so areDandBH.Lemma 7 Lett <n;
1
,`0
,=
FIP(
IF)
and0
=
FIP(
IF 0)
. If`
0then`is premature inF iff `is
premature inF 0
.
Proof Let
=
FIP(
IF)
and0
=
FIP
(
I 0F 0
)
. As in the proof of item (ii) of Lemma 6, it suffices to show that, for all q, if
`
q
0
then` is premature inF iff` is premature in F 0
. Thus, let us assume that
`
q
0 . Let=
PROPOSE(
IF)
and 0=
PROPOSE(
I 0 F 0)
be the runs of PROPOSEcorresponding to the runs
=
FIP(
IF)
and 0=
FIP(
I 0 F 0)
, respectively.By item (ii) of Lemma 6, it follows that
`
q
0
. Round`is premature inFiffBH
(
)
>`, which by Theorem 2 implies that q does not decide in by the end of round`. Thus, since the if test on line 10 of PROPOSE fails,decided
q
=
falsecontinues to hold in. The fact that`
q
0
implies thatdecided
q
=
falseholds at the end of the round`of0
as well. It follows thatBH
(
0)
>`, and consequently`is premature atF 0
, as desired. The ‘only-if’ direction of the lemma is obtained by a symmetric argument. 2
Lemma
7
Definition 5 A process is silent in a roundrof a run if it has crashed before sending its roundrmessages.
Definition 6 Given a failure patternF, a processqand a round r, letF
qr
be the failure pattern that satisfies thefollowing four conditions:
(
i)
Fqr
coincides withF for the firstr;1
rounds,(
ii)
in roundr exactly the failuresdetected inC
rF]
occur inFqr
,(
iii)
processqis silent from roundr+ 1
on, and(
iv)
no process other thanqfailsafter roundr.
Lemma 8 If`is premature inFandk<`, then no more thantprocesses crash inF
qk
.Proof Let H
rF] =
r+
t+ 1
;jCrF]
j. Let us observe that the number of processes that crash in Fqk
is at mostjCkF]
j+ 1
. It suffices to show thatjCkF]
j < t. As `is premature andk < `, we have HkF] =
5This is short for premature for simultaneous consensus, a term that will be justified by the technical analysis in this section.
k
+
t+ 1
;jCkF]
j>`. Sincek<`we have that`k+ 1
. We thus obtain thatt+
k+ 1
;jCkF]
j>k+ 1
, which implies thatt;jCkF]
j>0
, andtjCkF]
j+ 1
, as desired.2
Lemma
8Definition 7 Given a run
=
FIP(
IF)
, letqk
be the runqk
=
FIP(
IFqk
)
.Let us notice that, due to Lemma 8, if is a run of FIPin which at mosttprocesses fail, then so is
qk
.Lemma 9 Lett <n;
1
and fix` >0
. Moreover, let=
FIP(
IF)
and letq 2 . If`is premature inF, then`
qk
for allksatisfying1
k `.Proof Let ` >
0
. We prove the lemma for all runs , by induction ond=
`;k. For the base case, assume that d= 0
, and so k=
`. Choose q 2 , and let p 2 S`]
. Since` is premature in , we havehp
(
`) =
(
`;1) +
t+ 1
;jfp
`;1
F]
j>`, i.e.,jfp
`;1
F]
j t;1
n;3
. Letp 02S
`]
nfpg(such a processp 0is guaranteed to exist sincet n;
2
).Let 0
be the (prefix of a) run identical to up to and including round`;
1
, and in round`processpreceives the same messages as in , but process p0
(who is non-faulty in 0
too) receives messages from all processes in
nfp
`;1
F]
. There are exactlyjfp
`;1
F]
j<tfailures in 0andp2S
` 0]
has the same local state at the end of round`in both and
0 . Hence, we have
`
p
0 . Ifqis silent in round`in 0then we are done. Otherwise, let 00
be a run that is the same as 0
except thatqcrashes in round`by not sending a round`message top. At mostjf
p
`;1
F]
j+ 1
t;1 + 1 =
tprocesses fail in00 , and p
0
has the same state in 0 as in 00 . Hence, 0
`
p
0 00. Finally, observe that
q`
;1is identical to
00
except thatqis silent in round`. Processpdoes not distinguish
00
from
q`
;1since in both it receives the same messages in round `. Thus, 00
`
p
q`
;1, and by definition of`
we have that`
q`
;1, completing the base case.Induction step. Letk<`and assume that the lemma holds for roundk
+1
in all runs in which`is premature. We prove the lemma for roundk. Let be a run in which`is premature, and choose an arbitrary processq2. We will use the induction assumption to find a connected run that coincides with for the firstk;1
rounds where no process crashes in roundk. Ifqis silent in this run, then it will beqk
as desired. Otherwise, we use the induction assumption again to show that this run is connected toqk
, as desired.Let 0
be a run that coincides with for the firstk;
1
rounds where no process crashes in roundk, and processpn
is silent from roundk+ 1
on. We show that`
0
. Define 1
:::
n
wheren=
jjto be runs such that inj
the firstk;1
rounds are identical to , processpj
is silent from roundk+ 1
on, no process other than (possibly)pj
fails in roundsk+ 1
:::`, and no new failure in roundkis seen by processesp1 :::p
j
. Denoting=
0 , we prove by induction onjthat`
j
. The casej= 0
is immediate, since0
=
. Letj >
0
and assume inductively that`
j
;1
. Since`is premature inF
=
F(
, it follows from Lemma 7 that`is premature inj
;1. By the induction assumption fork
+ 1
we have thatj
;1
`
b , where b
= (
j
;1)
p
jk
. In particular, Lemma 8 implies that there are at mosttfailures in b. Observe that (i)p
j
is silent from roundk+ 1
in b, (ii) every process other thanpj
has the same local state at the end of roundkin both bandj
, and (iii) the same messages are sent from roundk+ 1
in both runs, and (iv) since no more processes fail inj
than do in b, there are at mosttfailures inj
. It thus follows that b`
j
, completing the induction argument. We conclude that`
n
=
0 , as claimed. Observe that 0coincides with for the firstk;
1
rounds and no process crashes in roundk. Otherwise, define1
:::
n
to be runs of FIPsuch that inj
the firstk;1
rounds are identical to 0, processp
j
is silent from round k+ 1
on, no process other than (possibly)pj
fails in roundsk+ 1
:::`, and roundkis identical to0
except that processqcrashes in roundkand does not send messages top
1 :::p
j
. Defining 0=
0 an induction argument identical to the one in the previous paragraph shows that0
`
n
. Notice thatqis silent from roundkinn
. Finally, it follows from the induction hypothesis fork+ 1
thatn `
nqk
+1
=
qk
, which completes the proof of the lemma. 2Lemma
9
Based on Lemma 9, we can modify the initial configuration arbitrarily as long as the round number`is premature. Specifically, if there are at least two possible initial values, and alljVj
n
initial configurations are possible, then simul-taneous agreement onv is impossible: The current run is still connected inG(`)to a run in which all initial values arev6
=
v:6
Lemma 10 Lett;
1
<n, and letvv2V be distinct initial values (i.e.,
v6
=
v). Let=
FIP(
IF)
and let`be apremature round inF. There is a run 0
all of whose initial states are
v, such that
`
0
.
Proof Let
=
FIP(
IF)
and let`be a premature round inF. Thanks to Lemma 9 we can silence the processes one by one from the first round, and change their initial values tov. Let us then define
1
:::
n
to be the runs of FIP such that inj
=
FIP(
Ij
Fp
j
1
)
, whereI
j
coincides withI on the initial values ofpj
+1:::p
n
, andvi
=
vfor all i j, and whereFp
j
1is obtained from Definition 6 with
q
=
pj
andr= 1
. Moreover, denote0
=
. We prove by induction onj that
`
j
. Forj= 0
the claim is trivial, since=
0. Assuming that the claim is true forj, we show that it holds forj+ 1
. Since`is premature inF and`
j
, we have by Lemma 7 that`is premature inFp
j
1, which is the failure pattern of
j
. Instantiating Definition 7 withq=
pj
+1and
k
= 1
, let us define b
= (
j
)
p
j+11. By Lemma 9 we have that
j `
b. Notice that bdiffers fromj
+1only in the initial state ofp
j
+1. Sincepj
+1is silent in b
andt<nit follows that b
`
j
+1
. By transitivity ofwe obtain that
`
j
+1
, completing the induction step. We finally obtain that
`
n
. Since all initial values inn
arev, the lemma holds for
0
=
n
. 2Lemma
10
5.4
Optimality of the proposed algorithm
Lemma 10 combined with Corollary 1 prove that no deterministic algorithm can decide in a run with failure patternF at a round that is premature in F. Since we have shown in Theorem 2 that PROPOSE is guaranteed to decide in roundBH
=
BHF]
we can conclude that PROPOSEis an optimal algorithm for simultaneous consensus. Hence the following theorem.Theorem 3 LetP be a deterministic algorithm for simultaneous consensus. Let be a run ofP and let be a run
of PROPOSEcorresponding to. If the correct processes decide in roundrinand in roundkin , thenk r.
6
A few concluding remarks
On the message size of the algorithm PROPOSE Letbbe the number of bits required to encode a proposed value. The size of each message is consequentlyb
+
nbits. Let us also observe that a processpi
can send a default value? instead ofesti
when its value is the same as in the previous round. This can reduce the size of some messages when bis big. Another simple improvement to reduce the message size consists in sending, at each roundr, the differential valuefi
r;1] =
fi
r;1]
;fi
r;2]
instead offi
r;1]
(fi
;1]
being initialized to).On optimality Usually an algorithm is said to be optimal when its performance matches the worst-case lower bound (given a failure model, every execution behaves at least as well as the worst case execution must). Here, the optimality notion is stronger. More precisely, whent < n;
1
, the proposed algorithm is optimal in the sense that, for every particular pattern of failures, no other algorithm can decide in fewer rounds in an execution with the same pattern. So optimality for our algorithm is in each and every run, rather than in the worst case run.It is shown in [4] that the bound on the minimal number of rounds required for a simultaneous decision is
(
t+1)
;D whent <n;1
, andt;Dwhent=
n;1
. The algorithm presented in Figure 1, that is optimal whent <n;1
, can be easily modified to be optimal also whent=
n;1
. Essentially, whent=
n;1
a process that sees that all others have crashed can immediately decide, and be guaranteed not to violate simultaneity since it is the only process remaining.6In the knowledge-theoretic terminology, the lemma claims that the existence of an initial value of
vamong the run’s initial values is not common knowledge as long as the round is premature.
Circumventing the lower bound An interesting open issue is the following: “Are there additional assumptions that would allow to circumvent the lower boundR S
tF
= (
t+ 1)
;D?” Recall that we have used the fact that alljVjn
initial configurations are possible, as are all ways in whichtor less processes can crash. Better performance should be attainable if we assume that the set of initial conditions is more restricted. This is the condition-based approach introduced in [13], and used in [14] to characterize the sets (each set is characterized by a degreed) of the input vectors for which themin(
t+1
f+2)
(non-simultaneous) consensus lower bound can be bypassed (it is shown that no more thanmin(
t+ 1
f+ 2
d+ 1)
rounds are necessary when the input vector belongs to a set characterized by the degree d).References
[1] Attiya H. and Welch J., Distributed Computing: Fundamentals, Simulations and Advanced Topics (2nd Edition), Wiley Interscience, 414 pages, 2004.
[2] Dolev D., Reischuk R. and Strong R., Early Stopping in Byzantine Agreement. Journal of the ACM, 37(4):720-741, April 1990.
[3] Dwork C. and Moses Y., Knowledge and Common Knowledge in a Byzantine Environment: Crash Failures. Information and
Computation, 88(2):156-186, 1990.
[4] Fagin R., Halpern J.Y., Moses Y. and Vardi M.Y., Reasoning about Knowledge, MIT Press, 533 pages, 2003.
[5] Fischer M.J. and Lynch N., A Lower Bound for the Time to Assure Interactive Consistency. Information Processing Letters, 71:183-186, 1982.
[6] Fischer M.J., Lynch N. and Paterson M.S., Impossibility of Distributed Consensus with One Faulty Process. Journal of the
ACM, 32(2):374-382, 1985.
[7] Garg V.K., Elements of Distributed Computing, Wiley, 423 pages, 2002.
[8] Halpern J.Y. and Moses Y., Knowledge and Common Knowledge in a Distributed Environment. Journal of the ACM, 37(3):549-587, 1990.
[9] Lynch N.A., Distributed Algorithms. Morgan Kaufmann Pub., San Francisco (CA), 872 pages, 1996.
[10] Mizrahi T. and Moses Y., Continuous Consensus via Common Knowledge. Distributed Computing, 20(5):305-321, 2008. [11] Mizrahi T. and Moses Y., Continuous Consensus with Failures and Recoveries. Unpublished manuscript, 2008.
[12] Moses Y. and Rajsbaum S., A Layered Analysis of Consensus. SIAM Journal of Computing, 31(4):989-1021, 2002. [13] Most´efaoui A., Rajsbaum S. and Raynal M., Conditions on Input Vectors for Consensus Solvability in Asynchronous
Dis-tributed Systems. Journal of the ACM, 50(6):922-954, 2003.
[14] Most´efaoui A., Rajsbaum S. and Raynal M., Synchronous Condition-Based Consensus. Distributed Computing, 18(5):325-343, 2006.
[15] Pease L., Shostak R. and Lamport L., Reaching Agreement in Presence of Faults. Journal of the ACM, 27(2):228-234, 1980. [16] Raynal M., Consensus in Synchronous Systems: a Concise Guided Tour. Proc. 9th IEEE Pacific Rim Int’l Symposium on
Dependable Computing (PRDC’02), IEEE Computer Press, pp. 221-228, 2002.
[17] Wang X., Teo Y.M. and Cao J., A Bivalency Proof of the Lower bound for Uniform Consensus. Information Processing
Letters, 96:167-174, 2005.