• Aucun résultat trouvé

There is no deterministic protocol that solves the consensus problem in an asynchronous sys- tem that is subject to even a single process crash failure

N/A
N/A
Protected

Academic year: 2022

Partager "There is no deterministic protocol that solves the consensus problem in an asynchronous sys- tem that is subject to even a single process crash failure"

Copied!
4
0
0

Texte intégral

(1)

Master Recherche en Informatique - November 2010

Fault-Tolerance: Agreement Problems in Distributed Asynchronous Systems

Achour Most´efaoui

Irisa/Ifsic, Universit´e de Rennes achour@irisa.fr

http://www.irisa.fr/asap/

Checkpointing Distributed Computations 1

Some Failure Types

• Software errors

• Process failures

⋆ Crash failure

⋆ Send/Receive omission

⋆ Arbitrary (Byzantine)

• Link failures

⋆ Omission/duplication failure

• Clock/Performance failures

Checkpointing Distributed Computations 2

Asynchronous Distributed Systems

• A set Π of n processes: p1, p2, . . . , pn

• A reliable communication network

• No bound on message transfer delays

• No upper bound on the time required by a process to execute a step

• Failures: At most t processes may crash

Checkpointing Distributed Computations 3

How to Tolerate Failures?

• Debbuging/Validation.

• Duplication of processors and memories.

• Faul-tolerant software.

⋆ Fault-tolerant services (Consensus, NBAC, etc.)

⋆ Checkpointing/Rollback-Recovery.

Checkpointing Distributed Computations 4

(2)

Fault-Tolerant Services: Agreement Services

• Try to continue to compute and make consistent deci- sions although there are crashed or slow processes.

There is a need of agreement services.

• Consensus

• Atomic broadcast

• Non blocking atomic commit

• Election

• Renaming, etc.

Checkpointing Distributed Computations 5

The Consensus Problem

Each process pi proposes a value vi and tries to decide.

• Termination: Every correct process eventually decides some value.

• Validity: If a process decides v, then v was proposed by some process.

• Agreement: No two correct processes decide differ- ently.

• Uniform Agreement: No two (correct or not) processes decide differently.

Checkpointing Distributed Computations 6

The Main Theoretical Result

Fisher-Lynch-Paterson’s Impossibilty Result (1985)

There is no deterministic protocol that solves the consensus problem in an asynchronous sys- tem that is subject to even a single process crash failure

Too bad: this results extends to many other agreement problems.

Checkpointing Distributed Computations 7

How to Circumvent the Impossibility Result?

• Randomized Protocols

⋆ The termination property becomes: With proba 1, every correct process eventually decides.

• Equip the system with Additional Properties:

⋆ Parital synchrony

⋆ Failure Detectors

Checkpointing Distributed Computations 8

(3)

An always Safe Consensus Protocol (t < n/2)

pi initially proposes vi repeat

• send my value to all processes

• wait for n−t values from different processes

• if all received values are equal to a same value v send this value to all processes otherwise send ⊥

• wait for n−t values from different processes

⋆ all received values are equal to v: decide v

⋆ all received values are equal to ⊥: adopt any of the proposed values

⋆ if v and ⊥ are both received then adopt v endrepeat

Checkpointing Distributed Computations 9

Few Features of this Algorithm

• Ifanyprocess decidesvduring a round thenall processes that end that round will either decide v or adopt v.

• If all processes start a round with v then all processes that end the round will decide v (this is not the only case where processes can decide).

This is known as the Abort/Commit algorithm.

Checkpointing Distributed Computations 10

Does this Algorithm Work?

• It is always safe but does not always terminate

This algorithm terminates with a very high probability.

• If the forever loop is changed to a fixed number of rounds, the algorithm willalways terminatebut thesafety may be violated

The new algorithm ensures safety with a very high prob- ability

Checkpointing Distributed Computations 11

How to get a Correct Algorithm?

• Synchrony properties: For example if the system is even- tually synchronous

During a reception phase, a process waits for (n −t) messages and at least some uniformly increasing time.

• Randomization: replace the statement “any of the pro- posed values” in the algorithm by the statement “a ran- dom value”.

Checkpointing Distributed Computations 12

(4)

Byzantine Processes

• How does the proposed algorithm behave if the t faulty processes can exhibit a malicious behavior?

⋆ a malicious process can disseminate wrong informa- tion

⋆ a malicious process can send different values do dif- ferent processes

⋆ the adversary can delay messages and/or processes

• Few examples

Checkpointing Distributed Computations 13

How to Deal with Byzantine Processes

• Adapt the specification of the problem

• Have smaller values for t then for crash failures

• Use much more messages

• Use certificates

• Use cryptography

• etc.

Checkpointing Distributed Computations 14

The Byzantine Consensus Problem

• Any solution to byzantine consensus (even in synchronous systems) needs t < n/3.

Each process pi proposes a value vi and tries to decide.

• Termination: Every correct process eventually decides some value.

• Validity: If all correct processes propose v, then only v can be decided.

• Agreement: No two correct processes decide differ- ently.

Checkpointing Distributed Computations 15

Adapting the Previous Algorithm

• Consider the binary consensus

• t < n/5

• Replace the statement “any of the proposed values”

in the algorithm by the statement “a random value”

among 0 and 1.

• Replace the statement “all received values” by the state- ment “at least n−2t received values”.

We get the Byzantine randomized algorithm of Rabin.

Checkpointing Distributed Computations 16

Références

Documents relatifs

A single router elected from the group is responsible for forwarding the packets that hosts send to the virtual router.. This router is known as the active

Change password is a request reply protocol that includes a KRB_PRIV message that contains the new password for the

Any LDAP operation may be extended by the addition of one or more controls ([RFC2251] Section 4.1.12). A control’s semantics may include the return of zero or

Consider an infinite sequence of equal mass m indexed by n in Z (each mass representing an atom)... Conclude that E(t) ≤ Ce −γt E(0) for any solution y(x, t) of the damped

Zisimopoulou have proved that every Banach space X with separable dual not containing a complemented subspace isomorphic to ` 1 is embedded into an L ∞ -space with

At that time the com­ pany, under the direction of Peter Z inovieff, was exploring the possibilities of using mini­ computers to control electronic music instruments.. My job was

Beebe’s final proposal is that the type relevant for evaluating a belief produced by a given token process t, among the types of t satisfying the tri-level condition, is

Formally prove that this equation is mass conser- vative and satisfies the (weak) maximum principle.. 4) Dynamic estimate on the entropy and the first