• Aucun résultat trouvé

Structure du m´ emoire de th` ese

Dans le document Formal verification of the Pastry protocol (Page 38-42)

La suite de ce m´emoire est structur´ee comme suit. Le chapitre 2 donne un ´etat de l’art concernant les syst`emes P2P et les techniques de v´erification formelle n´ecessaire `a la compr´ehension de ce qui suit.

Le chapitre 3 introduit le protocole Pastry au niveau conceptuel, partant du pseudo-code pour CastroPastry, fond´e sur Castro et al. (2004), en passant par des dia-grammes de flot pour HaeberlenPastry, inspir´e par Haeberlen et al. (2005) et Ideal-Pastry en tant que premier mod`ele formellement v´erifi´e, et aboutissant `a LuPastry, introduit `a nouveau par du pseudo-code et des ´el´ements de sa preuve.

Au chapitre 4 nous introduisons le mod`ele formel de LuPastry et ses propri´et´es de correction. Le chapitre 5 illustre l’utilisation du model checker TLC pour la valida-tion des mod`eles et l’analyse de la propri´et´e de correction. Quelques contre-exemples illustratifs sont pr´esent´es en d´etail. La d´emarche de la preuve d´eductive est introduite au chapitre 6 et appliqu´ee `a la v´erification de HaeberlenPastry, IdealPastry et LuPastry.

Le chapitre 7 introduit d’autres syst`emes P2P r´ealisant des DHT et compare le travail men´e dans cette th`ese avec d’autres approches pour la v´erification de protocoles de r´eseaux.

Au chapitre 8 nous r´esumons cette th`ese en rappelant ses contributions, et nous ´enon¸cons quelques perspectives pour des travaux futurs.

1 Introduction

Pastry (Rowstron and Druschel (2001), Castro et al. (2004), Haeberlen et al. (2005)) is a structured P2P algorithm realizing a Distributed Hash Table (DHT , by Hellerstein (2003)) over an underlying virtual ring of nodes. Several implementations of Pastry are available and have been applied in practice, but no attempt has so far been made to formally describe the algorithm or to verify its properties. Since Pastry is a typical realization of a DHT which combines rather complex data structures, asynchronous communication, concurrency, resilience to churn, i.e. concurrent join and departure of nodes, it makes an interesting target for verification.

This thesis models Pastry’s core algorithms, which provide the correct lookup ser-vice in the presence of churn and maintain a local data structures to adapt the dynamic updates of neighborhood. This chapter starts with the motivation of the research inter-ests, then states explicitly the research goals and explains how these goals are achieved by the work. At the end, a structural guidance will be given for reading the thesis.

1.1 Motivation

In our day-to-day life, disruptions of software systems such as Skype (Microsoft (2013)) directly disturb everyone’s daily life when people cannot make calls, or calls are dropped in the middle of a conversation. Skype is known and used world-wide by billions of users for making phone-calls over Internet. On Thursday, 16th August 2007, the Skype peer-to-peer (P2P ) network became unstable and suffered a critical disruption despite of its peer-to-peer network with an inbuilt ability to self-heal. “This event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly.” as reported in Arak (2007). Unfortunately on 23rdDecember 2010, millions of users could not make phone calls again due to the departure of several “super nodes”. Is there any fundamental problems of such network design that causes unexpected outage again and again? If not, is there a fundamental proof of its correctness?

Although it is not published how Skype uses the idea of P2P algorithm in its deploy-ments of services, there is no doubt that the scalability of Skype benefits significantly from the adoption of P2P networks, making it feasible to provide billions of phone calls simultaneously world-wide, according to Microsoft (2013). P2P systems have become popular since beginning of 21stcentury with their self-organization and decentralization properties. In particular, structured P2P systems implement Distributed Hash Tables (DHTs), which are supposed to provide

1 Introduction

• distributed maintenance of structure with low costs;

• and resilience to concurrent joins and departures of network members.

For these reasons, DHT is typically used in large-scale distributed systems such as Dynamo (DeCandia et al. (2007)), a storage substrate that Amazon uses internally for many services. Skype may implement DHT for its peers to find each other correctly and efficiently.

However, the correctness of a DHT system relies heavily on its realizing algorithms. Pastry is one of the successful realization1 of DHT . It realizes DHT by mapping object keys (e.g. identifier for a piece of distributed data) to overlay nodes (e.g. the computer connected to the Internet) and offers a lookup primitive to route a message to the node responsible for a key.

Questions About Pastry

Since Pastry implements all properties of a DHT , it is interesting and crucial to under-stand the fundamental mechanisms of Pastry and to analyze and prove its correctness properties. Castro et al. (2004) introduces Pastry on the message level using pseudocode. Starting from this paper, it will be interesting to know the insights of Pastry:

• How does Pastry work, in particular, how does the protocol realized the DHT ? • What does “dependable routing” formally mean? Does Pastry guarantee

depend-able routing? If so, to what extent, and how? Is there any fundamental design flaw in Pastry, in particular Castro et al. (2004), with respect to its correctness properties, such as dependable routing? If yes, how does the problem occur? If not, is there a formal proof?

• Are there other interesting properties of such a system implementing a DHT ? Are they interrelated?

How does Pastry Work

In Pastry, the overlay nodes are assigned logical identifiers from an Id space of naturals in the interval [0, 2M − 1] for some M . The Id space is considered as a ring as shown in Figure 1.1, i.e. 2M − 1 is the neighbor of 0.

The Ids are also used as object keys, such that an overlay node is in particular responsible for keys that are numerically close to its Id, i.e. it provides the primary storage for the hash table entries associated with these keys. Key responsibility is divided equally according to the distance between two neighbor nodes. If a node is responsible for a key we say it covers the key, as illustrated in Figure 1.1.

1

This thesis distinguishes the level of abstraction of an algorithm. Therefore, “realization” is used to described the refinement from DHT to a real algorithm Pastry and “implementation” is used to describe different versions of this algorithm. Both of the words describe different level of details of design and neither of these two words are used in the meaning of executable software.

1.1 Motivation

Figure 1.1: Pastry ring.

The most important sub-protocols of Pastry are join and lookup. The join protocol eventually adds a new node with an unused network Id to the ring. The lookup protocol delivers the hash table entry (the responsible node) for a given key. Pastry maintains its correct key mapping and it is supposed to provide the correct lookup service in the presence of churn, i.e. spontaneous join and departure of nodes.

Since neighbors of a Pastry node are changing all the time due to churn, they are locally decided based on the leaf sets of the node, a local state that each Pastry node maintains. Leaf sets, as shown in Figure 1.1, consist of a left set and a right set of the same length which is a parameter of the algorithm. The nodes in leaf sets are updated when new nodes join or failed nodes are detected using maintenance protocol. In order to achieve efficient routing, a Pastry node also maintains a routing table to store more distant nodes. In the example of Figure 1.1, node a received a lookup message for key k . The key is outside node a’s coverage and furthermore, it doesn’t lie between the leftmost node and the rightmost node of its leaf sets. Querying its routing table, node a finds node b, whose identifier matches the longest prefix with the destination key and then forwards the message to that node. Node b repeats the process and finally, the lookup message is answered by node c, which is the closest node to the key k , i.e. it covers key k . In this case, we say that node c delivers the lookup request for key k .

Result of This Thesis

This thesis separates different statuses (“dead”, “waiting”, “ok” and “ready”) of an overlay node according to its stage during join and readiness for delivering a key. Only “ready” nodes are supposed to have consistent key mapping among each other and are allowed to deliver a message. This thesis defines the “dependable routing” as the

1 Introduction

correctness property CorrectDelivery, requiring that there is always at most one node that can deliver a lookup request for a key. This property is non-trivial to obtain in the presence of churn. Using formal method, this thesis proves that Correct Key Delivery holds for all “ready” nodes under assumption that no nodes leave the network, despite concurrent joins of new nodes in arbitrary region. The assumption is not further relaxed due to possible separation of network problem caused by particular departure of nodes.

Dans le document Formal verification of the Pastry protocol (Page 38-42)

Documents relatifs