• Aucun résultat trouvé

Laboratoire de l’Informatique du Parallélisme

N/A
N/A
Protected

Academic year: 2022

Partager "Laboratoire de l’Informatique du Parallélisme"

Copied!
14
0
0

Texte intégral

(1)

Laboratoire de l’Informatique du Parallélisme

École Normale Supérieure de Lyon

Unité Mixte de Recherche CNRS-INRIA-ENS LYON-UCBL n o 5668

A Repair Mehanism for Fault-Tolerane for

Tree-Strutured Peer-to-Peer Systems

Eddy Caron ,

FrédériDesprez ,

CharlesFourdrignier ,

Frank Petit,

CédriTedeshi

Ot2006

Researh Report N o

2006-34

École Normale Supérieure de Lyon

46 Allée d’Italie, 69364 Lyon Cedex 07, France

Téléphone : +33(0)4.72.72.80.37

Télécopieur : +33(0)4.72.72.80.80

Adresse électronique :

lipens-lyon.fr

(2)

Peer-to-Peer Systems

EddyCaron ,FrédériDesprez , CharlesFourdrignier , Frank Petit, Cédri Tedeshi

Ot 2006

Abstrat

Faing the limits of traditional tools of resoure management within

omputational grids (related to sale, dynamiity, et. ofthe platforms

newly onsidered), new approahes, based on peer-to-peer tehnologies

are emerging. The resoure disovery and in partiular theservie dis-

overyisonerned bythisevolution. Amongthesolutions, apromising

one isthe indexing of resoures using trie strutures and more partiu-

larly prex trees. The major advantages of trie-strutured approahes

isthe apabilityto supportsearh querieson rangesof values withala-

teny growinglogarithmially inthenumberof nodesinthetrie. Those

tehniques are easy to extend to multiriteria searhes. One drawbak

of usingtries isits inherent poor robustness ina dynami environment,

where nodes joinand leave the network, leading to thesplit of thetree

intoaforest,whihresultsintheimpossibilitytorouterequests. Within

most reent approahes, the fault-tolerane is a prevention mehanism,

oftenrepliation-based. Therepliationanbeostlyintermofresoures

required. In this paper, we propose a fault-tolerane protool that re-

onnets subtrees a posteriori, after rashes, to have again a onneted

graphand thenreorder thenodesto rebuildaonsistent tree.

Keywords: Faulttolerane, peer-to-peer,prex trees

(3)

les grillesdealul(mauvaispassage à l'éhelle,non prise enompte de

la dynamiité du réseau, et.), des alternatives fondées sur les tehno-

logies pair-à-pair sont en train d'émerger. La déouverte de ressoures

et en partiulier des servies de alul est touhée par ette évolution.

Parmiessolutions,ilexistedesapprohesprometteuses fondéessurdes

arbres lexiographiques. L'intérêt detellesapprohesreposesurlapossi-

bilité d'eetuer des requêtes surdes intervalles de valeurs ainsi que la

possibilitéderéaliserdel'autoomplétion surleshaînesdereherheen

temps logarithmiqueen latailledel'arbre. Cestehniquess'étendent fa-

ilement àdesreherhes multiritères. Cependantlastruture enarbre

est fragile et peut élater en une forêt si l'un des n÷uds vient à quit-

ter le réseau, rendant ainsi impossible le routage de ertaines requêtes

et n'orant au lient qu'une vue partielle des servies. Dans la plupart

de es approhes, la tolérane aux pannes, indispensable dans les envi-

ronnementsdynamiquesàlargeéhelle,estpréventive(réaliséeapriori)

et se fonde sur la répliation, qui est oûteuse en termes de ressoures

etde temps. Danse papier,nousprésentons unprotooletolérant aux

pannes de n÷uds,omplémentaire à larépliation, dans les arbres lexi-

ographiques. Ilsefonde surlareonnexionetlaréparationa posteriori

d'arbres quiont subilaperted'unou plusieurs n÷uds.

Mots-lés: Toléraneauxpannes, pair-à-pair, arbres de préxes

(4)

1 Introdution

These last few years have seen the development of large sale grids onneting distributed

resoures(omputation resoures,storagefailities,omputation libraries, et.) inaseamless

way. This is now an eient alternative to superomputers to solve large problems suh as

high energy physis, simulation, bioinformati, et. However, existing middlewares used in

grids require most of the time a stable and entralized infrastruture. They usually loose

their performane on dynami and large sale platforms without entralized management of

resoures. To ope withthe harateristis of these emerging kind of platforms, it has been

suggestedto usepeer-to-peertehnologies within omputationalgrids [8 ℄.

Peer-to-peertehnologiesoer algorithmsallowingthesearh andretrievalofobjets over

thenet(dataitems, les,servies,et.). Among thesetehnologies, Distributed HashTables

(DHT) were initially designed for very large sale platforms, for example to share les over

the Internet. However, DHTs have several major drawbaks. Among them, their disovery

mehanismusually worksonexat searhes ofagiven key. Somework hasthenbeen done to

allowomplex requeststobesubmitted overDHTs or moregenerallyinstruturedpeer-to-

peer systems,i.e. systemsbased on request routing. Some of these worksare based on tries

(also alledprex trees). Atrie struture supports rangequeries inalogarithmi timeinthe

number ofnodesof thetrie.

Fault-tolerane is a mandatory feature for peer-to-peer systems to avoid the lossof data

storedonnodesandto allowaorretrouting ofmessages. Therashof oneor several nodes

inatrieleadstothelossofobjetsreferenesstored inthetrieandtothesplitofthetrieinto

several subtries, also alled a forest. Fault-tolerane within strutured peer-to-peer systems

usually uses repliation. Using suh an approah, eah node and eah link of thetrie would

have to be dupliated

k

times,

k

being the repliation fator. Keeping suh struture up is

ostly, mainly in terms of resoures used. Afterward, the purpose is to nd for the value of

k

the right trade-o between the repliation ost and the robustness of the system. In this

paper, we study an alternative to the repliation approah based on the reonnetion of the

subtries and the a posteriori reordering of a onsistent trie. When the trie is disonneted,

a rst solution onsists in rebuilding a trie adding nodes of remaining subtries one by one.

Thisnaivemethodanleadtoaprohibitive ostwhenthenumberofremainingnodesislarge

(whihisusuallytheaseinpeer-to-peersystems). Forexample,loosingonenodeanleadto

aompletereonstrution ofthetrie. Aseondapproahonsistsinreonnetingthesubtries

to gettheoriginal trie bakat aminimum ost. Thisisthis kindof algorithm wedesribein

this paper ina distributed and asynhronous environment. It an also be used to omplete

the repliationproess.

Abriefhistoryofpeer-to-peertehnologiesisprovidedinSetion2,followedbytheformal

desriptionofthepartiulartriestrutureweuse(Setion3)andofthedistributedsystemwe

plae ourselves. We fousour study on fault-tolerane meanisms related to them. Then, in

Setion 4 we present the repair algorithm we designed and give its proof before a onlusion

and futureworkSetion.

2 Related Work

With the spread of the peer-to-peer tehnologies going along with the le sharing over the

(5)

of unstrutured mehanisms, i.e., based on the ooding of searh requests [10 , 9℄. These

mehanisms resulted in overloading the network while providing non-exhaustive responses.

Addressingboth thesalabilityandtheexhaustivenessissueswithinpeer-to-peersystems,the

distributed hashtables [13 , 14, 18, 20 ℄, a.k.a., the strutured peer-to-peer group, are highly

salableinthesensethatthenumberoflogialhopsrequiredtorouteandtheloalstategrows

logarithmiallywiththenumberofnodespartiipatinginthesystem. Moreover,DHTsprevent

from loosing routing paths and objets' referenes by use of repliation and periodi sans.

Unfortunately, DHTs present several major drawbaks (homogeneous apaity assumptions,

topology awareness, et.). Amongthem,therigidityof therequestingmehanism, i.e., exat

math ona givenkeyhinders its useoverreal searhsystems.

A series of work givesthe opportunity to allow exible meanings of retrievalover stru-

tured peer-to-peer networks. First ahievement in this way has been the ability to desribe

resoures withsemi-strutured language, suh XML,as desribed in[3 ℄. [19℄enhanes DHTs

with traditional database operations. Several approahes, based on spaelling urves, suh

as Squid[15 ℄ or [17 ℄ support multi-dimensionalrange queries.[1℄ maps one-dimensionaldata

spae to d-dimensional Cartesian spae by using the inverse Hilbert mapping. Built on top

of multiple DHTs, SWORD [11 ℄ is an information servie aiming at disovering omputing

resoures on the gridbyansweringmulti-attribute range queries.

We fousinthisworkon trie-struturedretrievalsolutions, alsosupporting range queries

butoutperformingpreviousapproahesinthesensethatlogarithmi(oronstantifweassume

an upper bound on the depth of the trie) lateny is ahieved by parallelizing the resolution

of the queryinthe several branhes of thetrie. Prex HashTree(PHT) [12℄ builds a trie of

the entire key-spae on topof a DHT.Thepurposeof this arhiteture isto usethetrie asa

logial layerallowing omplex searheson top of anyDHT-like network. The arhiteture of

PHT results inthe multipliationof theomplexitiesof thetrie andof theunderlying DHT.

The Skip Graphs struture proposed in [2℄ is similar to a trie but is built with the skip

lists tehnology, allowing the use of their inherent fault-tolerane properties. But again, the

omplexity of the number of messages generated to proess range queries is in

O(m log(n))

,

m

beingthenumberof nodespertainedbytherangeand

n

thetotal numberof nodesinthe

graph.

Other approahes proposeto relyon a triefor eah purpose, i.e.,indexing thekey-spae,

mapping the nodes of the trie on the network, and routing the requests. Among them,

Nodewiz [4℄ assumes a set of stati reliable nodes to host the trie, whih is unfortunately

hard toensureonpeer-to-peerplatforms. P-Grid[7℄buildsatrieonthewholekey-spae(i.e.,

thewholesetofpotentialkeys). Eahleafofthistrieorrespondstoasubsetofthekey-spae.

The fault-tolerane isahieved by probabilisti repliation.

Asa more general onsideration, none of these approahes addressthetopology/physial

loality awareness issue, i.e., no information about the underlying network is taken into a-

ount to build the logial (overlay) network, what an raise a signiant performane prob-

lem, physial loality being broken when the logial network is built. Moreover, the several

fault-tolerane solutions are mostly repliation-based, or DHT-based, also involving heavy

repliation mehanisms.

Initially designed for the purpose of serviedisovery over dynami omputational grids

and attempting to solve the above drawbaks of existing approahes, we reently developed

a novel arhiteture, based on a logial Greatest Common Prex Tree formally desribed in

Setion 3, that is dynamially built as objets (servies, but extensible to data items, les,

(6)

3 Preliminaries

Greatest Common Prex Tree. Let an ordered alphabet

A

be a nite set of letters.

Denote

an order on

A

. A non empty word

w

over

A

is a nite sequene of letters

a 1 , . . . , a i , . . . , a l

,

l > 0

. The onatenation of two words

u

and

v

, denoted

u ◦ v

or sim-

ply

uv

,isequaltotheword

a 1 , . . . , a i , . . . , a k , b 1 , . . . , b j , . . . , b l

suhthat

u = a 1 , . . . , a i , . . . , a k

and

v = b 1 , . . . , b j , . . . , b l

. Let

ǫ

betheempty word suhthatfor everyword

w

,

wǫ = ǫw = w

.

The length ofa word

w

,denotedby

|w|

,is equalto thenumber ofletters of

w

|ǫ| = 0

.

A word

u

is a prex (respetively, proper prex) of a word

v

if there exists a word

w

suh that

v = uw

(resp.,

v = uw

and

u 6= v

). The Greatest Common Prex (resp.,

Proper Greatest Common Prex) of a olletion ofwords

w 1 , w 2 , . . . , w i , . . .

(

i ≥ 2

), denoted

GCP (w 1 , w 2 , . . . , w i , . . .)

(resp.

P GCP (w 1 , w 2 , . . . , w i , . . .)

), is the longest prex

u

shared

by all of them (resp., suh that

∀i ≥ 1, u 6= w i

). A [Proper Greatest Common Prex Tree

([P℄GCPTree, alsoapartiular kindof trie)isa labeledrootedtreesuh thatboth following

properties aretruefor everynode ofthetree:

1. The node labelis aproperprex of anylabel initssubtree;

2. The node labelis theProperGreatestCommon Prexof all itssonlabels.

Inthefollowing we usetheword trieto designate our PGCP tree.

DistributedLexiographiPlaementTable. Thedistributed system onsideredinthis

paperonsistsofasetofasynhronous physial nodesorganizedina Distributed Hash Tables

(DHT).EahphysialnodemaintainsoneormorenodesofthelogialPGCPTree. Notethat

aDHTisused,butitan bereplaedbyanysystem,distributedor not,allowingtheretrieval

ofanynodefromanyothernode. We alsoonsiderthatthepotential existingfault-tolerane

mehanisms provided by this layer arenot usedwithin our arhiteture. We propose in this

papera fault-tolerane mehanismat thePGCP Treelevel.

Whenonewantstoinsertanobjetlabeled

o

intothetrie,amessageisgenerated ontain-

ing

o

,aordingtowhihthemessageisroutedwithinthetrieuntilreahingthenodelabeled

v

suh that

v

is the smallest label inthe trie that shareswith

o

thegreatest ommon prex

ofanynode of thetrie with

o

. Moreformally,if

L

denotes thewholesetof labelurrently in

the trie, the set

U = {l ∈ L | GCP (l, o) = p}

where

p = max |m| {m = P GCP (l, o), l ∈ L)

.

The label of the target node is

t = min |w| {u ∈ U | u = pw}

. One found, the target node

performs the insertion. If

t 6= o

,node(s) arereated. If

o = tu

(

u 6= ǫ

), anew node labeled

o

is reated asa newson ofthe node labeled

t

. If

t = ou

(

u 6= ǫ

),a newnode isreated asthe

fatherofthe nodelabeledby

t

. Finally,ifnoneoftheseonditions aresatised,itmeansthat

o

and

t

mustbesiblingsbut nonode inthetrie is labeledbytheir ommon prex. Thus two

nodes arereated, a node labeled

GCP (o, t)

, father of thenode labeled by

t

and also father

of the other newly reated node labeled by

o

. The distributed routing algorithm (that also performsthereationandthemappingofnodes)requiresanumberofhopsboundedbytwie

thedepthofthe trie [5 ℄.

Physial nodesommuniate bymessage passing. Weassume two sendingfuntions. The

former,simplyreferred to SEND,is usedbyany physial node to senda message to another

node asynhronously, i.e., without waiting any aknowledgement. The latter, alled SYNC-

SEND, waits for an aknowledgement for eah message sent. We assume that eah physial

(7)

4 Protool

In this setion, we give a detailed explanation of how the protool works. We divide the

algorithm ode intwo parts. The former showsthe rstphase developed withour tehnique

duringwhihauniquetrieisreoveredwithoutonsideringanylexiographiproperty. During

the seond phase, the trie is reorganized to eventually form a distributed greatest ommon

prex tree.

4.1 Trie Reovery

After a node

p

detets the loss of its father (

p.f ather

), it searhes for a new father to link

on. Making a traversalof the DHT,Node

p

ollets inVariable

P N

all theaddressesof eah

remaining physial node. Colleting the addresses in

P N

,

p

builds the set of logial nodes

stored by the physial nodes in

P N

. Next, using a

P IF

(Propagation of Information with Feedbak) Protool [6, 16 ℄,

p

omputes

T

, the set of logial nodes in its subtrie, whih is

made of its real desendants and its temporary relinked desendants. This rst step of

the reovery protool ends when

p

hooses a temporary father (

p.tmpf ather

) in the subset

N \ T

. When, a node

q

is linked to a node

p

,then

p

onsiders

q

asa temporary sonstored

in

p.tmpsons

. Note that Variable

p.tmpsons

is required to ompute

T

using a PIF in the

subtrieof

p

. If

N \ T = ∅

(i.e.,thereis nonode forwhih

p

maylinkon),then

p

isonsidered

asthe root of the trie.

Theabovetehniquesuersofadrawbak: Severalnodeswithoutfathermaymakewhih

ould beome a bad hoie. In partiular, they an hoose as a temporary father a node

belongingtothesubtrieofanothernodebeinginthesamesituation. Bydoingthisinparallel,

yles mayappear. Ourstrategy isto detet andto break aposteriori suhyles asfollows.

After the hoie of its temporary father

tf

,a node

p

sends a message HELLO with its

ID (

p.id

) to

tf

. Inthenextstep,

tf

transmitsthe message toits ownfather, and soon. Step

bystep, one ofthe two following situationseventually arises:

1. The real root ofthe trie reeivesthe messageHELLO. Inthatase, therootnoties

p

thatit isnot involved inayle.

2. Themessageisreeivedbyafalse root,i.e.,anodehavingalso lostitsownfather. the

false root propagatesthemessage to itstemporary father.

Notethat, in the above latter ase, due to asynhrony of thenetwork, it ispossible thatthe

falseroot reeivesthe message HELLO sent by

p

beforeitexeutedits own reovery phase.

Inthatase, the falserootisstill withoutatemporaryfather. ThemessageHELLO isthen

delayed until the false root hooses its owntemporary father.

Therefore,the messageHELLO sentby

p

keepsirulatingamongitsanestors,arrying

the list of false roots' IDs whih were met during its traversal. Upon reeipt of a message

HELLO, iftherst itemofthelistarriedbythe messageisequal totheIDof thereeiver,

then a yle is deteted. In that ase, a leader eletion is omputed among the IDs of the

liste.g.,byhoosingthesmallest ID.Theleader beomestherootof thesubtrie, breaksits

linkwhihitsfather, andexeutesthe reovery phaseagain. (Theotherfalse rootsinvolved

in the yle remain onneted to the subtrie rooted by the leader.) Note that a yle may

(8)

least one subtrie beomes thesubtrieof one false root. In other words, thenumber of yles

is periodially divided byat least

2

. Therefore, the system eventually ontains one (rooted)

trie only.

4.2 Trie Reorganization

The trie reorganization is initiated one the trie reovery is done. Eah node

p

having a

temporary son

q

i.e.,

q

isa falseroot withits subtrieinitiatesa routing mehanismlosed to theoriginal key insertion [5℄. Letus onsiderthefollowing ases:

1. The value

p.val

isaprex ofthevalueof

q

Figure1,Case

(i)

. Inthatase,

q

(and its

subtrie) isplaed in thesubtrie of

p

following one of the four ases shown in Figure1,

Cases (

a

) to(

d

).

2. The value

p.val

is not a prex of the value of

q

. Then,

p

moves

q

to its father whih

nowhasthe responsibilityto plae

q

.

p

q s s s

1 i k

(

i

)

p.val = pref ix(q)

and

p.val = P GCP (s 1 , . . . , s k )

.

p

q s s s

1 i k

(

a

)Thereexists

s i

suhthat

s i .val = pref ix(q.val)

.

p

q s s s

1 i k

(

b

)Thereexists

s i

suhthat

q.val = pref ix(s i .val)

.

p

q

newson

s s

s

1 k

i

(

c

)Thereexists

s i

suhthat

P GCP (q.val, s i .val) > p.val

.

p

s s s

q=s k+1 1 i k

(

d

)

p.val = pref ix(q.val)

.

Figure 1: Afalse root

q

islinked to a node

p

suhthat

p.val = pref ix(q.val)

.

Notethatnewserviesmaykeepinsertingduringthetriereonstrution. So,anewsubtrie

mayhavebeen reatedatthe sameplaewherethefalserootinitiallywas. Thus,ourmethod

requirestotakeinaount thatanyfalserootbeingplaedinthetrieanmeet anodehaving

the same value. Inthat ase, the two tries mustbe merged. That isthe aim of themerging

(9)

node

p

exeutes Proedure

Gluing(q)

, whih moves the sons of

q

to

p

before withdrawing

q

from the trie (inluding the sons of

q

's father). Then, if neessary,

p

restarts reursively

merging andplaementsamong itssons,inorder to merge both subtries eventually.

4.3 Corretness Proof

In this subsetion, we disuss the orretness of our protool. In order to do this, we rst

need to make the realistiassumption thatundertheonsidered ontext, therash frequeny

islowenoughto make thetriefullybuiltsometime. (Intheoppositeway,thetrieouldnever

bebuiltand unusablemost ofthetime. Moregenerallyitisimpossibletosayanythingabout

termination otherwise.) In other words, we fairly assume that no rash ours after a rash

until the trie is fullybuilt, i.e., no two onseutive rashes interfere eah other, at one given

time.

Assuption 1 If a noderashes at time

t

, then for every

t > t

, no rash ours.

Lemma 2 Under Assumption 1, the reovery protool (Algorithm 1) terminates, and when

this ours,the system ontains onetrie only.

Proof. The validation mainly onsists inshowing that the protool terminates and that

the reorganizationof thetrie iseventually initiated (bysending amessage NOCYCLE).

Assume by ontradition that under Assumption 1, no node eventually sent a message

NOCYCLE.So,neitherLine

1.35

nor Line

1.37

inAlgorithm1 isexeuted. Notethatinthe

rst ase (Line

1.35

), the node beomes the real node after the rash of its father. So, in

both ases, this means thatNOCYCLE never reahes thereal root of thetrie. The height

of thetrie being nite,thismeans thateveryMessage HELLO traversesyles only. When a

message HELLO is reeived by its initiator, the yleis broken bythe nodewhih is eleted

among the false roots partiipating in the yleLines

1.16

to

1.21

. Therefore, yles are

reated innitely often. Let

C

be the number of reated yles. In the worst ase, a yle

ismade of at leasttwo nodes. So,

C

is initiallybounded by

F/2

,where

F

is thenumber of

false root reated by the rash. When a yle is broken, at most one leader is eleted. So,

at most

C/2

leaders are able to link another node again. In the next phase, the number of

yles is less than or equal to

C/2

. Sine under Assumption 1, yles may be reated only

whenfalse rootsarelinkedtoothernodes(exeuting Lines

1.10

and

1.11

),

C

never grows and

iseventually equalto

0

. This ontradits thatyles arereated innitely often.

2

We nowonsiderthe phaseof trie reorganization showninAlgorithm2.

Lemma 3 Under Assumption 1 and assuming that the system ontains one trie only, the

reorganization protool (Algorithm 2) terminates, and whenthis ours, the trie is a

P GCP

tree.

Proof. Clearly,eahtrie oftheforestfollowingthe rashofanode isa

P GCP

tree. So,its

remains to showthat exeuting Algorithm2,thewholetrie eventually satisestheondition

to be a

P GCP

tree.

From the algorithm, it is easy to observe that, inthe absene of merging, there are only

two ases to onsiderdepending onthevalueof Node

p

andits false son

f s

:

(10)

Algorithm 1 ReoveryProtoolfor eah node

p

1

.

01

uponreeiptof

<

DisonnetedfromFather

>

do

1

.

02 P N :=

PhysialNodeSetintheDHT(olletedbyaDHTtraversal);

1

.

03 N :=

LogialNodeSetin

P N

(olletedbypollingthenodesin

P N

);

1

.

04 T :=

LogialNodeSetinmysubtrie(olletedusingaPIFwave)

1

.

05

using

p.sons ∪ p.tmpsons

;

1

.

06

if

p.tmpf ather 6=⊥

thensend

<

DISCONNECT

>

to

p.tmpf ather

;

1

.

07

if

N \ T = ∅

1

.

08

then //Iamtheroot

1

.

09 p.f ather :=⊥

;

p.tmpf ather :=⊥

;

1

.

10

else

p.tmpf ather :=

randomhoieamong

N \ T

;

1

.

11

send-syn

<

LINK

>

to

p.tmpf ather

;

1

.

12

send

<

HELLO,

p.id>

to

p.tmpf ather

;

1

.

13

endif

1

.

14

uponreeiptof

<

HELLO,

list>

from

q

do

1

.

15

if

F irst(list) = p.id

1

.

16

then //Ayleisdeteted

1

.

17 leader := LeaderElection(list)

;

1

.

18

if

p = leader

1

.

19

then Exeutesuponreeiptof

<

Disonnet fromFather

>

do,

1

.

20

exept

P N

and

N

;

1

.

21

endif

1

.

22

elseif

p.F ather 6=⊥

1

.

23

then send

<

HELLO,

list>

to

p.f ather

;

1

.

24

elseif

p.tmpf ather 6=⊥

1

.

25

then

list := list + p.id

;

1

.

26

send

<

HELLO,

list>

to

p.tmpf ather

1

.

27

elseif

p.f ather =⊥

1

.

28

then //Both

f ather

and

tmpf ather

areunknown,i.e.,

1

.

29

Iamafalserootwhihisstillnotlinked

1

.

30

Exeutesuponreeiptof

<

DisonnetfromFather

>

do

1

.

31

ifitisstillnotworking;

1

.

32

if

tmpf ather 6=⊥

1

.

33

then

list := list + p.id

;

1

.

34

send

<

HELLO,

list>

to

p.tmpf ather

;

1

.

35

else send

<

NOCYCLE

>

to

F irst(list)

;

1

.

36

else //Iamtherealroot,sothere isnoyle.

1

.

37

send

<

NOCYCLE

>

to

F irst(list)

;

1

.

38

endif

1

.

39

uponreeiptof

<

NOCYCLE

>

from

q

do

1

.

40

send

<

MOVE,

p>

to

p.tmpf ather

;

1

.

41

send-syn

<

UNLINK

>

to

p.tmpf ather

;

1

.

42 p.tmpf ather :=⊥

;

1

.

43

uponreeiptof

<

LINK

>

from

q

do

1

.

44 tmpsons := tmpsons ∪ {q}

;

1

.

45

uponreeiptof

<

UNLINK

>

from

q

do

1

.

46 tmpsons := tmpsons \ {q}

;

(11)

Algorithm 2 ReorganizationProtool foreah node

p

1

.

01

uponreeiptof

<

MOVE,

f s>

from

q

do

1

.

02

if

f s.val = p.val

1

.

03

then //Isendto myselfthatafusionisneeded.

1

.

04

send

<

MERGE,

f s>

to

p

1

.

05

elseif

p.val = pref ix(f s.val)

1

.

06

then if

∃s ∈ p.sons| s.val = pref ix(f s.val)

1

.

07

then //

f s

isin thesubtrieof

s

,Case(

a

)inFigure1

1

.

08

send

<

MOVE,

f s>

to

s

;

1

.

09

elseif

∃s ∈ p.sons| f s.val = pref ix(s.val)

1

.

10

then //

s

isin thesubtrieof

f s

,Case(

b

)in Figure1

1

.

11 p.sons := p.sons ∪ {f s}

;

p.sons := p.sons \ {s}

;

1

.

12

send

<

MOVE,

s>

to

f s

;

1

.

13

elseif

∃s ∈ p.sons | p.val < P GCP (s.val, f s.val)

1

.

14

then //

f s

and

s

haveaPGCPwhih isgreaterthan

p.val

1

.

15

//Case(

c

)in Figure1

1

.

16 N ewnode(P GCP (f s.val, s.val), s, f s)

;

p.sons := p.sons \ {s}

;

1

.

17

else //

f s

isoneofmysons,Case(

d

)in Figure1

1

.

18 p.sons := p.sons ∪ {f s}

;

1

.

19

endif

1

.

20

else if

p.f ather 6=⊥

1

.

21

then send

<

MOVE,

f s>

to

p.f ather

1

.

22

else if

f s.val = pref ix(p.val)

1

.

23

then //Iaminthesubtrieof

f s

1

.

24

send

<

MOVE,

p>

to

f s

;

1

.

25

else //

p

and

f s

arebrothers

1

.

26 p.sons := p.sons ∪ N ewnode(P GCP(f s.val, f.val), f s, p)

;

1

.

27

endif

1

.

28

endif

1

.

29

endif

2

.

01

uponreeiptof

<

MERGE,

f s>

from

q

do

2

.

02 Gluing(q)

;

2

.

03

Sortingof

p.sons

inthelexiographiorderin Table

t s

;

2

.

04

for

i = 0

to

t s .length()

do

2

.

05

if

t s [i].val = t s [i + 1].val

2

.

06

then send

<

MERGE,

t s [i + 1]>

to

t s [i]

;

2

.

07 i := i + 1

;

2

.

08

elseif

t s [i].val = pref ix(t s [i + 1].val)

2

.

09

then send

<

MOVE,

t s [i + 1]>

to

t s [i]

;

2

.

10 p.sons := p.sons \ {t s [i + 1]}

;

2

.

11 i := i + 1

2

.

12

elseif

p.val < P GCP (t s [i].val, t s [i + 1].val)

2

.

13

then

p.sons := p.sons ∪ N ewnode(P GCP (t s [i].val, t s [i + 1].val),

2

.

14 t s [i], t s [i + 1])

;

2

.

15 p.sons; = p.sons \ {t s [i], t s [i + 1]}

;

2

.

16 i := i + 1

;

2

.

17

endif

2

.

18

done

(12)

1. Thevalueof

p

isaprexof

f s

'svalueLine

1.05

. Inthatase, followingthefour ases

desribedinFigure1,

f s

iseventuallyplaedattherightplaeinthesubtrieof

p

refer

to Lines

1.06

to

1.19

. The resulting trie isa

P GCP

tree.

2. The valueof

p

isnot a prex of

f s

. Again, there aretwo ases toonsider:

(a) Node

p

has no father (

p.f ather =⊥

)Line

1.22

to

1.28

. In that ase, if

f s.val

is a prex of

p

, then

p

(and its subtrie) beomes the node to be plaed in

f s

Line

1.24

. Otherwise,

p

and

f s

beome the two sons of a new root node

q

suh

that

q.val = P GCP (p, f s)

Line

1.26

. Thetrie isthenlearly a

P GCP

tree.

(b) Node

p

hasafather. Then,

f s

ismovedtothefatherof

p

Line

1.21

. Byindution

of the above disussion, either

f s

eventually moves ona node

q

suh that

q.val = pref ix(f s.val)

or

f s

eventually reahes the root of the trie. The former ase is

equivalent to Case 1,thelatterto Case 2a.

If

p

and

f s

merge,thenthere arefour ases to onsiderafter

p

and

f s

gluedtogether into

p

:

1. There existsa pair of sons

s i

,

s j

of

p

suh that

s i .val

is a prex of

s j .val

. Then,

s j

is

movedtoward

s i

Lines

2.08

to

2.11

. Thisaseissimilarto theaboveCase1(Cases(

a

)

or (

b

)inFigure 1).

2. There existsapair of sons

s i

,

s j

of

p

suh that

P GCP (s i , s j ) > p.val

. Then,

s i

and

s j

beome the two sons of a new son

q

of

p

suh that

q.val = P GCP (p, f s)

Lines

2.12

to

2.16

. Thisase isalso similarto theabove Case 1 (Case(

c

) inFigure 1).

3. There existsa pair of sons

s i

,

s j

of

p

suh that

s i .val = s j .val

. Thisase is solved by

initiatinga reursivemergingbetween

s i

and

s j

Lines

2.05

to

2.07

. Thisaseissolved

byindutionon

s i

and

s j

.

4. There existsnopairofsons

s i

,

s j

of

p

satisfying eitherCase 1,2,or 3. Inthatase,the

subtrie of

p

learly satisesthepropertiesof a

P GCP

tree.

2

From Lemmas2 and3 follows:

Theorem 1 UnderAssumption1, Algorithm1 andAlgorithm2 provide a

P GCP

tree reon-

strutionafter the rash of a physial node.

5 Conlusion and Future Work

Inthis paper, wehave presenteda fault-tolerant protool inase ofnode rashesinaProper

CommonGreatestPrextree searhsystem. Thisprotool anbeoupled witharepliation

strategyto lowertheostsrelatedto highrepliationfators. Thisprotoolallows thereon-

netionandrepairofsubtriesaftertherashofoneormorenodes. Thisalgorithmguarantees

toreovera onsistent PGCP tree aftera nitetimeand thus to avoid partiallyrepliation.

Ourfutureworkwill onsistinonnetingthetwo mehanisms (repliationandrepair)in

orderto minimizethe ostoffault-tolerane on dynamiplatforms. Wewill alsodevelopand

validateexperimentallythemehanismsexposedinthispaperontheGrid'5000platformofthe

(13)

oftherepair algorithmandtoseeitsapaitytoanswerlients'requestsfaingdierentlevels

of dynamiity. Moreover, we will be able to see starting from whih level of dynamiity the

repair mehanism is no more eient alone, and then how we an progressively injet some

repliation asthe dynamiitylevel inreases.

Referenes

[1℄ A.AndrzejakandZ.Xu. Salable,EientRangeQueriesforGridInformationServies.

InPeer-to-Peer Computing,pages 3340, 2002.

[2℄ J.Aspnes and G.Shah. Skip Graphs. In Fourteenth Annual ACM-SIAM Symposium on

Disrete Algorithms,pages 384393, January2003.

[3℄ M. Balazinska, H. Balakrishnan, and D. Karger. INS/Twine: A Salable Peer-to-Peer

Arhiteture forIntentionalResoureDisovery. In Proeedings of Pervasive 2002,2002.

[4℄ S.Basu,S.Banerjee,P.Sharma,andS.Lee. NodeWiz: Peer-to-PeerResoureDisovery

forGrids.In5thInternationalWorkshoponGlobalandPeer-to-PeerComputing(GP2PC)

in onjuntion withCCGrid, May2005, 2005.

[5℄ E. Caron, F. Desprez, and C. Tedeshi. A dynami prex tree for theservie disovery

within large sale grids. In IEEE, editor, The Sixth IEEE International Conferene on

Peer-to-Peer Computing, P2P2006,Cambridge, UK.,September 6-8 2006.

[6℄ E.J.H. Chang. Eho Algorithms: Depth Parallel Operations on General Graphs. IEEE

Trans. on Software Engineering, SE-8:391401, 1982.

[7℄ A. Datta, M. Hauswirth, R. John, R.Shmidt, and K. Aberer. Range Queries in Trie-

StruturedOverlays. InThe FifthIEEE InternationalConferene onPeer-to-Peer Com-

puting, 2005.

[8℄ I. Foster and A.Iamnithi. On Death,Taxes, and theConvergene of Peer-to-Peer and

GridComputing. In IPTPS'03,pages 118128,2003.

[9℄ Gnutella. http://www.gnutella.om.

[10℄ KaZaA2005. TheKaZaA Web Site. http://www.kazaa.om.

[11℄ D. Oppenheimer, J. Albreht, D. Patterson, and A. Vahdat. Distributed ResoureDis-

overy on PlanetLab with SWORD. In Proeedings of the ACM/USENIX Workshop on

Real, Large Distributed Systems(WORLDS), Deember 2004.

[12℄ S.Ramabhadran,S.Ratnasamy,J.M. Hellerstein,and S.Shenker. PrexHashTreeAn

indexing DataStruture over DistributedHash Tables. InProeedings of the 23rd ACM

Symposium on Priniples of Distributed Computing,St.John's, Newfoundland, Canada,

July 2004.

[13℄ S. Ratnasamy, P. Franis, M. Handley, R. Karp, and S. Shenker. A Salable Content-

(14)

[14℄ A.Rowstronand P.Drushel. Pastry: Salable, Distributed Objet Loation and Rout-

ing for Large-Sale Peer-To-Peer Systems. In International Conferene on Distributed

SystemsPlatforms (Middleware),November2001.

[15℄ C.ShmidtandM.Parashar.EnablingFlexibleQuerieswithGuaranteesinP2PSystems.

IEEE Internet Computing,8(3):1926, 2004.

[16℄ A. Segall. Distributed Network Protools. IEEE Transations on Information Theory,

IT-29:2335,1983.

[17℄ Y.Shu,B.-C.Ooi,K.-L.Tan,andA.Zhou.SupportingMulti-DimensionalRangeQueries

inPeer-to-PeerSystems. InPeer-to-Peer Computing,pages 173180, 2005.

[18℄ I. Stoia, R.Morris, D. Karger, M. Kaashoek, and H.Balakrishnan. Chord: A Salable

Peer-to-Peer Lookup serviefor Internet Appliations. In ACM SIGCOMM,pages149

160,2001.

[19℄ P. Triantallou and T. Pitoura. Towards a Unifying Framework for Complex Query

ProessingoverStrutured Peer-to-Peer Data Networks. In DBISP2P,2003.

[20℄ B. Y. Zhao, L. Huang, J. Stribling, S. C. Rhea, A. D. Joseph, and J. D. Kubiatowiz.

Tapestry: A Resilient Global-sale Overlay for Servie Deployment. IEEE Journal on

Seleted Areas in Communiations, 22(1):4153, January2004.

Références

Documents relatifs

normative influence to reduce the consumption of hot drinks sold in to-go cups.. To that end, we

histoly- tica (zymodème XIX) et E. Les souches isolées des selles des deux porteurs montrèrent la pré - sence d'une bande dense pour ME, l'absence de bandes rapides pour HK et PGM,

First introduced by Faddeev and Kashaev [7, 9], the quantum dilogarithm G b (x) and its variants S b (x) and g b (x) play a crucial role in the study of positive representations

We define sender-initiated file transfer for IP as a TCP service as follows: a receiver program (the server or &#34;daemon&#34;) listens on port 608 for inbound

Stability of representation with respect to increasing lim- its of Newtonian and Riesz potentials is established in ([16], Theorem 1.26, Theorem 3.9).. Convergence properties

Keywords: Behavioural Science, Behavioural Economics, Health Promotion, Public Health, Nudge.. David McDaid is Senior Research Fellow at LSE Health and Social Care and at

S everal years ago, at the urging of some nonmedical friends, a small group of physicians and our spouses created a new board game called “Diagnosis.” Each player was a

Objective 1: Collaborate with local partners to advocate for health policies that support cancer prevention and control and improve community access to cancer-