Finding Good Partners in Availability-aware P2P Networks

(1)

HAL Id: inria-00432741

https://hal.inria.fr/inria-00432741

Submitted on 17 Nov 2009

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

Finding Good Partners in Availability-aware P2P

Networks

Stevens Le Blond, Fabrice Le Fessant, Erwan Le Merrer

To cite this version:

Stevens Le Blond, Fabrice Le Fessant, Erwan Le Merrer. Finding Good Partners in Availability-aware

P2P Networks. International Symposium on Stabilization, Safety, and Security of Distributed Systems

(SSS’09), Nov 2009, Lyon, France. �inria-00432741�

(2)

Availability-aware P2P Networks StevensLe Blond

1

,Fabri e Le Fessant

2

,Erwan Le Merrer

3⋆

1

INRIASophiaAntipolis,

stevens.le_blondinria.fr,

2

INRIASa lay, fabri e.le_fessantinria.fr,

3

INRIARennes, elemerreirisa.fr

We study the problem of nding peers mat hing a given availability

pattern in a peer-to-peer (P2P) system. Motivated by pra ti al

exam-ples,wespe ifytwoformalproblemsofavailabilitymat hingthatarisein

real appli ations: dis onne tion mat hing, where peers look for partners

expe ted to dis onne t at the same time, and presen e mat hing, where

peers look for partners expe ted to be online simultaneously in the

fu-ture. As as alable and inexpensive solution, we proposeto useepidemi

proto ols fortopologymanagement;weprovide orrespondingmetri sfor

both mat hing problems. We evaluated this solution by simulating two

P2Pappli ations, tasks hedulingand lestorage,overa newtra eofthe

eDonkey network, thelargest available with availability information. We

rst proved the existen e of regularity patterns in the sessions of 14M

peers over 27 days. We also showed that, using only 7 days of history, a

simple predi tor ould sele tpredi table peersand su essfullypredi ted

their online periods for the next week. Finally, simulations showed that

oursimplesolutionprovidedgoodpartnersfastenoughtomat htheneeds

ofbothappli ations, andthat onsequently,theseappli ationsperformed

as e iently at a mu h lower ost. We believe that this work will be

usefulformanyP2Pappli ationsforwhi hithasbeenshownthat

hoos-ing good partners, based on their availability, drasti ally improves their

performan eand stability.

1 Introdu tion

Churnisoneofthemost riti al hara teristi sofpeer-to-peer(P2P)

net-works, asthepermanent ow ofpeer onne tionsand dis onne tions an

seriouslyhamperthee ien yofappli ations[9℄.Fortunately,ithasbeen

⋆

(3)

patterns([21,22,2℄), andso, an be predi tedfromtheuptimehistory of

those peers [18℄.

Totakeadvantageofthesepredi tions,appli ationsneedtobeableto

dynami ally nd good partners for peers, a ording to these availability

patterns, even in large-s ale unstru tured networks. The intrinsi

on-stitution of those networks makes pure random mat hing te hniques to

be time-ine ient fa ing hurn. Basi usage of predi tion based on node

availabilityexistsintheliterature, ase.g.for le repli ation[16℄.

Inthispaper,westudy ageneri te hnique to dis oversu h partners,

and apply itfor two parti ular mat hingproblems:dis onne tion

mat h-ing, where peers look for partners expe ted to dis onne t at the same

time, and presen e mat hing, where peers look for partners expe ted to

be online simultaneously in the future. These problems are spe ied in

Se tion 2.

Wethenproposetousestandardepidemi proto olsfortopology

man-agement to solve these problems (see e.g. [12,24℄); su h proto ols have

proven to be e ient for a large panel of appli ations, from overlay

sli -ing[13℄toIP-TVoverlaymaintenan e[14℄forexample.However,inorder

to onverge to thedesired state or topology (here mat hed peers), those

proto ols require good metri s to ompute the distan e between peers.

Su h metri s and a well known epidemi proto ol, T-Man [12℄, are

de-s ribed inSe tion 3.

To evaluate the e ien y of our proposal, we simulated an

appli a-tion for ea h mat hing problem: an appli ation of task s heduling, where

tasks of multiple remote jobsare startedbyall the peersin thenetwork

(dis onne tion mat hing), and an appli ation of P2P le-system, where

peers repli ate les on other peers to have them highly available

(pres-en emat hing). Theseappli ations arespe iedin Se tion5.

To run our simulations on a realisti workload, we olle ted a new

tra e of peer availability on theeDonkey le-sharing network. With the

onne tions and dis onne tion of 14M peers over 27 days, this tra e is

thelargestavailableworkload, on erningpeers'availability.InSe tion4,

we showthat peers inthis tra e exhibit availability patterns, and, using

a simple 7-day predi tor, that it is possible to sele t predi table peers

and su essfullypredi t their behavioroverthe following week.The new

eDonkeytra e and thissimple predi tor arestudied inSe tion 4.

Oursimulation resultsshowed thatour T-Man based solutionis able

to provide good partners to all peers, for both appli ations. Using

(4)

partners. Moreover, T-Man iss alable andinexpensive,makingthe

solu-tionusableforanyappli ationandnetworksize.Theseresultsaredetailed

inSe tion 6.

Webelieve thatmanyP2Psystemsand appli ations anbenetfrom

this work, asa lot of availability-aware appli ations have been proposed

inthe literature[3,8,20,5,25℄.Close toour work,Godfrey etal. [9℄show

that strategies based on the longest urrent uptime are more e ient

than uptime-agnosti strategies forrepli a pla ement; Mi kens etal. [18℄

introdu e sophisti atedavailabilitypredi torsandshowsthatthey anbe

very su essful. However, to thebest of our knowledge, this paperis the

rst to deal with the problem of nding the best partners a ording to

availability patterns in a large-s ale network. Moreover, previous results

areoften omputed on syntheti tra es orsmall tra esofP2P networks.

2 Problem Spe i ation

This se tion presents two availability mat hing problems, dis onne tion

mat hing and presen e mat hing. Ea h problem is abstra ted from the

needsofapra ti alP2Pappli ationthatwedes ribeafterward.Butrst,

we start byintrodu ingour systemmodel.

2.1 System and Network Model

Weassumeafully- onne tedasyn hronousP2Pnetworkof

N

nodes,with

N

usually ranging from thousands to millions of nodes. We assume that

there isa onstant bound

n

c

on thenumberof simultaneous onne tions

that a peer an engage in, typi ally mu h smaller than

N

. When peers

leave thesystem, they dis onne t silently. However, we assume that

dis- onne tions are dete ted aftera time

∆

disc

,for example 30 se onds with

TCP keep-alive.

For ea hpeer

x

,we assumetheexisten eof anavailabilitypredi tion

P r

x

_(t)

,startingatthe urrenttime

t

andforaperiod

T

inthefuture,su h

that

P r

x

_(t)

isasetofnon-overlappingintervalsduringwhi h

x

isexpe ted

to be online. Sin e these predi tions are based on previous measures of

availabilityforpeer

x

,weassumethatsu hmeasuresarereliable,evenin

the presen eof mali iouspeers[19,17℄.

We note

S

P r

x

_(t)

the set dened by the union of the intervals of

P r

x

_(t)

(5)

Fig.1.Dis onne tionMat hing:peer

y

isabettermat hthanpeer

z

forpeer

x

.

2.2 The Problem of Dis onne tion Mat hing

Intuitively, the problem of Dis onne tion Mat hing is, for a peer online

at a given time, to nd a set of other online peers who are expe ted to

dis onne t at thesametime.

Formally,forapeer

x

onlineattime

t

,anonlinepeer

y

isabettermat h

for Dis onne tion Mat hing than anonline peer

z

if

|t

x

_{− t}

y

_{| < |t}

x

_{− t}

z

_|

, where

[t, t

x

_{[∈ P r}

x

_(t)

,

[t, t

y

_{[∈ P r}

y

_(t)

and

[t, t

z

_{[∈ P r}

z

_(t)

. The problem of

Dis onne tionMat hing

DM (n)

istodis overthe

n

bestmat hesofonline

peersat anytime.

Theproblemofdis onne tionmat hingtypi allyarisesinappli ations

whereapeertriestondpartnerswithwhomitwantsto ollaborateuntil

the end of its session, in parti ular when starting su h a ollaboration

might be expensiveintermsof resour es.

Anexampleofsu hanappli ationistasks hedulinginP2Pnetworks.

InZorilla[7℄forexample,apeer ansubmita omputationtaskof

n

jobs

to thesystem.Insu ha ase, thepeertriestolo ate

n

onlinepeers(with

expanding ring sear h) to be ome partners for the task, and exe utes

the

n

jobs on these partners. When the omputation is over, the peer

olle ts the

n

results from the

n

partners. Withdis onne tion mat hing,

su hasystembe omesmu hmoree ient:by hoosingpartnerswhoare

likelytodis onne tatthesametimeasthepeer,thesystemin reasesthe

probabilitythat:

If the peer doesnot dis onne t too early, its partners will have time

tonish exe utingtheir jobsbeforedis onne tingand hewill be able

to olle t the results;

Ifthepeerdis onne tsbeforetheendofthe omputation,partnerswill

not waste unne essaryresour es asthey arealso likely to dis onne t

(6)

Fig.2. Presen eMat hing:peer

y

isabettermat hthanpeer

z

forpeer

x

.

2.3 The Problem of Presen e Mat hing

Intuitively, the problem of Presen e Mat hing is, for a peer online at a

given time, to nd a set of other online peers who are expe ted to be

onne ted at the same timeinthefuture.

Formally, for a peer

x

online at time

t

, an online peer

y

is a better

mat hfor Unfair Presen e Mat hingthan an onlinepeer

z

if:

||

[

P r

z

_{(t) ∩}

[

_{P r}

x

_{(t)|| < ||}

[

_{P r}

y

_{(t) ∩}

[

_{P r}

x

_(t)||

Thisproblemis alledunfair,sin epeerswhoarealwaysonlineappear

to be best mat hes for all other peers in the system, whereas only other

always-onpeersarebestmat hesforthem.Sin esomefairnessiswantedin

mostP2Psystems,oineperiodsshouldalsobe onsidered.Consequently,

y

is abettermat h than

z

for Presen e Mat hingif:

||

S

P r

z

_{(t) ∩}

S

_{P r}

x

_(t)||

||

S

P r

z

_{(t) ∪}

S

_{P r}

x

_(t)|

<

||

S

P r

y

_{(t) ∩}

S

_{P r}

x

_(t)||

||

S

P r

y

_{(t) ∪}

S

_{P r}

x

_(t)||

The problem of Presen e Mat hing

P M (n)

is to dis over the

n

best

mat hesof online peers atanytime.

Theproblemofpresen emat hing arisesinappli ations wherea peer

wants to nd partners that will be available at the same time in other

sessions. Thisis typi ally the ase when huge amount ofdata have to be

transferred,and thatpartnerswillhave to ommuni ate alotto usethat

data.

Anexample ofsu han appli ation isstorageof lesinP2P networks

[4℄.Forexample,inPasti he[6℄,ea hpeerinthesystemhastondother

peerstostoreitsles.Sin eles anonlybeusedwhenthepeerisonline,

thebestpartnersforapeer(atequivalentstability)arethepeerswhoare

expe tedto be online whenthe peeritself isonline.

Moreover,inaP2Pba kupsystem[8℄,peersusuallyrepla etherepli a

that annotbe onne ted for a given period,to maintain a givenlevel of

dataredundan y.Usingpresen emat hing,su happli ations anin rease

theprobabilityofbeingableto onne ttoalltheirpartners,thusredu ing

(7)

We think that epidemi proto ols [12,23,15,24℄ are good approximate

solutions forthese mat hingproblems.Here, wepresent oneof these

pro-to ols, T-Man[12℄ and, sin e su h proto ols rely heavily on appropriate

metri s, we proposeametri for ea h mat hingproblem.

3.1 Distributed Mat hing with T-Man

T-Man isa well-known epidemi proto ol, usually usedto asso iate ea h

peerinthenetwork withaset ofgoodpartners, givena metri (distan e

fun tion) between peers. Even in large-s ale networks, T-Man onverges

fast, and providesa good approximation of theoptimal solutionina few

rounds, where ea h round ostsonlyfour messagesinaverage perpeer.

In T-Man, ea h peer maintains two small sets, its random view and

its metri view, whi h are, respe tively, some randomneighbors, andthe

urrent best andidates for partnership, a ording to the metri in use.

During ea h round, every peer updates its views: with one random peer

inits randomview, itmergesthetwo random views,and keepsthe most

re ently seen peers in its random view; with the best peer in its metri

view,itmergesall the views,and keepsonly thebest peers,a ordingto

the metri ,inits metri view.

This double s heme guarantees a permanent shue of the random

views, while ensuring fast onvergen e of the metri views towards the

optimalsolution.Consequently,the hoi eofagoodmetri isvery

impor-tant.We proposesu hmetri s forthe twoavailabilitymat hingproblems

inthe next part.

3.2 Metri s for Availability Mat hing

To ompute e iently the distan ebetween peers,the predi tion

P r

x

_(t)

is approximated bya bitmap ofsize

m

,

pred

x

,where entry

pred

x

_[i]

is 1if

[i × T /m, (i + 1) × T /m[

isin ludedinanintervalof

P r

x

_(t)

for

0 ≤ i < m

.

Notethatthesemetri s anbeusedwithanyepidemi proto ol,not only

withT-Man.

Dis onne tion Mat hing The metri omputes the time between the

dis onne tions oftwo peers.In ase ofequality,thePM-distan eof 3.2is

usedto prefer peers withthesame availabilityperiods:

DM-distan e

(x, y) = |I

x

_{− I}

y

_|+

(8)

0 100000

200000

300000

400000

500000

600000

0

5

10

15

20

25

30 Number of Peers

Days

Global System Availability

Online Peers

Fig.3.Diurnal patternsare learly visible whenwe plotthenumberofonline peers

atany timeinour27-dayeDonkeytra e.Dependingonthetimeoftheday,between

300,000and600,000usersare onne tedtoasingleeDonkeyserver.

I

x

_{= min{0 ≤ i < m|pred}

x

_{[i] = 1 ∧ pred}

x

_{[i + 1] = 0}}

Presen eMat hing Themetri rst omputestheratioof o-availability

(time where both peers were simultaneously online) on total availability

(time where at least one peer was online). Sin e the distan e should be

lose to 0when peersare lose, we thenreverse thevalueon [0,1℄:

PM-distan e

(x, y) = 1 −

P

0≤i<m

min(pred

x

[i],pred

y

[i])

P

0 ≤i<m

max(pred

x

[i],pred

y

[i])

Note that, while the PM-distan e value is in [0,1℄, the DM-distan e

valueis in[0,m℄.

4 Simulation Settings

We evaluated our a solution based on T-Man on two appli ations, one

for ea h mat hing problem. In this se tion, we des ribe our simulation

settings.Inparti ular, wedes ribe the hara teristi s ofthetra ewe

ol-le ted for the needs of this study, with more than 300,000 online peers

on 27 days. With a few thousand peers online at the same time, most

other tra es olle ted on P2P systems[21,10,2℄ la k massive onne tion

anddis onne tion trends,forthe study ofavailabilitypatternsona large

(9)

1000

10000

100000

0

1

2

3

4

5

6

7

8

9 10 11 12 13 14

Number of peers (log scale)

Best pattern size (days)

Distribution of the Best Sizes of Patterns

Fig.4.Peersa hievetheirbestauto- orrelation(ressemblan ebetweensessionsaftera

givenperiod)betweensessionsforaone-dayperiodoraone-weekperiod.Consequently,

peers are highly likely to onne tat almost thesame timethe nextday or thenext

week.

4.1 A new eDonkey Tra e

In 2007, we olle ted the onne tion and dis onne tion events from the

logs of one of themain eDonkeyserversin Europe. Edonkey is urrently

the mostusedP2P le-sharingnetworkintheworld.Our tra e,available

on our website [1℄, ontains more than 200 millions of onne tions by

more than 14 millions of peers,overa period of 27 days. To analyse this

tra e, we rst ltereduseless onne tions (shorter than 10 minutes) and

suspi iousones(toorepetitive,simultaneousorwith hangingidentiers),

leading to a ltered tra e of12 millionpeers.

The number of peers online at the same time in the ltered tra e is

usually more than 300,000, asshown by Fig. 3. Global diurnal patterns

of around 100,000 users are also learly visible: as shown by previous

studies [11℄, most eDonkey users are lo ated in Europe, and so, their

daily oine periods are only partially ompensated by onne tions from

other ontinents.

For every peer inthe ltered tra e, the auto- orrelation on its

avail-abilityperiods was omputed on 14 days,with astep of one minute. For

a given peer,the period for whi h theauto- orrelation is maximum gives

itsbestpatternsize.Thenumberofpeerswithagivenbestpatternsizeis

plotted onFig.4,and shows,as ouldbe expe ted, thatthebest pattern

(10)

Ourgoalinthesesimulationswastoevaluate thee ien y ofour

mat h-ing proto ol, and not the e ien y of availability predi tors, as already

done in [18℄. As a onsequen e, we implemented a very straightforward

predi tor, thatusesa7-daywindowofavailabilityhistoryto omputethe

dailypattern ofapeer:forea hintervalof10minutesinaday,itsvalueis

the numberof days intheweekwhere the peerwasavailable duringthat

full interval:

pattern

p

[i] = Σ

_d∈[0:6]

history

p

[d ∗ 24 ∗ 60/10 + i]

Thispredi torhastwo purposes:

It should help the appli ation to de ide whi h peers are predi table,

andthus,whi hpeers anbenetfromanimprovedqualityofservi e.

Thisgivesanin entiveforpeerstoparti ipateregularlytothesystem;

it should help the appli ation to predi t future onne tions and

dis- onne tionsof the sele ted peers.

Tosele tpredi table peers,thepredi tor omputes, forea hpeer,the

maximum and the mean ovarian e of the peer daily pattern. For these

simulations, we omputed a set, alled predi table set, ontaining peers

mat hingwiththefollowing properties:

The maximum value in

pattern

is at least 5:ea h peer wasavailable

at leastvedaysduring thelast week exa tlyat thesametime;

Theaverage ovarian e in

pattern

isgreater than28: ea hpeerhasa

sharply-shaped behavior;

Peer availabilityis greater than 0.1: peers have to ontribute enough

to thesystem;

Peer availability is smaller than 0.9: peers whi h are always online

wouldbias positively our simulations.

Inour eDonkey tra e,this predi tableset ontains 19,600su h peers.

Notethatthisrelatively smallamountofpeers,w.r.t.thetotalnumberof

peersinourtra e,doesnotmeanthateDonkeypeersarenot predi table:

our tra e on erns onlya partofeDonkeyusersat measuretime(around

10%

,those onne ted to eDonkey Server N.2).Usersthatleave mayjoin

anotherserver(e.g.ServerN.1,alarger one),whi hmakestheminvisible

in our tra e, even though they are still using eDonkey. For every peer

in the set, the predi tor predi ts that the peer will be online in a given

(11)

0

0.2

0.4

0.6

0.8

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 Prediction Success

Availability

CDF of Peers (normalized)

Predictability of selected peers

prediction (predictable peers)

prediction (randomized bitmaps)

Availability

Fig.5. Whereas availability determinesthe predi tion with random bitmaps, daily

patternsimprovethepredi tionwithrealbitmaps(e.g.for60%ofpeers(x=0.4),50%

ofpredi tions(y=0.5)aresu essful,butonly25%withrandombitmaps).

otherwise predi ts nothing (we never predi t thata peer will be oine).

Theratio ofsu essfulpredi tionsafteraweekfor thefullfollowing week

is plotted on Fig. 5. It shows that predi tions annot be only explained

bya idental availability, and prove the presen eof availability patterns

inthe tra e.

We purposely hose a very simple predi tor, as we are interested in

showingthatpatternsofpresen earevisibleand anbenetappli ations,

even withaworst- aseapproa h.Therefore,weexpe tthatbetter results

wouldbea hieved usingmore sophisti atedpredi tors, su hasdes ribed

in[18℄,and for an optimalpattern size ofone dayinsteadof aweek.

4.3 General Simulation Setup

Asimulatorwasdevelopedfroms rat htorunthesimulationsonaLinux

3.2GHz Xeon omputer, for the19,600 peersof thepredi table set from

Se tion 4.2. Their behaviors on 14-days were extra ted from the

eDon-key tra e: the rst 7 days were used to ompute a predi tion, and that

predi tion, withoutupdates,wasusedto exe utetheproto olon the

fol-lowing seven days.During one round of thesimulator,all online peersin

randomorderevaluateone T-Manround, orrespondingtoone minuteof

the tra e.As explainedlater, both appli ations weredelayed bya period

(12)

two hoursand 6GB of memory footprint.

5 Simulated Appli ations

Inthisse tion,wedes ribethetwoappli ationsthatweusedto illustrate

theneedforane ientproto olfordistributedavailabilitymat hing.Our

goal is not to improve the performan e of these appli ations, asthis an

be done by an aggressive greedy algorithm, but to save resour es using

availabilityinformation.

5.1 Dis onne tion Mat hing: Task S heduling

To evaluatethee ien y ofT-Manand theDM-distan emetri ,we

sim-ulatedadistributedtasks hedulingappli ation.Inthisappli ation,every

peerstartsataskafter10 minutesonline: ataskis omposedof 3jobsof

4 hourson remotepartners, and is ompleted if thepeerand its partners

arestill online after4 hoursto olle t theresults.

The2rst hoursof ea hjob aredevotedto thedownload ofthedata

needed for the omputation from a entral server. As a onsequen e, a

peer an de ide not to start a taskto save the bandwidth of the entral

server. In our simulation, su h a de ision is taken when the predi tion

of the peer availability shows that the peer is going to go oine before

ompletion of the task.

5.2 Presen e Mat hing: P2P File-Storage

To evaluatethee ien y ofT-ManandthePM-distan e metri ,we

sim-ulatedaP2Plestorageappli ation.Inthisappli ation,everypeer

repli- atesits datato itspartners,tenminutes after omingonline fortherst

time, in the hope that he will be able to use this remote data the next

time itwill beonline.

Thesize of the dataof ea h peeris supposedto be large, hundred of

megabytesofexample.Asa onsequen e,itisimportantforthesystemto

useaslittle redundan y aspossibleto a hieve high o-availabilityofdata

(i.e. availability ofthe peer and at least one of its datarepli a). Finding

good partners in the network is expe ted to provide repli a whi h are

more likely to be available at thesame time asthepeer,thus de reasing

(13)

0 5000

10000

15000

20000

25000

30000

UptimeRandom

Number of Tasks

Impact of Disconnection Matching for P2P scheduling

Completed Tasks

Aborted Tasks

Week Mean

Day +7

Day + 1

Fig.6.Ataskisasetofthreeremotejobsof4hoursstartedbyeverypeer,tenminutes

after omingonline.Ataskissu essfulwhenthepeeranditspartnersarestillonline

after4hoursto olle ttheresults.Usingavailabilitypredi tions,apeer ande idenot

tostartataskexpe tedtoabort, leadingtofewerabortedtasks.Usingdis onne tion

mat hing,it anndgoodpartnersandit anstill ompletealmostas manytasksas

themu hmoreexpensiverandomstrategy.

6 Simulation Results

Inthisse tion,wepresenttheresultsofoursimulationsofthetwo

appli a-tions.We arenot interestedintherawperforman eoftheseappli ations,

butinthesavingsthat ouldbea hievedbyusingavailabilityinformation

and partner mat hing.

6.1 Results for Dis onne tion Mat hing

We ompared Dis onne tionMat hingwithaRandom hoi e ofpartners

(a tually, usingpartners withinT-Man random view) for thedistributed

tasks hedulingappli ation.Thenumberof ompletedtasksandthe

num-berof aborted tasks are plotted on Fig. 6, for the rst day, the 7

th

day

and the wholeweek.

Predi tion of availability de reased by 68% the number of aborted

tasksonaverageoveraweek, orrespondingto 50%ofbandwidthsavings

on the data server, while de reasing the number of ompleted tasks by

only 17%.

These results were largely improved using one-day predi tion, sin e

(14)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 Uptime

Random

Uptime

Random

Uptime

Random

Co-Availability

Impact of Presence Matching for P2P File Storage

1 repl

2 repl

3 repl

4 repl

5 repl

6 repl

7 repl

8 repl

9 repl

Week

Day + 7

Day + 1

Fig.7. 10 minutesafter oming online for the rst time, ea h peer reates a given

numberofrepli afor itsdata.Co-availabilityisdenedbythesimultaneouspresen e

ofthepeerandatleastonerepli a.Usingpresen emat hing,fewerrepli asareneeded

to a hieve better results thanusinga random hoi e of partners.Eventhe 7th day,

usinga 6-day old predi tion, the systemstill performs mu hmoree iently,almost

ompensatingthegenerallossinavailability.

inSe tion4.1).Indeed,bandwidthsavingswereabout43%for

Dis onne -tion Mat hing, while ompleting 20%more tasks. Thus, itis mu h more

interestingfromaperforman epointofviewtouseone-daypredi tion

ev-erydayinsteadof one-week predi tion,althoughsavingsarestillpossible

withone-week predi tions.

6.2 Results for Presen e Mat hing

We ompared Presen e Mat hing with a Random hoi e of repli a

lo a-tions for the P2P le-systemappli ation. The o-availability of the peer

andat leastone repli aisplotted onFig.7,for dierent numberof

repli- as.

Usingpresen emat hing,fewerrepli aswereneededtoa hievebetter

results than using a random hoi e of partners. For example, 1 repli a

withPresen eMat hinggivesabetter o-availabilitythan 2repli aswith

RandomChoi e; 5repli as withPresen e Mat hinggive a o-availability

of 95% whi h is only a hieved using 9 repli as with Random Choi e. As

forthe otherappli ation,week-oldpredi tionsperformedstill betterthan

(15)

In this paper, we showed that epidemi proto ols for topology

manage-ment anbee ient tondgoodpartnersinavailability-awarenetworks.

Simulations proved that, using one of these proto ols and appropriate

metri s,su h appli ations anbelessexpensiveand stillperformwithan

equivalent or better quality of servi e. We used a worst- ase s enario: a

simple predi tor, and a tra e olle ted from a highly volatile le-sharing

network, where only a small subset of peers provide predi table

behav-iors.Consequently,weexpe tthatarealappli ationwouldtakeevenmore

benetfromavailabilitymat hing proto ols.

Inparti ular,untilthiswork,availability-awareappli ationswere

lim-itedtousingpredi tionsoravailabilityinformationtobetter hooseamong

a limitedset of neighbors. This work opens thedoor to new

availability-aware appli ations, where best partners are hosen among all available

peers in the network. It is a useful omplement to the work done on

measuring availability[19,17℄ and using these measures to predi t future

availability[18℄.

Referen es

1. Tra e. http://fabri e.lefessant.net/tra es/edonkey2.

2. Bhagwan, R., Savage, S., and Voelker, G. Understanding availability. In

IPTPS,Int'lWork.onPeer-to-Peer Systems (2003).

3. Bhagwan,R.,Tati,K.,Cheng,Y.-C.,Savage,S.,andVoelker,G.M.Total

re all:systemsupportforautomatedavailabilitymanagement.InNSDI,Symp.on

NetworkedSystemsDesignandImplementation (2004).

4. Bus a, J.-M., Pi oni, F., and Sens, P. Pastis: Ahighly-s alable multi-user

peer-to-peerlesystem. InPro eedingsofEuro-Par (2005).

5. Chun, B.-G., Dabek, F., Haeberlen, A., Sit, E., Weatherspoon, H.,

Kaashoek, M. F., Kubiatowi z, J., and Morris, R. E ient repli a

main-tenan e for distributed storage systems. In NSDI, Symp. on Networked Systems

DesignandImplementation(2006).

6. Cox,L.P., Murray,C.D.,andNoble,B.D.Pasti he:Makingba kup heap

andeasy.InOSDI,Symp.onOperatingSystemsDesignandImplementation(2002).

7. Drost, N., van Nieuwpoort, R. V., and Bal, H. E. Simple lo ality-aware

o-allo ationinpeer-to-peersuper omputing. InGP2P,Int'lWork. onGlobaland

Peer-2-PeerComputing (2006).

8. Duminu o, A., Biersa k, E. W., and En Najjary, T. Proa tive repli ation

indistributedstoragesystemsusingma hineavailabilityestimation. InCoNEXT,

Int'lConf. onemergingNetworkingEXperimentsandTe hnologies (2007).

9. Godfrey,P.B.,Shenker, S., andStoi a,I. Minimizing hurnindistributed

systems. InSIGCOMM, Conf. on Appli ations, Te hnologies, Ar hite tures, and

Proto olsforComputer Communi ations (2006).

10. Guha, S., Daswani, N., and Jain, R. An Experimental Studyof the Skype

(16)

andPatarin,S. Peersharingbehaviourintheedonkeynetwork,andimpli ations

forthedesignofserver-lesslesharingsystems. InEuroSys (2006).

12. Jelasity,M.,and Babaoglu,O. T-man:Gossip-basedoverlaytopology

man-agement. InESOA,Intl'lWork.onEngineering Self-Organising Systems(2005).

13. Jelasity,M.,andKermarre ,A.-M. Orderedsli ingofverylarge-s ale

over-lay networks. IEEE International Conferen e onPeer-to-Peer Computing (2006),

117124.

14. Kermarre , A.-M., Le Merrer, E., Liu, Y., and Simon, G. Surng

peer-to-peeriptv:Distributed hannelswit hing. Pro eedingsofEuro-Par (2009).

15. Killijian, M.-O., Courtès, L., and Powell, D. A Survey of Cooperative

Ba kupMe hanisms. Te h.Rep.06472,LAAS,2006.

16. Kim, K. Lifetime-aware repli ation for data durability inp2p storage network.

IEICETransa tions91-B,12(2008),40204023.

17. Le Fessant, F., Sengul, C., and Kermarre , A.-M. Pa emaker: Fighting

Selshness inAvailability-Aware Large-S ale Networks. Te h.Rep.RR-6594,

IN-RIA,2008.

18. Mi kens, J. W., and Noble, B. D. Exploitingavailability predi tionin

dis-tributedsystems.InNSDI,Symp.onNetworkedSystemsDesignand

Implementa-tion(2006).

19. Morales,R.,andGupta,I.AVMON:Optimalands alabledis overyof

onsis-tentavailabilitymonitoringoverlaysfordistributedsystems.InICDCS,Int'lConf.

onDistributedComputingSystems (2007).

20. Sa ha,J.,Dowling,J.,Cunningham,R.,andMeier,R.Dis overyofstable

peers inaself-organising peer-to-peergradienttopology. InDAIS,Int'lConf. on

DistributedAppli ationsandInteroperable Systems (2006).

21. Saroiu,S.,Gummadi,P.K.,and Gribble,S. Ameasurementstudyof

peer-to-peer le sharing systems. In MMCN, Multimedia Computing and Networking

(2002).

22. Stutzba h,D.,and Rejaie,R.Understanding hurninpeer-to-peernetworks.

InIMC,Internet MeasurementConf.(2006).

23. Voulgaris,S.,Gavidia,D.,andvanSteen,M.CYCLON:Inexpensive

mem-bershipmanagementforunstru turedP2Poverlays.J.NetworkSyst.Manage.13,

2(2005).

24. Voulgaris,S.,vanSteen,M.,andIwani ki,K.Proa tivegossip-based

man-agementofsemanti overlaynetworks:Resear harti les.Con urr.Comput.:Pra t.

Exper.19,17(2007),22992311.

25. Xin, Q., S hwarz,T., and Miller, E.L. Availability inglobal peer-to-peer