HAL Id: inria-00352529
https://hal.inria.fr/inria-00352529
Submitted on 13 Jan 2009
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Finding Good Partners in Availability-aware P2P
Networks
Stevens Le Blond, Fabrice Le Fessant, Erwan Le Merrer
To cite this version:
Stevens Le Blond, Fabrice Le Fessant, Erwan Le Merrer. Finding Good Partners in Availability-aware
P2P Networks. [Research Report] RR-6795, INRIA. 2009, pp.17. �inria-00352529�
a p p o r t
d e r e c h e r c h e
IS
S
N
0
2
4
9
-6
3
9
9
IS
R
N
IN
R
IA
/R
R
--6
7
9
5
--F
R
+
E
N
G
Thème COM
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
Finding Good Partners in Availability-aware P2P
Networks
Stevens Le Blond — Fabrice Le Fessant — Erwan Le Merrer
N° 6795
Centre de recherche INRIA Saclay – Île-de-France
Parc Orsay Université
4, rue Jacques Monod, 91893 ORSAY Cedex
Téléphone : +33 1 72 92 59 00
Networks Stevens Le Blond∗
, Fabri eLe Fessant†
,Erwan LeMerrer‡
ThèmeCOMSystèmes ommuni ants Équipes-ProjetsAsapet Planète
Rapportdere her he n°6795January200914pages
Abstra t:
Inthispaper,westudytheproblemofndingpeersmat hingagiven avail-abilitypatternin apeer-to-peer(P2P) system. Werstprovetheexisten eof su hpatternsinanewtra eoftheeDonkeynetwork, ontainingthesessionsof 14Mpeersover27days. Wealsoshowthat,usingonly7daysofhistory,a sim-ple predi tor an sele t predi table peers and su essfully predi ttheir online periods for thenextweek. Then, motivated bypra ti alexamples, wespe ify twoformalproblemsofavailabilitymat hingthatariseinrealappli ations: dis- onne tion mat hing, where peers look for partners expe ted to dis onne t at thesametime, andpresen e mat hing, wherepeerslook forpartnersexpe ted to be online simultaneously in the future. As a s alable and inexpensive so-lution, we propose to use epidemi proto ols for topologymanagement, su h asT-Man;weprovide orrespondingmetri sfor both mat hingproblems. Fi-nally, weevaluated this solutionby simulatingtwo P2Pappli ations overour real tra e: task s heduling and le storage. Simulationsshowed that our sim-ple solution provided good partners fast enough to mat h the needs of both appli ations, andthat onsequently,these appli ationsperformedase iently at amu h lower ost. Webelieve that this work will beuseful formany P2P appli ationsforwhi hithasbeenshownthat hoosinggoodpartners,basedon theiravailability,drasti allyimprovestheire ien y.
Key-words: availability,peer-to-peer,mat hing,epidemi ,proto ols
∗
INRIASophiaAntipolis
†
INRIASa lay
‡
réseau pair-à-pair en fon tion de sa disponibilité
Résumé:
Dans e papier, nousétudions leproblématique detrouverdes partenaires suivantun ritèrededisponibilitédansunréseaupair-à-pair.Nous ommençons parmontrer l'existen e derégularitésde disponibilité dans une nouvelletra e duréseaueDonkey, ontenantlessessions de14Mde pairssur27 jours. Nous montrons aussi que, en utilisant 7 jours d'historique, une prédi teur simple peut séle tionnerdes pairsprévisibles et prédire ave su ès leurspériodesde disponibilitésur lasemainesuivante. Ensuite, nousspé ionsdeux problèmes formelsde séle tionen fon tionde ladisponibilité, qui seprésentent dansdes appli ationsréelles: laséle tionpourladé onnexion,quire her helespairsqui sedé onne terontprobablementenmêmetemps,etlaséle tionpourlaprésen e, qui re her helespairs quiserontprobablement présentsen même tempsdans lefutur. Commesolution peu oûteuse et passantà l'é helle, nous proposons d'utiliser desproto olesépidémiques degestion detopologie, telsque T-Man; nousfournissonslesmétriques orrespondantànosdeuxproblèmes. Finalement, nousavonsévalué ette solutionen simulantdeux appli ationspair-à-pair sur notretra eréelle. Lessimulationsontmontréquenotresimplesolutionfournit debonspartenairessusammentvitepourlesbesoinsdesdeuxappli ations,et qu'en onséquen e, es appli ations fon tionnent aussi e a ement àun oût bienmoindre. Nouspensonsque etravailserautilepourtouteslesappli ations pair-à-pairpourlesquelsil aétémontré quele hoixdebonspartenairespeut augmenter onsidérablementlesperforman es.
0
100000
200000
300000
400000
500000
600000
0
5
10
15
20
25
30
Number of Peers
Days
Global System Availability
Online Peers
(a)SystemAvailability
1000
10000
100000
0
1
2
3
4
5
6
7
8
9
10 11 12 13 14
Number of peers (log scale)
Best pattern size (days)
Distribution of the Best Sizes of Patterns
(b) Auto-Correlation on Ses-sions
0
0.2
0.4
0.6
0.8
1
0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Prediction Success
Availability
CDF of Peers (normalized)
Predictability of selected peers
prediction (predictable peers)
prediction (randomized bitmaps)
Availability
( ) Predi tion versusA vailabil-ity
Figure1: (a): Diurnalpatternsareobviouslyvisibleontheglobalsystem avail-ability. (b) The auto- orrelation on the sessions shows that the best pattern size is oneday, followedbyoneweek. ( ) Whereas availabilitydeterminesthe predi tion with random bitmaps, daily patterns improve the predi tion with real bitmaps (e.g. for 60% of peers (x=0.4), 50% of predi tions (y=0.5) are su essful,but only25%withrandombitmaps).
1 Introdu tion
Churnisoneofthemost riti al hara teristi sofpeer-to-peer(P2P)networks, as the permanent ow of peer onne tions and dis onne tions an seriously hamperthee ien y of appli ations[9℄. Fortunately, it hasbeenshownthat, formanypeers,theseeventsgloballyobeysomeavailabilitypatterns([18,19,2℄), andso, anbepredi tedfromtheuptimehistoryofthosepeers[15℄.
Totakeadvantageofthese predi tions,appli ationsneed tobeableto dy-nami allyndgoodpartnersforpeers,a ordingtotheseavailabilitypatterns, even in large-s aleunstru tured networks. The intrinsi onstitution of those networksmakespure randommat hingte hniquesto betime-ine ientfa ing hurn.
Inthis paper,we study ageneri te hniqueto dis oversu h partners, and applyitfortwoparti ularmat hing problems: dis onne tion mat hing, where peerslookfor partnersexpe ted to dis onne tat thesametime, andpresen e mat hing, where peers look for partners expe ted to beonline simultaneously in thefuture. These problemsarespe iedin Se tion 3. Wethenexplainthat T-Man[12℄,astandardepidemi algorithmfortopologymanagement,isagood andidatetosolvetheseproblems. However,inorderto onvergetothedesired state or topology (here mat hed peers), T-Man needs an a urate metri to ompute the distan e between peers. In Se tion 4, we des ribe how T-Man worksandproposeaparti ularmetri forea hof ourmat hingproblems.
Toevaluatethee ien yofourproposal,wesimulateanappli ationforea h mat hingproblem: anappli ationoftasks heduling,wheretasksofmultiple
motejobsarestartedbyallthepeersin thenetwork(dis onne tionmat hing), andanappli ationofP2Ple-system,wherepeersrepli atelesonotherpeers tohavethemhighlyavailable(presen emat hing). Torunoursimulationsona realisti workload,we olle tedanewtra eofpeeravailabilityontheeDonkey le-sharingnetwork. Withthe onne tionsanddis onne tionof14Mpeersover 27days,thistra eisthelargestavailableworkload, on erningpeers' availabil-ity. InSe tion2,weshowthat peersinthis tra eexhibit availabilitypatterns, and,usingasimple7-daypredi tor,thatitispossibletosele tpredi tablepeers andsu essfullypredi ttheirbehavioroverthefollowingweek.
Oursimulationresults,inSe tion5,showthatourT-Manbasedsolutionis ableto providegood partners to all peers, forbothappli ations. Using avail-abilitypatterns,bothappli ationsareabletokeepthesameperforman e,while onsuming 30% less resour es, ompared to a random sele tion of partners. Moreover, T-Man is s alable and inexpensive, making the solutionusable for anyappli ationandnetworksize.
We believe that manyP2P systemsand appli ations anbenet from this work,asalot of availability-awareappli ations havebeen proposed in the lit-erature[3,8,17, 5,22℄. Close to ourwork,[9℄ showsthat strategiesbased on the longest urrent uptime are more e ient than uptime-agnosti strategies for repli a pla ement; [15℄ introdu es sophisti ated availabilitypredi tors and showsthat they anbeverysu essful. However,tothebestofourknowledge, this paper is the rst to deal with the problem of nding the best partners a ordingto availabilitypatternsin alarge-s alenetwork. Moreover,previous resultsareoften omputedonsyntheti tra esorsmalltra esofP2Pnetworks.
2 Availability Patterns in eDonkey
Inthisse tion,wedes ribethe hara teristi softhetra ewe olle tedforthe needsofthis study. With afew thousandpeersonlineat thesametime, most other tra es olle tedonP2P systems[18, 10, 2℄la k massive onne tionand dis onne tiontrends,forthestudyofavailabilitypatternsonalarges ale.
2.1 The eDonkey Tra e
In 2007, we olle ted the onne tion and dis onne tion events from the logs of one of the main eDonkey servers in Europe. Our tra e, available on our website [1℄, ontains more than 200 millions of onne tions by more than 14 millions of peers, over a period of 27 days. To analyse this tra e, we rst ltereduseless onne tions(shorterthan 10minutes) andsuspi ious ones(too repetitive,simultaneousorwith hangingidentiers),leadingtoalteredtra e of12millionpeers.
Thenumberofpeersonlineatthesametimeinthelteredtra eisusually morethan 300,000,asshown byFig. 1(a). Global diurnalpatterns of around 100,000 usersare also learlyvisible: asshownby previousstudies [11℄, most eDonkeyusersarelo atedinEurope,andso,theirdailyoineperiodsareonly partially ompensatedby onne tionsfromother ontinents.
Foreverypeer in theltered tra e, the auto- orrelationon its availability periodswas omputedon14days,withastepofone minute. Foragivenpeer, the period for whi h the auto- orrelation is maximum gives its best pattern
size. ThenumberofpeerswithagivenbestpatternsizeisplottedonFig.1(b), andshows,as ouldbeexpe ted,that thebest patternsizeisaday,andmu h further,aweek.
2.2 Filtering and Predi tion
Weimplementedastraightforwardpredi tor,thatusesa7-daywindowof avail-ability historyto omputethe daily pattern of apeer: for ea h interval of 10 minutesin aday, itsvalue is the numberof daysin theweek where the peer wasavailableduring thatfullinterval.
Thispredi torhastwopurposes: (1)Itshouldhelptheappli ationtode ide whi hpeersarepredi table,and thus, anbenetfromanimprovedqualityof servi e. Thisgivesanin entiveto peersto parti ipateregularlytothesystem; (2) itshouldhelp theappli ation topredi tfuture onne tionsand dis onne -tionsofthesele tedpeers. Tosele tpredi tablepeers,thepredi tor omputes, forea hpeer, themaximumandthemean ovarian eofthepeerdailypattern. Forthispaper,we omputedaset, alledpredi tableset, ontaining19,600peers whosemaximumisatleast5(predi tionthreshold),andwhosemean ovarian e isgreaterthan28( learbehavior). Wealsoremovedthepeerswhose availabil-ity was smaller than 0.1 (useless peers) or greater then 0.9 (they would bias positivelyourexperiments).
For everypeer in the predi table set, the predi tor predi ts that the peer willbeonlineinagivenintervalifthepeer'sdailypatternvalueforthatinterval isat least5,and otherwisepredi tsnothing (weneverpredi tthat apeerwill beoine). Theratioofsu essfulpredi tionsafteraweekforthefullfollowing weekisplottedonFig.1( ). Itshowsthatpredi tions annotbeonlyexplained bya identalavailability,andprovethepresen eofavailabilitypatternsinthe tra e.
Wepurposely hoseaverysimplepredi tor,asweareinterestedinshowing that patterns of presen eare visibleand anbenetappli ations, even witha worst- aseapproa h. Therefore,weexpe tthatbetterresultswouldbea hieved usingmoresophisti atedpredi tors,su hasdes ribedin[15℄,andforanoptimal patternsize ofonedayinsteadofaweek.
3 Problem Spe i ation
Thisse tionpresentstwoavailabilitymat hingproblems,dis onne tion mat h-ing and presen e mat hing. Ea h problem is abstra ted from the needs of a pra ti al P2P appli ation that we des ribe afterward. But rst, we start by introdu ingoursystemmodel.
3.1 System and Network Model
We assumea fully- onne tedasyn hronous P2P network of
N
nodes, withN
usuallyrangingfromthousandstomillionsofnodes. Weassumethatthereisa onstantboundn
c
onthenumberofsimultaneous onne tionsthat apeer an engage in,typi allymu hsmallerthanN
. Whenpeersleavethesystem,they dis onne tsilently. However,weassumethatdis onne tionsare dete tedafter atime∆
disc
,forexamplethirtyse ondswithTCPkeep-alive.Forea hpeer
x
,weassumetheexisten eofanavailabilitypredi tionP r
x
(t)
, startingatthe urrenttime
t
andforaperiodT
inthefuture,su hthatP r
x
(t)
is a set of non-overlapping intervals during whi hx
is expe ted to be online. Sin ethese predi tionsare basedonprevious measuresof availabilityforpeerx
,weassumethatsu hmeasuresarereliable,eveninthepresen eofmali ious peers[16, 14℄.We note
S
P r
x
(t)
the set dened by the unionof theintervals of
P r
x
(t)
, and
||S||
thesizeofasetS
.3.2 The Problem of Dis onne tion Mat hing
Intuitively, the problem of Dis onne tion Mat hing is, for a peer online at a giventime,tondasetofotheronlinepeerswhoareexpe tedtodis onne tat thesametime.
Formally, for a peer
x
online at timet
, an online peery
is a better mat h forDis onne tion Mat hingthananonlinepeerz
if|t
x
− t
y
| < |t
x
− t
z
|
,where[t, t
x
[∈ P r
x
(t)
,[t, t
y
[∈ P r
y
(t)
and[t, t
z
[∈ P r
z
(t)
. TheproblemofDis onne tion Mat hing
DM (n)
isto dis overthen
best mat hesofonlinepeersatanytime.Theproblem ofdis onne tion mat hingarises inappli ations whereapeer tries to nd partners with whom it wants to ollaborate until the end of its session.
An exampleof su h an appli ation is task s heduling in P2Pnetworks. In Zorilla[7℄,apeer ansubmita omputationtaskof
n
jobstothesystem. Insu h a ase, thepeertries to lo aten
online peers (withexpanding ringsear h)to be omepartnersforthetask,andexe utesthen
jobsonthesepartners. When the omputation is over, the peer olle ts then
results from then
partners. With dis onne tionmat hing, su hasystembe omesmu h moree ient: by hoosing partners who are likely to dis onne t at the sametime asthe peer, thesystemin reasestheprobabilitythat(1)ifthepeerdoesnotdis onne ttoo early, its partners will havetime to nish exe uting their jobs before dis on-ne tingandhewillbeableto olle ttheresults,and(2)ifthepeerdis onne ts beforetheendofthe omputation,partnerswillnotwasteunne essaryresour es astheyarealsolikelytodis onne tat thesametime.3.3 The Problem of Presen e Mat hing
Intuitively, theproblem of Presen e Mat hing is, for a peer online at agiven time, to nd aset of other online peers who are expe ted to be onne ted at thesametimeinthefuture.
Formally, fora peer
x
onlineat timet
, an online peery
is abetter mat h forUnfair Presen eMat hingthananonlinepeerz
if:||
[
P r
z
(t) ∩
[
P r
x
(t)|| < ||
[
P r
y
(t) ∩
[
P r
x
(t)||
Thisproblemisqualiedasunfair,sin epeerswhoarealwaysonlineappear tobebestmat hesforallotherpeersinthesystem,whereasonlyother always-onpeersarebestmat hesforthem. Sin esomefairnessiswantedinthesystem, oineperiodsshouldalsobe onsidered. Consequently,
y
isabettermat hthanz
forPresen eMat hingif:||
S
P r
z
(t) ∩
S
P r
x
(t)||
||
S
P r
z
(t) ∪
S
P r
x
(t)|
<
||
S
P r
y
(t) ∩
S
P r
x
(t)||
||
S
P r
y
(t) ∪
S
P r
x
(t)||
TheproblemofPresen eMat hing
P M (n)
istodis overthen
bestmat hes ofonlinepeersatanytime.Theproblemofpresen emat hingarisesinappli ationswhereapeerwants to ndpartnersthat willbeavailableat thesametimein othersessions. This istypi allythe asewhenhugeamountofdatahavetobetransferred,andthat partnerswill haveto ommuni atealottousethatdata.
An example of su h an appli ation is storage of les in P2P networks [4℄. Forexample, in Pasti he [6℄, ea h peer in the systemhas to nd other peers to storeitsles. Sin eles anonlybeused whenthepeer isonline, thebest partners fora peer (atequivalentstability) arethe peers whoare expe tedto beonlinewhen thepeeritselfisonline.
Moreover,inaP2Pba kup system[8℄,peersusuallyrepla etherepli athat annotbe onne tedforagivenperiod,tomaintainagivenlevelofdata redun-dan y. Usingpresen emat hing,su happli ations anin reasetheprobability of beingableto onne tto alltheirpartners, thus redu ingtheirmaintenan e ost.
4 Uptime Mat hing with Epidemi Proto ols
Wethinkthatepidemi proto ols[20,21,13℄aregoodapproximatesolutionsfor these mat hing problems. Here,wepresentoneof these proto ols, T-Man[12℄ and, sin e su h proto ols rely heavily on appropriate metri s, we propose a metri forea hmat hingproblem.
4.1 Distributed Mat hing with T-Man
T-Man is awell-knownepidemi proto ol,usuallyused to asso iate ea h peer in thenetwork with aset ofgoodpartners, given ametri (distan e fun tion) betweenpeers. Eveninlarge-s alenetworks,T-Man onvergesfast,andprovides agoodapproximationoftheoptimalsolutioninafewrounds,whereea hround ostsonlyfourmessagesin averageperpeer.
InT-Man,ea hpeermaintainstwosmallsets,itsrandomviewanditsmetri view, whi h are, respe tively, some random neighbors, and the urrent best andidatesforpartnership,a ordingtothemetri inuse. Duringea hround, every peer updates its views: with one random peer in its random view, it
merges the two random views, and keeps the most re ently seen peers in its randomview;withthebestpeerin itsmetri view,itmergesalltheviews,and keepsonlythebestpeers,a ordingto themetri ,initsmetri view.
This double s heme guarantees a permanent shue of the random views, whileensuringfast onvergen eofthemetri viewstowardstheoptimalsolution. Consequently, the hoi eofagood metri isveryimportant. Wepropose su h metri sforthetwoavailabilitymat hingproblemsin thenextpart.
4.2 Metri s for AvailabilityMat hing
To ompute e iently the distan e between peers in T-Man, the predi tion
P r
x
(t)
is approximatedbyabitmap ofsize
m
,pred
x
,where entry
pred
x
[i]
is 1 if[i × T /m, (i + 1) × T /m[
is in ludedin anintervalofP r
x
(t)
for0 ≤ i < m
. 4.2.1 Dis onne tion Mat hingThe metri omputes the time between the dis onne tions of two peers. In aseofequality,thePM-distan eof4.2.2isusedtopreferpeerswiththesame availabilityperiods:
DM-distan e
(x, y) = |I
x
− I
y
|+
PM-distan e
(x, y)
whereI
x
= min{0 ≤ i < m|pred
x
[i] = 1 ∧ pred
x
[i + 1] = 0}
4.2.2 Presen e Mat hing
The metri rst omputes the ratio of o-availability (time where bothpeers were simultaneouslyonline)ontotalavailability(time whereat least onepeer was online). Sin ethe distan e should be loseto 0when peers are lose, we thenreversethevalueon[0,1℄:
PM-distan e
(x, y) = 1 −
P
0≤i<m
min(pred
x
[i],pred
y
[i])
P
0
≤i<m
max(pred
x
[i],pred
y
[i])
Notethat,whilethePM-distan evalueisin [0,1℄,theDM-distan evalueis in[0,m℄.
5 Simulations and Results
We evaluated the performan e of T-Man plus the metri s of Se tion 4.2, by simulatingthetwoappli ationsofSe tion3ontheeDonkeytra eofSe tion 2.
5.1 General Simulation Setup
Asimulator wasdevelopedfrom s rat hto run thesimulationsonaLinux3.2 GHzXeon omputer,forthe19,600peersofthepredi tablesetfromSe tion2.2. Theirbehaviorson14-dayswereextra tedfrom theeDonkeytra e: therst7 dayswereusedto omputeapredi tion,andthat predi tion,withoutupdates, wasusedtoexe utetheproto olonthefollowingsevendays. Duringoneround ofthe simulator, all onlinepeers in random orderevaluate oneT-Man round, orrespondingtooneminuteofthetra e. Asexplainedlater,bothappli ations weredelayedbyaperiodof10minutesafterapeerwould omeonlinetoallow
0
5000
10000
15000
20000
25000
30000
UptimeRandom
UptimeRandom
UptimeRandom
Number of Tasks
Impact of Disconnection Matching for P2P scheduling
Completed Tasks
Aborted Tasks
Week Mean
Day +7
Day + 1
Figure 2: A task isa setof threeremotejobs of4 hoursstarted by everypeer, ten minutesafter omingonline. Ataskis su essfulwhenthe peeranditspartnersare still onlineafter 4hours to olle t the results. Using availability predi tions, apeer ande idenottostartataskexpe tedtoabort,leadingtofewerabortedtasks. Using dis onne tionmat hing,it anndgoodpartnersandit anstill ompletealmostas manytasksasthemu hmoreexpensiverandomstrategy.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Uptime
Random
Uptime
Random
Uptime
Random
Co-Availability
Impact of Presence Matching for P2P File Storage
1 repl
2 repl
3 repl
4 repl
5 repl
6 repl
7 repl
8 repl
9 repl
Week
Day + 7
Day + 1
Figure3: 10minutesafter omingonlineforthersttime,ea hpeer reatesagiven numberofrepli aforitsdata.Co-availabilityisdenedbythesimultaneouspresen e ofthepeerandatleastonerepli a. Usingpresen emat hing,fewerrepli asareneeded toa hievebetter resultsthanusingarandom hoi eof partners. Eventhe 7thday, usinga6-day oldpredi tion, the systemstill performsmu hmore e iently,almost ompensatingthegeneralloss inavailability.
T-Mantoprovideausefulmetri view. The omputationofa ompleterundid notex eedtwohoursand 6GBofmemoryfootprint.
5.2 Evaluation of Dis onne tion Mat hing
The task s heduling appli ation of Se tion 3.2 wassimulated to evaluate the performan eof T-Manand theDM-distan e metri . Inthesimulations,every peerstartedataskafter10minutesonline: ataskranthree jobsof4hourson remotepartners,andwas ompletedifthepeeranditspartnerswerestillonline after 4 hoursto olle t theresults. A peer ouldde ide notto start atask if the predi tion of its own availability fore ast that he would go oine before ompletion of the task. The number of aborted/ ompleted tasks is plotted on Fig. 2, for the rst day, the seventh day and the whole week for either dis onne tion mat hing(uptime) orrandom hoi e(randompeer hosenin T-Manrandomview).
Predi tionof availabilityde reasedalot thenumberof abortedtasks,and, with fewer started tasks, dis onne tion mat hing ompleted almost the same numberoftasksasrandommat hing,evenoverthefullweek,whenthe predi -tionwassupposedtobelessa urate(seeauto- orrelationinSe tion 2.1).
5.3 Evaluation of Presen e Mat hing
The P2P le storage of Se tion 3.3 was also simulated with T-Man and the PM-distan emetri . Everypeerrepli ateditsdataonitspartners,tenminutes after omingonlinefor thersttime, inthe hopeof usingitsremote datathe next time itwould be online. The o-availability of thepeer and at leastone repli aisplotted onFig.3,fordierentnumberofrepli as.
Usingpresen emat hing,fewerrepli aswereneededtoa hievebetterresults thanusingarandom hoi eofpartners. Asintheprevioussimulations,week-old predi tionsperformedstillbetterthanrandom hoi e.
6 Dis ussion and Con lusion
Inthispaper,weshowedthatepidemi proto olsfortopologymanagement an be e ient to nd good partners in availability-aware networks. Simulations provedthat,usingoneoftheseproto olsandappropriatemetri s,su h appli a-tions anbelessexpensiveandstillperformwithanequivalentorbetterquality of servi e. Weusedaworst- ases enario: asimplepredi tor,and atra e ol-le ted from a highly volatile le-sharing network, where only a small subset of peers provide predi table behaviors. Consequently, we expe t that a real appli ationwouldtakeevenmorebenetfromavailabilitymat hingproto ols. Inparti ular,untilthiswork,availability-awareappli ationswerelimitedto usingpredi tionsoravailabilityinformationtobetter hooseamongalimitedset ofneighbors. Thisworkopensthedoorto newavailability-awareappli ations, wherebestpartnersare hosenamongallavailablepeersinthenetwork. Itisa useful omplementtotheworkdoneonmeasuringavailability[16,14℄andusing these measurestopredi tfutureavailability[15℄.
Referen es
[1℄ Tra e. http://fabri e.lefessant.net/tra es/edonkey2.
[2℄ Ranjita Bhagwan, Stefan Savage, and Georey Voelker. Understanding availability. InIPTPS,Int'lWork. onPeer-to-Peer Systems,2003.
[3℄ RanjitaBhagwan,KiranTati,Yu-ChungCheng, StefanSavage,and Geof-frey M. Voelker. Total re all: system support for automated availability management. In NSDI, Symp. on Networked Systems Design and Imple-mentation,2004.
[4℄ Jean-Mi helBus a,FabioPi oni,andPierreSens.Pastis: Ahighly-s alable multi-userpeer-to-peerlesystem. InEuro-Par,2005.
[5℄ Byung-Gon Chun, Frank Dabek, Andreas Haeberlen, Emil Sit, Hakim Weatherspoon, M. Frans Kaashoek, John Kubiatowi z, and Robert Mor-ris. E ientrepli a maintenan efordistributed storagesystems. InNSDI, Symp.on NetworkedSystemsDesignandImplementation,2006.
[6℄ Landon P. Cox, Christopher D. Murray, and Brian D. Noble. Pasti he: Making ba kup heap and easy. In OSDI, Symp. on Operating Systems DesignandImplementation, 2002.
[7℄ Niels Drost, Rob V. van Nieuwpoort, and Henri E. Bal. Simple lo ality-aware o-allo ationin peer-to-peer super omputing. InGP2P, Int'lWork. onGlobal andPeer-2-Peer Computing,2006.
[8℄ AlessandroDuminu o,ErnstWBiersa k,andTaoukEnNajjary.Proa tive repli ationindistributedstoragesystemsusingma hineavailability estima-tion. In CoNEXT, Int'l Conf. on emerging Networking EXperiments and Te hnologies,2007.
[9℄ P. Brighten Godfrey, S ott Shenker, and Ion Stoi a. Minimizing hurn in distributed systems. In SIGCOMM, Conf. on Appli ations, Te hnologies, Ar hite tures, andProto ols for ComputerCommuni ations, 2006.
[10℄ SaikatGuha,NeilDaswani,andRaviJain. An ExperimentalStudyofthe SkypePeer-to-Peer VoIP System. InIPTPS, Int'l Work. on Peer-to-Peer Systems,2006.
[11℄ SidathB.Handurukande,Anne-MarieKermarre ,Fabri eLeFessant, Lau-rentMassoulié,and SimonPatarin. Peersharingbehaviourin theedonkey network,andimpli ations forthedesignof server-lessle sharingsystems. InEuroSys, 2006.
[12℄ Mark Jelasity and Ozalp Babaoglu. T-man: Gossip-based overlay topol-ogy management. In ESOA, Intl'l Work. on Engineering Self-Organising Systems,2005.
[13℄ Mar -Olivier Killijian, Ludovi Courtès, and David Powell. A Surveyof CooperativeBa kup Me hanisms. Te hni alReport06472,LAAS,2006.
[14℄ Fabri e Le Fessant, Cigdem Sengul, and Anne-Marie Kermarre . Pa e-maker: Fighting Selshness in Availability-Aware Large-S ale Networks. Te hni alReportRR-6594,INRIA, 2008.
[15℄ JamesW. Mi kensandBrianD. Noble. Exploiting availabilitypredi tion in distributed systems. InNSDI, Symp. onNetworkedSystemsDesign and Implementation,2006.
[16℄ Ramses MoralesandIndranilGupta. AVMON:Optimaland s alable dis- overyof onsistentavailabilitymonitoringoverlaysfordistributedsystems. InICDCS, Int'lConf.onDistributedComputingSystems, 2007.
[17℄ Jan Sa ha, Jim Dowling, Raymond Cunningham, and René Meier. Dis- overy of stable peers in a self-organising peer-to-peer gradient topology. InDAIS,Int'lConf.onDistributedAppli ations andInteroperableSystems, 2006.
[18℄ S. Saroiu,P. KrishnaGummadi,andS.D.Gribble. Ameasurementstudy ofpeer-to-peerlesharingsystems. InMMCN, MultimediaComputingand Networking,2002.
[19℄ Daniel Stutzba h and Reza Rejaie. Understanding hurn in peer-to-peer networks. InIMC, InternetMeasurement Conf.,2006.
[20℄ Spyros Voulgaris, Daniela Gavidia, and Maarten van Steen. CYCLON: Inexpensive membership management for unstru tured P2P overlays. J. NetworkSyst. Manage., 13(2),2005.
[21℄ Spyros VoulgarisandMaarten vanSteen. An epidemi proto olfor man-aging routingtables in verylarge peer-to-peer networks. In DSOM, Int'l Work. onDistributedSystems: Operations andManagement,,2003.
[22℄ Qin Xin, Thomas S hwarz, and Ethan L. Miller. Availability in global peer-to-peer storage systems. In WDAS, Work. on Distributed Data and Stru tures,2004.
Contents
1 Introdu tion 3
2 Availability Patterns in eDonkey 4 2.1 TheeDonkeyTra e. . . 4 2.2 FilteringandPredi tion . . . 5
3 ProblemSpe i ation 5
3.1 SystemandNetworkModel . . . 5 3.2 TheProblemofDis onne tionMat hing . . . 6 3.3 TheProblemofPresen eMat hing . . . 6
4 UptimeMat hingwith Epidemi Proto ols 7
4.1 DistributedMat hing withT-Man . . . 7
4.2 Metri sforAvailabilityMat hing . . . 8
4.2.1 Dis onne tion Mat hing . . . 8
4.2.2 Presen eMat hing . . . 8
5 Simulationsand Results 8 5.1 GeneralSimulationSetup . . . 8
5.2 EvaluationofDis onne tionMat hing . . . 11
5.3 EvaluationofPresen eMat hing . . . 11
6 Dis ussionand Con lusion 11
Centre de recherche INRIA Saclay – Île-de-France
Parc Orsay Université - ZAC des Vignes
4, rue Jacques Monod - 91893 Orsay Cedex (France)
Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Universitaire - 351, cours de la Libération - 33405 Talence Cedex
Centre de recherche INRIA Grenoble – Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier
Centre de recherche INRIA Lille – Nord Europe : Parc Scientifique de la Haute Borne - 40, avenue Halley - 59650 Villeneuve d’Ascq
Centre de recherche INRIA Nancy – Grand Est : LORIA, Technopôle de Nancy-Brabois - Campus scientifique
615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex
Centre de recherche INRIA Paris – Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex
Centre de recherche INRIA Rennes – Bretagne Atlantique : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex
Centre de recherche INRIA Sophia Antipolis – Méditerranée : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex
Éditeur
INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)
http://www.inria.fr