Analyse des Systèmes Distribués par
Théorie des Jeux : Coneption et
Initation
Game Theoreti Analysis of Distributed Systems : Design and Inentives
László Toka
Thèse présentée pour obtenir legrade de doteur de Téléom ParisTeh
soutenue le 31Janvier 2011
Direteursde thèse :
Dr. Attila Vidás Dr.Pietro Mihiardi
HighSpeed NetworksLab Département Réseaux et Séurité
Département Téléommuniations et Médiainformatique Eureom
Université des Sienes Tehniques et Éonomiques de Budapest Téléom ParisTeh
2011
Cette thèse présente les aspets d'initation dessystèmes distribués où une quantité limitée de
ressoures publiques ou privées doit être répartie parmi les partiipants égoïstes et autonomes.
Notre objetif est de onevoir des méanismes qui assurent l'eaité et l'équité de l'alloa-
tion des ressoures dans tels systèmes. Nous appliquons des modèles d'utilisateurs égoïstes et
nousétudionsles résultats denos régimes proposés.Nousproposons également desalgorithmes
d'optimisation distribuésdestinésà lamiseen ÷uvredansla pratique.
Premièrement, nous iblons les servies de sauvegarde dans des systèmes pair-à-pair, 'est-
à-dire des réseaux distribués onstitués de pairs fontionnellement égaux, où les utilisateurs
sauvegardentleursdonnéessurlespériphériquesdestokagesous-utiliséesdesautresutilisateurs
àtraversd'Internet.Lesystèmeestapabledefontionneràgrandeéhellepuisqueplusd'utilisa-
teursfournissentplusd'espaedestokageetdebandepassanteenglobale.Enoutre,ladiversité
spatialeetpropriétairedeshtesdestokage assurentladisponibilitédesdonnéessauvegardées.
Toutefois, lagestiondesutilisateursqui neveulent paspartagerleurs ressoures loalesave les
autres partiipants a d'importane extrême pour maintenir unsystème opérationnel. En outre,
assurant unehautequalité de serviedansun telréseau pair-à-pairnéessiteune oneptiondu
systèmeave soin.Nos nouvellespolitiques onernant laredondane de donnéesetla séletion
des pairs rendent le servie de sauvegarde able en éhange d'une ontribution équitable des
ressouresdesutilisateurs.
Deuxièmement, nous examinons la gestion du spetre d'une façon dynamique qui permet
d'allouerdesbandesdefréquenepour lesfournisseursdeserviesanslséquentiellement.Nous
présentons notre oneption d'un système distribué sur l'alloation et la tariation ave le
but d'établir l'utilisation eae du spetre, les alloationssouples etlaompatibilité ave des
initations,ompte tenudel'interférene physiqueentrelestitulairesdefréquene. Notretravail
donne un aperçu sur les nouveaux problèmes d'optimisation liés à la répartition du spetre.
Nous proposons des solutions heuristiques à es problèmes, basées sur nos résultats d'analyse.
Nous évaluons lesystème et les algorithmes proposés ave des simulations numériques, et nous
onluons que nosheuristiques peuvent être la fondation d'un système d'alloation dynamique
distribué.
This dissertation studies inentive aspets of distributed systems in whih limited private or
publi resoures must be alloated among selshautonomi partiipants. Our goal is to design
mehanisms whih ensure theeieny and fairness of resourealloation insuh systems. We
employ selsh user models and we investigate the results of our proposed shemes. We also
design distributedoptimization algorithmsintended for pratial implementation.
First, we target bakup servies in peer-to-peer systems, i.e., distributed networks of fun-
tionallyequal peers,where users save their bakup data on theunderutilized storagedevies of
oneanotherovertheInternet. Asamainharateristi,nosalabilityproblemsarisesine more
users provide larger overall storage spae and bandwidth. Furthermore, spatial and ownership
diversityofstoragehostsassuretheavailabilityofbakedupdata. However,themanagement of
usersnot willingto sharetheir loalresoures withother partiipantsisextremely important to
keep thesystem operational. Moreover, ensuring highquality of servie in suh a peer-to-peer
network requires areful systemdesign. Our novel data redundany and peerseletion poliies
provide reliable bakupservieinreturnfor fair resoureontribution oftheusers.
Seond, we examine the potential of a dynami spetrummanagement framework that en-
ables sequential alloation of frequeny bands for wireless servie providers. We present our
distributed system design on alloation and priing with the goal of ahieving eient spe-
trum utilization, exible alloations and inentive-ompatibility, onsidering physial interfer-
eneamongfrequeny liensees. Ourwork providesinsightson emergingoptimization problems
related to the alloation. We suggest heuristi solutions to these problems based on our ana-
lytiresults. We evaluatethe proposedframework andalgorithmsnumerially,andwe onlude
thatourproposedheuristisan be theornerstonesofa exibledistributeddynamialloation
system.
A jelen értekezés olyan elosztott rendszerek ösztönzõ kérdéseit vizsgálja, amelyekben korláto-
zott magán vagy közös erõforrásokat kell szétosztani önzõ autonóm résztvevõk között. Célunk
olyan módszerek megtervezése, amelyek biztosítják a hatékony és méltányos erõforrás-elosztást
ilyen rendszerekben. Az önzõ felhasználók leírására modelleket építünk és megvizsgáljuk a ja-
vasolt rendszerek teljesítményét. Továbbá olyan elosztott algoritmusokat tervezünk, amelyeket
gyakorlati megvalósításban használnilehet.
Elõször biztonsági adatmentõ szolgáltatásokat vizsgálunk egyenrangú rendszerekben, azaz
olyanelosztotthálózatokban,amelyekbenafelekfunkionálisszerepemegegyezik. Ilyenrendsze-
rekbenafelhasználókegymáskihasználatlantároló eszközeirementik azadataikat azInterneten
keresztül. A szolgáltatás fõ jellemzõje az, hogy nem merülnek fel méretezhetõségi problémák,
ugyanis több felhasználó nagyobb tárhelyet és sávszélességet nyújt összességében. Továbbá, a
tároló eszközök területi és tulajdonosi sokrétûsége biztosítja az elmentett adatok rendelkezésre
állását. Ugyanakkor azonfelhasználókkezeléserendkívülfontosarendszermûködõképességének
biztosításának szempontjából, amelyeknem hajlandóak megosztani helyi erõforrásaikat a többi
résztvevõvel. Emellett a szolgáltatás színvonalának magasan tartása az egyenrangú hálózatok-
ban gondos rendszertervezést igényel. Az újszerû adat-redundania éstároló partner-választási
szabályaink megbízható biztonsági adatmentõ szolgáltatástbiztosítanaka felhasználókszámára
méltányoserõforrás-hozzájárulás fejében.
Ezután egy dinamikus spektrumgazdálkodás megvalósíthatóságát vizsgáljuk meg, amely le-
hetõvé teszifrekveniasávokvezetéknélküliszolgáltatók közötti,egymást követõkiosztását. Be-
mutatjukazáltalunkjavasoltfoglalásiésárazásikeretrendszert, amelyarádióspektrumhatékony
felhasználását,rugalmaskiosztásátésösztönzõkmegvalósításátélozza,azugyanazonfrekveniát
használókközöttizikaizavarástgyelembevéve. Munkánkeredményeibetekintéstnyújtanakaz
elosztássalkapsolatosanfelmerülõ optimalizálási problémák nehézségébe. Ezekkiküszöbölésére
heurisztikusmegközelítéseketjavasolunk,amely megoldásoka problémák analitikus vizsgálatán
alapulnak. Numerikusszimuláiókkal értékeljük ajavasolt algoritmusokat, ésarraa következte-
tésre jutunk, hogy az adott heurisztikák egy rugalmas, elosztott dinamikus allokáiós rendszer
sarokköveit képezhetik.
I wishto express mygratitude to myadvisors Matteo Dell'Amio, Pietro Mihiardi and Attila
Vidásfor their guidane. I amalso thankfulfor the supportof myolleagues at Eureom and
at theHighSpeed NetworksLab.
1 Introdution 1
1.1 Researhbakground . . . 1
1.1.1 Related work . . . 1
1.1.2 Researhgoals . . . 2
1.1.3 Motivation . . . 2
1.2 Methodology . . . 3
1.2.1 Gametheory . . . 3
1.2.2 Inentive mehanismdesign . . . 4
1.2.3 MathingTheory . . . 5
1.3 Outline ofthedissertation . . . 5
I Peer-to-Peer Bakup System 7 2 Introdution 9 2.1 Onlinedatastorageservies . . . 9
2.2 Bakupversus storageina P2P system. . . 10
2.3 Fous ofthework . . . 11
3 Related work 15 3.1 Dataredundany inP2P storage . . . 15
3.1.1 Repliation . . . 16
3.1.2 Erasure oding . . . 16
3.1.3 Maintainingredundany . . . 17
3.1.4 Feasibility ofP2P storage . . . 18
3.2 Dataplaement . . . 19
3.2.1 Central or distributedmapping . . . 19
3.2.2 Peer monitoring. . . 20
3.2.3 Fairness . . . 21
3.3.1 Data transfers. . . 23
3.3.2 Seurity . . . 23
4 System design 25 4.1 Data bakup andretrieval . . . 25
4.2 Redundany sheme . . . 27
4.2.1 Data struture . . . 27
4.2.2 Adaptive redundany rate . . . 28
4.2.3 Redundany maintenane sheme . . . 30
4.2.4 Assistedrepairs . . . 31
4.3 Grouping peers bydesign . . . 31
4.4 Assistedbakup. . . 33
4.4.1 Data enter storage. . . 34
4.4.2 Data plaement duringbakup . . . 34
5 User-driven peerseletion 39 5.1 Seletion basedon grades . . . 39
5.2 User satisfation . . . 40
5.3 The exhange game . . . 42
5.4 Stable xturesproblem. . . 43
5.5 Stable stratiation . . . 46
5.6 Grade improvement . . . 48
6 Sheduling data transfers 51 6.1 Sheduling problemwithfull information. . . 52
6.2 Randomsheduling withoutfull information . . . 54
6.3 Evaluationof randomsheduling . . . 56
7 Evaluation of the system design 59 7.1 Simulated usersettings . . . 59
7.2 Fixed andadaptive redundany rates . . . 62
7.2.1 Prompt dataavailabilityand TTR . . . 62
7.2.2 Adaptive redundany ratesheme. . . 63
7.2.3 Data lossresults . . . 65
7.3 Evaluationof agrouped P2P system . . . 68
7.4 Evaluationof sheduling poliies . . . 71
7.5 Choieof parameters . . . 74
7.5.1 Fragment size . . . 74
7.5.2 Simulated user parameters . . . 77
8 Conlusions 79 8.1 Reliability . . . 79
8.2 Fairness . . . 80
8.3 Eieny . . . 81
8.4 Perspetives . . . 82
II Distributed Dynami Spetrum Alloation 85 9 Introdution 87 9.1 Stati versus dynamispetrum alloation . . . 87
9.2 Fous ofthe work . . . 88
10Related work 91 10.1 Central alloation . . . 91
10.2 Distributedalloation . . . 92
10.3 Spetrumautions . . . 92
10.4 Seondaryspetrumusage . . . 93
11System model 95 11.1 Distributedspetrumalloation model . . . 95
11.1.1 Nodedesription . . . 96
11.1.2 Interferene model . . . 97
11.1.3 Distributedalloation one-way exlusion . . . 98
11.2 Priingdiretives . . . 99
11.2.1 Seond-prie autions . . . 100
11.2.2 Utility-based priingand rationality . . . 101
11.2.3 Inentive ompatibility . . . 103
11.2.4 Fairness andeieny . . . 103
11.3 Nodeexlusionstrategies andtheir onsequenes . . . 104
11.3.1 Nodeexlusionproblem . . . 104
11.3.2 Insights about exlusions inasimpliedsenario . . . 106
11.3.3 Thesaturation of afrequeny slot . . . 108
11.4.1 The frequenyband seletion and node-exlusion algorithm . . . 111
11.4.2 Optimization heuristis . . . 113
11.4.3 Implementedheuristi algorithms . . . 114
12 Evaluation 115 12.1 Simulation setting . . . 115
12.2 Evaluationmetris . . . 117
12.3 Simulation results. . . 119
13 Conlusions and perspetives 123 14 Summary 125 14.1 The designand analysisof aP2P bakupsystem . . . 125
14.2 Distributed dynamispetrumalloation . . . 126
Bibliography 126 A Synthèse en Français 141 A.1 Introdution . . . 141
A.1.1 Le ontexte . . . 141
A.1.2 Motivation . . . 142
A.2 Lesbuts . . . 142
A.3 Méthodologie . . . 143
A.4 Résultats . . . 143
A.4.1 La oneptiond'unsystèmede sauvegarde àP2P . . . 143
A.4.2 La séletiondespairs parl'utilisateur dansun systèmedesauvegardeàP2P153 A.4.3 Alloation dynamiquedistribuée du spetre . . . 155
A.5 Appliation desrésultats . . . 160
6.1 Maximumow problemformulation ofdata transfer sheduling . . . 54
6.2 Bakupandretrievalineienywithsynthetionlinephasesyielding
0.36
average availability. . . 566.3 Bakupandretrievalineienywithorrelatedonline phasesyielding
0.36
aver- age availability . . . 577.1 Number ofonline peers . . . 60
7.2 Peer behaviorinputs . . . 60
7.3 Peer onnetivityinputs . . . 61
7.4 Onlineredundany withxed-rate . . . 63
7.5 Analysisof adaptive-, andxed-rate redundany shemes . . . 64
7.6 Fixedand adaptive redundany shemes . . . 65
7.7 Redundanyrates and datalosses . . . 66
7.8 Fatalfration ofpeerrasheswithadaptive-rates (top) andxed-rates(bottom), losseventsarelassedaording to Table7.1 . . . 68
7.9 Theeetsof assistedrepairs . . . 69
7.10 Distribution ofgrades . . . 69
7.11 Fairness ingrouped peerseletion . . . 70
7.12 Costoffairness ingrouped peerseletion . . . 71
7.13 Uplinkunderutilization. . . 72
7.14 Benetsof assistedbakup. . . 73
7.15 Dataenter involvement indierent dataplaement shemes. . . 74
7.16 Dataenter tra . . . 75
7.17 Redundanyrate distribution withdierent fragment sizes . . . 75
7.18 Fragment size analysis . . . 76
11.1 Sets
F f,C i f,D f i and E i f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
D f i and E i f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
12.1 Nodesintheseondsenario . . . 118
12.3 Authorityinomeandnodelifetimeswithdierentfrequenybandseletionstrate-
gies intheseondsenario . . . 122
A.1 L'appliation desauvegarde, en oursd'exéutionsur l'ordinateurde l'utilisateur
onneté àl'Internet,stoke lesdonnéesdel'utilisateur entouteséuritéen diu-
sant desopies surd'autresordinateurs partiipants.Après une pertede données
loales,lelogiielréupèrelesdonnéesdemandéesdepuislespartenairesdestokage.144
7.1 Datalosstypes . . . 67
12.1 Nodeparameters inthe simplied senario . . . 117
12.2 Tehnology oupling between nodes . . . 118
12.3 Transmission powerand interferene toleranethresholds
mW
kHz
. . . 118
12.4 Nodeutilities pertype . . . 118
Introdution
Our researh work aims at designing distributed systems withfous on fairness and inentives
inresoure provisioning. Thedissertation onsistsgame theoreti models of peer-to-peer(P2P,
distributednetworkoffuntionallyequalpeers)bakupanddeentralizedradiospetrumsharing
solutions. The inherent selsh behavior of partiipants is mitigated by well-suited inentive
shemes. Analyti and numeri resultsreet theperformane evaluationoftheadvised system
design frameworks.
1.1 Researh bakground
Nowadays,moreandmoreinformation tehnology serviesandappliationsturntotheemploy-
ment of the distributed paradigm beause of salability issues. Self-organizing and distributed
systemsappearineverydomainofteleommuniations;thesesystems,howeverdierentinmost
of the tehnial aspets, reet similar inentive issues. Besides the extended researh work
on tehnial solutions, the related eonomi harateristis have also been takled in growing
measurereently.
1.1.1 Related work
Many distributed servies urrently rely upon altruisti behavior from their users. The phe-
nomenon of selsh individuals who opt out of a voluntary ontribution to the ommon welfare
of the group has been widely studied, and is known as the free-rider problem. For example,
unontrolledor exessivefree-riding ina P2Plesharing systemleads tonetworkongestion at
some hot-spot peersand to thedegradation of systemperformane; this phenomenon is indeed
a real issue inP2P networks in general. It is thus important to design some mehanisms that
enourage peers to ontributeresoures andredue free-ridingbehaviorindistributedsystems.
A vast researh literature takles the aforementioned problems and propose solutions for
distributed systems, suh as network aess sharing [96,97℄, P2P le sharing [11,29℄, network
routing [50℄, paket forwarding in ad-ho networks [10,23,152℄, spetrum alloation [79℄, P2P
storageandbakup[11,33,95,130,139℄, network ontentahing[85,86℄, andnetworkformation
[30,49,84℄. Designinginentive-ompatible systemframeworksisonsideredto beahot topiin
theresearhommunity.
1.1.2 Researh goals
Our researh aims at building speially tailored design for two dierent types of distributed
systems: P2Pbakupanddistributedradiofrequenyalloationsystems. Inboth asesthenon-
ooperative selshbehaviorofpartiipantsan jeopardizetheoperation. Basedonuser models,
our goalistodesignoptimaleonomiinentivesolutions thatensuredesirablequalityofservie
byfosteringooperationamongusers or bydistributingshared resoures eiently. IntheP2P
bakupsystemweintrodueabarter-basedsheme,for spetrumalloationweimposemonetary
ows between users and the authority. We evaluate the proposed inentive shemes from the
user andfrom thesystemperspetives, throughboth analyti andnumeriinvestigations.
Inorderto reahour goals,
•
we modelthe rationalbehaviorofselshpartiipantsintheinvestigatedsystems,aount-ing for user benets in termsof appliation performane, user ost of resouresharing (if
appliable), andthe heterogeneityof userharateristis, relevant for theservie,e.g., the
heterogeneity ofshared userresoures, interferene relations among users;
•
based onthemodels,webuild systemdesignswithinentivesolutions;•
in order to analyze the system models and the proposed inentive shemes, we utilize abroadlassofanalytitools,e.g.,mathingtheorymodelsfortheP2Pbakupsystem,and
aution theoryfor the alloation ofradio spetrum;
•
we deompose the appeared optimization problems and develop distributedalgorithms tosolve them;
•
to approve the theoreti models, we perform simulations as proofs of onept, and weprovide numeriperformaneevaluationof the proposedsolutions.
We strive to reate potential implementation solutions of pratial, feasible and salable
appliations. The frameworks to be designed with embedded inentive shemes and analyzed
aordingtotheassumedselshuserbehavior,mustensurefavorableoutomesinrobustsystems.
1.1.3 Motivation
The motivation that gives ground to this researh work is the lak of reasonable inentives
suh a system sets bak the possibility to reah a soially optimal outome in the funtioning.
Appliation-spei shemes ouldenhanethewaythese distributedsystems work.
Therstpartofthethesisaimsarelevanteldofresearh: thereisgrowingneedforseamless,
seure,reliable and easily aessibleonline bakup asthedailyusedeletronidevies aremore
andmoreintegratedintotheInternetandtheinreasingtransmissionratesmakepossiblemoving
large amount of data. Sine instead of publi or entral resoures users exploit those of one
another,resoureprovisioning requiresa well-suited inentive sheme.
Theseondpartofthethesistargetsradiospetrumalloation,studyingaution-basedman-
agementshemesinadistributedfashion. Themainmotivation behindtheapproah weinvesti-
gateisthatthesequeneofentralautionstorealloatepubliresouresshouldbetransformed
into a more salable framework. We let the partiipants trade the aquired resoures among
themselvesina distributeddesign without theintervention of aentral autioneer.
1.2 Methodology
Game theory oers a tool-set for modeling individual user preferenes, strategies, osts and
valuationsindistributedsystems. We employ graphtheoryand mathing theorytoanalyze the
inentive mehanisms that we propose. Furthermore, we perform numerial evaluations with
simulations, written inMATLAB.
1.2.1 Game theory
Distributedsystemsonsist ofautonomipartiipantsand limitedpublior privateresoures to
bedistributedamongthem. Itseemsreasonabletoassumethateveryuserisselsh,i.e.,sensitive
onlyto thequalityofthe experienedservie,regardlessoftheeetsofitsationson theother
users. On the other hand, the quality of servie a user reeives depends on the generosity of
other users: eah user benets either from the shared apaity of others or its share of the
publi resoures. The framework of game theory[56,69,105,122℄ is thereforepartiularly well-
suitedto studythis kindofsituation asanon-ooperativegameplayed amongusers. Analytial
investigations takle user behavior, theexistene of best-response strategies, Nash-equilibrium,
Pareto-optimal outomes, inentives, and soialwelfare.
Sine in premature system designs there is no diret inentive to oer own apaity to the
others orto fairlysharetheommon wealth,users aremotivatedtofree-ride [53℄. Ifthesharing
eorts do not get some kind of proof of appreiation, nobody has interest to ooperate and
in extreme ase the distributed servie fails to exist. Therefore it is neessary to takle the
eonomiaspetsofsuhsystemsandto properlydesign inentive mehanismsfor theserviein
question [12,31,52℄.
ore of game theory that attempts to mathematially apture behavior in strategi situations,
in whih the suess of an individual inmaking hoies dependson thehoies of others. The
rational users hange their behavior to maximize their own benet when taking part in the
system. In reality, there maybeother typesof individuals, e.g., altruisti or maliious, but the
predominant majorityisassumed to behave strategially.
The ombination of universal ooperation leading to optimal overall utility, an individual
inentivetodefet,andrationalbehaviorprovidestheessentialtensionthatresultsinthetragedy
oftheommons[82℄ withoutproperlydesigned inentive shemes. Designingthis latterrequires
omplete knowledge about the partiipants, moreover to attain optimality, eah peer should
be oered a personalized sheme, and where neessary, enfore (or make payments to ensure)
partiipation. None of these onditions are likely to be ahievable in a pratial distributed
system,where not onlythepreferenes oftheindividualpeersbut even theiridentities might be
unknown [12℄.
As far as eonomi eieny is onerned, besides the lak of information onerning the
identities and preferenes of the individual partiipants, whih isrequired for the omputation
oftheoptimalalloationofresoures andostinadistributedsystem,anadditionalhallenging
issueistheompliatedeonomimodelingofindividuals. Inseveralases,theexisteneofexter-
nalitiesmakestheadoptionofafree-marketapproahusingmarket-denedpriesineient[12℄.
Also, dierent tehnial onstraints limit the design variety of many distributed systems, e.g.,
urreny-based inentive shemes are hard to implement in urrent le sharing P2P networks.
In addition to the hallenging eonomi resoure alloation problem, a system designerusually
hasto dealwiththe inability torely ontrustedsoftwareor onentral entities thatan monitor
and aount for peer transations to ensure that they ontribute and onsume the amount of
resoures ditatedbyanunderlying inentive model.
1.2.2 Inentive mehanism design
Inentive shemes must be deployed to align selsh partiipant behavior with the goals of the
system design, e.g., in P2P systems it is essential that peers be ompliant with the protool
speiation [64,82,102,148℄. However, modeling of the eonomi transations arried out in
a distributed systemis, in general, a very omplex task. The main reason is that partiipants
should ontribute dierent types of resoures (money, power, bandwidth, storage, CPU yles,
ontent,et.) withvariousharateristis, theprovisionofwhihgeneratesomplex oststaking
intoaountthetimespentusingthesystemandinsomeasesextremeomponentssuhaslegal
risks[12,13℄. Earlyworks[54,64,139℄madeeortstomodeltheutilitiesandostsassoiatedwith
thepartiipationinaP2Plesharingsystemusinggametheoretianalysis: theyanalyzethefree-
riding problemand theequilibrium ofuserstrategies underseveral miro-payment mehanisms,
Monetary-basedinentivemehanismsarewidelystudiedineveryareaofdistributedsystems.
Curreny has a well-dened uniform valuation for every partiipant and supports exibility in
termsofthetimeandtheamountofounter-ontributionforanygivenresoure. InP2Psystems
thepredominant payment shemeapproahesbelong tomiro-payment solutionswithlearane
infrastruture or to redit-based systems. When publily available limited resoure must be
alloatedon a setof individuals,autions seem to be the most suitable frameworks. Extensive
researh targetsa wholerangeof suhsystems [79,96℄.
When a given distributedsystem exludes the possibility of applying monetary tools inor-
der to provide inentives for the partiipants, then barter-based solutions arise. Collaboration
amongpeers isthen motivatedbyresoure exhanges, tit-for-tat strategies [11,29,101℄, reputa-
tionsystems[65,82,104℄,penaltypoliies[102℄. However,inthesemodels,asaresultofexluding
monetarymeans,determiningtheobjetivevalueofresoures thataresubjettobarterappears
asanadditional issue.
Distributed Algorithmi Mehanism Design (DAMD) aims to reate inentives for Internet
appliations. In an ideal systemdesign selsh nodes would maximize theommon welfare [12℄.
Ifnoentralized authoritywithtotal knowledgean makedeisions about thesystemresoures,
buildinganinentiveshemebeomesaDAMDproblem,ombiningomputersienewithinen-
tiveompatible mehanismdesign ofeonomis. DAMDprovidesa usefulframeworkto enfore
theproperprovision ofresoures inadistributed system[51℄.
1.2.3 Mathing Theory
Mathingtheory[58,72,73℄,aeldofombinatorialoptimization,providesusefultoolstoanalyze,
amongseveral otherpossibletargets, e.g., peerseletion inP2P lesharing [57,88℄systems.
1.3 Outline of the dissertation
The rst part of the dissertation fouses on P2P systems for bakup. Users partiipate in the
systemoperation byoeringprivateresoures (suh asstorage, online time,bandwidth) for the
benet ofthe ommunity withtheultimate goalof improving appliation performane.
Westudy important design optionsof dataredundany,datamaintenaneand datatransfer
overthe network in Chapter4. We suggest a new proedure for determining dataredundany,
and we evaluate its performane ompared to urrently known tehniques in Chapter 7. For
theiromparison,wedenenovelmetristhatdesribethequalityofservie,e.g.,thedurationof
arhivingdataandofreoveryproesses,theprobabilityofbakuploss. Determiningredundany
is based on the time required to retrieve the baked up data and it guarantees high servie
quality levels while signiantly reduing the applied redundany, thus storage and bandwidth
We show the settings that provide high quality of servie for bakup purposes and, in the
meantime, require the least possible shared resoures. However, a system with low number of
users might notbeableto guaranteetheappropriate qualityofserviebasedonly ontheshared
userresoures. Therefore,we examinetheeetsofintroduingaentral storageserverinorder
to avoidsuhsituations: weshowthe ostimpliationsof persistent qualityguarantees. Insuh
a hybrid systemthe entral, highlyavailableservermight beusedto store data inexhange for
thereimbursement ofosts. Furthermore,itanalsobeusedtorestoredatafromtheremaining
opies when bakup is lost on peers that leave the system. In Chapter 7 we arrive at the
onlusion thatrelatively lowourringostsofsuh ahybrid systemsigniantly improvesthe
qualityof servie.
Sinetheextent ofuser ontributions aetsthesystem,an inentivesshemeisrequired in
order to maintain the servie: partiipating users must share diskspae, bandwidth and online
time, onneted to the Internet. In Chapter 5we present asymmetri peer seletion sheme in
whih users have the ability to selshly selet remote peers they want to exhange data with.
Peerharateristis(e.g.,onlineavailability,dediatedbandwidth)playanimportantroleandare
reeted inthe modelthroughasingleparameter, termedgrade. Weshowthatseleting remote
peers selshly,based on their proles, reates inentives for users to improve their ontribution
to thesystem.
InChapter 6,we show an eient algorithm to ompute theoptimal data transfershedul-
ing solution for hypothetial ases where future peer uptimes are known. We use the optimal
results to evaluatethe applied sheduling poliy,and we proposepratial settings inwhih the
performaneof random deisions islose to optimal.
The seond partof thedissertation investigates the possibility of alloating radio spetrum
amongmultiple appliantsdynamiallyinadistributedmanner. Thedistributeddynamispe-
trumalloationframework,iffrequenyleasersareoordinatedbyawell-suitedsheme,provides
eientmethodsforthealloationofthesareunderlyingresoures,withrespettothegeneral
interferene onditions.
In Chapter 11, we present a model and the related framework of alloation and priing
thatoersa distributedmehanismdesign,adapted topratial employment issues. Our model
handles interferene eets without anyrestriting assumptions, and our framework issalable
and inentive-ompatible. Weprovidebothanalytial andnumerialevaluation(inChapter12)
of theproposed framework, and ineitherase we prove this latter to be a suitable approah to
eient and exible spetrumutilization.
Peer-to-Peer Bakup System
Introdution
Nowadays, aumulated information results in large, and ontinuously inreasing amount of
digital data, a signiant part of whih is personal and annot be reprodued easily, therefore
theirsafestorageisessential. Numerousmethodsandappliationsareavailabletosimultaneously
produe and bakup opies, but the diulty and high osts of the arhiving proess prelude
most users to provide safetyto their data.
We study online bakup servies that oer safe storage of personal data for users with an
Internet onnetion. We suggestan innovative and viable approah to realize this servievia a
P2P arhiteture whih exploits user resoures (storage, online time, bandwidth) with thegoal
ofdereasingtheusualostof baking updataonline. Simulations showthatsuhasystemhas
theperformane and the reliabilitysimilarto those ofa ostlyentral storageserver.
The reliability of a bakup servie an be established to dierent extents. If the system
providesdata availability,thenit meansthatthedata is aessibleonline whenevera requestis
issued. This implies that any arbitrary part of the data an be fethed promptly by the data
owner, thereforeguaranteeing low aesslateny. Ontheother hand,ifthedurability ofdatais
ensured, the datais safely stored inthe system aslong as the servieis required, but it is not
neessarily always aessible, at least not all parts of it. A reliable system reovers the stored
datadespitefailures,without violatingthelevelof dataaessibility itguarantees.
2.1 Online data storage servies
Theadvent ofloudomputingasanewparadigmtoenableservieproviderswiththeabilityto
deployost-eetive solutionshasfavoredthe development of arangeof newservies,inluding
online storage appliations. Due to the eonomy of sale of loud-based storage servies, the
ostsinurredbyend-usersto hand overtheir datato a remotestorageloation intheInternet
havedereasedsigniantly. Furthermore,theproessofbakingupuserdataonlineareusually
performed automatially by a lient appliation: in ontrast to on-site bakup solutions, user
interation isminimal, and inase ofdatalossdue toan aident,restoring theoriginal data is
a seamlessoperation.
Commerialonline data storageservies[2,3,6,7,147℄ an bebroken down into those based
solely on apital-intensive and energy-onsuming [5,84℄ serverfarms [2,3℄ and those embraing
theP2P paradigm[6,7,147℄.
AmazonS3[2℄isbasedonlargeparksofommodityhardware(i.e.,adataenter)runninga
ustom-built distributeddatastruturedisussedin[37℄. Dataavailabilityanddurabilitydonot
ome for free: end users areompelled to payfor theamount ofspae they oupyin thedata
enter and the amount of tra their ontents generate. Furthermore, loud-based approahes
an besubjetto failures,asreportedin[1℄. Manyompanies basetheir online storageservies
(e.g., Dropbox [3℄)on the Amazon loudbybuildinga user-friendlyinterfae to it.
The entralized omponent that ensures storage for users of Wuala [7℄ is omplemented by
storage apaity at all available peers takingpart to the appliation. Generally, data is plaed
both on servers and on a P2P network with a predetermined amount of redundany. Servers
oordinate the dataplaement by seleting storage nodes: users must oer an amount of loal
spae inversely proportional to their average online availability,and a minimal dediatedband-
widthapaity[65℄. Thesame serversareinvolvedinonstantlyheking thattheseonstraints
aresatised.
Thelong-termstorageostofonlineservies,whiharepartiularly dominantintheontext
of bakupappliations, mayeasily go pastthatofthetraditionaloine solutions. Additionally,
while data availability is a key feature that large-sale data-enters deployments guarantee, its
durability isquestionable [147℄, as reported in[140℄. A P2P storage systemis a viablealterna-
tive to loud-based solutions: user ommodity storage devies are shared (together with some
bandwidth resoures)with a numberof remoteusers to form adistributed storagesystem that
is resilient to loalfailures.
2.2 Bakup versus storage in a P2P system
General purpose online storage systems optimize lateny to individual le aess, sine users
upload their datato the system asa replaement of a loal hard drive. Maintaining high data
availability inaP2Pstoragesystemputshighburdenontheusersduetotheintermittentonline
appearanes of storingpeers. The online behavior ofusers is unpreditable and, at large sale,
rashes and failures arethe norm rather than the exeption. As a onsequene, a P2P storage
appliation stores large amounts of redundant data to ope with suh unfavorable events, i.e.,
storagespae issaried for lowaesslateny.
Despitetheenormous amount ofwork arriedout intheresearh of P2Pstorage duringthe
thisisthatafavorabletrade-obetweentheperformaneandtheresourerequirementofsuha
servieishard tond. AsBlakeetal. arguein[20℄,itisprohibitiveto haveenough redundany
to keep all data available at all times, although without this, theperformane is questionable.
Thehighdemandofpeerresouresmight beunsatisablebytheontributionofthepartiipant
peers, even ifwell-suited inentivesareinplae to enourage themto share.
Onthe otherhand, anonline bakup servie,asa partiular ase of online storage, involves
thebulk transfer ofpotentially large quantities of data, both during regular data bakups, and
espeially in ase of retrievals. As a onsequene, short bakup and retrieve times are more
importantgoals to ahieve thanlowaesslateny. Furthermore,asthebakupservieassumes
aloalopyofthedataattheuser,retrievingthebakupisrequiredonlywhentheloallystored
opyislost. At theserather rareevents, aesslateny to speiparts ofthebakupis simply
not anissuesine thebulk tra neededto restorethe wholeofit wouldneed hoursanyway.
Rather than aiming for another general purpose P2P storage system, we therefore fous
speiallyonasimpler toimplementandyetveryusefulappliation: baking uptheimportant
dataof users. Given theabove onsiderations, our bakupsystemdesign optimizes bakup and
retrievaltimes, whileguaranteeing datadurability,i.e.,ensuringthatthelossofbakupremains
anunlikelyevent.
We show in the thesis that providing data durability is less diult to ahieve, in terms
of required peer resoures, than data availability. We present the drasti redution of data
redundany, still manageable to ensure the above quality of servie guarantees. Furthermore,
we demonstrate the possibility of avoiding separate design layers that provide user inentives:
our appliation fosters ooperation among peers by design. Moreover, we propose a hybrid
design with a data enter that improves the system performane to the same level of a ostly
entralized servie, even if our optimizations and inentives fail to ensure the neessary peer
resoures. Strivingtorelieveusers frompayments,our datatransfershedulingpoliieskeep the
bandwidthand storageostsofthe dataenter at aminimum.
2.3 Fous of the work
Thethesis presents our workon online databakupsystemsinwhih usersstore opies of their
lesfor long-term. Weproposeasystemdesign thatinvolvesuseredgedeviesthatonfederate
by pooling their loal resoures in a P2P network, and a entral storage faility, i.e., a data
enter. Wedeneasetofperformanemetristhatdesribeimportant qualityofservieaspets
(e.g., data durability, bakup and retrieve time). We analyze dierent design hoies of data
redundany,data plaementanddatatransfershedulingbyevaluatingthesystemperformane.
WereviewrelatedworksinChapter3,organized aordingto thestrutureofthethesis. We
and/ororthogonal to our ontributions,but important for aomplete systemdesign.
In Chapter 4,we overview our system design and disuss its key omponents. We desribe
indetailourappliation senario,and weshowwhytheassumptionsunderlyingabakupappli-
ation an simplify manyproblemsaddressed intheliterature.
In order to maintain the durability of baked up data, the degree of redundany, i.e., the
amountofadditionaldataintheP2Psystemthatguaranteesabakupoperationtobeonsidered
omplete and safe, mustbehosen wisely. We present a novelredundany poliy inSetion 4.2
that, rather than fousing on short-term dataavailability, targetsshort data retrieve times. As
suh, our method alleviates the storage burden of large amounts of redundant data on lient
mahines.
In order to enhane the performane of the system, we propose to employ a data enter to
omplement the resoures oered by peers, if they are not suient. For example, deteting a
faultyexternal hard-drive may not be immediate, or obtaining a new piee of equipment upon
a rash may require some time; in these ases an assisted approah to repair data redundany
on peers, whih involves a loud-based storage servie, an signiantly redue the probability
of datalossat anaordable ost. Similarly,we show thatan assistedbakupproess mitigates
thenegative eets oflowpeerquality,dereasing thedata lossprobability alongwiththetime
to bakup. The suggested assisted poliies ooad the dataenter bytransferring bakup data
to peers assoon aspossible,thereby minimizing data enter osts(i.e., storage and bandwidth
burden) whilesustaining the qualityof thebakup servie.
In Chapter 5, we fous on the peer seletion proess, during whih peers hoose where to
plae fragments of bakup datathey need to store. A global ranking is built among the peers
intermsofthequality ofstoragespae, representingthe onlineavailabilityandbandwidth,they
oer to theP2P system. We modelthe proess asa game inwhih users selshly optimize the
utilityandost theybearfor joining thesystembyadjustingtheir sharedstorageand qualities.
We present theobjetive funtion that drives thebehaviorof peers inthe game, and we study
thealgorithmi perspetiveof theunoordinatedpeer seletion proess.
We showthat the game reahes an equilibriuminwhih thesystemis stratied: peers with
similar harateristis ooperate bybuildingbi-lateral linksthatareusedtoexhange andstore
data. As peers improve their ontributions, the servie they reeive from one another beomes
less ostly, thus, the onsequene of system stratiation is a natural inentive for peers to
improve thequalityof resouresthey oerto other peers.
We further disuss the sheduling poliies, i.e., deiding how to alloate data transfers be-
tween peers. InChapter6 we formalize the problemof exhanging multiple pieesof datawith
intermittently available peers during uploading and retrieving operations with full knowledge
of future peer uptime, and we show that it an be solved in polynomial time by reduing it
prove that a randomized approah to sheduling ompletes transfers nearly optimally interms
ofduration aslong asthe systemissuiently large.
In Chapter 7 we orroborate our theoretial ndings with simulations, driven by real avail-
ability traes from an instant messaging appliation where on-line appearanes of users show
daily and weekly patterns. Extensive numerial simulations, with a distributed algorithm for
the peer seletion proess, show that our tehniques are eetive in suh a realisti senario
withheterogeneous peeravailabilities and bandwidths. With thesimulationof a omplete P2P
bakupsystemweshowthatour proposeddesignisviableinpratialsenariosandweillustrate
its benets intermsof inreasedperformaneompared toother systemdesigns.
Our ontribution, disussed indetailinChapter8,is three-fold:
•
wepresent aP2Pbakupsystemdesigninwhihdataredundanyanditsmaintenane areadaptedto the requirementsof abakupserviedriven byour novelperformanemetris,
resulting inasigniant dereaseof resoureusageompared to general purposestorage;
•
through user-driven peer seletion, we introdue embedded inentives for peers to sharetheir loalresoures and weshowthe stableonguration ofsuh asystem;
•
we show the time and bandwidth onstraints of data transfers between peers in a P2Psystem,andwe suggesttodiminish thoseineienies fora lowostbyintegratingadata
enterstorageservieinthesystem.
Related work
Online data storage solutions, allowing users to store and aess their data from any point
on the Internet, are ommerially popular produts. These appliations have reeived muh
attention from the researh ommunity that studied their manyfaets. In partiular, researh
on distributedP2P storage appliations hasproliferated intheliterature. The omplete design
of suh systems requires onsidering several problems. Here, we fous on those works that are
loselyrelated to our ontributions, but also give an overviewon topis that arenot addressed
inthethesis.
The hallenge in thedesign of P2P data storage systems is that a reliable servie must be
built on many unreliable peers. The emphasis is on the high number of partiipants, beause
theunderlying oneptexploits thediversity of peer harateristis. Indeed, thedatastored in
thesystemremains available onlineifitissatteredonsomanypeersthateven ifmost ofthem
are temporarily disonneted, there are enough peers onneted to the Internet to serve any
data read request. Similarly, the durability of storage is ensured if permanent loss, deletion or
orruptionofdataonpeersanbeorretedbymaintainingthespreaddatawithdierentrepair
tehniques. Theavailabilityanddurabilityofstoreddataareahievableonlyifeventsthatmake
data parts unavailable or lost are not positively orrelated. In this ase a system ontaining
numerous peers to storedata partsan provide both featuresbythelaw oflarge numbers.
In thefollowing setions we revisittheolletion [44℄ ofmajor ideas thattakle theamount
of dataredundany and its maintenane, theplaement of dataparts onpeers,and therelated
dataindexing, seurity andinentive issues.
3.1 Data redundany in P2P storage
During the last deade, researh foused on the design of general-purpose P2P storage that
provides features of traditional le systems. Therefore, a signiant amount of work has been
devoted to implementing systems with low lateny, the most diult task when building dis-
tributed storage. Numerous solutions emerged, all proposing to add some redundany to the
stored data, but withdierent methods and extents depending on the design perspetives. We
give abriefoverviewofexistingredundany shemes, wesummarize thetehniquesthatareap-
plied to maintain theredundany,thenwe argueonthedierenesbetween storageandbakup
systems.
3.1.1 Repliation
The simplest redundany sheme stores opies of data on dierent peers. If
r
replias of thesame leareplaed on
r
dierentpeers,theleisavailableeven ifr − 1
ofthem areoine. Itssimpliitymadethis shemewide-spreadintherstP2Pstoragesystemdesigns, e.g.,PAST[43℄
adopts the repliation of les, CFS [35℄ divides les in bloks and performs repliation at the
bloklevel.
As desribed in the following, the redundany sheme is intertwined with, among other
aspets, its repair tehnique. When a opy is lost, the appliable repair onsists of simply
opying an available repliato anotherpeer, basedon thedataplaement poliy.
Theweaknessofrepliationisthattheamount ofredundant dataissigniantlylarger than
inmoresophistiated shemes,thereforeimposeshighstorageburdenofpeersto meet thesame
data availability anddurabilityguarantees.
3.1.2 Erasure oding
Themoresophistiatedredundanyshemesapplyerasureodingonthedatatostore. Thebasi
desriptionoftheshemeisasfollows: dataisorganized into bakup objets,eah ofthemisut
into
k
,equal size original fragments whih arethen transformed into an arbitrary number, butn > k
, of enoded fragments, again of the same size, to bestored onn
remote peers. Anyk
ofthese enoded fragments are suient to reonstrut the
k
original fragments, thus thebakupobjet. Therefore the redundany sheme resists to the erasure of
n − k
enoded fragments,stored on peers.
The benet of erasure oding is that less redundany (dened as
r = n k) is required, thus
lessstoragespaeisonsumed thanwithrepliation, whenprovidingthesamelevelofreliability
[113,141℄. As anillustrative example, letus onsider repliation with
r = 3
opies, and erasureodingshemewith
k = 3, n = 5
. Bothshemesensuredatadurabilitywhenonlytwopeerslosetheir assigned data, however the dataredundany rates
r
are dierent:r = 3
with repliationand lessthan
2
with erasure oding. Note that both shemes imply that replias or fragmentsareplaed on distintpeers,inorder to maximizethediversityof storageloations.
The inonveniene of erasure oding shemes is that repairs are more omplex than trans-
one),theenodingproesshastobeperformedagain,forwhihtheoriginaldataor
k
redundantfragments areneeded. Ifnone of these hoies areensured at one loation, the repair of a sole
enoded fragment imposes the transfer ofa dataamount equal to thesize of thebakupobjet
onthe system.
The most widely used erasure odes are based on linear operations on Galois Fields, these
solutions are alled as Reed-Solomon odes [112℄. The main drawbak of these latter is the
omputationalomplexityregardingtheenodinganddeodingproesses;theremedyisproposed
byLTodes[93℄thatprovidelinearodinganddeodingtimes, althoughthereonstrutionofa
bakupobjetrequiresslightlymorethan
k
enodedfragments. Furthermore,theseodesdonot predeterminethe parametern
,henethename: rate-less(orfountain) odes. Asanalternative, rate-lesserasureodesanbeonstrutedbasedonrandomlinearodesfromtheareaofnetworkoding[55℄.
The system designs that apply erasure oding to build data redundany are numerous [8,
17,18,40,67,81,91℄. The popularity ofthe sheme is due to its high storageeieny sine, as
mentioned above, erasure odes are able to provide the same level of reliability as repliation,
onsumingmuhsmallerstoragespae. Ifthe major drawbakof erasureodingisthataesses
andrepairs havehigher latenythan inthe repliation sheme. Ifthis beomesmore important
than the savings in storage spae, designers apply repliation, e.g., for ative and for meta-
data[18,81℄.
3.1.3 Maintaining redundany
Asmentionedearlier,redundant datadoesnotguaranteetheavailabilityanddurabilityofdata,
unless it is maintained during its lifetime: fragments that are lost on peers that rashed or
abandonedthe systemmustbe repairedbybuilding andstoring newfragments.
Therepairpoliydetermineswhenafragmentrepairmustbeperformed. Mostoftheexisting
tehniques have the target of ensuring prompt availability at all times. In these P2P storage
systems,eithereager [35,43℄orlazy repairsareadopted[18℄: oinefragmentsarerepairedeither
immediately, or only after some delay, respetively. The rst poliy is simple, but ineient,
sineallpeerdisonnetionsareonsideredaslosses;theseondoneistheopposite: iteiently
takes into aount both, transient and permanent disonnetions, reintegrating fragments on
peersthatomebakonlineafterashortperiodtoavoidunneessaryrepairs,butitalsorequires
more sophistiateddeision-making.
The deisions thatdrive lazy repair tehniquesmust ensure theamount of redundany that
guaranteesthe requiredavailability, andmostimportantly durabilityofdata. Generally,thresh-
olds on the neessary number of fragments are determined after observing statistis of online,
and supposedly alive, but oine peers. System designsapply reative [18,27℄and/or proative
higherredundany thantherepairthreshold, andproativerepairsareperformedto ensurethat
the rate always remains above, or reative repairs are performed when the rate falls below it.
The motivation for the reative poliy is to avoid waste of bandwidth, i.e., to perform repairs
when they seem to be neessary,while the reasoningbehindthe proative sheme isto smooth
out bandwidthutilizationduringthe repair proess,sinereativerepairsmight omeinbursts.
3.1.4 Feasibility of P2P storage
Among manyotherapproahes, the storagesystemdesignof TotalReall[18℄adopts abinomial
formulatoalulateerasureodingredundanyratethatguaranteeslowlatenythroughprompt
dataavailability. Thisrate,asensuringpromptdataavailabilityingeneral,requireshighstorage
and bandwidthontributions from peers intypialsettings.
As an example, in Wuala [7℄, the usage of a data enter is inevitable beause the overall
storage apaity at peers is not suient to sustain the expeted data availability for every
stored le. AsWuala oerslowlateny storageservies,thesystemmuststore data on servers,
and the load on peers serves only ahing purposes: popular le parts, stored on peers, an
be retrieved from multiple souresfor free, henethe advantage of theP2P overlayin terms of
tra [94℄.
Proportionallywiththeamountof redundantdatato store,therepairpoliyinP2P storage
systems neessitates large ommuniation bandwidth from peers. The authors in [20,21,113℄
argue that suh a systemmight be unsustainable inommon Internet senarios with highpeer
failure rate andlow peerbandwidth apaities. The reason for this is, again, that therepair of
eah lost enoded fragment requiresthetransfer ofa larger amount ofdata.
Toremedythisdisadvantageoferasureoding,hybrid shemesmightuserepliasto perform
repairsofenodedfragmentsavoidingthetransferofthewholebakupobjet. Dimakisetal.[39℄
disusses on the benets of using network oding in alleviating the osts of data maintenane
as opposed to approahes based on erasure oding. On the other hand, they argue that the
omplexityofthesystemgrows,sine dierent maintenane polies mustbe applied forreplias
and for erasure oded fragments. In fat, redundany shemes, applying solely erasure oding,
might beome feasible with more sophistiated odes, built with the need for repairs in mind.
When maintenane of redundany is delegated to peers that do not have a loal opy of the
bakupobjets,hierarhial [45℄orregenerating [39,46,144℄odingshemesanbeusedtolower
theamount ofrequired datatransfers.
Although storage systems are more often envisaged in the literature than bakup systems,
data availability requirements, thatreeive partiular importane, areunfortunately annot be
always fullled. On the other hand, P2P bakupsystems, in whih datadurability is far more
important than availability, an be operated for muh lower prie in terms of redundany and
The authorsof [89℄advoatetheuseofa timer-basedrepair poliythatseparatesdurability
fromavailability. Moreover,various works[27,107℄determine redundany asa funtion ofnode
failure rate in order to guarantee only data durability at the expense of low data availability.
Sinetheseworksaimtodereasetheamount ofredundant datainordertoalleviatethestorage
andtra burdenonpeers,theirfousshiftstothemaintenane ofthelower,thereforelesssafe
redundany rate, i.e.,to datarepair tehniques.
Many works devoted to P2P storage target the almost Herulean task of baking up the
entireontentsofaharddrive,inludingoperatingsystemles. Theseworksproposeonvergent
enryption, atehnique toahieve datasummarization thatavoids storingmultipletimes piees
of data that areommon to many users [16,32,83℄, thus ensuring that storage spae does not
get wasted by saving multiple opies of the same le aross the system. In ontrast, personal
bakup usually involves only an important subset of user data to save, thus the amount of
overlappingdatathatouldbesummarizedisplausiblyverylittle. Instead,userbakupsshould
enodeinrementaldierenesbetweenarhive versions. Reently,various tehniqueshavebeen
proposedto optimize omputational timeandsize of thesedierenes [126℄.
3.2 Data plaement
The P2P storagesystemis built up by theshared resoures of partiipant peers. Theirstorage
spaeontributionanbeorganizedintoanarhitetureinmanydierentways,andthestruture
(or the lak of it) plays a very important role in the systemoperation, from repair poliies to
peer inentives. Inthis setionwe givean overviewon theseaspets.
3.2.1 Central or distributed mapping
The rst andsimplest solution to organize data stored inthesystem isa entrally oordinated
approah. The notion of traker, a server that registers and monitors peers, and furthermore
ditates and organizes data plaements, was rst adapted in P2P le sharing systems [4,29℄,
where the traker has a database about the exhanged les, their loations, and the status of
peers.
Thesimpliityoftraker-baseddesignisovershadowedbyissuesofsalability,robustnessand
seurity: even if the server does not store data replias or fragments [91℄, as a single point of
failure,anyoverloadduetointensivequeriesormaliiousattaksanrenderitdysfuntional. As
suh,nodataislostinasethetrakerfailsontheshortterm,howeversineitoordinatesdata
plaement and repairs, along termfailure signiantly aetstheperformane ofthesystem.
InF2F[90℄andFriendStore[137℄,peersstoretheirdataattheirfriendstoimprovedatadura-
bility. The authors argue that hoosing storage partners based on existing soial relationships
redues the ost of maintaining redundany. The ost of this approah is dereased exibility,
and bakupapaity.
One step toward a distributed plaement algorithm is presented in[8℄: peers are randomly
assignedtogroupsthatmanagedierentpartitionsoftheoverallstorage. Everypeerthatbelongs
to thesame grouphas a onsistent view of the attributed partition. Thisarhiteture requires
trustedpeers,thereforeit targetsorporate networksinsteadof aommon P2P senario.
The appearane of Distributed Hash Tables (DHTs) [111,116,123,150℄ made a ompletely
distributed systemdesign for P2P storagepossible. DHTs provide onsistent mapping between
keys and values [77℄ in a ompletely deentralized fashion: e.g., in Chord [123℄ eah peer and
storedobjetisidentiedbyakey,andislogiallypositionedonaringomposedbythepossible
keyvalues,theneahpeerisresponsibletostore theobjets,whose keysfallbetween itskeyand
thepreedingkeyof anotherpeer.
Various DHTsfousondierentperformaneguarantees,andproposedesignhoiesaord-
ingly,althoughthebasimappingfuntionalitiesarethesame: everypeerreatesandmaintains
(whenpeersonnetanddisonnet)somelinkstoremotepeersintheDHTinordertoonstitute
a strutured overlay for fast lookups through simple routing protools. Furthermore, the peer,
that isresponsiblefor a storage objet, repliatesit on some remotepeerswith hashkeys lose
to its own,whih intuitively makesrepliation be applied inDHT-basedP2P storage[35,43℄.
InPastihe[32℄,storagepeersareseletedbasedonthesimilarityoftheirbakup: peerssave
storagespae by de-dupliatingommon bakupdata.
3.2.2 Peer monitoring
The repair poliy, inalmost every systemdesign, requiresknowledge about thestatus ofpeers.
In orderto determine whether a peeris online,temporarily or permanently oine,an ative or
passive monitoring system must be in plae. This onnetivity information is used in order to
highlight datapartsthat areindangerand need repairs.
If a traker organizes the data plaement [91℄, it is also able to perform monitoring oper-
ations easily. In an unstrutured distributed P2P system, deentralized monitoring might be
probabilisti andtheresults anbe aggregatedvia agossip-basedprotool[60℄,whihoodsthe
systemwithpeermonitoring information.
Instruturedoverlaynetworks, distributedmonitoringrelations amongpeers mayfollowthe
overlay struture. DHTs oer an intuitive poliy: eah peer monitors a set of its neighbors
whih also store replias of storage data assigned to the peer, therefore issues related to data
plaement, peermonitoringandrepairpoliy,thatreatstotheresultsofmonitoring,aresolved
at one[35,43,67,149℄.
Correlated online behavior of peers, e.g., daily and weekly online patterns of peers in the
assumption about independent and unorrelated uptimes, but also renders the measurement of
peer availabilities inadistributed mannermore diult.
3.2.3 Fairness
One of the most investigated peuliarities of P2P systems is the need for peer inentives. If
well-suited mehanisms do not enfore peers to ontribute at least as many resoures as they
onsume, thenselsh peers exploit the shared resoures of more altruisti partiipants without
sharing their own. Thisphenomenon is referredto asfree-riding and thehange ofidentities in
orderto esapefrom badlydesigned inentivesis alledwhite-washing [52℄.
Itisobviousthatfree-ridingleadstotheollapseoftheP2Pservieonthelongterm,aeting
P2Pstorageevenmoredrastiallythanothertypesofservies. Ifpeersinsertmoredatainto the
systemthanthe storagespae theyshare,theserviebeomesunsustainable. Enforingstorage
fairness is therefore imperative: either symmetri storage exhanges must be made between
peers in a tit-for-tat fashion [32,91℄, or multilateral symmetry must be established through a
transferable urreny [33,64,71℄. While bilateral barters aresimple to enfore and ontrol, a
urreny-based design imposes less onstraints on data plaement, however, requires a entral
authority.
A strutured approah to data plaement, e.g., based on a DHT, renders peer seletion
impliit beause data is uniformly stored on peers following the onsistent hashing onept.
Therefore,DHT-basedapproahes ahieve loadbalaningbyspreadingdataon every peerwith
uniform probability,irrespetivelyof theirharateristis. Asa onsequene,peerheterogeneity
in terms of the amount of resoures they dediate to the systemhas to be taken into aount
by an additional system layer to eliit their ooperation. This is further ompliated with the
asymmetry in that a peer might store data hunks for a remote peer that is not neessarily
storingits data.
3.2.4 Inentives
Ineet,peerharateristisdiretlydeterminethequalityofservieinthesystem. Forexample,
lowpeeravailabilitywouldrequireextensive useofredundany inorder tomakedataaessible
despite massive peer disonnetions. In general, large dierenes an be seen among peers in
terms of onnetivity, i.e., up-time and ommuniation bandwidth, whih may motivate non-
random data plaement poliies that take into aount this heterogeneity. Manyworks[42,108,
127℄ improve data availability by arefully seleting peers that will store replias or fragments
of data. Somewhat as a ounter-example [62℄ addresses peer seletion strategies under hurn,
andtheauthors present a stohastimodelofa P2Psystemandargue onthepositive eetsof
If data plaement strategies put more storage on peers with better harateristis, selsh
peers will not strive to showlong uptimes and high onnetivity, beause inreturn they would
reeive more and more data to store, saturating their onnetion links and storage apaity.
Therefore, besides enforing fairness in terms of used and shared storage apaities, i.e., any
peer should oer an amount of loal storage spae equal to the load it imposes on the system,
thefairness shouldbeextended to otherresoures aswell.
A reurrent problemfor P2P storage appliations is reating inentives to enourage nodes
to ontribute more online time and bandwidth resoures [33,83,91℄. This an be ahieved via
reputation systems [76℄ or virtual urreny [139℄. In general, the idea is to impose fairness
spanning everyresouretype. Aommerialexample isWuala, wheretheamountof loalspae
tobesharedforagivenamountofonlinestorageisinverselyproportionaltothepeeravailability
[7℄. However, inthis sheme highlyavailablepeers mayhave to store dataon unavailable peers
oering mostof the umulative available storagespae.
Tokaetal.[95℄omparestwopossiblemehanismstomanageaP2Pstoragesystem: theysug-
gestthateithertheservieuseofeahpeershouldbelimitedtoitsontributionlevel(symmetri
shemes), or that storage spae should be bought from and sold to peers by a system operator
that seeks to maximize prot. Using a non-ooperative game model to take into aount user
selshness, theauthors study these mehanisms with respet to the soial welfare performane
measure, and give neessaryand suient onditions for one sheme to soially outperform the
other.
Whilepeerseletion inmost relatedworksis ditatedbywell-known dataindexing-retrieval
tehniques (e.g., DHT, loality-based), our symmetri data plaement sheme (Chapter 5) is
basedonsharedresoures. Everyuserpiksremotepeersseletively,i.e.,peerswithhighquality
are preferred over peers with low quality. The resulting overlay reation proess resembles to
network reation games[49℄, espeiallyto thepairwiseversionofthem [30℄.
Similarlyto [101℄, our keyobservation isthat anetwork reation game withbilateral agree-
mentsan bestudied withthe toolsofmathing theory [115℄. Weintrodue ageneralized stable
mathing problem that we use as a theoretial basis for peer seletion. The study of peer se-
letion in P2P ontent sharing appliations [57℄ and in P2P storage [118℄ have pinpointed that
user-driven peer seletion brings to system stratiation. We show that this result provides
fairness andembedded inentivesfor users to oertheneessaryamount of resoures.
Anadditionalbenetofuser-driven peerseletion isthateient tehniquesto perform fast
look-up of individual bakup les are unneessary: a restore operation an be ompleted with
3.3 Reliability of the system
In ontrast to storage servies,our goal is to ensure data durabilitywith a P2P bakup system
design that neither requires user payments nor external inentive mehanisms to provide safe
bakup in exhange for as little shared resoures as possible. Besides the issues and solutions
disussed sofar, the reliabilityof theserviehighly dependson thespeed of datatransfers and
onthe seurityaspets.
3.3.1 Data transfers
Besidestheamountandplaement ofredundantdata,organizingdatatransfershasreeived less
attentionfromtheresearhommunity. Theauthorsof[19℄analyzedrandombakupsheduling
by modeling peer uptime as a Markovian proess. Their study reahed a onlusion that is
analogoustowhatweobtaininChapter6: inbakups,theompletiontimeofrandomsheduling
onverges to thetheoretialminimumasthesystemsize grows.
BitTorrent [29℄ uses xed-size fragments, and adaptive upperlimits on parallel transfers in
order to avoid unnished transfers and theappearane of bottleneks respetively. Inspired by
these designelements of BitTorrent, we applied similartehniquesinour bakupP2P system.
As mentioned in Chapter 2, we suggest to employ a data enter as a remedy for the tem-
porary lak of peer resoures. Little work has been done on hybrid approahes to mitigate the
shortomings of P2P systems. To the best of our knowledge noprior worktaklesdata bakup
and/orrepair operationsassistedbyaentral entity.
AmazingStore [147℄ improves data availability of loud-based storage servies and redues
their osts by augmenting entralized louds withan eient lient-side storagesystem. Peers
bakup at other peers, besides the servers, with dierent online patterns to improve the data
availability and to serve read requests within the P2P network. Therefore the hybrid system
mitigatestheissueoftheentralizedpointoffailure,andprovidesresilienetolarge-salefailures.
FS2You [92℄ is a peer-assisted systemthat provides temporary storage and seeding for les
in a BitTorrent-like ontent distribution system with a hybrid struture onsisting peers and
servers. FS2You does not guarantee data persistene; while its goal is to minimize bandwidth
osts,wefousinsteadonminimizing thestorageoststhatwillbe dominant inthelongrun for
astorage system.
3.3.2 Seurity
Whileorrelationamong peeruptimes isa naturalphenomenon, losingdataonpeersin masses
is somewhat suspiious. The most probable reason for many peers to lose the data stored on
them simultaneously is thatthey allrun thesame software, e.g., operating system,or they had
be a vulnerability, reatinghigh risk on the reliability of the system, we see them asa seurity
issue.
Manyworks[67,74,75,100,142℄ takled this issue,and proposedto plaedataon peers that
are lesspossiblyorrelated: those whouse dierent softwareongurations, who areonneted
through dierent network servieproviders andarefar from eahother geographially.
Alongwithlosses due to peerfailures,replias or fragments an be destroyed on peers ai-
dentallyandvoluntarily. Inordertodetettheseevents,dataintegrityheksmustbeperformed
periodially. These veriations also ensure that peers are really storing the data assigned to
them,hene enforing fairnessinstorage.
Whileommonheksumssignalaidentalorruptions,detetingmaliiousdatadeletionre-
quiresmoresophistiatedoperations. Twomainapproaheshavebeenproposed: eitherpeersre-
ateself-verifyingdatablokswithsignatures thatareryptographiollision-resistanthashfun-
tionofthebloksthemselves[119,143℄,ortheyperformprobabilistihallengesviaryptographi
protoolstoward storagepeersthatanbeansweredonlybyholdingthedatablok[14,91,106℄.
With therst option,to detet anydata modiation the peer has to download the blokand
reompute thesignature toperform theomparison.
Sine in our bakup system we an assume to have a single data owner authorized to read
andwriteit,dataondentialityandaessontrolanbeahievedwithstandardryptographi
tehniques between the storing ouples. Moreover, for the same reason, onsisteny guarantees
formultiplereadersandwriters,andanonymityfordatapublishersandreadersarenotneeded.
System design
ThegoalofaP2Pbakupsystemistostore dataforusers safelyon remotepeers. Everysystem
partiipant runs a lient appliation on its devie, e.g., omputer or set-top box, with shared
storageapaity,onnetedtotheInternet. Thesystemdesignmustopewiththeunavailability
ofpeers andthe unpreditable amount ofresoures dediatedto thesystem.
We have to onsider many design aspets when building our system. This thesis presents
innovativeelements regardingthe dataredundanysheme, thepeerseletion withrelateddata
plaement strategies, and the datatransfer sheduling poliy. Inthoseelds where knownteh-
niques provide reasonable solutions, given the purposes and the assumptions of a P2P bakup
system,ourworkreferstoandreusesmehanismspublishedintheliterature,e.g.,erasureoding
andrepairs.
4.1 Data bakup and retrieval
The system design that we present ontains a entral server, alled traker, whih is operated
by the bakupservie provider to supplyvarious systemmanagement operations, suh as peer
registration and monitoring. We note that they an also an be implemented in a distributed
wayusing well-known tehniques (e.g.,DHTs). Suh a taskishowever outsidethe sope of the
thesis. Thereforefor simpliityweassume thatallof them arearriedout bythetraker.
When apeerhasto save newdata, the bakup phase begins. During it, thedata owner:
•
establishesitsdatatobebakedup(storedloallyindenitelyunlessthedevieofthepeerrashes),dividesitinto xedsizefragments,thenreatesadditionalfragmentsbyenoding
theoriginal fragments byerasureoding(Setion 4.2);
•
queriesthetrakerforasetofremotepeersthatarewillingtostoreafragment ofthepeeron their devies, i.e., have suient unalloated storageapaity, and announes itself at
thetrakerasa storageandidate inthe meanwhile(Setion 4.3);