Game theoretic analysis of distributed systems : design and incentives

(1)

Analyse des Systèmes Distribués par

Théorie des Jeux : Coneption et

Initation

Game Theoreti Analysis of Distributed Systems : Design and Inentives

László Toka

Thèse présentée pour obtenir legrade de doteur de Téléom ParisTeh

soutenue le 31Janvier 2011

Direteursde thèse :

Dr. Attila Vidás Dr.Pietro Mihiardi

HighSpeed NetworksLab Département Réseaux et Séurité

Département Téléommuniations et Médiainformatique Eureom

Université des Sienes Tehniques et Éonomiques de Budapest Téléom ParisTeh

2011

(2)

(3)

Cette thèse présente les aspets d'initation dessystèmes distribués où une quantité limitée de

ressoures publiques ou privées doit être répartie parmi les partiipants égoïstes et autonomes.

Notre objetif est de onevoir des méanismes qui assurent l'eaité et l'équité de l'alloa-

tion des ressoures dans tels systèmes. Nous appliquons des modèles d'utilisateurs égoïstes et

nousétudionsles résultats denos régimes proposés.Nousproposons également desalgorithmes

d'optimisation distribuésdestinésà lamiseen ÷uvredansla pratique.

Premièrement, nous iblons les servies de sauvegarde dans des systèmes pair-à-pair, 'est-

à-dire des réseaux distribués onstitués de pairs fontionnellement égaux, où les utilisateurs

sauvegardentleursdonnéessurlespériphériquesdestokagesous-utiliséesdesautresutilisateurs

àtraversd'Internet.Lesystèmeestapabledefontionneràgrandeéhellepuisqueplusd'utilisa-

teursfournissentplusd'espaedestokageetdebandepassanteenglobale.Enoutre,ladiversité

spatialeetpropriétairedeshtesdestokage assurentladisponibilitédesdonnéessauvegardées.

Toutefois, lagestiondesutilisateursqui neveulent paspartagerleurs ressoures loalesave les

autres partiipants a d'importane extrême pour maintenir unsystème opérationnel. En outre,

assurant unehautequalité de serviedansun telréseau pair-à-pairnéessiteune oneptiondu

systèmeave soin.Nos nouvellespolitiques onernant laredondane de donnéesetla séletion

des pairs rendent le servie de sauvegarde able en éhange d'une ontribution équitable des

ressouresdesutilisateurs.

Deuxièmement, nous examinons la gestion du spetre d'une façon dynamique qui permet

d'allouerdesbandesdefréquenepour lesfournisseursdeserviesanslséquentiellement.Nous

présentons notre oneption d'un système distribué sur l'alloation et la tariation ave le

but d'établir l'utilisation eae du spetre, les alloationssouples etlaompatibilité ave des

initations,ompte tenudel'interférene physiqueentrelestitulairesdefréquene. Notretravail

donne un aperçu sur les nouveaux problèmes d'optimisation liés à la répartition du spetre.

Nous proposons des solutions heuristiques à es problèmes, basées sur nos résultats d'analyse.

Nous évaluons lesystème et les algorithmes proposés ave des simulations numériques, et nous

onluons que nosheuristiques peuvent être la fondation d'un système d'alloation dynamique

distribué.

(4)

(5)

This dissertation studies inentive aspets of distributed systems in whih limited private or

publi resoures must be alloated among selshautonomi partiipants. Our goal is to design

mehanisms whih ensure theeieny and fairness of resourealloation insuh systems. We

employ selsh user models and we investigate the results of our proposed shemes. We also

design distributedoptimization algorithmsintended for pratial implementation.

First, we target bakup servies in peer-to-peer systems, i.e., distributed networks of fun-

tionallyequal peers,where users save their bakup data on theunderutilized storagedevies of

oneanotherovertheInternet. Asamainharateristi,nosalabilityproblemsarisesine more

users provide larger overall storage spae and bandwidth. Furthermore, spatial and ownership

diversityofstoragehostsassuretheavailabilityofbakedupdata. However,themanagement of

usersnot willingto sharetheir loalresoures withother partiipantsisextremely important to

keep thesystem operational. Moreover, ensuring highquality of servie in suh a peer-to-peer

network requires areful systemdesign. Our novel data redundany and peerseletion poliies

provide reliable bakupservieinreturnfor fair resoureontribution oftheusers.

Seond, we examine the potential of a dynami spetrummanagement framework that en-

ables sequential alloation of frequeny bands for wireless servie providers. We present our

distributed system design on alloation and priing with the goal of ahieving eient spe-

trum utilization, exible alloations and inentive-ompatibility, onsidering physial interfer-

eneamongfrequeny liensees. Ourwork providesinsightson emergingoptimization problems

related to the alloation. We suggest heuristi solutions to these problems based on our ana-

lytiresults. We evaluatethe proposedframework andalgorithmsnumerially,andwe onlude

thatourproposedheuristisan be theornerstonesofa exibledistributeddynamialloation

system.

(6)

(7)

A jelen értekezés olyan elosztott rendszerek ösztönzõ kérdéseit vizsgálja, amelyekben korláto-

zott magán vagy közös erõforrásokat kell szétosztani önzõ autonóm résztvevõk között. Célunk

olyan módszerek megtervezése, amelyek biztosítják a hatékony és méltányos erõforrás-elosztást

ilyen rendszerekben. Az önzõ felhasználók leírására modelleket építünk és megvizsgáljuk a ja-

vasolt rendszerek teljesítményét. Továbbá olyan elosztott algoritmusokat tervezünk, amelyeket

gyakorlati megvalósításban használnilehet.

Elõször biztonsági adatmentõ szolgáltatásokat vizsgálunk egyenrangú rendszerekben, azaz

olyanelosztotthálózatokban,amelyekbenafelekfunkionálisszerepemegegyezik. Ilyenrendsze-

rekbenafelhasználókegymáskihasználatlantároló eszközeirementik azadataikat azInterneten

keresztül. A szolgáltatás fõ jellemzõje az, hogy nem merülnek fel méretezhetõségi problémák,

ugyanis több felhasználó nagyobb tárhelyet és sávszélességet nyújt összességében. Továbbá, a

tároló eszközök területi és tulajdonosi sokrétûsége biztosítja az elmentett adatok rendelkezésre

állását. Ugyanakkor azonfelhasználókkezeléserendkívülfontosarendszermûködõképességének

biztosításának szempontjából, amelyeknem hajlandóak megosztani helyi erõforrásaikat a többi

résztvevõvel. Emellett a szolgáltatás színvonalának magasan tartása az egyenrangú hálózatok-

ban gondos rendszertervezést igényel. Az újszerû adat-redundania éstároló partner-választási

szabályaink megbízható biztonsági adatmentõ szolgáltatástbiztosítanaka felhasználókszámára

méltányoserõforrás-hozzájárulás fejében.

Ezután egy dinamikus spektrumgazdálkodás megvalósíthatóságát vizsgáljuk meg, amely le-

hetõvé teszifrekveniasávokvezetéknélküliszolgáltatók közötti,egymást követõkiosztását. Be-

mutatjukazáltalunkjavasoltfoglalásiésárazásikeretrendszert, amelyarádióspektrumhatékony

felhasználását,rugalmaskiosztásátésösztönzõkmegvalósításátélozza,azugyanazonfrekveniát

használókközöttizikaizavarástgyelembevéve. Munkánkeredményeibetekintéstnyújtanakaz

elosztássalkapsolatosanfelmerülõ optimalizálási problémák nehézségébe. Ezekkiküszöbölésére

heurisztikusmegközelítéseketjavasolunk,amely megoldásoka problémák analitikus vizsgálatán

alapulnak. Numerikusszimuláiókkal értékeljük ajavasolt algoritmusokat, ésarraa következte-

tésre jutunk, hogy az adott heurisztikák egy rugalmas, elosztott dinamikus allokáiós rendszer

sarokköveit képezhetik.

(8)

(9)

I wishto express mygratitude to myadvisors Matteo Dell'Amio, Pietro Mihiardi and Attila

Vidásfor their guidane. I amalso thankfulfor the supportof myolleagues at Eureom and

at theHighSpeed NetworksLab.

(10)

(11)

1 Introdution 1

1.1 Researhbakground . . . 1

1.1.1 Related work . . . 1

1.1.2 Researhgoals . . . 2

1.1.3 Motivation . . . 2

1.2 Methodology . . . 3

1.2.1 Gametheory . . . 3

1.2.2 Inentive mehanismdesign . . . 4

1.2.3 MathingTheory . . . 5

1.3 Outline ofthedissertation . . . 5

I Peer-to-Peer Bakup System 7 2 Introdution 9 2.1 Onlinedatastorageservies . . . 9

2.2 Bakupversus storageina P2P system. . . 10

2.3 Fous ofthework . . . 11

3 Related work 15 3.1 Dataredundany inP2P storage . . . 15

3.1.1 Repliation . . . 16

3.1.2 Erasure oding . . . 16

3.1.3 Maintainingredundany . . . 17

3.1.4 Feasibility ofP2P storage . . . 18

3.2 Dataplaement . . . 19

3.2.1 Central or distributedmapping . . . 19

3.2.2 Peer monitoring. . . 20

3.2.3 Fairness . . . 21

(12)

3.3.1 Data transfers. . . 23

3.3.2 Seurity . . . 23

4 System design 25 4.1 Data bakup andretrieval . . . 25

4.2 Redundany sheme . . . 27

4.2.1 Data struture . . . 27

4.2.2 Adaptive redundany rate . . . 28

4.2.3 Redundany maintenane sheme . . . 30

4.2.4 Assistedrepairs . . . 31

4.3 Grouping peers bydesign . . . 31

4.4 Assistedbakup. . . 33

4.4.1 Data enter storage. . . 34

4.4.2 Data plaement duringbakup . . . 34

5 User-driven peerseletion 39 5.1 Seletion basedon grades . . . 39

5.2 User satisfation . . . 40

5.3 The exhange game . . . 42

5.4 Stable xturesproblem. . . 43

5.5 Stable stratiation . . . 46

5.6 Grade improvement . . . 48

6 Sheduling data transfers 51 6.1 Sheduling problemwithfull information. . . 52

6.2 Randomsheduling withoutfull information . . . 54

6.3 Evaluationof randomsheduling . . . 56

7 Evaluation of the system design 59 7.1 Simulated usersettings . . . 59

7.2 Fixed andadaptive redundany rates . . . 62

7.2.1 Prompt dataavailabilityand TTR . . . 62

7.2.2 Adaptive redundany ratesheme. . . 63

7.2.3 Data lossresults . . . 65

7.3 Evaluationof agrouped P2P system . . . 68

7.4 Evaluationof sheduling poliies . . . 71

(13)

7.5 Choieof parameters . . . 74

7.5.1 Fragment size . . . 74

7.5.2 Simulated user parameters . . . 77

8 Conlusions 79 8.1 Reliability . . . 79

8.2 Fairness . . . 80

8.3 Eieny . . . 81

8.4 Perspetives . . . 82

II Distributed Dynami Spetrum Alloation 85 9 Introdution 87 9.1 Stati versus dynamispetrum alloation . . . 87

9.2 Fous ofthe work . . . 88

10Related work 91 10.1 Central alloation . . . 91

10.2 Distributedalloation . . . 92

10.3 Spetrumautions . . . 92

10.4 Seondaryspetrumusage . . . 93

11System model 95 11.1 Distributedspetrumalloation model . . . 95

11.1.1 Nodedesription . . . 96

11.1.2 Interferene model . . . 97

11.1.3 Distributedalloation one-way exlusion . . . 98

11.2 Priingdiretives . . . 99

11.2.1 Seond-prie autions . . . 100

11.2.2 Utility-based priingand rationality . . . 101

11.2.3 Inentive ompatibility . . . 103

11.2.4 Fairness andeieny . . . 103

11.3 Nodeexlusionstrategies andtheir onsequenes . . . 104

11.3.1 Nodeexlusionproblem . . . 104

11.3.2 Insights about exlusions inasimpliedsenario . . . 106

11.3.3 Thesaturation of afrequeny slot . . . 108

(14)

11.4.1 The frequenyband seletion and node-exlusion algorithm . . . 111

11.4.2 Optimization heuristis . . . 113

11.4.3 Implementedheuristi algorithms . . . 114

12 Evaluation 115 12.1 Simulation setting . . . 115

12.2 Evaluationmetris . . . 117

12.3 Simulation results. . . 119

13 Conlusions and perspetives 123 14 Summary 125 14.1 The designand analysisof aP2P bakupsystem . . . 125

14.2 Distributed dynamispetrumalloation . . . 126

Bibliography 126 A Synthèse en Français 141 A.1 Introdution . . . 141

A.1.1 Le ontexte . . . 141

A.1.2 Motivation . . . 142

A.2 Lesbuts . . . 142

A.3 Méthodologie . . . 143

A.4 Résultats . . . 143

A.4.1 La oneptiond'unsystèmede sauvegarde àP2P . . . 143

A.4.2 La séletiondespairs parl'utilisateur dansun systèmedesauvegardeàP2P153 A.4.3 Alloation dynamiquedistribuée du spetre . . . 155

A.5 Appliation desrésultats . . . 160

(15)

6.1 Maximumow problemformulation ofdata transfer sheduling . . . 54

6.2 Bakupandretrievalineienywithsynthetionlinephasesyielding

0.36

^average availability. . . 56

6.3 Bakupandretrievalineienywithorrelatedonline phasesyielding

0.36

^average availability . . . 57

7.1 Number ofonline peers . . . 60

7.2 Peer behaviorinputs . . . 60

7.3 Peer onnetivityinputs . . . 61

7.4 Onlineredundany withxed-rate . . . 63

7.5 Analysisof adaptive-, andxed-rate redundany shemes . . . 64

7.6 Fixedand adaptive redundany shemes . . . 65

7.7 Redundanyrates and datalosses . . . 66

7.8 Fatalfration ofpeerrasheswithadaptive-rates (top) andxed-rates(bottom), losseventsarelassedaording to Table7.1 . . . 68

7.9 Theeetsof assistedrepairs . . . 69

7.10 Distribution ofgrades . . . 69

7.11 Fairness ingrouped peerseletion . . . 70

7.12 Costoffairness ingrouped peerseletion . . . 71

7.13 Uplinkunderutilization. . . 72

7.14 Benetsof assistedbakup. . . 73

7.15 Dataenter involvement indierent dataplaement shemes. . . 74

7.16 Dataenter tra . . . 75

7.17 Redundanyrate distribution withdierent fragment sizes . . . 75

7.18 Fragment size analysis . . . 76

11.1 Sets

F ^f

^,

C _i ^f

^,

D ^f _i

^and

E _i ^f

^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ¹⁰⁵

12.1 Nodesintheseondsenario . . . 118

(16)

12.3 Authorityinomeandnodelifetimeswithdierentfrequenybandseletionstrate-

gies intheseondsenario . . . 122

A.1 L'appliation desauvegarde, en oursd'exéutionsur l'ordinateurde l'utilisateur

onneté àl'Internet,stoke lesdonnéesdel'utilisateur entouteséuritéen diu-

sant desopies surd'autresordinateurs partiipants.Après une pertede données

loales,lelogiielréupèrelesdonnéesdemandéesdepuislespartenairesdestokage.144

(17)

7.1 Datalosstypes . . . 67

12.1 Nodeparameters inthe simplied senario . . . 117

12.2 Tehnology oupling between nodes . . . 118

12.3 Transmission powerand interferene toleranethresholds

_mW

kHz

. . . 118

12.4 Nodeutilities pertype . . . 118

(18)

(19)

Introdution

Our researh work aims at designing distributed systems withfous on fairness and inentives

inresoure provisioning. Thedissertation onsistsgame theoreti models of peer-to-peer(P2P,

distributednetworkoffuntionallyequalpeers)bakupanddeentralizedradiospetrumsharing

solutions. The inherent selsh behavior of partiipants is mitigated by well-suited inentive

shemes. Analyti and numeri resultsreet theperformane evaluationoftheadvised system

design frameworks.

1.1 Researh bakground

Nowadays,moreandmoreinformation tehnology serviesandappliationsturntotheemploy-

ment of the distributed paradigm beause of salability issues. Self-organizing and distributed

systemsappearineverydomainofteleommuniations;thesesystems,howeverdierentinmost

of the tehnial aspets, reet similar inentive issues. Besides the extended researh work

on tehnial solutions, the related eonomi harateristis have also been takled in growing

measurereently.

1.1.1 Related work

Many distributed servies urrently rely upon altruisti behavior from their users. The phe-

nomenon of selsh individuals who opt out of a voluntary ontribution to the ommon welfare

of the group has been widely studied, and is known as the free-rider problem. For example,

unontrolledor exessivefree-riding ina P2Plesharing systemleads tonetworkongestion at

some hot-spot peersand to thedegradation of systemperformane; this phenomenon is indeed

a real issue inP2P networks in general. It is thus important to design some mehanisms that

enourage peers to ontributeresoures andredue free-ridingbehaviorindistributedsystems.

A vast researh literature takles the aforementioned problems and propose solutions for

distributed systems, suh as network aess sharing [96,97℄, P2P le sharing [11,29℄, network

(20)

routing [50℄, paket forwarding in ad-ho networks [10,23,152℄, spetrum alloation [79℄, P2P

storageandbakup[11,33,95,130,139℄, network ontentahing[85,86℄, andnetworkformation

[30,49,84℄. Designinginentive-ompatible systemframeworksisonsideredto beahot topiin

theresearhommunity.

1.1.2 Researh goals

Our researh aims at building speially tailored design for two dierent types of distributed

systems: P2Pbakupanddistributedradiofrequenyalloationsystems. Inboth asesthenon-

ooperative selshbehaviorofpartiipantsan jeopardizetheoperation. Basedonuser models,

our goalistodesignoptimaleonomiinentivesolutions thatensuredesirablequalityofservie

byfosteringooperationamongusers or bydistributingshared resoures eiently. IntheP2P

bakupsystemweintrodueabarter-basedsheme,for spetrumalloationweimposemonetary

ows between users and the authority. We evaluate the proposed inentive shemes from the

user andfrom thesystemperspetives, throughboth analyti andnumeriinvestigations.

Inorderto reahour goals,

•

^we ^model^the ^rational^behavior^of^selshpartiipantsintheinvestigatedsystems,aount-

ing for user benets in termsof appliation performane, user ost of resouresharing (if

appliable), andthe heterogeneityof userharateristis, relevant for theservie,e.g., the

heterogeneity ofshared userresoures, interferene relations among users;

•

^based ^on^the^models,^we^build ^system^designs^with^inentive^solutions;

•

ⁱⁿ ôrder ^to ânalyze ^the ^system ^models ând ^the ^proposed înentive ^shemes, ^we ûtilize â

broadlassofanalytitools,e.g.,mathingtheorymodelsfortheP2Pbakupsystem,and

aution theoryfor the alloation ofradio spetrum;

•

^we ^deompose ^the ^appeared optimization problems and develop distributedalgorithms to

solve them;

•

^to ^approve ^the ^theoreti ^models, ^we ^perform simulations as proofs of onept, and we

provide numeriperformaneevaluationof the proposedsolutions.

We strive to reate potential implementation solutions of pratial, feasible and salable

appliations. The frameworks to be designed with embedded inentive shemes and analyzed

aordingtotheassumedselshuserbehavior,mustensurefavorableoutomesinrobustsystems.

1.1.3 Motivation

The motivation that gives ground to this researh work is the lak of reasonable inentives

(21)

suh a system sets bak the possibility to reah a soially optimal outome in the funtioning.

Appliation-spei shemes ouldenhanethewaythese distributedsystems work.

Therstpartofthethesisaimsarelevanteldofresearh: thereisgrowingneedforseamless,

seure,reliable and easily aessibleonline bakup asthedailyusedeletronidevies aremore

andmoreintegratedintotheInternetandtheinreasingtransmissionratesmakepossiblemoving

large amount of data. Sine instead of publi or entral resoures users exploit those of one

another,resoureprovisioning requiresa well-suited inentive sheme.

Theseondpartofthethesistargetsradiospetrumalloation,studyingaution-basedman-

agementshemesinadistributedfashion. Themainmotivation behindtheapproah weinvesti-

gateisthatthesequeneofentralautionstorealloatepubliresouresshouldbetransformed

into a more salable framework. We let the partiipants trade the aquired resoures among

themselvesina distributeddesign without theintervention of aentral autioneer.

1.2 Methodology

Game theory oers a tool-set for modeling individual user preferenes, strategies, osts and

valuationsindistributedsystems. We employ graphtheoryand mathing theorytoanalyze the

inentive mehanisms that we propose. Furthermore, we perform numerial evaluations with

simulations, written inMATLAB.

1.2.1 Game theory

Distributedsystemsonsist ofautonomipartiipantsand limitedpublior privateresoures to

bedistributedamongthem. Itseemsreasonabletoassumethateveryuserisselsh,i.e.,sensitive

onlyto thequalityofthe experienedservie,regardlessoftheeetsofitsationson theother

users. On the other hand, the quality of servie a user reeives depends on the generosity of

other users: eah user benets either from the shared apaity of others or its share of the

publi resoures. The framework of game theory[56,69,105,122℄ is thereforepartiularly well-

suitedto studythis kindofsituation asanon-ooperativegameplayed amongusers. Analytial

investigations takle user behavior, theexistene of best-response strategies, Nash-equilibrium,

Pareto-optimal outomes, inentives, and soialwelfare.

Sine in premature system designs there is no diret inentive to oer own apaity to the

others orto fairlysharetheommon wealth,users aremotivatedtofree-ride [53℄. Ifthesharing

eorts do not get some kind of proof of appreiation, nobody has interest to ooperate and

in extreme ase the distributed servie fails to exist. Therefore it is neessary to takle the

eonomiaspetsofsuhsystemsandto properlydesign inentive mehanismsfor theserviein

question [12,31,52℄.

(22)

ore of game theory that attempts to mathematially apture behavior in strategi situations,

in whih the suess of an individual inmaking hoies dependson thehoies of others. The

rational users hange their behavior to maximize their own benet when taking part in the

system. In reality, there maybeother typesof individuals, e.g., altruisti or maliious, but the

predominant majorityisassumed to behave strategially.

The ombination of universal ooperation leading to optimal overall utility, an individual

inentivetodefet,andrationalbehaviorprovidestheessentialtensionthatresultsinthetragedy

oftheommons[82℄ withoutproperlydesigned inentive shemes. Designingthis latterrequires

omplete knowledge about the partiipants, moreover to attain optimality, eah peer should

be oered a personalized sheme, and where neessary, enfore (or make payments to ensure)

partiipation. None of these onditions are likely to be ahievable in a pratial distributed

system,where not onlythepreferenes oftheindividualpeersbut even theiridentities might be

unknown [12℄.

As far as eonomi eieny is onerned, besides the lak of information onerning the

identities and preferenes of the individual partiipants, whih isrequired for the omputation

oftheoptimalalloationofresoures andostinadistributedsystem,anadditionalhallenging

issueistheompliatedeonomimodelingofindividuals. Inseveralases,theexisteneofexter-

nalitiesmakestheadoptionofafree-marketapproahusingmarket-denedpriesineient[12℄.

Also, dierent tehnial onstraints limit the design variety of many distributed systems, e.g.,

urreny-based inentive shemes are hard to implement in urrent le sharing P2P networks.

In addition to the hallenging eonomi resoure alloation problem, a system designerusually

hasto dealwiththe inability torely ontrustedsoftwareor onentral entities thatan monitor

and aount for peer transations to ensure that they ontribute and onsume the amount of

resoures ditatedbyanunderlying inentive model.

1.2.2 Inentive mehanism design

Inentive shemes must be deployed to align selsh partiipant behavior with the goals of the

system design, e.g., in P2P systems it is essential that peers be ompliant with the protool

speiation [64,82,102,148℄. However, modeling of the eonomi transations arried out in

a distributed systemis, in general, a very omplex task. The main reason is that partiipants

should ontribute dierent types of resoures (money, power, bandwidth, storage, CPU yles,

ontent,et.) withvariousharateristis, theprovisionofwhihgeneratesomplex oststaking

intoaountthetimespentusingthesystemandinsomeasesextremeomponentssuhaslegal

risks[12,13℄. Earlyworks[54,64,139℄madeeortstomodeltheutilitiesandostsassoiatedwith

thepartiipationinaP2Plesharingsystemusinggametheoretianalysis: theyanalyzethefree-

riding problemand theequilibrium ofuserstrategies underseveral miro-payment mehanisms,

(23)

Monetary-basedinentivemehanismsarewidelystudiedineveryareaofdistributedsystems.

Curreny has a well-dened uniform valuation for every partiipant and supports exibility in

termsofthetimeandtheamountofounter-ontributionforanygivenresoure. InP2Psystems

thepredominant payment shemeapproahesbelong tomiro-payment solutionswithlearane

infrastruture or to redit-based systems. When publily available limited resoure must be

alloatedon a setof individuals,autions seem to be the most suitable frameworks. Extensive

researh targetsa wholerangeof suhsystems [79,96℄.

When a given distributedsystem exludes the possibility of applying monetary tools inor-

der to provide inentives for the partiipants, then barter-based solutions arise. Collaboration

amongpeers isthen motivatedbyresoure exhanges, tit-for-tat strategies [11,29,101℄, reputa-

tionsystems[65,82,104℄,penaltypoliies[102℄. However,inthesemodels,asaresultofexluding

monetarymeans,determiningtheobjetivevalueofresoures thataresubjettobarterappears

asanadditional issue.

Distributed Algorithmi Mehanism Design (DAMD) aims to reate inentives for Internet

appliations. In an ideal systemdesign selsh nodes would maximize theommon welfare [12℄.

Ifnoentralized authoritywithtotal knowledgean makedeisions about thesystemresoures,

buildinganinentiveshemebeomesaDAMDproblem,ombiningomputersienewithinen-

tiveompatible mehanismdesign ofeonomis. DAMDprovidesa usefulframeworkto enfore

theproperprovision ofresoures inadistributed system[51℄.

1.2.3 Mathing Theory

Mathingtheory[58,72,73℄,aeldofombinatorialoptimization,providesusefultoolstoanalyze,

amongseveral otherpossibletargets, e.g., peerseletion inP2P lesharing [57,88℄systems.

1.3 Outline of the dissertation

The rst part of the dissertation fouses on P2P systems for bakup. Users partiipate in the

systemoperation byoeringprivateresoures (suh asstorage, online time,bandwidth) for the

benet ofthe ommunity withtheultimate goalof improving appliation performane.

Westudy important design optionsof dataredundany,datamaintenaneand datatransfer

overthe network in Chapter4. We suggest a new proedure for determining dataredundany,

and we evaluate its performane ompared to urrently known tehniques in Chapter 7. For

theiromparison,wedenenovelmetristhatdesribethequalityofservie,e.g.,thedurationof

arhivingdataandofreoveryproesses,theprobabilityofbakuploss. Determiningredundany

is based on the time required to retrieve the baked up data and it guarantees high servie

quality levels while signiantly reduing the applied redundany, thus storage and bandwidth

(24)

We show the settings that provide high quality of servie for bakup purposes and, in the

meantime, require the least possible shared resoures. However, a system with low number of

users might notbeableto guaranteetheappropriate qualityofserviebasedonly ontheshared

userresoures. Therefore,we examinetheeetsofintroduingaentral storageserverinorder

to avoidsuhsituations: weshowthe ostimpliationsof persistent qualityguarantees. Insuh

a hybrid systemthe entral, highlyavailableservermight beusedto store data inexhange for

thereimbursement ofosts. Furthermore,itanalsobeusedtorestoredatafromtheremaining

opies when bakup is lost on peers that leave the system. In Chapter 7 we arrive at the

onlusion thatrelatively lowourringostsofsuh ahybrid systemsigniantly improvesthe

qualityof servie.

Sinetheextent ofuser ontributions aetsthesystem,an inentivesshemeisrequired in

order to maintain the servie: partiipating users must share diskspae, bandwidth and online

time, onneted to the Internet. In Chapter 5we present asymmetri peer seletion sheme in

whih users have the ability to selshly selet remote peers they want to exhange data with.

Peerharateristis(e.g.,onlineavailability,dediatedbandwidth)playanimportantroleandare

reeted inthe modelthroughasingleparameter, termedgrade. Weshowthatseleting remote

peers selshly,based on their proles, reates inentives for users to improve their ontribution

to thesystem.

InChapter 6,we show an eient algorithm to ompute theoptimal data transfershedul-

ing solution for hypothetial ases where future peer uptimes are known. We use the optimal

results to evaluatethe applied sheduling poliy,and we proposepratial settings inwhih the

performaneof random deisions islose to optimal.

The seond partof thedissertation investigates the possibility of alloating radio spetrum

amongmultiple appliantsdynamiallyinadistributedmanner. Thedistributeddynamispe-

trumalloationframework,iffrequenyleasersareoordinatedbyawell-suitedsheme,provides

eientmethodsforthealloationofthesareunderlyingresoures,withrespettothegeneral

interferene onditions.

In Chapter 11, we present a model and the related framework of alloation and priing

thatoersa distributedmehanismdesign,adapted topratial employment issues. Our model

handles interferene eets without anyrestriting assumptions, and our framework issalable

and inentive-ompatible. Weprovidebothanalytial andnumerialevaluation(inChapter12)

of theproposed framework, and ineitherase we prove this latter to be a suitable approah to

eient and exible spetrumutilization.

(25)

Peer-to-Peer Bakup System

(26)

(27)

Introdution

Nowadays, aumulated information results in large, and ontinuously inreasing amount of

digital data, a signiant part of whih is personal and annot be reprodued easily, therefore

theirsafestorageisessential. Numerousmethodsandappliationsareavailabletosimultaneously

produe and bakup opies, but the diulty and high osts of the arhiving proess prelude

most users to provide safetyto their data.

We study online bakup servies that oer safe storage of personal data for users with an

Internet onnetion. We suggestan innovative and viable approah to realize this servievia a

P2P arhiteture whih exploits user resoures (storage, online time, bandwidth) with thegoal

ofdereasingtheusualostof baking updataonline. Simulations showthatsuhasystemhas

theperformane and the reliabilitysimilarto those ofa ostlyentral storageserver.

The reliability of a bakup servie an be established to dierent extents. If the system

providesdata availability,thenit meansthatthedata is aessibleonline whenevera requestis

issued. This implies that any arbitrary part of the data an be fethed promptly by the data

owner, thereforeguaranteeing low aesslateny. Ontheother hand,ifthedurability ofdatais

ensured, the datais safely stored inthe system aslong as the servieis required, but it is not

neessarily always aessible, at least not all parts of it. A reliable system reovers the stored

datadespitefailures,without violatingthelevelof dataaessibility itguarantees.

2.1 Online data storage servies

Theadvent ofloudomputingasanewparadigmtoenableservieproviderswiththeabilityto

deployost-eetive solutionshasfavoredthe development of arangeof newservies,inluding

online storage appliations. Due to the eonomy of sale of loud-based storage servies, the

ostsinurredbyend-usersto hand overtheir datato a remotestorageloation intheInternet

havedereasedsigniantly. Furthermore,theproessofbakingupuserdataonlineareusually

performed automatially by a lient appliation: in ontrast to on-site bakup solutions, user

(28)

interation isminimal, and inase ofdatalossdue toan aident,restoring theoriginal data is

a seamlessoperation.

Commerialonline data storageservies[2,3,6,7,147℄ an bebroken down into those based

solely on apital-intensive and energy-onsuming [5,84℄ serverfarms [2,3℄ and those embraing

theP2P paradigm[6,7,147℄.

AmazonS3[2℄isbasedonlargeparksofommodityhardware(i.e.,adataenter)runninga

ustom-built distributeddatastruturedisussedin[37℄. Dataavailabilityanddurabilitydonot

ome for free: end users areompelled to payfor theamount ofspae they oupyin thedata

enter and the amount of tra their ontents generate. Furthermore, loud-based approahes

an besubjetto failures,asreportedin[1℄. Manyompanies basetheir online storageservies

(e.g., Dropbox [3℄)on the Amazon loudbybuildinga user-friendlyinterfae to it.

The entralized omponent that ensures storage for users of Wuala [7℄ is omplemented by

storage apaity at all available peers takingpart to the appliation. Generally, data is plaed

both on servers and on a P2P network with a predetermined amount of redundany. Servers

oordinate the dataplaement by seleting storage nodes: users must oer an amount of loal

spae inversely proportional to their average online availability,and a minimal dediatedband-

widthapaity[65℄. Thesame serversareinvolvedinonstantlyheking thattheseonstraints

aresatised.

Thelong-termstorageostofonlineservies,whiharepartiularly dominantintheontext

of bakupappliations, mayeasily go pastthatofthetraditionaloine solutions. Additionally,

while data availability is a key feature that large-sale data-enters deployments guarantee, its

durability isquestionable [147℄, as reported in[140℄. A P2P storage systemis a viablealterna-

tive to loud-based solutions: user ommodity storage devies are shared (together with some

bandwidth resoures)with a numberof remoteusers to form adistributed storagesystem that

is resilient to loalfailures.

2.2 Bakup versus storage in a P2P system

General purpose online storage systems optimize lateny to individual le aess, sine users

upload their datato the system asa replaement of a loal hard drive. Maintaining high data

availability inaP2Pstoragesystemputshighburdenontheusersduetotheintermittentonline

appearanes of storingpeers. The online behavior ofusers is unpreditable and, at large sale,

rashes and failures arethe norm rather than the exeption. As a onsequene, a P2P storage

appliation stores large amounts of redundant data to ope with suh unfavorable events, i.e.,

storagespae issaried for lowaesslateny.

Despitetheenormous amount ofwork arriedout intheresearh of P2Pstorage duringthe

(29)

thisisthatafavorabletrade-obetweentheperformaneandtheresourerequirementofsuha

servieishard tond. AsBlakeetal. arguein[20℄,itisprohibitiveto haveenough redundany

to keep all data available at all times, although without this, theperformane is questionable.

Thehighdemandofpeerresouresmight beunsatisablebytheontributionofthepartiipant

peers, even ifwell-suited inentivesareinplae to enourage themto share.

Onthe otherhand, anonline bakup servie,asa partiular ase of online storage, involves

thebulk transfer ofpotentially large quantities of data, both during regular data bakups, and

espeially in ase of retrievals. As a onsequene, short bakup and retrieve times are more

importantgoals to ahieve thanlowaesslateny. Furthermore,asthebakupservieassumes

aloalopyofthedataattheuser,retrievingthebakupisrequiredonlywhentheloallystored

opyislost. At theserather rareevents, aesslateny to speiparts ofthebakupis simply

not anissuesine thebulk tra neededto restorethe wholeofit wouldneed hoursanyway.

Rather than aiming for another general purpose P2P storage system, we therefore fous

speiallyonasimpler toimplementandyetveryusefulappliation: baking uptheimportant

dataof users. Given theabove onsiderations, our bakupsystemdesign optimizes bakup and

retrievaltimes, whileguaranteeing datadurability,i.e.,ensuringthatthelossofbakupremains

anunlikelyevent.

We show in the thesis that providing data durability is less diult to ahieve, in terms

of required peer resoures, than data availability. We present the drasti redution of data

redundany, still manageable to ensure the above quality of servie guarantees. Furthermore,

we demonstrate the possibility of avoiding separate design layers that provide user inentives:

our appliation fosters ooperation among peers by design. Moreover, we propose a hybrid

design with a data enter that improves the system performane to the same level of a ostly

entralized servie, even if our optimizations and inentives fail to ensure the neessary peer

resoures. Strivingtorelieveusers frompayments,our datatransfershedulingpoliieskeep the

bandwidthand storageostsofthe dataenter at aminimum.

2.3 Fous of the work

Thethesis presents our workon online databakupsystemsinwhih usersstore opies of their

lesfor long-term. Weproposeasystemdesign thatinvolvesuseredgedeviesthatonfederate

by pooling their loal resoures in a P2P network, and a entral storage faility, i.e., a data

enter. Wedeneasetofperformanemetristhatdesribeimportant qualityofservieaspets

(e.g., data durability, bakup and retrieve time). We analyze dierent design hoies of data

redundany,data plaementanddatatransfershedulingbyevaluatingthesystemperformane.

WereviewrelatedworksinChapter3,organized aordingto thestrutureofthethesis. We

(30)

and/ororthogonal to our ontributions,but important for aomplete systemdesign.

In Chapter 4,we overview our system design and disuss its key omponents. We desribe

indetailourappliation senario,and weshowwhytheassumptionsunderlyingabakupappli-

ation an simplify manyproblemsaddressed intheliterature.

In order to maintain the durability of baked up data, the degree of redundany, i.e., the

amountofadditionaldataintheP2Psystemthatguaranteesabakupoperationtobeonsidered

omplete and safe, mustbehosen wisely. We present a novelredundany poliy inSetion 4.2

that, rather than fousing on short-term dataavailability, targetsshort data retrieve times. As

suh, our method alleviates the storage burden of large amounts of redundant data on lient

mahines.

In order to enhane the performane of the system, we propose to employ a data enter to

omplement the resoures oered by peers, if they are not suient. For example, deteting a

faultyexternal hard-drive may not be immediate, or obtaining a new piee of equipment upon

a rash may require some time; in these ases an assisted approah to repair data redundany

on peers, whih involves a loud-based storage servie, an signiantly redue the probability

of datalossat anaordable ost. Similarly,we show thatan assistedbakupproess mitigates

thenegative eets oflowpeerquality,dereasing thedata lossprobability alongwiththetime

to bakup. The suggested assisted poliies ooad the dataenter bytransferring bakup data

to peers assoon aspossible,thereby minimizing data enter osts(i.e., storage and bandwidth

burden) whilesustaining the qualityof thebakup servie.

In Chapter 5, we fous on the peer seletion proess, during whih peers hoose where to

plae fragments of bakup datathey need to store. A global ranking is built among the peers

intermsofthequality ofstoragespae, representingthe onlineavailabilityandbandwidth,they

oer to theP2P system. We modelthe proess asa game inwhih users selshly optimize the

utilityandost theybearfor joining thesystembyadjustingtheir sharedstorageand qualities.

We present theobjetive funtion that drives thebehaviorof peers inthe game, and we study

thealgorithmi perspetiveof theunoordinatedpeer seletion proess.

We showthat the game reahes an equilibriuminwhih thesystemis stratied: peers with

similar harateristis ooperate bybuildingbi-lateral linksthatareusedtoexhange andstore

data. As peers improve their ontributions, the servie they reeive from one another beomes

less ostly, thus, the onsequene of system stratiation is a natural inentive for peers to

improve thequalityof resouresthey oerto other peers.

We further disuss the sheduling poliies, i.e., deiding how to alloate data transfers be-

tween peers. InChapter6 we formalize the problemof exhanging multiple pieesof datawith

intermittently available peers during uploading and retrieving operations with full knowledge

of future peer uptime, and we show that it an be solved in polynomial time by reduing it

(31)

prove that a randomized approah to sheduling ompletes transfers nearly optimally interms

ofduration aslong asthe systemissuiently large.

In Chapter 7 we orroborate our theoretial ndings with simulations, driven by real avail-

ability traes from an instant messaging appliation where on-line appearanes of users show

daily and weekly patterns. Extensive numerial simulations, with a distributed algorithm for

the peer seletion proess, show that our tehniques are eetive in suh a realisti senario

withheterogeneous peeravailabilities and bandwidths. With thesimulationof a omplete P2P

bakupsystemweshowthatour proposeddesignisviableinpratialsenariosandweillustrate

its benets intermsof inreasedperformaneompared toother systemdesigns.

Our ontribution, disussed indetailinChapter8,is three-fold:

•

^we^present â^P2P^bakup^system^designⁱⁿ^whih^data^redundanyândîts^maintenane âre

adaptedto the requirementsof abakupserviedriven byour novelperformanemetris,

resulting inasigniant dereaseof resoureusageompared to general purposestorage;

•

^through user-driven peer seletion, we introdue embedded inentives for peers to share

their loalresoures and weshowthe stableonguration ofsuh asystem;

•

^we ^show ^the ^time ând ^bandwidth ônstraints ôf ^data ^transfers ^between ^peers ⁱⁿ â ^P2P

system,andwe suggesttodiminish thoseineienies fora lowostbyintegratingadata

enterstorageservieinthesystem.

(32)

(33)

Related work

Online data storage solutions, allowing users to store and aess their data from any point

on the Internet, are ommerially popular produts. These appliations have reeived muh

attention from the researh ommunity that studied their manyfaets. In partiular, researh

on distributedP2P storage appliations hasproliferated intheliterature. The omplete design

of suh systems requires onsidering several problems. Here, we fous on those works that are

loselyrelated to our ontributions, but also give an overviewon topis that arenot addressed

inthethesis.

The hallenge in thedesign of P2P data storage systems is that a reliable servie must be

built on many unreliable peers. The emphasis is on the high number of partiipants, beause

theunderlying oneptexploits thediversity of peer harateristis. Indeed, thedatastored in

thesystemremains available onlineifitissatteredonsomanypeersthateven ifmost ofthem

are temporarily disonneted, there are enough peers onneted to the Internet to serve any

data read request. Similarly, the durability of storage is ensured if permanent loss, deletion or

orruptionofdataonpeersanbeorretedbymaintainingthespreaddatawithdierentrepair

tehniques. Theavailabilityanddurabilityofstoreddataareahievableonlyifeventsthatmake

data parts unavailable or lost are not positively orrelated. In this ase a system ontaining

numerous peers to storedata partsan provide both featuresbythelaw oflarge numbers.

In thefollowing setions we revisittheolletion [44℄ ofmajor ideas thattakle theamount

of dataredundany and its maintenane, theplaement of dataparts onpeers,and therelated

dataindexing, seurity andinentive issues.

3.1 Data redundany in P2P storage

During the last deade, researh foused on the design of general-purpose P2P storage that

provides features of traditional le systems. Therefore, a signiant amount of work has been

devoted to implementing systems with low lateny, the most diult task when building dis-

(34)

tributed storage. Numerous solutions emerged, all proposing to add some redundany to the

stored data, but withdierent methods and extents depending on the design perspetives. We

give abriefoverviewofexistingredundany shemes, wesummarize thetehniquesthatareap-

plied to maintain theredundany,thenwe argueonthedierenesbetween storageandbakup

systems.

3.1.1 Repliation

The simplest redundany sheme stores opies of data on dierent peers. If

r

^replias ^of ^the

same leareplaed on

r

^dierent^peers,^the^leîsâvailableêven îf

r − 1

ôf^them âreôine. Îts

simpliitymadethis shemewide-spreadintherstP2Pstoragesystemdesigns, e.g.,PAST[43℄

adopts the repliation of les, CFS [35℄ divides les in bloks and performs repliation at the

bloklevel.

As desribed in the following, the redundany sheme is intertwined with, among other

aspets, its repair tehnique. When a opy is lost, the appliable repair onsists of simply

opying an available repliato anotherpeer, basedon thedataplaement poliy.

Theweaknessofrepliationisthattheamount ofredundant dataissigniantlylarger than

inmoresophistiated shemes,thereforeimposeshighstorageburdenofpeersto meet thesame

data availability anddurabilityguarantees.

3.1.2 Erasure oding

Themoresophistiatedredundanyshemesapplyerasureodingonthedatatostore. Thebasi

desriptionoftheshemeisasfollows: dataisorganized into bakup objets,eah ofthemisut

into

k

^,êqual ^size ôriginal ^fragments ^whih âre^then transformed into an arbitrary number, but

n > k

^, ôf ênoded ^fragments, âgain ôf ^the ^same ^size, ^to ^be^stored ôn

n

^remote ^peers. ^Any

k

^of

these enoded fragments are suient to reonstrut the

k

^original ^fragments, ^thus ^the^bakup

objet. Therefore the redundany sheme resists to the erasure of

n − k

^enoded ^fragments,

stored on peers.

The benet of erasure oding is that less redundany (dened as

r = ⁿ _k

⁾ ^is ^required, ^thus

lessstoragespaeisonsumed thanwithrepliation, whenprovidingthesamelevelofreliability

[113,141℄. As anillustrative example, letus onsider repliation with

r = 3

ôpies, ând êrasure

odingshemewith

k = 3, n = 5

^. ^Both^shemes^ensure^data^durability^when^only^two^peers^lose

their assigned data, however the dataredundany rates

r

^are ^dierent:

r = 3

^with ^repliation

and lessthan

2

^with êrasure ôding. ^Note ^that ^both ^shemes împly ^that ^replias ôr ^fragments

areplaed on distintpeers,inorder to maximizethediversityof storageloations.

The inonveniene of erasure oding shemes is that repairs are more omplex than trans-

(35)

one),theenodingproesshastobeperformedagain,forwhihtheoriginaldataor

k

^redundant

fragments areneeded. Ifnone of these hoies areensured at one loation, the repair of a sole

enoded fragment imposes the transfer ofa dataamount equal to thesize of thebakupobjet

onthe system.

The most widely used erasure odes are based on linear operations on Galois Fields, these

solutions are alled as Reed-Solomon odes [112℄. The main drawbak of these latter is the

omputationalomplexityregardingtheenodinganddeodingproesses;theremedyisproposed

byLTodes[93℄thatprovidelinearodinganddeodingtimes, althoughthereonstrutionofa

bakupobjetrequiresslightlymorethan

k

^enoded^fragments. ^Furthermore,theseodesdonot predeterminethe parameter

n

^,^hene^the^name: ^rate-less^(or^fountain) ôdes. Âsânalternative, rate-lesserasureodesanbeonstrutedbasedonrandomlinearodesfromtheareaofnetwork

oding[55℄.

The system designs that apply erasure oding to build data redundany are numerous [8,

17,18,40,67,81,91℄. The popularity ofthe sheme is due to its high storageeieny sine, as

mentioned above, erasure odes are able to provide the same level of reliability as repliation,

onsumingmuhsmallerstoragespae. Ifthe major drawbakof erasureodingisthataesses

andrepairs havehigher latenythan inthe repliation sheme. Ifthis beomesmore important

than the savings in storage spae, designers apply repliation, e.g., for ative and for meta-

data[18,81℄.

3.1.3 Maintaining redundany

Asmentionedearlier,redundant datadoesnotguaranteetheavailabilityanddurabilityofdata,

unless it is maintained during its lifetime: fragments that are lost on peers that rashed or

abandonedthe systemmustbe repairedbybuilding andstoring newfragments.

Therepairpoliydetermineswhenafragmentrepairmustbeperformed. Mostoftheexisting

tehniques have the target of ensuring prompt availability at all times. In these P2P storage

systems,eithereager [35,43℄orlazy repairsareadopted[18℄: oinefragmentsarerepairedeither

immediately, or only after some delay, respetively. The rst poliy is simple, but ineient,

sineallpeerdisonnetionsareonsideredaslosses;theseondoneistheopposite: iteiently

takes into aount both, transient and permanent disonnetions, reintegrating fragments on

peersthatomebakonlineafterashortperiodtoavoidunneessaryrepairs,butitalsorequires

more sophistiateddeision-making.

The deisions thatdrive lazy repair tehniquesmust ensure theamount of redundany that

guaranteesthe requiredavailability, andmostimportantly durabilityofdata. Generally,thresh-

olds on the neessary number of fragments are determined after observing statistis of online,

and supposedly alive, but oine peers. System designsapply reative [18,27℄and/or proative

(36)

higherredundany thantherepairthreshold, andproativerepairsareperformedto ensurethat

the rate always remains above, or reative repairs are performed when the rate falls below it.

The motivation for the reative poliy is to avoid waste of bandwidth, i.e., to perform repairs

when they seem to be neessary,while the reasoningbehindthe proative sheme isto smooth

out bandwidthutilizationduringthe repair proess,sinereativerepairsmight omeinbursts.

3.1.4 Feasibility of P2P storage

Among manyotherapproahes, the storagesystemdesignof TotalReall[18℄adopts abinomial

formulatoalulateerasureodingredundanyratethatguaranteeslowlatenythroughprompt

dataavailability. Thisrate,asensuringpromptdataavailabilityingeneral,requireshighstorage

and bandwidthontributions from peers intypialsettings.

As an example, in Wuala [7℄, the usage of a data enter is inevitable beause the overall

storage apaity at peers is not suient to sustain the expeted data availability for every

stored le. AsWuala oerslowlateny storageservies,thesystemmuststore data on servers,

and the load on peers serves only ahing purposes: popular le parts, stored on peers, an

be retrieved from multiple souresfor free, henethe advantage of theP2P overlayin terms of

tra [94℄.

Proportionallywiththeamountof redundantdatato store,therepairpoliyinP2P storage

systems neessitates large ommuniation bandwidth from peers. The authors in [20,21,113℄

argue that suh a systemmight be unsustainable inommon Internet senarios with highpeer

failure rate andlow peerbandwidth apaities. The reason for this is, again, that therepair of

eah lost enoded fragment requiresthetransfer ofa larger amount ofdata.

Toremedythisdisadvantageoferasureoding,hybrid shemesmightuserepliasto perform

repairsofenodedfragmentsavoidingthetransferofthewholebakupobjet. Dimakisetal.[39℄

disusses on the benets of using network oding in alleviating the osts of data maintenane

as opposed to approahes based on erasure oding. On the other hand, they argue that the

omplexityofthesystemgrows,sine dierent maintenane polies mustbe applied forreplias

and for erasure oded fragments. In fat, redundany shemes, applying solely erasure oding,

might beome feasible with more sophistiated odes, built with the need for repairs in mind.

When maintenane of redundany is delegated to peers that do not have a loal opy of the

bakupobjets,hierarhial [45℄orregenerating [39,46,144℄odingshemesanbeusedtolower

theamount ofrequired datatransfers.

Although storage systems are more often envisaged in the literature than bakup systems,

data availability requirements, thatreeive partiular importane, areunfortunately annot be

always fullled. On the other hand, P2P bakupsystems, in whih datadurability is far more

important than availability, an be operated for muh lower prie in terms of redundany and

(37)

The authorsof [89℄advoatetheuseofa timer-basedrepair poliythatseparatesdurability

fromavailability. Moreover,various works[27,107℄determine redundany asa funtion ofnode

failure rate in order to guarantee only data durability at the expense of low data availability.

Sinetheseworksaimtodereasetheamount ofredundant datainordertoalleviatethestorage

andtra burdenonpeers,theirfousshiftstothemaintenane ofthelower,thereforelesssafe

redundany rate, i.e.,to datarepair tehniques.

Many works devoted to P2P storage target the almost Herulean task of baking up the

entireontentsofaharddrive,inludingoperatingsystemles. Theseworksproposeonvergent

enryption, atehnique toahieve datasummarization thatavoids storingmultipletimes piees

of data that areommon to many users [16,32,83℄, thus ensuring that storage spae does not

get wasted by saving multiple opies of the same le aross the system. In ontrast, personal

bakup usually involves only an important subset of user data to save, thus the amount of

overlappingdatathatouldbesummarizedisplausiblyverylittle. Instead,userbakupsshould

enodeinrementaldierenesbetweenarhive versions. Reently,various tehniqueshavebeen

proposedto optimize omputational timeandsize of thesedierenes [126℄.

3.2 Data plaement

The P2P storagesystemis built up by theshared resoures of partiipant peers. Theirstorage

spaeontributionanbeorganizedintoanarhitetureinmanydierentways,andthestruture

(or the lak of it) plays a very important role in the systemoperation, from repair poliies to

peer inentives. Inthis setionwe givean overviewon theseaspets.

3.2.1 Central or distributed mapping

The rst andsimplest solution to organize data stored inthesystem isa entrally oordinated

approah. The notion of traker, a server that registers and monitors peers, and furthermore

ditates and organizes data plaements, was rst adapted in P2P le sharing systems [4,29℄,

where the traker has a database about the exhanged les, their loations, and the status of

peers.

Thesimpliityoftraker-baseddesignisovershadowedbyissuesofsalability,robustnessand

seurity: even if the server does not store data replias or fragments [91℄, as a single point of

failure,anyoverloadduetointensivequeriesormaliiousattaksanrenderitdysfuntional. As

suh,nodataislostinasethetrakerfailsontheshortterm,howeversineitoordinatesdata

plaement and repairs, along termfailure signiantly aetstheperformane ofthesystem.

InF2F[90℄andFriendStore[137℄,peersstoretheirdataattheirfriendstoimprovedatadura-

bility. The authors argue that hoosing storage partners based on existing soial relationships

(38)

redues the ost of maintaining redundany. The ost of this approah is dereased exibility,

and bakupapaity.

One step toward a distributed plaement algorithm is presented in[8℄: peers are randomly

assignedtogroupsthatmanagedierentpartitionsoftheoverallstorage. Everypeerthatbelongs

to thesame grouphas a onsistent view of the attributed partition. Thisarhiteture requires

trustedpeers,thereforeit targetsorporate networksinsteadof aommon P2P senario.

The appearane of Distributed Hash Tables (DHTs) [111,116,123,150℄ made a ompletely

distributed systemdesign for P2P storagepossible. DHTs provide onsistent mapping between

keys and values [77℄ in a ompletely deentralized fashion: e.g., in Chord [123℄ eah peer and

storedobjetisidentiedbyakey,andislogiallypositionedonaringomposedbythepossible

keyvalues,theneahpeerisresponsibletostore theobjets,whose keysfallbetween itskeyand

thepreedingkeyof anotherpeer.

Various DHTsfousondierentperformaneguarantees,andproposedesignhoiesaord-

ingly,althoughthebasimappingfuntionalitiesarethesame: everypeerreatesandmaintains

(whenpeersonnetanddisonnet)somelinkstoremotepeersintheDHTinordertoonstitute

a strutured overlay for fast lookups through simple routing protools. Furthermore, the peer,

that isresponsiblefor a storage objet, repliatesit on some remotepeerswith hashkeys lose

to its own,whih intuitively makesrepliation be applied inDHT-basedP2P storage[35,43℄.

InPastihe[32℄,storagepeersareseletedbasedonthesimilarityoftheirbakup: peerssave

storagespae by de-dupliatingommon bakupdata.

3.2.2 Peer monitoring

The repair poliy, inalmost every systemdesign, requiresknowledge about thestatus ofpeers.

In orderto determine whether a peeris online,temporarily or permanently oine,an ative or

passive monitoring system must be in plae. This onnetivity information is used in order to

highlight datapartsthat areindangerand need repairs.

If a traker organizes the data plaement [91℄, it is also able to perform monitoring oper-

ations easily. In an unstrutured distributed P2P system, deentralized monitoring might be

probabilisti andtheresults anbe aggregatedvia agossip-basedprotool[60℄,whihoodsthe

systemwithpeermonitoring information.

Instruturedoverlaynetworks, distributedmonitoringrelations amongpeers mayfollowthe

overlay struture. DHTs oer an intuitive poliy: eah peer monitors a set of its neighbors

whih also store replias of storage data assigned to the peer, therefore issues related to data

plaement, peermonitoringandrepairpoliy,thatreatstotheresultsofmonitoring,aresolved

at one[35,43,67,149℄.

Correlated online behavior of peers, e.g., daily and weekly online patterns of peers in the

(39)

assumption about independent and unorrelated uptimes, but also renders the measurement of

peer availabilities inadistributed mannermore diult.

3.2.3 Fairness

One of the most investigated peuliarities of P2P systems is the need for peer inentives. If

well-suited mehanisms do not enfore peers to ontribute at least as many resoures as they

onsume, thenselsh peers exploit the shared resoures of more altruisti partiipants without

sharing their own. Thisphenomenon is referredto asfree-riding and thehange ofidentities in

orderto esapefrom badlydesigned inentivesis alledwhite-washing [52℄.

Itisobviousthatfree-ridingleadstotheollapseoftheP2Pservieonthelongterm,aeting

P2Pstorageevenmoredrastiallythanothertypesofservies. Ifpeersinsertmoredatainto the

systemthanthe storagespae theyshare,theserviebeomesunsustainable. Enforingstorage

fairness is therefore imperative: either symmetri storage exhanges must be made between

peers in a tit-for-tat fashion [32,91℄, or multilateral symmetry must be established through a

transferable urreny [33,64,71℄. While bilateral barters aresimple to enfore and ontrol, a

urreny-based design imposes less onstraints on data plaement, however, requires a entral

authority.

A strutured approah to data plaement, e.g., based on a DHT, renders peer seletion

impliit beause data is uniformly stored on peers following the onsistent hashing onept.

Therefore,DHT-basedapproahes ahieve loadbalaningbyspreadingdataon every peerwith

uniform probability,irrespetivelyof theirharateristis. Asa onsequene,peerheterogeneity

in terms of the amount of resoures they dediate to the systemhas to be taken into aount

by an additional system layer to eliit their ooperation. This is further ompliated with the

asymmetry in that a peer might store data hunks for a remote peer that is not neessarily

storingits data.

3.2.4 Inentives

Ineet,peerharateristisdiretlydeterminethequalityofservieinthesystem. Forexample,

lowpeeravailabilitywouldrequireextensive useofredundany inorder tomakedataaessible

despite massive peer disonnetions. In general, large dierenes an be seen among peers in

terms of onnetivity, i.e., up-time and ommuniation bandwidth, whih may motivate non-

random data plaement poliies that take into aount this heterogeneity. Manyworks[42,108,

127℄ improve data availability by arefully seleting peers that will store replias or fragments

of data. Somewhat as a ounter-example [62℄ addresses peer seletion strategies under hurn,

andtheauthors present a stohastimodelofa P2Psystemandargue onthepositive eetsof

(40)

If data plaement strategies put more storage on peers with better harateristis, selsh

peers will not strive to showlong uptimes and high onnetivity, beause inreturn they would

reeive more and more data to store, saturating their onnetion links and storage apaity.

Therefore, besides enforing fairness in terms of used and shared storage apaities, i.e., any

peer should oer an amount of loal storage spae equal to the load it imposes on the system,

thefairness shouldbeextended to otherresoures aswell.

A reurrent problemfor P2P storage appliations is reating inentives to enourage nodes

to ontribute more online time and bandwidth resoures [33,83,91℄. This an be ahieved via

reputation systems [76℄ or virtual urreny [139℄. In general, the idea is to impose fairness

spanning everyresouretype. Aommerialexample isWuala, wheretheamountof loalspae

tobesharedforagivenamountofonlinestorageisinverselyproportionaltothepeeravailability

[7℄. However, inthis sheme highlyavailablepeers mayhave to store dataon unavailable peers

oering mostof the umulative available storagespae.

Tokaetal.[95℄omparestwopossiblemehanismstomanageaP2Pstoragesystem: theysug-

gestthateithertheservieuseofeahpeershouldbelimitedtoitsontributionlevel(symmetri

shemes), or that storage spae should be bought from and sold to peers by a system operator

that seeks to maximize prot. Using a non-ooperative game model to take into aount user

selshness, theauthors study these mehanisms with respet to the soial welfare performane

measure, and give neessaryand suient onditions for one sheme to soially outperform the

other.

Whilepeerseletion inmost relatedworksis ditatedbywell-known dataindexing-retrieval

tehniques (e.g., DHT, loality-based), our symmetri data plaement sheme (Chapter 5) is

basedonsharedresoures. Everyuserpiksremotepeersseletively,i.e.,peerswithhighquality

are preferred over peers with low quality. The resulting overlay reation proess resembles to

network reation games[49℄, espeiallyto thepairwiseversionofthem [30℄.

Similarlyto [101℄, our keyobservation isthat anetwork reation game withbilateral agree-

mentsan bestudied withthe toolsofmathing theory [115℄. Weintrodue ageneralized stable

mathing problem that we use as a theoretial basis for peer seletion. The study of peer se-

letion in P2P ontent sharing appliations [57℄ and in P2P storage [118℄ have pinpointed that

user-driven peer seletion brings to system stratiation. We show that this result provides

fairness andembedded inentivesfor users to oertheneessaryamount of resoures.

Anadditionalbenetofuser-driven peerseletion isthateient tehniquesto perform fast

look-up of individual bakup les are unneessary: a restore operation an be ompleted with

(41)

3.3 Reliability of the system

In ontrast to storage servies,our goal is to ensure data durabilitywith a P2P bakup system

design that neither requires user payments nor external inentive mehanisms to provide safe

bakup in exhange for as little shared resoures as possible. Besides the issues and solutions

disussed sofar, the reliabilityof theserviehighly dependson thespeed of datatransfers and

onthe seurityaspets.

3.3.1 Data transfers

Besidestheamountandplaement ofredundantdata,organizingdatatransfershasreeived less

attentionfromtheresearhommunity. Theauthorsof[19℄analyzedrandombakupsheduling

by modeling peer uptime as a Markovian proess. Their study reahed a onlusion that is

analogoustowhatweobtaininChapter6: inbakups,theompletiontimeofrandomsheduling

onverges to thetheoretialminimumasthesystemsize grows.

BitTorrent [29℄ uses xed-size fragments, and adaptive upperlimits on parallel transfers in

order to avoid unnished transfers and theappearane of bottleneks respetively. Inspired by

these designelements of BitTorrent, we applied similartehniquesinour bakupP2P system.

As mentioned in Chapter 2, we suggest to employ a data enter as a remedy for the tem-

porary lak of peer resoures. Little work has been done on hybrid approahes to mitigate the

shortomings of P2P systems. To the best of our knowledge noprior worktaklesdata bakup

and/orrepair operationsassistedbyaentral entity.

AmazingStore [147℄ improves data availability of loud-based storage servies and redues

their osts by augmenting entralized louds withan eient lient-side storagesystem. Peers

bakup at other peers, besides the servers, with dierent online patterns to improve the data

availability and to serve read requests within the P2P network. Therefore the hybrid system

mitigatestheissueoftheentralizedpointoffailure,andprovidesresilienetolarge-salefailures.

FS2You [92℄ is a peer-assisted systemthat provides temporary storage and seeding for les

in a BitTorrent-like ontent distribution system with a hybrid struture onsisting peers and

servers. FS2You does not guarantee data persistene; while its goal is to minimize bandwidth

osts,wefousinsteadonminimizing thestorageoststhatwillbe dominant inthelongrun for

astorage system.

3.3.2 Seurity

Whileorrelationamong peeruptimes isa naturalphenomenon, losingdataonpeersin masses

is somewhat suspiious. The most probable reason for many peers to lose the data stored on

them simultaneously is thatthey allrun thesame software, e.g., operating system,or they had

(42)

be a vulnerability, reatinghigh risk on the reliability of the system, we see them asa seurity

issue.

Manyworks[67,74,75,100,142℄ takled this issue,and proposedto plaedataon peers that

are lesspossiblyorrelated: those whouse dierent softwareongurations, who areonneted

through dierent network servieproviders andarefar from eahother geographially.

Alongwithlosses due to peerfailures,replias or fragments an be destroyed on peers ai-

dentallyandvoluntarily. Inordertodetettheseevents,dataintegrityheksmustbeperformed

periodially. These veriations also ensure that peers are really storing the data assigned to

them,hene enforing fairnessinstorage.

Whileommonheksumssignalaidentalorruptions,detetingmaliiousdatadeletionre-

quiresmoresophistiatedoperations. Twomainapproaheshavebeenproposed: eitherpeersre-

ateself-verifyingdatablokswithsignatures thatareryptographiollision-resistanthashfun-

tionofthebloksthemselves[119,143℄,ortheyperformprobabilistihallengesviaryptographi

protoolstoward storagepeersthatanbeansweredonlybyholdingthedatablok[14,91,106℄.

With therst option,to detet anydata modiation the peer has to download the blokand

reompute thesignature toperform theomparison.

Sine in our bakup system we an assume to have a single data owner authorized to read

andwriteit,dataondentialityandaessontrolanbeahievedwithstandardryptographi

tehniques between the storing ouples. Moreover, for the same reason, onsisteny guarantees

formultiplereadersandwriters,andanonymityfordatapublishersandreadersarenotneeded.

(43)

System design

ThegoalofaP2Pbakupsystemistostore dataforusers safelyon remotepeers. Everysystem

partiipant runs a lient appliation on its devie, e.g., omputer or set-top box, with shared

storageapaity,onnetedtotheInternet. Thesystemdesignmustopewiththeunavailability

ofpeers andthe unpreditable amount ofresoures dediatedto thesystem.

We have to onsider many design aspets when building our system. This thesis presents

innovativeelements regardingthe dataredundanysheme, thepeerseletion withrelateddata

plaement strategies, and the datatransfer sheduling poliy. Inthoseelds where knownteh-

niques provide reasonable solutions, given the purposes and the assumptions of a P2P bakup

system,ourworkreferstoandreusesmehanismspublishedintheliterature,e.g.,erasureoding

andrepairs.

4.1 Data bakup and retrieval

The system design that we present ontains a entral server, alled traker, whih is operated

by the bakupservie provider to supplyvarious systemmanagement operations, suh as peer

registration and monitoring. We note that they an also an be implemented in a distributed

wayusing well-known tehniques (e.g.,DHTs). Suh a taskishowever outsidethe sope of the

thesis. Thereforefor simpliityweassume thatallof them arearriedout bythetraker.

When apeerhasto save newdata, the bakup phase begins. During it, thedata owner:

•

establishesitsdatatobebakedup(storedloallyindenitelyunlessthedevieofthepeer

rashes),dividesitinto xedsizefragments,thenreatesadditionalfragmentsbyenoding

theoriginal fragments byerasureoding(Setion 4.2);

•

^queries^the^traker^forâ^setôf^remote^peers^thatâre^willing^to^storeâ^fragment ôf^the^peer

on their devies, i.e., have suient unalloated storageapaity, and announes itself at

thetrakerasa storageandidate inthe meanwhile(Setion 4.3);

Game theoretic analysis of distributed systems : design and incentives

0.36

0.36

F f

C i f

D f i

E i f

mW

kHz

•

•

•

•

•

•

•

•

r

r

r − 1

k

n > k

n

k

k

n − k

r = n k

r = 3

k = 3, n = 5

r

r = 3

2

k

k

n

•

•

F ^f

C _i ^f

D ^f _i

E _i ^f

_mW

r = ⁿ _k