Object replication strategies in content distribution networks

(1)

Jussi Kangasharju

Institut Eurecom

SophiaAntipolis

France

James Roberts

France Telecom R &D

Issy-les-Moulineaux

France

Keith W. Ross

Institut Eurecom

Sophia Antipolis

France

Abstract

Recentlythe Internethaswitnessed theemergenceof

contentdistribution networks(CDNs). In this paper

westudy theproblem ofoptimallyreplicatingobjects

in CDN servers. In our model, each Internet Au-

tonomousSystem(AS)isanodewithnitestorageca-

pacityforreplicatingobjects. Theoptimization prob-

lemistoreplicateobjectssothatwhenclientsfetchob-

jectsfrom thenearestCDNserverwiththerequested

object, the average numberof ASs traversed ismini-

mized. Weformulatethis problemas acombinatorial

optimization problem. We show that this optimiza-

tionproblemisNPcomplete. Wedevelopfournatural

heuristicsandcomparethemnumericallyusingrealIn-

ternettopologydata. Wendthatthebestresultsare

obtainedwithheuristicsthathavealltheCDNservers

cooperating in making the replication decisions. We

also develop amodel for studying the benetsof co-

operationbetweennodes,which provides insightinto

peer-to-peercontentdistribution.

Keywords: Optimal replication strategies, Con-

tent Distribution Networks, Peer-to-peer networks,

Cooperation

1 Introduction

Recentlythe Internethaswitnessed theemergenceof

contentdistribution networks(CDNs). CDNsaretar-

getedforspeedingupthedeliveryofnormalWebcon-

tent and reduce the load on the origin servers and

the network. CDNs, such as Akamai [1] or Digital

Island[4], distribute contentby placingitoncontent

serverswhich are located near the users. A content

providercansignupfortheserviceandhaveitscon-

tent placed on the content servers. The content is

replicatedeitheron-demand whenusersrequestit,or

it can be replicated beforehand, by pushingthe con-

tentonthecontentservers.

In this paper we study the problem of optimally

replicatingobjectsin CDNservers. Weconsider each

AS as a node in a graph with one CDN server with

nitestoragecapacityforreplicatingobjects. Theop-

timization problemis to replicateobjectsso that the

average number of ASs traversed is minimized when

clientsfetchobjectsfromthenearestCDNservercon-

tainingtherequestedobject. Weformulate thisprob-

lemasacombinatorialoptimizationproblemandshow

thatthisoptimizationproblemisNPcomplete. Wede-

velopfournaturalheuristicsandcomparethemnumer-

icallyusing real Internet topologydata. Our results

showthatthebestresultsareobtainedwithheuristics

thathaveallthe CDNserverscooperatingin making

thereplicationdecisions. Wealsodevelopamodelfor

studyingthebenetsofcooperationbetweennodesin

apeer-to-peernetworkingcontext.

This paper is organized as follows. Section 2

presentsthenetworktopologiesweused. InSection3

we develop our cost model. Section 4 presents the

replicationheuristicswehavedevelopedandSection5

presentsourevaluationmethodsandresults. Section6

presents peer-to-peer content distribution and devel-

ops and evaluates a model for cooperation in peer-

to-peer networks. Section 7 discusses related work.

Finally, Section 8 concludes the paper and presents

directionsforfuturework.

2 Network Model

Our network model is based on the actual Internet

AS topology. To construct thetopology, wewill use

dataprovided byNLANR [12]. This data represents

summaries of Internet routing data collected in the

Route Views Project [15] from 1997 to beginning of

2000. These summaries provide information about

which ASs are connected to each other. From these

summaries we constructed a graph where the nodes

are the ASs and the edges are the inter-AS connec-

tions. Wethencalculatedtheshortestpathsbetween

allthe node pairs, and used this data to form adis-

tancematrixforthenetwork.

Asdiscussedin[5],thismethodisslightlyinaccurate

because wecannot infer that all the connections are

valid. For example, consider a small ISP that buys

(2)

providers, our graph will show an edge between the

small ISP and the provider, but in reality the small

ISPwould not route traÆc from one of its providers

toanother. Thisisnotlikelytobeaproblem,though,

sinceinmostcasesthetwoprovidersinthiscaseeither

haveadirectlinkbetweenthemorareabletoconnect

viaacommonprovider.

A CDN tries to put its content servers asclose to

usersaspossible. Byplacingtheclientsandthecon-

tentservers(storagenodesinournetwork)inthesame

nodes,wecansimulatetheidealredirectionwhere all

clientsarealwaysredirectedtotheclosestserver(the

redirectioninthiscaseisstatic). Bystudyingdierent

replicationstrategies wecan determinewhich strate-

giesaCDNshouldusewhendecidingwhichobjectsto

replicate. BecauseaCDN hascompleteknowledgeof

itsnetwork,wecanusestrategieswhichrequireglobal

informationorcooperation.

2.1 Network Topologies

We downloaded several dierent les from

NLANR [12], each describing the AS topology

on a dierent day. Table 1 shows the dierent

topologies we used, the number of ASs in each one,

the number of leaf ASs, the average length of the

shortestpathbetweenanytwonodes,andtheaverage

distance of the shortest path between any two leaf

nodes.

Fromtopologiesin Table1andothertopologieswe

downloaded(not shown),weseethat theaveragedis-

tancebetweennodestends toincreaseasthenumber

ofnodesinthenetworkincreases. Itisalsoimportant

tonotethatthesetopologiesdonot containallofthe

ASsintheInternet,butonlyasubsetofthem. There-

fore,thefullInternetAStopologywouldlikelyhavea

higheraveragedistancethananyofthetopologieswe

used.

3 Cost Model

We consider a network where the nodes are the au-

tonomous systems (ASs). For simplicity, we assume

that oneAS is the sameas oneISP. We haveI ASs

in the network. AS i, i 2 1;2;:::;I, hasS

i

bytes of

storagecapacityand hasclients that request objects

ataggregaterate

i .

We have J objects. Object j has asize of b

j , j 2

f1;2;:::;Jgandarequestprobabilityp

j

whichisthe

probability that aclientwill request this object. We

assumethat clientrequestpatternsarehomogeneous,

i.e.,thep

j

'sarethesameforallASs. Butthismodel

canbeextendedto include other requestpatterns by

ij

objectj fromAS i.

Wehavethefollowingvariables:

x

ij

= (

1 ifobjectj isstoredatAS i;

0 otherwise:

Thestorageisconstrainedbythespaceavailableat

ASi,that is

J

X

j=1 b

j x

ij S

i

i=1;:::;I

Thegoalistochoosethex

ij

'ssothatagivenperfor-

mancemetricisminimized. In thispaperourgoalis

tominimizetheaveragenumberofinter-AShopsthat

a request must traverse. This reects the download

timeofanobjecttosomedegreeandcanthusbeused

asanindicatoroftheuserperceivedlatency.

Wedenotethematrixofallx

ij

'sbyx. Furthermore,

weassumethat each objectj isinitiallyplacedonan

originserver; we denote by O

j

the AS that contains

thisoriginserver. Weassumethatalloftheobjectsare

always available in their origin servers, regardlessof

theplacementx. Wedenotetheplacementofobjects

to origin serversas x

o

. Note that this origin server

placementdoesnotcountagainstthestoragecapacity

S

Oj .

The average number of hops that a request must

traversefromAS iis

C

i (x)=

J

X

j=1 p

j d

ij

(x) (1)

whered

ij

(x) istheshortest distanceto acopyof ob-

jectj from ASiunder theplacementx. Thisnearest

copy is either in the origin AS O

j

, or in another AS

wheretheobjecthasbeenreplicated. Weassumethat

theclientis alwaysredirectedtothenearestcopy. In

thispaperwedonotconsiderthemechanismsusedto

redirectclients,butinsteadassumethatsuchamech-

anismisinplace.

Let = P

i

bethe totalrequestrateof allASs.

Theaveragenumberofhopsfrom allASs isthen

C(x)= 1

I

X

i=1

i C

i (x)

= 1

I

X

i=1 J

X

j=1

i p

j d

ij (x)

= I

X

i=1 J

X

j=1 s

ij d

ij

(x) (2)

(3)

Date nodes leaf nodes distance distance

97/12/21 3184 1423 3.76 4.34

99/01/11 549 136 3.40 4.18

99/12/08 767 222 3.03 3.38

99/12/11 1477 502 3.45 4.10

Table1: AStopologies

wheres

ij

=

i p

j

=. Theplacementxissubjectto

J

X

j=1 b

j x

ij S

i

i=1;:::;I

Thiscostfunction representsthelongtermaverage

cost. ForalargenumberofobjectsandASs,itisnot

feasibletosolvethisproblemoptimally;in fact,aswe

showinthenextsection,thisproblemisNP-complete.

3.1 Proving NP-Completeness

InordertoprovethatouroptimizationproblemisNP-

complete,werstformulatetheproblemasadecision

problem. Givenatarget numberofhops T,weaskis

thereaplacementxsuchthat

I

X

i=1 J

X

j=1 s

ij d

ij

(x)T

subjectto

J

X

j=1 b

j x

ij S

i

i=1;:::;I

We prove the NP-completeness of this problem by

showingthatitbelongsinNPandthenwereducethe

knapsack problem to a special case of our problem.

ThisprovestheNP-completeness.

The problem is easily seen to be in NP. Given a

placementx and numberof hopsT, we can verify in

polynomialtime whether the placementresults in an

averagecostoflessthanT hops.

Next, we consider the special case where S

1

= S,

S

i

= 0, i = 2;:::;I,

i

= , i = 1;:::;I, p

j

= p,

j = 1;:::;J, i.e., we have only one AS on which to

placeobjects,allASshavethesamerequestrate,and

allobjectsareequallypopular.

Recall that each objectj is alwaysavailable at the

originASO

j

. Thecostofgettingobjectj foraclient

inASiisd

ij (x

o

). Becauseallclientsalwaysgotothe

nearestcopy,placing copieson theonly available AS

canonlydecreasethecostforanyclient,i.e.,d

ij (x)

d

ij (x

o

)foralliandj. Wedenetheutilityofplacing

objectj on AS i as u(j)= P

i [d

ij (x

o ) d

ij

(x)], i.e.,

thedecreasein numberofhopswewouldobtainifwe

placedobjectj in theAS.

GiventargetdecreaseinnumberofhopsT 0

wenow

askifthere isaset ofobjectsJ 0

suchthat

X

j2J 0

b

j

S and X

j2J 0

u(j)T 0

This problem is identical to the well-known NP-

completeknapsackproblem[6]. Giventhatourplace-

ment problem belongs in NP and that the knapsack

problem reduces to it, we know that our placement

problemisNP-complete.

4 Replication Heuristics

Because our optimization problem is NP-complete,

nding the optimal solution is not feasible. There-

forewe havedesigned several heuristics that use the

availableinformationin dierentwaysin orderto get

thebestresults.

Inoursimulationsweusethefollowingheuristics.

1. Random. Assignsobjectstostoragenodesran-

domlysubjecttothestorageconstraints. Wepick

oneobjectwithuniformprobabilityandonenode

withuniformprobability,andwestoretheobject

in that node. Ifthenodealreadystoresthat ob-

ject, wepickanewobjectand anewnode. Asa

result,anobjectcanbeassignedtoseveralnodes,

but anodewillhaveatmaximumonecopyofan

object.

2. Popularity. Eachnodestoresthemostpopular

objectsamongitsclients. Thenodesortstheob-

jectsin decreasingorder ofpopularityandstores

as manyobjectsin thisorder asthestoragecon-

straintallows. Thenodecanestimatethe popu-

larities byobservingtherequestsitreceivesform

its clients. This heuristic does not require the

nodeto get any information from outside of the

node. Notethatin ourcase,theobjectpopulari-

tiesp

j

arethesameacrossallnodes,henceallthe

(4)

subjectto dierentstorageconstraints.

3. Greedy-Single. Each node i calculates C

ij

=

p

j d

ij (x

o

) for each object j. This represents the

contribution of an individual object to (1) un-

der the initial placement. The node then sorts

the objectsin decreasingorder ofC

ij

and stores

as manyobjectsin thisorder asthestoragecon-

straintallows. Thepopularitiesareobtainedasin

thePopularityheuristics,buttheCDNalsoneeds

information aboutthenetwork topologyinorder

to estimatethed

ij

's. Notethat theC

ij

'sarecal-

culatedonlyonceundertheplacementx

o

andare

notadjustedwhenobjectsarestored. Thismeans

that every node stores objects independently of

all the other nodes and no cooperation between

nodesisrequired.

4. Greedy-Global TheCDNrstcalculatesC

ij

=

i p

j d

ij (x

o

) for all nodes i and objects j. Then

theCDNpicksthenode-object-pairwhichhasthe

highest C

ij

and storesthat object in that node.

This results in a new placement x

1

. Then the

CDN re-calculates the costs C

ij

under the new

placementandpickthenode-object-pairthathas

the highest cost. We store that object in that

nodeandobtainanewplacementx

2

. Weiterate

this untilallthestoragenodeshavebeenlled.

Wedonotconsideranyvariantsofthesebaseheuris-

tics, such as using popularity divided by object size,

butevaluateonlytheperformanceofthebaseheuris-

tics. Dierent variants of these heuristics have been

studiedinthecontextofwebcaching(see[8]and ref-

erencestherein) andanysuchimprovementscouldbe

useddirectlytoenhancetheperformanceofourheuris-

tics.

5 Evaluation of Heuristics

Weevaluatedtheperformancesofourheuristicsusing

thetopologiesfromSection2.1. Weraneachheuristic

oneachtopologyusingdierentparametersforobject

popularityandstoragecapacityinthenodes.

We assigned the popularities to the objects from

aZipf-like distribution where we varied the parame-

terfrom 0.6 to 1.0. Valuesbetween 0.6and 0.8 have

beentypicallyobservedinWebproxytraÆc[2]. Much

higher values, up to 1.4, have also been discovered

in the context of popular Web servers [13]. Object

sizes were randomly drawn from a uniform distribu-

tion. Thestorage capacityat eachnodewasxed to

somepercentageofthetotalsize ofthesetofobjects.

ments. Alltheclientnodeshadthesamerequestrate

.

Weplacedallthecontentserversattheleafsofthe

network,i.e.,forallnon-leafASswesetS

i

=0. Even

thoughthisassignmentofstoragecapacityisarticial,

itallowsustostudytheperformancesofourheuristics

better. Byplacingallthestoragecapacityandobjects

at the edges, the average number of hops needed to

obtainan object is higherthan with amore realistic

assignment. This makes it more important to repli-

catetherightobjectsandletsusseemoreclearlythe

dierencesbetweenourheuristics.

The baseline for our experiments was the initial

placementx

o

which weobtainedbyrandomlyassign-

ing objects to storage nodes. We compared the per-

formanceofeachoftheheuristicstothisbaselineand

report the relative performance obtained with each

heuristic. Because the memory requirements for the

experiments grow with the product IJ, we were not

abletorun allexperimentsforallthetopologies.

Figure 1 shows the results from experiments with

1000objects. Weonlyshowresultsfortwotopologies,

butweobservedthattheresultsfortheremainingtwo

were similar to the two shown here. On the x-axis

we plotthe amount of storage at a node aspercent-

ageofthetotal size oftheobjects. Onthe y-axiswe

plottheperformancerelativetothebaseline. Ineach

graph,weplotdierentcurvesfordierentheuristics

and dierent values of the Zipf-parameter (0.6, and

1.0). Note that we only plot the Random heuristic

withZipf-parameter1.0. TheperformanceoftheRan-

dom heuristic was similar for the other Zipf-values.

Thecurvesof theother heuristics for Zipf-valuesbe-

tween0.6and1.0was betweenthecurvesplotted.

FromFigure1wecanseethatGreedy-Global isthe

best performingheuristic. Thesecondbest isGreedy-

Single,followedcloselybyPopularity. Randomheuris-

ticisconsistentlytheworstanddoesnotachievesub-

stantialreductionsin numberof hops, even for large

storagecapacities. Aswementioned,theperformance

of Random didnot change much with dierent Zipf-

parametervalues.

AsFigure1shows,thegainsincreaselogarithmically

withincreased storagecapacity. Themain determin-

ingfactoristheZipf-parametervalue. Thelargerthis

valueis, the smalleris the number of objects gener-

ating alargeamount ofrequests. Thus, it is easyto

signicantly reduce the cost if only a small number

of objects is verypopular. To reduce the numberof

hopsby50%,weneedonlyasmallamountofstorage

forparameter1.0andupto 25%of totaldataset for

parametervalue0.6.

In Figure 2 we plot the results from experiments

(5)

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cache size, % of data set

Relative performance

Random − 1.0 Pop. − 0.6 G−S − 0.6 G−G − 0.6 Pop. − 1.0 G−S − 1.0 G−G − 1.0

(a)99/01/11

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cache size, % of data set

Relative performance

Random − 1.0 Pop. − 0.6 G−S − 0.6 G−G − 0.6 Pop. − 1.0 G−S − 1.0 G−G − 1.0

(b)99/12/08

Figure1: Experimentswith1,000objects

with 10,000objects. Due to memory limitations, we

werenotableto runGreedy-Global forallthetopolo-

gies, thereforeitisnotshownontheplots.

The resultsare verysimilarto theresultsfrom the

previous experiment. Random is still the worst in

termsofperformance,butthedierencebetweenPop-

ularity and Greedy-Single has decreased. This is be-

cause as thenumber ofobjectsgrows, theindividual

object popularities will get smaller. Therefore, the

contribution of an individual object's retrieval cost

to theglobal cost(2) will get proportionally smaller.

Hence,thepopularityoftheobjectbecomesmoreim-

portantindeterminingthecost. Theproductofpop-

ularity and distance used by Greedy-Single is still a

slightlybetterindicator,butthedierenceisminimal.

Fromourresultswecanconcludethatthebestper-

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cache size, % of data set

Relative performance

Random − 1.0 Pop. − 0.6 G−S − 0.6 Pop. − 1.0 G−S − 1.0

(a)99/12/08

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cache size, % of data set

Relative performance

Random − 1.0 Pop. − 0.6 G−S − 0.6 Pop. − 1.0 G−S − 1.0

(b)99/12/11

Figure2: Experimentswith10,000objects

formance is obtained when the object replication is

coordinatedbyasinglesource,inourcase,thiswould

be the CDN. The dierence in performance between

Greedy-Global andtheothertwoheuristicsisquitesig-

nicant,especiallyforsmallZipf-parametervalues. In

Figure1wecansee thatthelargestimprovementsin

performanceare upto24%. This resultshowsthat a

CDNshoulduseacoordinatedreplicationstrategyand

notlettheCDN serversactontheirown.

6 Peer-to-Peer Content Distri-

bution

Wenowintroduceanewformofcontentdistribution,

namelypeer-to-peercontentdistribution. Peer-to-peer

(6)

tent distribution and are mainly used to share indi-

vidual les between users. In peer-to-peer networks,

such asNapster [11],orGnutella[7],individual users

decideto share leswith others. With the help of a

directoryservice,userscan determinewhere dierent

lescanbedownloadedfrom. While this newmodel

does not directly compete with the traditional Web

browsingmodel,itisestablishingitselfasameansfor

distributing larger les, such as applications or mu-

sic,betweenusers. Becausepeer-to-peernetworks are

madeupof individualusers,wecannotusestrategies

which requireglobal information and coordinationof

nodes as with CDNs. Instead we must restrict our-

selvestostrategiesthat needonlylocallyavailablein-

formation;oneexampleofsuch strategyis thePopu-

larity heuristic.

As before, weassume that oneAS in ournetwork

correspondsto one ISP. We canviewthe storageca-

pacityinanASastheaggregationofthepeer-to-peer

storageoered by the users in that AS. In asimilar

vein,x

ij

=1meansthatatleastoneuserinASihasa

copyofobjectj. Weassumethatthecostofretrieving

objectsfromother usersin thesameAS isnegligible;

thisisconsistentwithourdenitionofretrievalcostin

Section3. Inthissectionwewillinvestigatethebene-

tsofcooperationbetweentheusersin dierentASs.

Our goalis to see if the users in apeer-to-peer net-

workcouldgainanythingfromcooperatingwithother

nearbyusers.

We will now develop a cooperation model for

peer-to-peer content distribution under the popular-

ity heuristic. Forsimplicityand ease of notation,we

shall assumethroughout this section that all objects

havethesamesize,i.e.,b

j

=b forj=1;:::;J.

Consider two ASs, A and B. Denote the shortest

path between them by D

AB

. Let K be the number

ofobjectsthat each AScanstore. Wedonotassume

anythingabouttherelationshipbetweenthetwoASs,

except that the distance between them is D

AB . In

particular, we do not assume that one is the access

provideroftheotherone. Weassumethat

A

=

B .

If both ASs act independently, they would both

cache the K most popular objects and the average

numberofhopsforrequestsfromAandB wouldbe

A

A +

B J

X

j=K+1 p

j d

Aj +

A

A +

B J

X

j=K+1 p

j d

Bj

= 1

2 J

X

j=K+1 p

j d

Aj +

1

2 J

X

j=K+1 p

j d

Bj

(3)

We now consider the case where the two ASs co-

operate and do not necessarily both store the same

of theL most popular objects(L K), and that in

additionAstoresobjects(L+1);:::;K,andB stores

objects (K+1);:::;(2K L). Because the request

ratesareidentical,itdoesnotmatterhowtheobjects

(L+1);:::;(2K L) are shared between A and B.

Notethatbecauseofthesamereason,weonlyneedto

considerreplicatingobjectsinAandB andnotinthe

intermediatenodes.

TheaveragenumberofhopsforrequestsfromAand

B underthisschemeis

1

2 2K L

X

j=K+1 p

j D

AB +

1

2 J

X

j=2K L+1 p

j d

Aj

+ 1

2 K

X

j=L+1 p

j D

AB +

1

2 J

X

j=2K L+1 p

j d

Bj

= 1

2 2K L

X

j=L+1 p

j D

AB +

1

2 J

X

j=2K L+1 (p

j d

Aj +p

j d

Bj ) (4)

Thedierencebetween(3)and(4)is

1

2 2K L

X

j=K+1 (p

j d

Aj +p

j d

Bj )

1

2 2K L

X

j=L+1 p

j D

AB (5)

If (5) is greater than zero, then it is better for A

andB to cooperate. Byassuming d

Aj

=d

Bj

=d

avg ,

wecancalculate the valueof d

avg

where cooperation

becomesthepreferredstrategy. Theequationbecomes

1

2 2K L

X

j=K+1 (p

j d

avg +p

j d

avg )

1

2 2K L

X

j=L+1 p

j D

AB

>0 (6)

Solvingford

avg

,weget

d

avg

>

D

AB

2 P

2K L

j=L+1 p

j

P

2K L

j=K+1 p

j

(7)

If(7)holds,thenitisbetterforAandBtocooperate.

Becaused

avg

islikelytobereasonablystableoverlong

intervalsand becausep

j

'sare known tobothparties,

A and B can use (7) as a quick test to see whether

theycouldgainanythingbycooperating. Forthetest,

AandB eitherneedtospecifythevalueofLorverify

equation(7)overseveralvaluesofL.

In the following, we will consider two values for

D

AB

, namely 1and 2. If D

AB

is equal to 1,then A

andB areneighbors. Weplotted (5)forseveralZipf-

parametervalues,averagedistancesd

avg

,andvaluesof

K,andinallcases,cooperationisalmostalwayssupe-

riortoanon-cooperatingstrategy. Insomeinstances,

however,thedierencebetweenthetwoissmall.

(7)

5 10 15 20 25 30 35 40 45 50

−0.25

−0.2

−0.15

−0.1

−0.05 0 0.05 0.1 0.15 0.2 0.25

Number of shared objects, L

Difference in hops

d avg = 3 d avg = 4 d avg = 5 d avg = 6

(a)Zipf0.6

5 10 15 20 25 30 35 40 45 50

−0.25

−0.2

−0.15

−0.1

−0.05 0 0.05 0.1 0.15 0.2 0.25

Number of shared objects, L

Difference in hops

d avg = 3 d avg = 4 d avg = 5 d avg = 6

(b)Zipf0.8

Figure 3: Gain from cooperation for K = 50, 1000

objects

In Figure 3we show typical plots of (5). In these

plots,wehave1,000objectsavailableandbothnodes

havecapacity to store 50 objects, or 5% of the data

set. Weshowtwoplotsfortwodierentvaluesofthe

Zipf-distributionparameter. Onthex-axisweplotthe

valueofL,i.e.,thenumberofobjectsstoredatbothA

andB,andonthey-axisweshowthevalueof(5),that

is, the dierence between individual and cooperative

strategies in hops. If the dierence is positive, then

cooperationisbetter.

As wecanseefrom theplotsin Figure3,there are

always somevalues of L for which cooperation gives

better results, but in somecases thegains are small.

Wecanseethatasd

avg

getssmaller,thegainsbecome

smaller. Thisis nosurprise sinceasmall d

avg means

that the requested objectsare typically already very

A C

B

Figure4: 3-waycooperation

closeand cooperationwould nothelp much. Wealso

seethat asd

avg

increases,thepotentialgainsincrease

signicantly.

Eventhoughagainof0.2hopsmaynotseemmuch,

it is important to note that the dierence between

Popularity heuristic and Greedy-Single was typically

less than0.05hops. Thedierence betweenPopular-

ity andGreedy-Global was0.1{0.15hops. Therefore,

two cooperating nodes using the Popularity heuris-

ticwouldobtainsignicantlybetterperformancethan

theycouldhopetoobtainbyactingindependently. We

canconcludethat cooperation ismuch more eÆcient

atreducing the cost thanchangingaheuristicfrom a

popularitybasedheuristicintoagreedyone.

If D

AB

is equal to 2, then A and B are separated

byone AS.Oneexampleof thiscaseisshown inFig-

ure4where wehavethree nodes,oneparentandtwo

children. In thiscase, thecooperation would happen

betweenthe children. Figure 5 shows thegraphs for

thiscase.

ComparingFigures3and5wecanseethatthegains

get smaller as D

AB

increases. When D

AB

increases

evenfurther,cooperationwillnolongerbebenecialto

AandB. Thisistobeexpected,sinceinthenetwork

topologiesweused, theaveragedistance betweenleaf

nodeswasaround4andthereforethedistancebetween

A and B would be roughlythe sameas thedistance

totheoriginserver.

We also investigated cases where there are a very

largenumberofobjectsinthenetwork. Inthesecases,

the general form of the gains matches those in Fig-

ures 3 and 5, but the actual gain is slightly lower.

Thisisbecausethestoragenodescanonlyholdavery

smallfraction of theobjects,and even twonodesco-

operating cannot hold enough objects to reduce the

average number of hops signicantly. However, even

in these cases, cooperation yielded a smaller average

number of hops than not cooperating. For example,

with1millionobjectsandKequalto2000,themaxi-

mumgainwasaround0.2hopsasopposedto0.27with

1000objectsandK=50(ford

avg

=6).

(8)

5 10 15 20 25 30 35 40 45 50

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1 0 0.1

Number of shared objects, L

Difference in hops

d avg = 3 d _avg = 4 d avg = 5 d _avg = 6

(a)Zipf0.6

5 10 15 20 25 30 35 40 45 50

−0.9

−0.8

−0.7

−0.6

−0.5

−0.4

−0.3

−0.2

−0.1 0 0.1

Number of shared objects, L

Difference in hops

d avg = 3 d _avg = 4 d avg = 5 d _avg = 6

(b)Zipf0.8

Figure 5: Gain from cooperationfor D

AB

= 2,K =

50, 1000objects

7 Related Work

Related work on replicating content has mostly con-

centratedontheproblemofplacingthereplicaservers

foroneoriginserver. Inthispaperweconsideramore

global casewhere wehavecontentfrom severalorigin

serversandwedecidewhichobjectstoreplicateonthe

replicaservers.

Inboth[10]and[3],theauthorsconsider theprob-

lemof placingreplicas foroneoriginserveron atree

topology. While the real Internet topology is not a

tree, this simplerapproach allowsthe authors to de-

velopoptimalalgorithms. Thetree-approachmaynot

generalize, because it would require that the trees

fordierent originserversoverlapat thereplica sites

tionofthereplicasites. Also,in [10],thealgorithmis

ofahighcomputationalcomplexityO(N 3

M 2

).

In[14]theauthorspresenttheiralgorithmsforplac-

ing server replicas in a CDN. They assume that the

replicas are complete replicas and they do notstudy

replicating individual objects; in our work we make

replicationdecisionsonaper-objectgranularity. They

formulatetheproblem as theNP-completeK-median

problem,developheuristicsandevaluatetheirperfor-

mance.Theyonlyconsiderplacingreplicasforasingle

originserver;ourheuristics replicateobjectsfrom all

theoriginservers. Alloftheirheuristicsrequireinfor-

mation about the network topology as well as client

requestloads. Also,intheirwork,allthereplicas act

independentlyandtheydonotstudyanycooperating

schemes.

In [9] the authors consider the placement of inter-

ceptingproxiesinsidethenetworktoreducethedown-

loadtime. Theypresent optimalsolutionsfor simple

topologies,suchaslineandring,andconsiderthecase

ofplacingproxiesforasingleserverinatreetopology.

Relyingon intercepting proxiesrequires that routing

isstable duringthelifetimeoftheconnection.

8 Conclusion

In this paper we have studied the problem of opti-

mally replicating objects in CDN servers. We treat

eachAS asanodewithnite capacityforstoringob-

jects. Our optimization problem is to replicate ob-

jectssothatwhenclientsfetch objectsfromthenear-

estCDNserver,theaveragenumberof ASstraversed

is minimized. We haveformulated this problem asa

combinatorial optimization problem and have shown

itto beNPcomplete.

Wehavedevelopedfournaturalheuristicsandcom-

pared them numerically using real Internet topol-

ogy data. Our results show that the best perform-

ingheuristicis Greedy-Global which has alltheCDN

serverscooperating. Thedierenceinperformancebe-

tweenGreedy-Global andthesimplerheuristicswasup

to24%.

Wehavealsostudied peer-to-peercontentdistribu-

tionanddevelopedamodelforstudyingthebenetsof

cooperationbetweennodes. Ourevaluationoftheco-

operationmodelshowsthatnodesusingsimpleheuris-

ticsandintelligentcooperationcangetsignicantper-

formancegains.

The eld of content distribution and peer-to-peer

networks hassignicantpotentialforfuture research.

Ourfutureworkwilllookmorecloselyintointer-node

cooperationandinvestigatetheoptimizationproblem

(9)

achievable performance. We also plan to extend our

costmodeltoincludeotherimportantfactors,suchas

networktraÆc orserverload.

Acknowledgements

Theroutingdatasummaries usedin constructingthe

network topologies were provided by National Sci-

ence Foundation Cooperative Agreement No. ANI-

9807479, and the National Laboratory for Applied

NetworkResearch. Wewould alsoliketo thankAnat

Bremler-BarrofTelAvivUniversityforthediscussions

aboutthetopologymodels.

References

[1] Akamai. <http://www.akamai.com>.

[2] L. Breslau, P. Cao, L. Fan, G. Phillips, and

S. Shenker. Web caching and Zipf-like distribu-

tions: Evidenceand implications. InProceedings

of IEEE Infocom, New York, NY, March 21{25,

1999.

[3] I. Cidon, S. Kutten, and R. Soer. Optimal

allocation of electronic content. In Proceedings

of IEEE Infocom, Anchorage, AK, April 22{26,

2001.

[4] DigitalIslandInc. <http://www.digisle.net>.

[5] L. Gao. On inferring autonomous system re-

lationships in the internet. In Proceedings of

IEEEGlobalInternet,SanFrancisco,CA,Novem-

ber28{30,2000.

[6] M. R. Garey and D. S. Johnson. Computers

andIntractability: AGuidetotheTheory ofNP-

Completeness. Freeman,1979.

[7] Gnutellawebsite.

<http://www.gnutella.co.uk/>.

[8] S. Jin and A. Bestavros. GreedyDual* web

cachingalgorithm: Exploitingthetwosourcesof

temporallocalityinwebrequeststreams. InPro-

ceedingsof 5thWebCachingandContentDistri-

bution Workshop, Lisbon, Portugal, May 22{24,

2000.

[9] P. Krishnan,D. Raz, and Y.Shavitt. Thecache

location problem. IEEE/ACM Transactions on

Networking,8(5):568{582,October2000.

On the optimalplacementof web proxiesin the

internet. In Proceedings of IEEE Infocom, New

York,NY,March21{25,1999.

[11] Napster. <http://www.napster.com>.

[12] NLANR measurement and operations analysis

team. <http://moat.nlanr.net>.

[13] V.N.PadmanabhanandL.Qiu.Thecontentand

accessdynamicsofabusywebsite: Findingsand

implications. InProceedingsofACMSIGCOMM,

Stockholm, Sweden, August 28 { September 1,

2000.

[14] L. Qiu,V.N.Padmanabhan,andG.M.Voelker.

On the placement of web server replicas. In

Proceedings of IEEE Infocom, Anchorage, AK,

April22{26,2001.

[15] Routeviews projecthomepage.

<http://www.antc.uoregon.edu/rout e-vi ews/>.