Jussi Kangasharju
Institut Eurecom
SophiaAntipolis
France
James Roberts
France Telecom R &D
Issy-les-Moulineaux
France
Keith W. Ross
Institut Eurecom
Sophia Antipolis
France
Abstract
Recentlythe Internethaswitnessed theemergenceof
contentdistribution networks(CDNs). In this paper
westudy theproblem ofoptimallyreplicatingobjects
in CDN servers. In our model, each Internet Au-
tonomousSystem(AS)isanodewithnitestorageca-
pacityforreplicatingobjects. Theoptimization prob-
lemistoreplicateobjectssothatwhenclientsfetchob-
jectsfrom thenearestCDNserverwiththerequested
object, the average numberof ASs traversed ismini-
mized. Weformulatethis problemas acombinatorial
optimization problem. We show that this optimiza-
tionproblemisNPcomplete. Wedevelopfournatural
heuristicsandcomparethemnumericallyusingrealIn-
ternettopologydata. Wendthatthebestresultsare
obtainedwithheuristicsthathavealltheCDNservers
cooperating in making the replication decisions. We
also develop amodel for studying the benetsof co-
operationbetweennodes,which provides insightinto
peer-to-peercontentdistribution.
Keywords: Optimal replication strategies, Con-
tent Distribution Networks, Peer-to-peer networks,
Cooperation
1 Introduction
Recentlythe Internethaswitnessed theemergenceof
contentdistribution networks(CDNs). CDNsaretar-
getedforspeedingupthedeliveryofnormalWebcon-
tent and reduce the load on the origin servers and
the network. CDNs, such as Akamai [1] or Digital
Island[4], distribute contentby placingitoncontent
serverswhich are located near the users. A content
providercansignupfortheserviceandhaveitscon-
tent placed on the content servers. The content is
replicatedeitheron-demand whenusersrequestit,or
it can be replicated beforehand, by pushingthe con-
tentonthecontentservers.
In this paper we study the problem of optimally
replicatingobjectsin CDNservers. Weconsider each
AS as a node in a graph with one CDN server with
nitestoragecapacityforreplicatingobjects. Theop-
timization problemis to replicateobjectsso that the
average number of ASs traversed is minimized when
clientsfetchobjectsfromthenearestCDNservercon-
tainingtherequestedobject. Weformulate thisprob-
lemasacombinatorialoptimizationproblemandshow
thatthisoptimizationproblemisNPcomplete. Wede-
velopfournaturalheuristicsandcomparethemnumer-
icallyusing real Internet topologydata. Our results
showthatthebestresultsareobtainedwithheuristics
thathaveallthe CDNserverscooperatingin making
thereplicationdecisions. Wealsodevelopamodelfor
studyingthebenetsofcooperationbetweennodesin
apeer-to-peernetworkingcontext.
This paper is organized as follows. Section 2
presentsthenetworktopologiesweused. InSection3
we develop our cost model. Section 4 presents the
replicationheuristicswehavedevelopedandSection5
presentsourevaluationmethodsandresults. Section6
presents peer-to-peer content distribution and devel-
ops and evaluates a model for cooperation in peer-
to-peer networks. Section 7 discusses related work.
Finally, Section 8 concludes the paper and presents
directionsforfuturework.
2 Network Model
Our network model is based on the actual Internet
AS topology. To construct thetopology, wewill use
dataprovided byNLANR [12]. This data represents
summaries of Internet routing data collected in the
Route Views Project [15] from 1997 to beginning of
2000. These summaries provide information about
which ASs are connected to each other. From these
summaries we constructed a graph where the nodes
are the ASs and the edges are the inter-AS connec-
tions. Wethencalculatedtheshortestpathsbetween
allthe node pairs, and used this data to form adis-
tancematrixforthenetwork.
Asdiscussedin[5],thismethodisslightlyinaccurate
because wecannot infer that all the connections are
valid. For example, consider a small ISP that buys
providers, our graph will show an edge between the
small ISP and the provider, but in reality the small
ISPwould not route traÆc from one of its providers
toanother. Thisisnotlikelytobeaproblem,though,
sinceinmostcasesthetwoprovidersinthiscaseeither
haveadirectlinkbetweenthemorareabletoconnect
viaacommonprovider.
A CDN tries to put its content servers asclose to
usersaspossible. Byplacingtheclientsandthecon-
tentservers(storagenodesinournetwork)inthesame
nodes,wecansimulatetheidealredirectionwhere all
clientsarealwaysredirectedtotheclosestserver(the
redirectioninthiscaseisstatic). Bystudyingdierent
replicationstrategies wecan determinewhich strate-
giesaCDNshouldusewhendecidingwhichobjectsto
replicate. BecauseaCDN hascompleteknowledgeof
itsnetwork,wecanusestrategieswhichrequireglobal
informationorcooperation.
2.1 Network Topologies
We downloaded several dierent les from
NLANR [12], each describing the AS topology
on a dierent day. Table 1 shows the dierent
topologies we used, the number of ASs in each one,
the number of leaf ASs, the average length of the
shortestpathbetweenanytwonodes,andtheaverage
distance of the shortest path between any two leaf
nodes.
Fromtopologiesin Table1andothertopologieswe
downloaded(not shown),weseethat theaveragedis-
tancebetweennodestends toincreaseasthenumber
ofnodesinthenetworkincreases. Itisalsoimportant
tonotethatthesetopologiesdonot containallofthe
ASsintheInternet,butonlyasubsetofthem. There-
fore,thefullInternetAStopologywouldlikelyhavea
higheraveragedistancethananyofthetopologieswe
used.
3 Cost Model
We consider a network where the nodes are the au-
tonomous systems (ASs). For simplicity, we assume
that oneAS is the sameas oneISP. We haveI ASs
in the network. AS i, i 2 1;2;:::;I, hasS
i
bytes of
storagecapacityand hasclients that request objects
ataggregaterate
i .
We have J objects. Object j has asize of b
j , j 2
f1;2;:::;Jgandarequestprobabilityp
j
whichisthe
probability that aclientwill request this object. We
assumethat clientrequestpatternsarehomogeneous,
i.e.,thep
j
'sarethesameforallASs. Butthismodel
canbeextendedto include other requestpatterns by
ij
objectj fromAS i.
Wehavethefollowingvariables:
x
ij
= (
1 ifobjectj isstoredatAS i;
0 otherwise:
Thestorageisconstrainedbythespaceavailableat
ASi,that is
J
X
j=1 b
j x
ij S
i
i=1;:::;I
Thegoalistochoosethex
ij
'ssothatagivenperfor-
mancemetricisminimized. In thispaperourgoalis
tominimizetheaveragenumberofinter-AShopsthat
a request must traverse. This reects the download
timeofanobjecttosomedegreeandcanthusbeused
asanindicatoroftheuserperceivedlatency.
Wedenotethematrixofallx
ij
'sbyx. Furthermore,
weassumethat each objectj isinitiallyplacedonan
originserver; we denote by O
j
the AS that contains
thisoriginserver. Weassumethatalloftheobjectsare
always available in their origin servers, regardlessof
theplacementx. Wedenotetheplacementofobjects
to origin serversas x
o
. Note that this origin server
placementdoesnotcountagainstthestoragecapacity
S
Oj .
The average number of hops that a request must
traversefromAS iis
C
i (x)=
J
X
j=1 p
j d
ij
(x) (1)
whered
ij
(x) istheshortest distanceto acopyof ob-
jectj from ASiunder theplacementx. Thisnearest
copy is either in the origin AS O
j
, or in another AS
wheretheobjecthasbeenreplicated. Weassumethat
theclientis alwaysredirectedtothenearestcopy. In
thispaperwedonotconsiderthemechanismsusedto
redirectclients,butinsteadassumethatsuchamech-
anismisinplace.
Let = P
i
i
bethe totalrequestrateof allASs.
Theaveragenumberofhopsfrom allASs isthen
C(x)= 1
I
X
i=1
i C
i (x)
= 1
I
X
i=1 J
X
j=1
i p
j d
ij (x)
= I
X
i=1 J
X
j=1 s
ij d
ij
(x) (2)
Date nodes leaf nodes distance distance
97/12/21 3184 1423 3.76 4.34
99/01/11 549 136 3.40 4.18
99/12/08 767 222 3.03 3.38
99/12/11 1477 502 3.45 4.10
Table1: AStopologies
wheres
ij
=
i p
j
=. Theplacementxissubjectto
J
X
j=1 b
j x
ij S
i
i=1;:::;I
Thiscostfunction representsthelongtermaverage
cost. ForalargenumberofobjectsandASs,itisnot
feasibletosolvethisproblemoptimally;in fact,aswe
showinthenextsection,thisproblemisNP-complete.
3.1 Proving NP-Completeness
InordertoprovethatouroptimizationproblemisNP-
complete,werstformulatetheproblemasadecision
problem. Givenatarget numberofhops T,weaskis
thereaplacementxsuchthat
I
X
i=1 J
X
j=1 s
ij d
ij
(x)T
subjectto
J
X
j=1 b
j x
ij S
i
i=1;:::;I
We prove the NP-completeness of this problem by
showingthatitbelongsinNPandthenwereducethe
knapsack problem to a special case of our problem.
ThisprovestheNP-completeness.
The problem is easily seen to be in NP. Given a
placementx and numberof hopsT, we can verify in
polynomialtime whether the placementresults in an
averagecostoflessthanT hops.
Next, we consider the special case where S
1
= S,
S
i
= 0, i = 2;:::;I,
i
= , i = 1;:::;I, p
j
= p,
j = 1;:::;J, i.e., we have only one AS on which to
placeobjects,allASshavethesamerequestrate,and
allobjectsareequallypopular.
Recall that each objectj is alwaysavailable at the
originASO
j
. Thecostofgettingobjectj foraclient
inASiisd
ij (x
o
). Becauseallclientsalwaysgotothe
nearestcopy,placing copieson theonly available AS
canonlydecreasethecostforanyclient,i.e.,d
ij (x)
d
ij (x
o
)foralliandj. Wedenetheutilityofplacing
objectj on AS i as u(j)= P
i [d
ij (x
o ) d
ij
(x)], i.e.,
thedecreasein numberofhopswewouldobtainifwe
placedobjectj in theAS.
GiventargetdecreaseinnumberofhopsT 0
wenow
askifthere isaset ofobjectsJ 0
suchthat
X
j2J 0
b
j
S and X
j2J 0
u(j)T 0
This problem is identical to the well-known NP-
completeknapsackproblem[6]. Giventhatourplace-
ment problem belongs in NP and that the knapsack
problem reduces to it, we know that our placement
problemisNP-complete.
4 Replication Heuristics
Because our optimization problem is NP-complete,
nding the optimal solution is not feasible. There-
forewe havedesigned several heuristics that use the
availableinformationin dierentwaysin orderto get
thebestresults.
Inoursimulationsweusethefollowingheuristics.
1. Random. Assignsobjectstostoragenodesran-
domlysubjecttothestorageconstraints. Wepick
oneobjectwithuniformprobabilityandonenode
withuniformprobability,andwestoretheobject
in that node. Ifthenodealreadystoresthat ob-
ject, wepickanewobjectand anewnode. Asa
result,anobjectcanbeassignedtoseveralnodes,
but anodewillhaveatmaximumonecopyofan
object.
2. Popularity. Eachnodestoresthemostpopular
objectsamongitsclients. Thenodesortstheob-
jectsin decreasingorder ofpopularityandstores
as manyobjectsin thisorder asthestoragecon-
straintallows. Thenodecanestimatethe popu-
larities byobservingtherequestsitreceivesform
its clients. This heuristic does not require the
nodeto get any information from outside of the
node. Notethatin ourcase,theobjectpopulari-
tiesp
j
arethesameacrossallnodes,henceallthe
subjectto dierentstorageconstraints.
3. Greedy-Single. Each node i calculates C
ij
=
p
j d
ij (x
o
) for each object j. This represents the
contribution of an individual object to (1) un-
der the initial placement. The node then sorts
the objectsin decreasingorder ofC
ij
and stores
as manyobjectsin thisorder asthestoragecon-
straintallows. Thepopularitiesareobtainedasin
thePopularityheuristics,buttheCDNalsoneeds
information aboutthenetwork topologyinorder
to estimatethed
ij
's. Notethat theC
ij
'sarecal-
culatedonlyonceundertheplacementx
o
andare
notadjustedwhenobjectsarestored. Thismeans
that every node stores objects independently of
all the other nodes and no cooperation between
nodesisrequired.
4. Greedy-Global TheCDNrstcalculatesC
ij
=
i p
j d
ij (x
o
) for all nodes i and objects j. Then
theCDNpicksthenode-object-pairwhichhasthe
highest C
ij
and storesthat object in that node.
This results in a new placement x
1
. Then the
CDN re-calculates the costs C
ij
under the new
placementandpickthenode-object-pairthathas
the highest cost. We store that object in that
nodeandobtainanewplacementx
2
. Weiterate
this untilallthestoragenodeshavebeenlled.
Wedonotconsideranyvariantsofthesebaseheuris-
tics, such as using popularity divided by object size,
butevaluateonlytheperformanceofthebaseheuris-
tics. Dierent variants of these heuristics have been
studiedinthecontextofwebcaching(see[8]and ref-
erencestherein) andanysuchimprovementscouldbe
useddirectlytoenhancetheperformanceofourheuris-
tics.
5 Evaluation of Heuristics
Weevaluatedtheperformancesofourheuristicsusing
thetopologiesfromSection2.1. Weraneachheuristic
oneachtopologyusingdierentparametersforobject
popularityandstoragecapacityinthenodes.
We assigned the popularities to the objects from
aZipf-like distribution where we varied the parame-
terfrom 0.6 to 1.0. Valuesbetween 0.6and 0.8 have
beentypicallyobservedinWebproxytraÆc[2]. Much
higher values, up to 1.4, have also been discovered
in the context of popular Web servers [13]. Object
sizes were randomly drawn from a uniform distribu-
tion. Thestorage capacityat eachnodewasxed to
somepercentageofthetotalsize ofthesetofobjects.
ments. Alltheclientnodeshadthesamerequestrate
.
Weplacedallthecontentserversattheleafsofthe
network,i.e.,forallnon-leafASswesetS
i
=0. Even
thoughthisassignmentofstoragecapacityisarticial,
itallowsustostudytheperformancesofourheuristics
better. Byplacingallthestoragecapacityandobjects
at the edges, the average number of hops needed to
obtainan object is higherthan with amore realistic
assignment. This makes it more important to repli-
catetherightobjectsandletsusseemoreclearlythe
dierencesbetweenourheuristics.
The baseline for our experiments was the initial
placementx
o
which weobtainedbyrandomlyassign-
ing objects to storage nodes. We compared the per-
formanceofeachoftheheuristicstothisbaselineand
report the relative performance obtained with each
heuristic. Because the memory requirements for the
experiments grow with the product IJ, we were not
abletorun allexperimentsforallthetopologies.
Figure 1 shows the results from experiments with
1000objects. Weonlyshowresultsfortwotopologies,
butweobservedthattheresultsfortheremainingtwo
were similar to the two shown here. On the x-axis
we plotthe amount of storage at a node aspercent-
ageofthetotal size oftheobjects. Onthe y-axiswe
plottheperformancerelativetothebaseline. Ineach
graph,weplotdierentcurvesfordierentheuristics
and dierent values of the Zipf-parameter (0.6, and
1.0). Note that we only plot the Random heuristic
withZipf-parameter1.0. TheperformanceoftheRan-
dom heuristic was similar for the other Zipf-values.
Thecurvesof theother heuristics for Zipf-valuesbe-
tween0.6and1.0was betweenthecurvesplotted.
FromFigure1wecanseethatGreedy-Global isthe
best performingheuristic. Thesecondbest isGreedy-
Single,followedcloselybyPopularity. Randomheuris-
ticisconsistentlytheworstanddoesnotachievesub-
stantialreductionsin numberof hops, even for large
storagecapacities. Aswementioned,theperformance
of Random didnot change much with dierent Zipf-
parametervalues.
AsFigure1shows,thegainsincreaselogarithmically
withincreased storagecapacity. Themain determin-
ingfactoristheZipf-parametervalue. Thelargerthis
valueis, the smalleris the number of objects gener-
ating alargeamount ofrequests. Thus, it is easyto
signicantly reduce the cost if only a small number
of objects is verypopular. To reduce the numberof
hopsby50%,weneedonlyasmallamountofstorage
forparameter1.0andupto 25%of totaldataset for
parametervalue0.6.
In Figure 2 we plot the results from experiments
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Cache size, % of data set
Relative performance
Random − 1.0 Pop. − 0.6 G−S − 0.6 G−G − 0.6 Pop. − 1.0 G−S − 1.0 G−G − 1.0
(a)99/01/11
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Cache size, % of data set
Relative performance
Random − 1.0 Pop. − 0.6 G−S − 0.6 G−G − 0.6 Pop. − 1.0 G−S − 1.0 G−G − 1.0
(b)99/12/08
Figure1: Experimentswith1,000objects
with 10,000objects. Due to memory limitations, we
werenotableto runGreedy-Global forallthetopolo-
gies, thereforeitisnotshownontheplots.
The resultsare verysimilarto theresultsfrom the
previous experiment. Random is still the worst in
termsofperformance,butthedierencebetweenPop-
ularity and Greedy-Single has decreased. This is be-
cause as thenumber ofobjectsgrows, theindividual
object popularities will get smaller. Therefore, the
contribution of an individual object's retrieval cost
to theglobal cost(2) will get proportionally smaller.
Hence,thepopularityoftheobjectbecomesmoreim-
portantindeterminingthecost. Theproductofpop-
ularity and distance used by Greedy-Single is still a
slightlybetterindicator,butthedierenceisminimal.
Fromourresultswecanconcludethatthebestper-
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Cache size, % of data set
Relative performance
Random − 1.0 Pop. − 0.6 G−S − 0.6 Pop. − 1.0 G−S − 1.0
(a)99/12/08
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Cache size, % of data set
Relative performance
Random − 1.0 Pop. − 0.6 G−S − 0.6 Pop. − 1.0 G−S − 1.0
(b)99/12/11
Figure2: Experimentswith10,000objects
formance is obtained when the object replication is
coordinatedbyasinglesource,inourcase,thiswould
be the CDN. The dierence in performance between
Greedy-Global andtheothertwoheuristicsisquitesig-
nicant,especiallyforsmallZipf-parametervalues. In
Figure1wecansee thatthelargestimprovementsin
performanceare upto24%. This resultshowsthat a
CDNshoulduseacoordinatedreplicationstrategyand
notlettheCDN serversactontheirown.
6 Peer-to-Peer Content Distri-
bution
Wenowintroduceanewformofcontentdistribution,
namelypeer-to-peercontentdistribution. Peer-to-peer
tent distribution and are mainly used to share indi-
vidual les between users. In peer-to-peer networks,
such asNapster [11],orGnutella[7],individual users
decideto share leswith others. With the help of a
directoryservice,userscan determinewhere dierent
lescanbedownloadedfrom. While this newmodel
does not directly compete with the traditional Web
browsingmodel,itisestablishingitselfasameansfor
distributing larger les, such as applications or mu-
sic,betweenusers. Becausepeer-to-peernetworks are
madeupof individualusers,wecannotusestrategies
which requireglobal information and coordinationof
nodes as with CDNs. Instead we must restrict our-
selvestostrategiesthat needonlylocallyavailablein-
formation;oneexampleofsuch strategyis thePopu-
larity heuristic.
As before, weassume that oneAS in ournetwork
correspondsto one ISP. We canviewthe storageca-
pacityinanASastheaggregationofthepeer-to-peer
storageoered by the users in that AS. In asimilar
vein,x
ij
=1meansthatatleastoneuserinASihasa
copyofobjectj. Weassumethatthecostofretrieving
objectsfromother usersin thesameAS isnegligible;
thisisconsistentwithourdenitionofretrievalcostin
Section3. Inthissectionwewillinvestigatethebene-
tsofcooperationbetweentheusersin dierentASs.
Our goalis to see if the users in apeer-to-peer net-
workcouldgainanythingfromcooperatingwithother
nearbyusers.
We will now develop a cooperation model for
peer-to-peer content distribution under the popular-
ity heuristic. Forsimplicityand ease of notation,we
shall assumethroughout this section that all objects
havethesamesize,i.e.,b
j
=b forj=1;:::;J.
Consider two ASs, A and B. Denote the shortest
path between them by D
AB
. Let K be the number
ofobjectsthat each AScanstore. Wedonotassume
anythingabouttherelationshipbetweenthetwoASs,
except that the distance between them is D
AB . In
particular, we do not assume that one is the access
provideroftheotherone. Weassumethat
A
=
B .
If both ASs act independently, they would both
cache the K most popular objects and the average
numberofhopsforrequestsfromAandB wouldbe
A
A +
B J
X
j=K+1 p
j d
Aj +
A
A +
B J
X
j=K+1 p
j d
Bj
= 1
2 J
X
j=K+1 p
j d
Aj +
1
2 J
X
j=K+1 p
j d
Bj
(3)
We now consider the case where the two ASs co-
operate and do not necessarily both store the same
of theL most popular objects(L K), and that in
additionAstoresobjects(L+1);:::;K,andB stores
objects (K+1);:::;(2K L). Because the request
ratesareidentical,itdoesnotmatterhowtheobjects
(L+1);:::;(2K L) are shared between A and B.
Notethatbecauseofthesamereason,weonlyneedto
considerreplicatingobjectsinAandB andnotinthe
intermediatenodes.
TheaveragenumberofhopsforrequestsfromAand
B underthisschemeis
1
2 2K L
X
j=K+1 p
j D
AB +
1
2 J
X
j=2K L+1 p
j d
Aj
+ 1
2 K
X
j=L+1 p
j D
AB +
1
2 J
X
j=2K L+1 p
j d
Bj
= 1
2 2K L
X
j=L+1 p
j D
AB +
1
2 J
X
j=2K L+1 (p
j d
Aj +p
j d
Bj ) (4)
Thedierencebetween(3)and(4)is
1
2 2K L
X
j=K+1 (p
j d
Aj +p
j d
Bj )
1
2 2K L
X
j=L+1 p
j D
AB (5)
If (5) is greater than zero, then it is better for A
andB to cooperate. Byassuming d
Aj
=d
Bj
=d
avg ,
wecancalculate the valueof d
avg
where cooperation
becomesthepreferredstrategy. Theequationbecomes
1
2 2K L
X
j=K+1 (p
j d
avg +p
j d
avg )
1
2 2K L
X
j=L+1 p
j D
AB
>0 (6)
Solvingford
avg
,weget
d
avg
>
D
AB
2 P
2K L
j=L+1 p
j
P
2K L
j=K+1 p
j
(7)
If(7)holds,thenitisbetterforAandBtocooperate.
Becaused
avg
islikelytobereasonablystableoverlong
intervalsand becausep
j
'sare known tobothparties,
A and B can use (7) as a quick test to see whether
theycouldgainanythingbycooperating. Forthetest,
AandB eitherneedtospecifythevalueofLorverify
equation(7)overseveralvaluesofL.
In the following, we will consider two values for
D
AB
, namely 1and 2. If D
AB
is equal to 1,then A
andB areneighbors. Weplotted (5)forseveralZipf-
parametervalues,averagedistancesd
avg
,andvaluesof
K,andinallcases,cooperationisalmostalwayssupe-
riortoanon-cooperatingstrategy. Insomeinstances,
however,thedierencebetweenthetwoissmall.
5 10 15 20 25 30 35 40 45 50
−0.25
−0.2
−0.15
−0.1
−0.05 0 0.05 0.1 0.15 0.2 0.25
Number of shared objects, L
Difference in hops
d avg = 3 d avg = 4 d avg = 5 d avg = 6
(a)Zipf0.6
5 10 15 20 25 30 35 40 45 50
−0.25
−0.2
−0.15
−0.1
−0.05 0 0.05 0.1 0.15 0.2 0.25
Number of shared objects, L
Difference in hops
d avg = 3 d avg = 4 d avg = 5 d avg = 6
(b)Zipf0.8
Figure 3: Gain from cooperation for K = 50, 1000
objects
In Figure 3we show typical plots of (5). In these
plots,wehave1,000objectsavailableandbothnodes
havecapacity to store 50 objects, or 5% of the data
set. Weshowtwoplotsfortwodierentvaluesofthe
Zipf-distributionparameter. Onthex-axisweplotthe
valueofL,i.e.,thenumberofobjectsstoredatbothA
andB,andonthey-axisweshowthevalueof(5),that
is, the dierence between individual and cooperative
strategies in hops. If the dierence is positive, then
cooperationisbetter.
As wecanseefrom theplotsin Figure3,there are
always somevalues of L for which cooperation gives
better results, but in somecases thegains are small.
Wecanseethatasd
avg
getssmaller,thegainsbecome
smaller. Thisis nosurprise sinceasmall d
avg means
that the requested objectsare typically already very
A C
B
Figure4: 3-waycooperation
closeand cooperationwould nothelp much. Wealso
seethat asd
avg
increases,thepotentialgainsincrease
signicantly.
Eventhoughagainof0.2hopsmaynotseemmuch,
it is important to note that the dierence between
Popularity heuristic and Greedy-Single was typically
less than0.05hops. Thedierence betweenPopular-
ity andGreedy-Global was0.1{0.15hops. Therefore,
two cooperating nodes using the Popularity heuris-
ticwouldobtainsignicantlybetterperformancethan
theycouldhopetoobtainbyactingindependently. We
canconcludethat cooperation ismuch more eÆcient
atreducing the cost thanchangingaheuristicfrom a
popularitybasedheuristicintoagreedyone.
If D
AB
is equal to 2, then A and B are separated
byone AS.Oneexampleof thiscaseisshown inFig-
ure4where wehavethree nodes,oneparentandtwo
children. In thiscase, thecooperation would happen
betweenthe children. Figure 5 shows thegraphs for
thiscase.
ComparingFigures3and5wecanseethatthegains
get smaller as D
AB
increases. When D
AB
increases
evenfurther,cooperationwillnolongerbebenecialto
AandB. Thisistobeexpected,sinceinthenetwork
topologiesweused, theaveragedistance betweenleaf
nodeswasaround4andthereforethedistancebetween
A and B would be roughlythe sameas thedistance
totheoriginserver.
We also investigated cases where there are a very
largenumberofobjectsinthenetwork. Inthesecases,
the general form of the gains matches those in Fig-
ures 3 and 5, but the actual gain is slightly lower.
Thisisbecausethestoragenodescanonlyholdavery
smallfraction of theobjects,and even twonodesco-
operating cannot hold enough objects to reduce the
average number of hops signicantly. However, even
in these cases, cooperation yielded a smaller average
number of hops than not cooperating. For example,
with1millionobjectsandKequalto2000,themaxi-
mumgainwasaround0.2hopsasopposedto0.27with
1000objectsandK=50(ford
avg
=6).
5 10 15 20 25 30 35 40 45 50
−0.9
−0.8
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1 0 0.1
Number of shared objects, L
Difference in hops
d avg = 3 d avg = 4 d avg = 5 d avg = 6
(a)Zipf0.6
5 10 15 20 25 30 35 40 45 50
−0.9
−0.8
−0.7
−0.6
−0.5
−0.4
−0.3
−0.2
−0.1 0 0.1
Number of shared objects, L
Difference in hops
d avg = 3 d avg = 4 d avg = 5 d avg = 6
(b)Zipf0.8
Figure 5: Gain from cooperationfor D
AB
= 2,K =
50, 1000objects
7 Related Work
Related work on replicating content has mostly con-
centratedontheproblemofplacingthereplicaservers
foroneoriginserver. Inthispaperweconsideramore
global casewhere wehavecontentfrom severalorigin
serversandwedecidewhichobjectstoreplicateonthe
replicaservers.
Inboth[10]and[3],theauthorsconsider theprob-
lemof placingreplicas foroneoriginserveron atree
topology. While the real Internet topology is not a
tree, this simplerapproach allowsthe authors to de-
velopoptimalalgorithms. Thetree-approachmaynot
generalize, because it would require that the trees
fordierent originserversoverlapat thereplica sites
tionofthereplicasites. Also,in [10],thealgorithmis
ofahighcomputationalcomplexityO(N 3
M 2
).
In[14]theauthorspresenttheiralgorithmsforplac-
ing server replicas in a CDN. They assume that the
replicas are complete replicas and they do notstudy
replicating individual objects; in our work we make
replicationdecisionsonaper-objectgranularity. They
formulatetheproblem as theNP-completeK-median
problem,developheuristicsandevaluatetheirperfor-
mance.Theyonlyconsiderplacingreplicasforasingle
originserver;ourheuristics replicateobjectsfrom all
theoriginservers. Alloftheirheuristicsrequireinfor-
mation about the network topology as well as client
requestloads. Also,intheirwork,allthereplicas act
independentlyandtheydonotstudyanycooperating
schemes.
In [9] the authors consider the placement of inter-
ceptingproxiesinsidethenetworktoreducethedown-
loadtime. Theypresent optimalsolutionsfor simple
topologies,suchaslineandring,andconsiderthecase
ofplacingproxiesforasingleserverinatreetopology.
Relyingon intercepting proxiesrequires that routing
isstable duringthelifetimeoftheconnection.
8 Conclusion
In this paper we have studied the problem of opti-
mally replicating objects in CDN servers. We treat
eachAS asanodewithnite capacityforstoringob-
jects. Our optimization problem is to replicate ob-
jectssothatwhenclientsfetch objectsfromthenear-
estCDNserver,theaveragenumberof ASstraversed
is minimized. We haveformulated this problem asa
combinatorial optimization problem and have shown
itto beNPcomplete.
Wehavedevelopedfournaturalheuristicsandcom-
pared them numerically using real Internet topol-
ogy data. Our results show that the best perform-
ingheuristicis Greedy-Global which has alltheCDN
serverscooperating. Thedierenceinperformancebe-
tweenGreedy-Global andthesimplerheuristicswasup
to24%.
Wehavealsostudied peer-to-peercontentdistribu-
tionanddevelopedamodelforstudyingthebenetsof
cooperationbetweennodes. Ourevaluationoftheco-
operationmodelshowsthatnodesusingsimpleheuris-
ticsandintelligentcooperationcangetsignicantper-
formancegains.
The eld of content distribution and peer-to-peer
networks hassignicantpotentialforfuture research.
Ourfutureworkwilllookmorecloselyintointer-node
cooperationandinvestigatetheoptimizationproblem
achievable performance. We also plan to extend our
costmodeltoincludeotherimportantfactors,suchas
networktraÆc orserverload.
Acknowledgements
Theroutingdatasummaries usedin constructingthe
network topologies were provided by National Sci-
ence Foundation Cooperative Agreement No. ANI-
9807479, and the National Laboratory for Applied
NetworkResearch. Wewould alsoliketo thankAnat
Bremler-BarrofTelAvivUniversityforthediscussions
aboutthetopologymodels.
References
[1] Akamai. <http://www.akamai.com>.
[2] L. Breslau, P. Cao, L. Fan, G. Phillips, and
S. Shenker. Web caching and Zipf-like distribu-
tions: Evidenceand implications. InProceedings
of IEEE Infocom, New York, NY, March 21{25,
1999.
[3] I. Cidon, S. Kutten, and R. Soer. Optimal
allocation of electronic content. In Proceedings
of IEEE Infocom, Anchorage, AK, April 22{26,
2001.
[4] DigitalIslandInc. <http://www.digisle.net>.
[5] L. Gao. On inferring autonomous system re-
lationships in the internet. In Proceedings of
IEEEGlobalInternet,SanFrancisco,CA,Novem-
ber28{30,2000.
[6] M. R. Garey and D. S. Johnson. Computers
andIntractability: AGuidetotheTheory ofNP-
Completeness. Freeman,1979.
[7] Gnutellawebsite.
<http://www.gnutella.co.uk/>.
[8] S. Jin and A. Bestavros. GreedyDual* web
cachingalgorithm: Exploitingthetwosourcesof
temporallocalityinwebrequeststreams. InPro-
ceedingsof 5thWebCachingandContentDistri-
bution Workshop, Lisbon, Portugal, May 22{24,
2000.
[9] P. Krishnan,D. Raz, and Y.Shavitt. Thecache
location problem. IEEE/ACM Transactions on
Networking,8(5):568{582,October2000.
On the optimalplacementof web proxiesin the
internet. In Proceedings of IEEE Infocom, New
York,NY,March21{25,1999.
[11] Napster. <http://www.napster.com>.
[12] NLANR measurement and operations analysis
team. <http://moat.nlanr.net>.
[13] V.N.PadmanabhanandL.Qiu.Thecontentand
accessdynamicsofabusywebsite: Findingsand
implications. InProceedingsofACMSIGCOMM,
Stockholm, Sweden, August 28 { September 1,
2000.
[14] L. Qiu,V.N.Padmanabhan,andG.M.Voelker.
On the placement of web server replicas. In
Proceedings of IEEE Infocom, Anchorage, AK,
April22{26,2001.
[15] Routeviews projecthomepage.
<http://www.antc.uoregon.edu/rout e-vi ews/>.