The replication theory and algorithms inthe previous sections maximize the availability
0
(a)Nodeupprobability.2
0
(b)Nodeupprobability.5
0
(c)Nodeupprobability.9
Figure 7.13: Hit-rate as function of L and K for Zipf = .8 and 160 objects of per-node
storage
example, suppose that j is among themost popular objects, and that Y
1
, the rst-place
winnerforj,isalwaysup. Thenthealgorithmsoftheprevioussectionswillcreateexactly
one replica for o and store it in Y
1
In this sectionwe modify the theoryand algorithms oftheprevious sections to better
respondto hot spots. First considerthe optimization theoryofSection 7.4. Supposenow
that each peer iannounces a value
i
, which is the maximumaverage rate at which it is
willing to servecontent to therest of the community. Alsolet betheaggregate request
rate forobjects.
Suppose acopy ofobject j is inpeeri. Let us calculate thetransmission rate of peer
i due to object j. We suppose thatthe aggregate transmissionrate (across all peers) for
object j is equally shared among all the up peers that have a copy of j. The average
number of up peers with a copy of j is
. Thus the transmission rate for peer i
due to type-j objects is
q
Summing upoverall objectsj givesthefollowing constraint:
J
Therefore,theintegerprogrammingformulationoftheoptimalreplicationproblemwith
explicitloadbalancingistomaximize(7.1)subjectto(7.2)and(7.5). Wewilladdressthis
probleminour future research.
Now consider the case when all the p
i
's are equal. A natural constraint is that the
average transmissionloaddue to anyobjectj placedon anypeer be lessthan some xed
value, say,
=p. Thisconstraintcan easily be included inthedynamicprogramming
formulation. (However, we recommend roundingtheright-handside, thereby allowing for
zerocopies of lesspopular objects.)
Adaptive Algorithms
Replicate: When the peers holding a hot object are overloaded, the peers should
push the object to the next-place winners. For example, if the rst- and
second-place winners areoverloaded with requestsfor object o, they should push acopy of
oto third (and possibly fourth,fth, etc.) place winners.
Spread Requests: When a peer X wants an object o, it should ping the top-K
winners. Ifmorethan oneof theK peershaso, itshouldchooseone atrandom and
request the object from that peer. In the case of content caches, if no peer has o,
X shouldrequest ofrom therst-place winnerY
1
(whichwill requestofromoutside
thecommunity onthe behalfofX.
We propose the following algorithm for replication. When a peer Y sees that it is
overloadedwithtransmissionsforanobjecto,itdeterminesthethecurrenttop-R winners
andasksthesewinnersiftheyhaveo. Leti
1
;i
2
;:::;i
r
bethepeersthatindicatethatthey
do not have o, ordered by their winner placement for object o. Y will then push o into
i
1
;::: ;i
s
(sr),withthevalueofsdependingon theextent ofoverloadandthenumber
of top-K peersthatcurrently have o.
Chopping Objects into Small Pieces
Given the focus of this chapter is on large multimedia objects, another approach to
load balancing is to chop up each object into small pieces, and give each piece a unique
name. This will cause each new object introduced into the community to be scattered
amongthepeersofthecommunity. Whenapeeraccessesanobject,thetransmissionload
will be shared bya large numberofpeers.
One drawback of this approach is that if only one piece is unavailable from the peer
community, then there will be miss. Such a miss event is particularly unfortunate for a
content club, since the missing piece cannot be retrieved from the outside. Therefore, it
is useful to introduce redundant pieces sothat, say,any n ofn+m pieces are neededto
reconstruct the object. Thiswill be the subjectof futureresearch.
7.8 Conclusion
In this chapter we have studied content replication inpeer-to-peer communities. We
dis-tinguish between two types of coordinated peer communities: content caches, for which
content can beretrieved from outsidethecommunity ifunavailable from withinthe
com-munity; andpeerclubs,forwhichallcontentmustberetrievedfrominsidethecommunity.
Wehaveformulatedanintegerprogrammingproblemthatprovidesuswiththeexact
opti-mal solutionfor both peer cachesandcontent clubs. We havedeveloped several adaptive,
distributed algorithms for replicating content on-the-y. Through extensive experiments,
we have found that our algorithms combined with a distributed least-frequently-used
re-placement policyprovide near-optimalperformance, both intermsof hit-rateandnumber
of replicas.
The research in this chapter provides for many opportunities for future research. We
plan to explore integer programming heuristics to better estimate the optimal solution
will also study if chopping the large objects into small pieces can help in improving the
performance.
Distribution of Layered Encoded
Video
8.1 Overview
TheecientdistributionofstoredinformationhasbecomeamajorconcernintheInternet
which hasincreasinglybecome a vehiclefor the transport ofstored video. Because ofthe
highly heterogeneous access to the Internet, researchers and engineers have argued for
layered encoded video. In this chapter we investigate delivering layered encoded video
using caches. Based on a stochastic knapsack model we develop a model for the layered
video cachingproblem. We proposeheuristicstodetermine whichvideosandwhichlayers
in the videos should be cached. We evaluate the performance of our heuristics through
extensive numerical experiments. We also consider two intuitive extensions to theinitial
problem.
8.2 Introduction
In recent years, the ecient distribution of stored information has become a major
con-cern inthe Internet. Inthe late 1990s numerous companies including Cisco, Microsoft,
Netscape,Inktomi, andNetworkAppliancebegantosellWebcachingproducts,enabling
ISPsto deliverWebdocumentsfasterandtoreducetheamountoftracsent toandfrom
other ISPs. More recently the Internet has witnessed the emergence of content
distribu-tionnetwork companies,such asAkamaiandSandpiper, whichworkdirectlywithcontent
providerstocacheandreplicatetheproviders'contentclosetotheendusers. Inparallelto
allofthiscachingandcontent distributionactivity,theInternethasincreasinglybecomea
vehicleforthetransportofstoredvideo. ManyoftheWebcachingandcontentdistribution
companies have recently announced new products for the ecient distribution of stored
video.
Access to the Internet is, of course, highly heterogeneous, and includes 28K modem
connections, 64K ISDN connections, shared-bandwidth cable modem connections, xDSL
connections withdownstreamrates in100K-6M range, andhigh-speed switched Ethernet
connections at 10 Mbps. Researchers and engineers have therefore argued that layered
encodedvideoisappropriatefortheInternet. Whenavideoislayeredencoded,thenumber
of layersthat aresent to the enduser isa functionof theuser'sdownstreambandwidth.
An important research issue is how to eciently distribute stored layered video from
servers (includingWeb servers) to endusers. AswithWebcontent, itclearly makes sense
toinsertintermediatecachesbetweentheserversandclients. Thiswillallowuserstoaccess
muchofthestoredvideo contentfromnearbyservers, ratherthanaccessingthevideofrom
a potentially distant server. Given thepresence of a cachingand/or content distribution
network infrastructure,andoflayeredvideo inoriginservers, afundamental problemisto
determine which videos and which layers in the videos should be cached. Intuitively, we
will want to cache themore popularvideos, andwill want to give preferenceto thelower
baselayersrather than to the higherenhancement layers.
In this chapter we present a methodology for selecting which videosand which layers
should be stored at anite-capacitycache. Themethodology couldbeused, for example,
by a cable or ADSL access company with a cache at the root of the distribution tree.
Specically,we suppose thatthe users have high-speed accessto thecache,but thecache
haslimitedstorage capacity and alimited bandwidthconnectionto theInternet at large.
Forexample,theISPmighthaveaterabytecachewitha45Mbpsconnectiontoitsparent
ISP.Thus,thevideocachingproblemhastwoconstrainedresources,thecachesizeandthe
transmissionrateofthe accesslinkbetween theISPand itsparent ISP.Our methodology
is based on astochastic knapsack modelof the 2-resource problem. We suppose thatthe
cacheoperatorhasagoodestimateofthepopularitiesofthethevideolayers. Theproblem,
inessence,istodeterminewhichvideosandwhichlayerswithinthevideoshouldbecached
sothatcustomer demand canbestbemet.
Thischapterisorganizedasfollows. InSection8.3wepresentourlayeredvideo
stream-ingmodel. InSection 8.4wepresent ourutilityheuristics andevaluatetheir performance.
Section 8.5extendsourcachingmodelbyaddingthepossibilityto negotiatethedelivered
streamquality. Section 8.6considersaqueueingschemeformanaging clientrequests.
Sec-tion 8.7 considers the usefulness of partial caching. Section 8.8 presents an overview of
relatedwork and Section 8.9concludesthechapter.