Dealing with Hot Spots - Internet content distribution

The replication theory and algorithms inthe previous sections maximize the availability

0

(a)Nodeupprobability.2

0

(b)Nodeupprobability.5

0

(c)Nodeupprobability.9

Figure 7.13: Hit-rate as function of L and K for Zipf = .8 and 160 objects of per-node

storage

example, suppose that j is among themost popular objects, and that Y

, the rst-place

winnerforj,isalwaysup. Thenthealgorithmsoftheprevioussectionswillcreateexactly

one replica for o and store it in Y

In this sectionwe modify the theoryand algorithms oftheprevious sections to better

respondto hot spots. First considerthe optimization theoryofSection 7.4. Supposenow

that each peer iannounces a value

, which is the maximumaverage rate at which it is

willing to servecontent to therest of the community. Alsolet betheaggregate request

rate forobjects.

Suppose acopy ofobject j is inpeeri. Let us calculate thetransmission rate of peer

i due to object j. We suppose thatthe aggregate transmissionrate (across all peers) for

object j is equally shared among all the up peers that have a copy of j. The average

number of up peers with a copy of j is

. Thus the transmission rate for peer i

due to type-j objects is

Summing upoverall objectsj givesthefollowing constraint:

Therefore,theintegerprogrammingformulationoftheoptimalreplicationproblemwith

explicitloadbalancingistomaximize(7.1)subjectto(7.2)and(7.5). Wewilladdressthis

probleminour future research.

Now consider the case when all the p

's are equal. A natural constraint is that the

average transmissionloaddue to anyobjectj placedon anypeer be lessthan some xed

value, say,

=p. Thisconstraintcan easily be included inthedynamicprogramming

formulation. (However, we recommend roundingtheright-handside, thereby allowing for

zerocopies of lesspopular objects.)

Adaptive Algorithms

Replicate: When the peers holding a hot object are overloaded, the peers should

push the object to the next-place winners. For example, if the rst- and

second-place winners areoverloaded with requestsfor object o, they should push acopy of

oto third (and possibly fourth,fth, etc.) place winners.

Spread Requests: When a peer X wants an object o, it should ping the top-K

winners. Ifmorethan oneof theK peershaso, itshouldchooseone atrandom and

request the object from that peer. In the case of content caches, if no peer has o,

X shouldrequest ofrom therst-place winnerY

(whichwill requestofromoutside

thecommunity onthe behalfofX.

We propose the following algorithm for replication. When a peer Y sees that it is

overloadedwithtransmissionsforanobjecto,itdeterminesthethecurrenttop-R winners

andasksthesewinnersiftheyhaveo. Leti

;:::;i

bethepeersthatindicatethatthey

do not have o, ordered by their winner placement for object o. Y will then push o into

;::: ;i

(sr),withthevalueofsdependingon theextent ofoverloadandthenumber

of top-K peersthatcurrently have o.

Chopping Objects into Small Pieces

Given the focus of this chapter is on large multimedia objects, another approach to

load balancing is to chop up each object into small pieces, and give each piece a unique

name. This will cause each new object introduced into the community to be scattered

amongthepeersofthecommunity. Whenapeeraccessesanobject,thetransmissionload

will be shared bya large numberofpeers.

One drawback of this approach is that if only one piece is unavailable from the peer

community, then there will be miss. Such a miss event is particularly unfortunate for a

content club, since the missing piece cannot be retrieved from the outside. Therefore, it

is useful to introduce redundant pieces sothat, say,any n ofn+m pieces are neededto

reconstruct the object. Thiswill be the subjectof futureresearch.

7.8 Conclusion

In this chapter we have studied content replication inpeer-to-peer communities. We

dis-tinguish between two types of coordinated peer communities: content caches, for which

content can beretrieved from outsidethecommunity ifunavailable from withinthe

com-munity; andpeerclubs,forwhichallcontentmustberetrievedfrominsidethecommunity.

Wehaveformulatedanintegerprogrammingproblemthatprovidesuswiththeexact

opti-mal solutionfor both peer cachesandcontent clubs. We havedeveloped several adaptive,

distributed algorithms for replicating content on-the-y. Through extensive experiments,

we have found that our algorithms combined with a distributed least-frequently-used

re-placement policyprovide near-optimalperformance, both intermsof hit-rateandnumber

of replicas.

The research in this chapter provides for many opportunities for future research. We

plan to explore integer programming heuristics to better estimate the optimal solution

will also study if chopping the large objects into small pieces can help in improving the

performance.

Distribution of Layered Encoded

Video

8.1 Overview

TheecientdistributionofstoredinformationhasbecomeamajorconcernintheInternet

which hasincreasinglybecome a vehiclefor the transport ofstored video. Because ofthe

highly heterogeneous access to the Internet, researchers and engineers have argued for

layered encoded video. In this chapter we investigate delivering layered encoded video

using caches. Based on a stochastic knapsack model we develop a model for the layered

video cachingproblem. We proposeheuristicstodetermine whichvideosandwhichlayers

in the videos should be cached. We evaluate the performance of our heuristics through

extensive numerical experiments. We also consider two intuitive extensions to theinitial

problem.

8.2 Introduction

In recent years, the ecient distribution of stored information has become a major

con-cern inthe Internet. Inthe late 1990s numerous companies including Cisco, Microsoft,

Netscape,Inktomi, andNetworkAppliancebegantosellWebcachingproducts,enabling

ISPsto deliverWebdocumentsfasterandtoreducetheamountoftracsent toandfrom

other ISPs. More recently the Internet has witnessed the emergence of content

distribu-tionnetwork companies,such asAkamaiandSandpiper, whichworkdirectlywithcontent

providerstocacheandreplicatetheproviders'contentclosetotheendusers. Inparallelto

allofthiscachingandcontent distributionactivity,theInternethasincreasinglybecomea

vehicleforthetransportofstoredvideo. ManyoftheWebcachingandcontentdistribution

companies have recently announced new products for the ecient distribution of stored

video.

Access to the Internet is, of course, highly heterogeneous, and includes 28K modem

connections, 64K ISDN connections, shared-bandwidth cable modem connections, xDSL

connections withdownstreamrates in100K-6M range, andhigh-speed switched Ethernet

connections at 10 Mbps. Researchers and engineers have therefore argued that layered

encodedvideoisappropriatefortheInternet. Whenavideoislayeredencoded,thenumber

of layersthat aresent to the enduser isa functionof theuser'sdownstreambandwidth.

An important research issue is how to eciently distribute stored layered video from

servers (includingWeb servers) to endusers. AswithWebcontent, itclearly makes sense

toinsertintermediatecachesbetweentheserversandclients. Thiswillallowuserstoaccess

muchofthestoredvideo contentfromnearbyservers, ratherthanaccessingthevideofrom

a potentially distant server. Given thepresence of a cachingand/or content distribution

network infrastructure,andoflayeredvideo inoriginservers, afundamental problemisto

determine which videos and which layers in the videos should be cached. Intuitively, we

will want to cache themore popularvideos, andwill want to give preferenceto thelower

baselayersrather than to the higherenhancement layers.

In this chapter we present a methodology for selecting which videosand which layers

should be stored at anite-capacitycache. Themethodology couldbeused, for example,

by a cable or ADSL access company with a cache at the root of the distribution tree.

Specically,we suppose thatthe users have high-speed accessto thecache,but thecache

haslimitedstorage capacity and alimited bandwidthconnectionto theInternet at large.

Forexample,theISPmighthaveaterabytecachewitha45Mbpsconnectiontoitsparent

ISP.Thus,thevideocachingproblemhastwoconstrainedresources,thecachesizeandthe

transmissionrateofthe accesslinkbetween theISPand itsparent ISP.Our methodology

is based on astochastic knapsack modelof the 2-resource problem. We suppose thatthe

cacheoperatorhasagoodestimateofthepopularitiesofthethevideolayers. Theproblem,

inessence,istodeterminewhichvideosandwhichlayerswithinthevideoshouldbecached

sothatcustomer demand canbestbemet.

Thischapterisorganizedasfollows. InSection8.3wepresentourlayeredvideo

stream-ingmodel. InSection 8.4wepresent ourutilityheuristics andevaluatetheir performance.

Section 8.5extendsourcachingmodelbyaddingthepossibilityto negotiatethedelivered

streamquality. Section 8.6considersaqueueingschemeformanaging clientrequests.

Sec-tion 8.7 considers the usefulness of partial caching. Section 8.8 presents an overview of

relatedwork and Section 8.9concludesthechapter.

Dans le document Internet content distribution (Page 140-146)