• Aucun résultat trouvé

Content distribution network

Dans le document Internet content distribution (Page 36-0)

A.8 Vidéo en couches et les caches

A.8.2 Approche

2.3 Content distribution network

Inearly1999andintheyearssince,severalcompanieshavestartedtooperatetheircontent

distribution networks or CDNs. Thesecompanies include Adero, Akamai, Digital Island,

Mirror Image, and SandPiper[2,3,16,43,76]. This newmodelof content distribution has

becomevery popularamong the largecontent providers.

Figure2.3showsthearchitectureofaCDN.ACDNmakesagreementswiththecontent

providers (O

1 , O

2

, and O

3

) to distribute their content to the users. The CDN operates

content servers (C) that are typically placed near the users, for example at the dial-up

ISPs. The user requests are redirected to these content servers which are able to serve

thecontent fast. Theinternal networkoftheCDNconnectstheoriginserversandcontent

serversandisusedtotransfercontentfromtheoriginserverstocontent servers(ormoving

content fromone content serverto another).

The main dicultyincreatinga CDN isredirectingtheclients to thecontent servers.

Ideally,onewouldwantthistobecompletelytransparenttotheclientssothatno

modica-tionstoclientsoftwareisneeded. ModernCDNsachieve thisbymanipulatinginformation

in the Domain Name System (DNS) and in the following we present the two methods

O 1

Figure 2.3: Content distributionnetwork

schemes.

BecauseCDNschargeahighpricefortheirservices,theyaremostlyapplicabletolarger

companies wanting to distribute their content. Individuals or small organizations would

typically not be able to aord the services of a CDN, hence they would have to rely on

client-side cachingtohelp distributetheir content. CDNsarethereforenot areplacement

tothetraditionalclient-servermodelorclient-sidecaching,butacomplementaryapproach

which allows for ecient delivery ofvery popular content to a large number of interested

users.

2.3.1 Full Replication

In thefull replication scheme the CDN takescontrol of theDNS mapping of thecontent

provider's server, say www.example.com. When a client wants to request an object from

this server, it rst has to do a DNS lookup on www.example.com to get the server's IP

address. The information in the DNS system for the domain example.com points to a

nameserverinthe CDN'snetwork. Whenthisnameserverreceivestheclient'sDNSlookup

request,itdetermineswhich content serveristhebestplacedtohandlethis requestandit

returnstheIPaddressofthatcontentserverastheIPaddressof www.example.com. When

theclientreceivestheDNSreply,itwillattempttoconnecttothecontent server. Because

the content server is closer to theclient than the origin server, the client will receive the

requested objectmuch faster.

The downside of this approach is that each content server mustbe able to handle all

serverfullyreplicatesthecontentsofallcontent providerswithwhichtheCDNhaspassed

an agreement,or that thecontent serveracts asa surrogateproxy [14].

The benetof thismechanism isthatall clientrequests arealwayssent to thecontent

servers. We will studythis issuedeeperinChapter 5.

2.3.2 Partial Replication

In the partial replication scheme, only a subset of the objects on the content provider's

originserverareplacedonthecontentservers. Clientredirectiongoesasfollows. Theclient

retrievesthe homepage from the origin serverwww.example.com. Thispage maycontain

references to images which have been placed on the content servers. The URLs of these

images have been changed, for example the URL http://www.example.com/title.gif

couldbecomehttp://www.cdn.net/example/title.gif. Fromtheclient'spointofview,

this image looks like any normal image, except that it has to open a connection to a

new server. When it opens this connection, it has to perform a DNS lookup to get the

IP address and this DNS lookup is handled in the same way as above. The client gets

redirected to anearbycontent serverandasksit for theimage.

Thebenetofthismechanismisthatonlyobjectsthatneedtobereplicated areplaced

on the content servers, thus reducing the storagerequirements. Butthe downside of this

mechanism is that someone, typically the content provider, has to decide which objects

areto be replicated. This means thatthe systemasa wholeis slow to react to hot-spots

which may occur when some content on the origin server suddenly becomes extremely

popular. Also,the cost associated withopeninga newconnection to thecontent serveris

non-negligible, aswe showinChapter5.

One weakness that is shared by both replication mechanisms is that the redirection

decisions arebasedon the IPaddressof the machinewhich sent theDNSlookuprequest.

Thisistypicallythenameserverfortheclientandmayormaynotbetopologicallycloseto

the client. TheeectivenessofDNS-basedserverselectionhasbeen studiedinmoredetail

in[79]. Theirresultsindicatethatespeciallyfordial-upusers,thecostofDNSredirection

can be very high.

2.4 Peer-to-Peer Networks

The latest development in content distribution is peer-to-peer networks. The rst

peer-to-peer network wasNapster [51]which allowed users to share MP3-leswitheach other.

Themain applicationfor peer-to-peernetworkshasbeen lesharing,inwhichusers make

some les available on their computers and others can download these les. In order for

userstobeabletondoutwhichusersareoeringwhichcontent,thenetworkneedssome

kind of a lookup service which maps object names into the machines serving these les.

Belowwe will discusssome possibleapproachesfor buildingsuch a lookupservice.

What sets peer-to-peernetworks apartfrom thetraditional forms of content

distribu-tion, cachingand CDNs, isthatinapeer-to-peer networkeverynode isboth a client and

network is served by a small minority of users and a large number of users do not oer

anyles[1]. Regardless of this,peer-to-peernetworks have becomeextremely popularfor

sharing les between users.

The main problem faced by Napster was not technical, but legal. Napster was made

to share only music encoded inMP3-format and most users usedit to share and retrieve

copyrighted music without permission from thecopyright owners. This prompted a long

legal battle between Napsterand record companies and asa resultNapsterwas forcedto

shut down 3

.

Regardless of this problem, Napster had showed the advantages of the peer-to-peer

contentdistributionmodelandseveralnewsystemsandprotocolshavebeendeveloped,for

example,Gnutella[28],Freenet[23],FastTrackandMorpheus[21,48],andMojoNation[47].

There has also been considerable interest in the research community shown by the large

numberof peer-to-peerprojects,such asCAN[63],Chord [83],Pastry [75],Tapestry[95],

OceanStore [39],and FarSite [20]. We will reviewthese projects indetailinChapter7.

Wewillnowpresentthemainapproachesforbuildingalookupserviceforapeer-to-peer

network,namely centralized,distributed, and hybrid.

2.4.1 Centralized Architecture

Figure 2.4 shows an example of a centralized peer-to-peer architecture. The centralized

architecture requires that some central authority operates a single central server. (This

serverwouldtypically beaserverfarm,but theusers wouldseeitasasingleserver.) This

centralserverisresponsibleforansweringthequeries,henceallthequerytracisdirected

to it.

The lookup service inNapster was based on a centralized architecture. When a user

wanted to locate an object, the Napster software would contact a server, operated by

Napster, that kept track of who was online and which les each user was sharing. The

central serverwouldperform a simplekeyword search withthe search terms given bythe

user, andwouldreturna listofpotential matches. The userwouldseethis listand would

have to choose one peer from whom to download the le. The list would include hints,

such as other peers' network bandwidth and round-trip times, to help users choose peers

whichareclose-byor well connected. However, thesehintswerenot veried,andanyuser

couldindicateanyconnectionbandwidthhedesired. Theactualletransferwouldhappen

directlybetween the peersand not via thecentral server.

The main drawback of this architecture is thatthecentral serverbecomes a potential

bottleneck and is a single-point of failure. Using a server farm we can overcome the

bottleneckproblem,butifalltheserversareco-located,asinglenetworkfailurecouldtake

out all the servers. The servers can be geographically dispersed and peers redirected on

them basedon some criteria, but inthis case thequalityof serviceobtained bythepeers

dependson which serverthey happened to be redirected. In this model, each peer would

haveto register withthe centralserverwhenthesoftwarewasinstalled,andalso tonotify

the central serverwhenthe peers goesup or down, and whichles ithas.

3

Central server

Peers

0 1 0 1

0 0 1 1

0 0 1 1

0 0 1 1

0 0 1 1

0 0 1 1 0 0 1 1

0 0 1 1

0 0 1 1

00 00 11 11 0 0 1 1 0 0

1 1 0 0 1 1

0 0 1 1 0 0 1 1

Figure 2.4: Centralized peer-to-peer network. Dashed lines show query trac and solid

linesshow howobjects aretransferred.

The advantage of this model is that the queries are simple to perform and they do

not consume much network resources. Each client request results in one query message,

frompeerto centralserver,andone longerresponse,thelistfromthecentralservertothe

requestingpeer.

2.4.2 Distributed Architecture

The other main lookup architecture used inpeer-to-peer networks is a distributed

archi-tecture, suchasthe oneusedinGnutella[28]. Figure2.5shows adistributedpeer-to-peer

network. The main advantageof a distributedarchitecture isthat allpeersare equal and

nopeersholdanypermanent informationaboutwhichobjectsarestored where;alsothere

isno directoryof thepeers whichare apartof thenetwork.

Whenauserwantstojoinsuchanetwork,hemusttypicallyrstobtaintheIPaddress

(orhostname) of apeerthat isalreadya member ofthenetwork. Thiscould be achieved

throughsome well-knownbootstrapnodesandthe peerscan obtaintheir addresseseither

from the DNS (the approach in CAN [63]) or simply with out-of-band methods, such as

publishing theaddresses ofbootstrapnodesona webpage. Notethatthere canbeoneor

0 1 00 00

11 11

0 1 00 00

11 11

0 1 00 00

11 11

0 1 00 00

11 11

0 1 00 00

11 11

0 1 00 00

11 11

0 1 00 00

11 11

0 1 00 00 11 11

0 1 00 00

11 11 0

1 00 00 11 11

Figure 2.5: Distributed peer-to-peer network. Dashed lines show query trac and solid

linesshow howobjects aretransferred.

bootstrapnode,but thisstillrequiresthatthenewpeerisableto obtaintheaddressofat

leastone peer inthe network.

The rst peer-to-peer network to use the distributed architecture was Gnutella and

when a Gnutella-peer wants to retrieve an object, it queries as follows. The requesting

peer has some number of neighboring peers in the peer-to-peer network 4

. It sends its

query to all of its neighbors. If a neighbor has the requested object, it will reply to the

requesting peer and inform it that it has a copy. If a neighbor does not have a copy, it

will,inturn, sendthe same queryto allof its neighbors, excluding theneighbor whosent

theoriginal query.

Thiswaythequeryeventuallypropagatestoallpeersinthenetworkandtherequesting

peerisabletondoutifanypeerhasacopyoftheobject. Thisalsomeansthateveryquery

needs to be ooded to all ofthe network which puts aconsiderable strainon it. Gnutella

attemptsto alleviatethis oodingproblembyusinga limit(time-to-live,or TTL)onhow

manytimesaquerycanbeforwarded. ThisTTLissetbytherequestingpeer,henceauser

can increaseitifaquerydidnot ndtheobject. If someusersets this TTL veryhigh, he

couldreceive potentiallythousands ofreplies whichcould cloguphis networkconnection.

Hence, users will learn,throughtrial-and-error, what isthe bestsetting for thisTTL.

4

Neighborrelationsinthepeer-to-peeroverlaynetworkdonotimplythatthetwopeersare neareach

Becausethequerymightnotpropagatetoallpeers,itispossiblethatarequestendsina

failure,eventhoughtheobjectisavailablefromsomepeers. Inthecentralizedarchitecture

this isnot possible,because the central serverisawareof allobjects inthenetwork.

Most of the current research on peer-to-peer networks has been aimed at improving

query performance indistributed peer-to-peer networks [63,75,83]. We will review these

projects inmore detailinChapter7.

2.4.3 Hybrid Architectures

FastTrack [21] and Morpheus [48] use a hybrid architecture for querying. This hybrid

architecture attempts to strike a balance between the accuracy of the centralized

archi-tecture and the lower load of the distributed architecture. Inthe Morpheus architecture,

some peers have been designated assupernodesbythebootstrapnode. Normalpeers are

assigned a supernode by thebootstrapnode. When apeerwants to request an object, it

sends its queryto its supernode. Each supernode maintains a directory ofall the objects

in the peers under it; this provides for Napster-like behavior within the peers under a

supernode. The supernode can also forward the query to other supernodes which reply

directlytotherequestingpeerandsendtheaddresses ofpeersunderthemwhichhavethe

requested object. This provides for a wide coverage, like Gnutella, but with considerably

lessresources.

There havenotyetbeenanystudiesthatwouldhavecomparedtheperformanceofthe

hybrid architecture withthatofthe centralized or distributedarchitectures.

2.5 Conclusion

This chapter has presented an overview of how content distribution technologies have

evolved on the Internet in the past few years. Starting o with the basic client-server

model, the rst stepwasclient-sidecaching. This wasinitiallyimplemented withcaching

proxies, installed locally at ISPs or institutions, and later on these caches were used to

createcachinghierarchies. Client-sidecachingdidnot,however,allowthecontentprovider

anycontroloverhowthecontentwouldbecached. Content distributionnetworksemerged

to remedy this problem and they have become the de-facto content distribution method

for most commercial web content. Finally, we also presented a new content distribution

paradigm,namelypeer-to-peernetworks. Thesenetworksdierfromthetraditionalmodel

inthateach peerinthenetworkis both aclient and aserver. We alsopresenteddierent

architectures forbuilding alookup servicefor a peer-to-peernetwork.

Client Redirection

Locating Copies of Objects

3.1 Overview

InordertoreduceaveragedelayandbandwidthusageintheWeb,geographicallydispersed

serversoftenstorecopiesofpopularobjects. Forexample,withnetworkcaching,theorigin

server stores a master copy of the object and geographically dispersed cache servers pull

andstorecopiesoftheobject. Withsitereplication,objectsstoredatmasterarereplicated

intosecondarysites. Inthis chapterweproposeanewnetworkapplication,LocationData

System (LDS), that allows an arbitrary host to obtain the IP addresses of the servers

that store a specied URL. Our networking application is an extension to the Domain

NameSystem(DNS),requiresonlysmallchangestothedomainnameservers, andcanbe

deployedincrementally. ForthecaseofnetworkWebcaching,weelaborateonourproposal

to allowacache to (i) updatea distributeddatabase whenit storesor evictsobjects, and

(ii)pushobjectsto parentcachesinordertoimprove delayandbandwidthusage. Forthe

caseof mirrored servers, we showhow aclient can obtaina listofall servers mirroringall

or partof thedesiredsite. LDSapplied to partially mirroredsites generatessubstantially

lessDNS trac than LDSapplied to caching. Finally,we discuss howa host can usethe

locationdatainordertomakeintelligent decisionsaboutwheretoretrievedesiredobjects.

3.2 Introduction

Network caching of documents has become a standard way of reducing network trac

and latency in the Web. Caches are currently employed in institutional, local, regional

and national ISPs. Cache hierarchies, created when caches in lower-level ISPs point to

caches inhigher-level ISPs,are currently prevalent intheInternet [52,68]. Today'scache

hierarchies use static, manually congured pointers to dene the hierarchy tree. Cache

hierarchies operateasfollows. When abrowserrequestsadocument,itsends arequestto

a leaf cache. This cache then eitherserves the document (ifit is cached) or forwards the

document to its parent in the hierarchy. The process is repeated along a static chain of

caches until theroot ofthehierarchy isreached. If there isalso a cache miss at theroot,

the root forwards the request directly to the origin server. A response is returned along

thecachechaininthereversedirection. Cooperatingcachesincache hierarchiesoftenuse

ICP(Internet Cache Protocol)to improve and enlarge thescope ofthesearch [91].

Caching hierarchies have several problems. First, requestsfor less popular documents

will experiencemissesat all caches in thecache chain. For deep hierarchies, these misses

lead to poor latency performance [85]; moreover, ICP can further degrade performance,

since the cache must wait for a reply from all sibling and parent caches or until a two

second timeout before proceeding up the hierarchy. Second, today's caching hierarchies

are static they do not permit a browser or a cache to choose the subsequent cache

accordingtocurrenttopology ortrac conditions. Manualoptimizations, suchassending

requestsfor certaintop level domains to designated parent caches, are possible, but even

with these optimizations, the hierarchy for a given URL remains static. Third, caching

hierarchies do not allow the chain of caches to extend beyond the root cache; if there is

a missat the root server,the request isforwarded directlyto theorigin server, and never

to a cache thatliessomewhere inbetween the root serverand theorigin server. Thisis a

problem because a cache nearby the origin might be able to serve an object much faster

than theorigin server, especially when theorigin server runs on a slow machine or has a

lowbandwidth connection.

In this chapter we propose a new cooperative caching scheme that has the following

features. (1) At most two servers (including the origin server) are visited in the request

chain;(2)ThechainofcachesdependsontherequestedURL andcanchange dynamically

asafunction ofcurrent network topologyand tracconditions; (3)Anarbitrarycachein

theInternetcan bequeried, includingacachethatisfarfromthebrowserbut closetothe

originserver. (4)The scheme can be incrementally deployed withminor changes to DNS

servers. Furthermore,although athorough performance studyis stillrequired,wefeelthe

schemeshould leadto asubstantial reductionindelayandnetwork trac ascomparedto

schemeshould leadto asubstantial reductionindelayandnetwork trac ascomparedto

Dans le document Internet content distribution (Page 36-0)