Locating copies of objects using the Domain Name System

(1)

Using the Domain Name System

Jussi Kangasharju, Keith W. Ross

Institut Eurecom

Sophia Antipolis, France

fkangasha,rossg@eurecom.fr

James W. Roberts

France Telecom { CNET

Issy les Moulineaux, France

Abstract

Inordertoreduceaveragedelayandband-

width usage in theWeb, geographicallydis-

persed servers often store copies of popular

objects. Forexample,with network caching,

the origin server stores a master copy of

theobjectandgeographicallydispersedcache

servers pull and store copies of the object.

With sitereplication, objectsstored at mas-

terarereplicatedintosecondarysites. Inthis

paperweproposeanewnetworkapplication,

LocationDataSystem(LDS),that allowsan

arbitrary hostto obtainthe IP addresses of

the servers that store a specied URL. Our

networkingapplicationisanextensiontothe

Domain Name System (DNS), requires only

small changes to the domain name servers,

and can be deployed incrementally. For the

case of network Web caching, we elaborate

on our proposal to allow a cache to (i) up-

dateadistributeddatabasewhenitstoresor

evictsobjects,and(ii)pushobjectstoparent

caches in order to improve delay and band-

widthusage. Forthecaseofmirroredservers,

we showhow aclient canobtainalistof all

servers mirroring all or part of the desired

site. LDSapplied to partially mirroredsites

generatessubstantiallylessDNStraÆc than

LDS applied to caching. Finally, we discuss

howahostcanusethelocationdatainorder

to make intelligentdecisionsaboutwhere to

retrievedesiredobjects.

1 Introduction

Networkcachingofdocumentshasbecome

a standard way of reducing network traÆc

and latency in the Web. Caches are cur-

rently employed in institutional, local, re-

gional and national ISPs. Cache hierar-

chies,createdwhencachesinlower-levelISPs

pointto cachesin higher-levelISPs,are cur-

rentlyprevalentintheInternet[8,9]. Today's

cache hierarchies use static, manually con-

guredpointersto dene the hierarchytree.

Cache hierarchies operate asfollows. When

abrowserrequestsadocument,itsendsare-

questto aleafcache. Thiscachetheneither

servesthe document (if it is cached) or for-

wardsthedocumenttoitsparentinthehier-

archy. Theprocessisrepeatedalongastatic

chainofcachesuntiltherootofthehierarchy

isreached. Ifthereisalsoacachemissatthe

root, the root forwards the request directly

to the originserver. A response is returned

alongthecachechaininthereversedirection.

Cooperatingcachesincachehierarchiesoften

useICP(InternetCacheProtocol)toimprove

andenlargethescopeofthesearch[15].

Cachinghierarchieshaveseveralproblems.

First,requestsforlesspopulardocumentswill

experience misses at all caches in the cache

chain. Fordeephierarchies,thesemisseslead

to poorlatency performance [13]; moreover,

ICP can further degrade performance, since

thecache mustwait forareplyfrom allsib-

lingand parentcachesoruntil atwosecond

(2)

|theydonotpermitabrowseroracacheto

choosethesubsequentcacheaccordingtocur-

rent topology or traÆc conditions. Manual

optimizations, such as sending requests for

certaintop level domainsto designated par-

ent caches,are possible, butevenwiththese

optimizations,thehierarchyforagivenURL

remainsstatic. Third,caching hierarchiesdo

not allow the chain of caches to extend be-

yond theroot cache;if thereisamissat the

rootserver,therequestisforwardeddirectly

totheoriginserver,andnevertoacachethat

liessomewhereinbetweentherootserverand

theoriginserver. This isa problembecause

a cache nearby the origin might be able to

serve an object much faster than the origin

server,especiallywhentheoriginserverruns

on a slow machine or has a low bandwidth

connection.

Inthispaperweproposeanewcooperative

caching scheme that has the following fea-

tures. (1)Atmosttwoservers(includingthe

originserver)arevisitedintherequestchain;

(2) The chain of caches depends on the re-

questedURLandcanchangedynamicallyas

a function of current network topology and

traÆc conditions; (3) An arbitrary cache in

theInternetcanbequeried,includingacache

that isfarfrom thebrowserbut closeto the

origin server. (4) The scheme can be in-

crementallydeployed with minor changesto

DNSservers. Furthermore, althoughathor-

oughperformance study is still required, we

feel the scheme should leadto a substantial

reductionindelayandnetworktraÆcascom-

paredto traditionalhierarchicalcaching.

Our caching scheme makes use of a new

networkapplication, theLocationData Sys-

tem(LDS),whichwealsodeneinthispaper.

TheLDS,denedasanextensionofDNS,al-

lows an arbitrary host to obtain the IP ad-

dresses of the servers that store a specied

URL.TheLDSisofindependentinterest,and

canbeused forotherapplications, including

choosing the best mirrored site for a given

URL. For the case of network Web caching,

wespecifyhowacacheupdatestheLDSdis-

tributed database when it stores and evicts

objects, and how a cache pushes objects to

parent cachesin orderto improvedelay and

bandwidthusage.

caching signicantlyincreasesthenumberof

DNS messages. In this paper we also show

how LDS can be applied to replicated and

partially replicatedservers. Inthiscase, the

amountofDNSmessagesdoesnotincrease.

Thispaperisorganizedasfollows. InSec-

tion 2 we dene the LDS. In particular, we

show how DNS can be extended to provide

the LDSservice. In Section 3weshowhow

network cachescanexploit the LDS; in par-

ticular,wediscusshowthecachesupdatethe

LDSdistributeddatabase,andhowcachesat

higherlevelsin thecaching hierarchycanbe

populated. In Section 4we show howdocu-

mentandsitereplicationcanexploittheLDS.

InSection 5wediscusshowahostcanmake

routingdecisions based on the results of an

LDS query. In Section 6 we discuss related

research on cooperative caching. Section 7

presents directions for future work and Sec-

tion8concludesthepaper.

2 Location Data System (LDS)

Copies of an object, with each object ref-

erenced by the same URL, are often avail-

able from dierent servers in the Inter-

net. These servers include origin servers,

replicated servers (also called mirrored

sites) and cache servers(also called proxy

servers). The origin server for an object is

the serverat which the objectoriginates; it

alwayscontainsanup-to-datecopyoftheob-

ject. A replicated server contains copies of

objects that have been placed into it (typi-

cally manually or by pulling); typically, ob-

jectsin replicated serversare up-to-date. A

replicated server may replicate entire Web

sites or may replicate only portions of vari-

ousWebsites. Cacheserversobtaincopiesof

objectsby pullingthem ondemand. Inpar-

ticular,whenacacheserverreceivesarequest

foranobject,andiftheobjectisnotcached,

thecacheserverretrievestheobjectfroman-

otherserver(whichmaybetheobject'sorigin

server, areplicated server, oranother cache

server), storesacopy ofthe object,and for-

wardsacopytotherequestor. Acachedcopy

(3)

thispaperwerefertoallthreetypesofservers

asobject servers.

ItishighlydesirableforahostintheInter-

nettobeabletodeterminethelocations(i.e.,

theIPaddresses)ofalltheobjectserversthat

containcopiesofaspeciedURL.Inparticu-

lar,ahostwouldliketobeabletogiveanet-

workapplicationaURLandreceivefromthe

applicationalistofalltheobjectserversthat

containtheURL.Inadditiontoreceivingthe

IPaddressesofalltheserversthatcontainthe

object, it is desirable to receive information

about thefreshness of each copy, e.g., when

each copy was last modied orthe \age" of

theobject(asdened intheHTTP/1.1[2]).

In this section weoutline a new network-

ing application that provides the service of

mapping a URL to a list of object servers

that contain the URL, with each server on

thelist having associated freshnessinforma-

tion. We refer to this system as the Loca-

tion Data System (LDS). Wehavetaken

aratherpragmaticapproachindesigningthe

LDS. Our principlegoal has been to design

a systemthat can be rapidly deployed with

incremental changes to the existing Internet

infrastructure. A second goal is scalability,

i.e., asystem that is decentralized and hier-

archical. A third goal is performance, that

is,asystemthat quicklyreturnsthelocation

data while injectinga minimumof overhead

traÆcintothenetwork.

Figure 1 shows the basicmapping service

providedbyLDS.Inthegure,theclienton

the left has a URL, it gives it to the LDS

systemwhich returnsalist of objectservers

thatcontaintheobject. Insteadofthewhole

URL,theclientcanaswellgiveonlyaprex

oftheURLtotheLDSblackboxwhichthen

returns a list of object servers that contain

URLs matching the prex. In the simplest

case, the prex is only the hostname of the

URL;inthiscase,theLDSserviceisidentical

to theDomainName System,operationalin

theInternetforover20years.

Given the similarities in the service pro-

vided by both DNS and LDS, we have de-

cided to base the design of LDS on DNS.

In fact, LDS can be implemented by mak-

ing minor extensions to BINDname servers

Black LDS

List of object Box servers that contain object

LDS

URL Client

Figure1: LDSservice

and to the DNS protocol. Recall that DNS

provides a mapping of hostnames to IP ad-

dresses. (DNScan returnmorethan one IP

address for a hostname for mirrored sites.)

DNSusesahierarchyofnameserversto im-

plement a distributed database of resource

records, whereby a resource record contains

amapping.

We now present an overview of the LDS

design;weprovidemoredetailsin thesubse-

quent subsections. As does DNS, LDS uses

a hierarchy of servers to implement a dis-

tributeddatabaseofresourcerecords. Were-

ferto theLDS serversaslocationservers.

EachLDSresourcerecordmapsaURLtoan

objectserver,withassociatedfreshnessinfor-

mation. EachURLhasone(ormore)author-

itativelocation servers(To conveythe main

idea,we initially assumethat each URLhas

exactly one authoritative server.) The au-

thoritative location server contains a list of

resourcerecords for the URL, that is, a list

ofobjectserversthatcontaintheURL(along

withthefreshness information). Otherloca-

tionserversmaycontaincachedcopiesofthe

listof resourcerecords. In ourbasic design,

hostsquerylocationserversinamannerthat

isfully analogousto theDNS protocol. The

sequence of query and reply events is illus-

tratedin Figure2.

In Figure 2, C is the querying host (e.g.,

a browser), O

0

is the origin server for the

queriedobject,machinesO

1 {O

4

areotherob-

jectservers(e.g., caches), andmachinesL

1 {

L

4

arelocationservers.

Thequerying host sends a location query

to its local location server (L

1 ). If L

1 does

(4)

L ₂

C O ₀

L ₁

O ₁ O 3

O ₂

O ₄ L ₃

L ₄

1 8

2

7 9 3

9 6

9 5 4

Figure 2: Sequence of Location Queries and

Replies

sends a query to a root location server

(L

2

). If L

2

does not have the location in-

formation cached, it returns the address of

alocation server responsible for the domain

of the origin server in the query (L

3 ). L

1

sends a new location query to L

3

and if L

3

doesnothavethelocationinformation,itre-

turns the address of the authoritative loca-

tionserver(L

4

). Finally, L

1

queriesthe au-

thoritativelocationserverandreceivesthelo-

cationinformation. LocationserverL

1 then

sends the information to the querying host

andalsocachestheinformation. (In thisex-

ample we assumed that all queries between

LDSserversareiterative;recursiveandcom-

binations of recursiveand iterative can also

be used. Like DNS,LDS isnotbased onei-

therquerytypebeingused. Wealsoassumed

thatthereisoneintermediatelocationserver

betweentherootandauthoritativeservers;in

practicethere canmoreorless.)

Suppose that thereply indicated that ob-

ject servers O

0 , O

1

, and O

3

contain copies

oftheURL.Thequeryinghostthenchooses

toretrievetheURLfrom oneoftheseobject

servers. Ideally, the hostchooses theobject

serverthatwilldelivertheobjectthefastest.

(We will discuss how this choice might be

madeinSection5.)

Giventhesimilaritiesbetweenourbasicde-

sign of the LDS and the DNS, we now ad-

dress whether the LDS can implemented in

the existing DNS, or within a slightly ex-

leadto rapiddeploymentoftheLDS.

2.1 Resource Records, Query

Messages and Reply Messages

Forlocation requestsanew DNSresource

record type is needed. It is similar to the

standardA-typeresourcerecord,whichmaps

a hostname to an IP address. We now de-

scribethisnewDNSresourcerecordtype.

Each resource record of this type must

contain an object identier. In order to

be compatible with standard DNS queries,

object identiers must be encoded in a

standardized way. Using this encoding,

the query appears like a normal DNS

A-query and can be resolved without

changing any parsing routines in DNS

servers. Figure 3 shows how the URL

http://www.eurecom.fr/~bob/index.html

wouldbeencoded.

t p 3 w w w 7 e u r e c o m 2 f r 0

4 h t m l 5 i n d e x 4 ~ b o b 4 h t

Figure3: RepresentationofaURL

ThisencodingallowsforeÆcientcompres-

sionofidentierssharingacommonpart,just

aswhen queryingthe addressesof all of the

serversinagivendomain. Sincetheidentier

canbeviewedasageneralizedhostname,all

thenormalfacilitiesofDNScanbeused.

We mentionthat there is a 255character

limitonhostnamesinDNSquerieswhichim-

plies that URLs longer than 255 characters

will have to be truncated. Also, the length

ofasinglelabel(e.g.,onecomponentofhost-

nameorURL path)is limited to 63 charac-

ters. We do not expect these limits to be

aproblem. Wealso mention that URLs are

case-sensitive but DNS does not guarantee

case-sensitivetreatment. Nonetheless,wedo

notexpectthistobeamajorissuesincemost

URLs cannot be confused with other URLs

evenifcaseisnotpreserved.

We studied the access log from one of

(5)

ceededthese limits. Evenifthese 122URLs

were truncated to conform to DNS limita-

tions, we did not observe any collisions be-

tween two dierent URLs. Neither did we

ndanyURLswherecase-insensitivitywould

havepresentedanyproblems. Therefore, we

donotproposeanymethodsforhandlingsuch

URLsbeyondsimpletruncation.

The reply will include several resource

records, one for each object server holding

a copy of the URL. Each of these resource

recordshas atime-to-live(TTL) eld which

species how long that resource record can

be cached at a DNS server. This TTL-

informationcaneitherbespeciedbytheau-

thoritativelocationserveror,forabetteres-

timate, by the object server. If the location

serversetsthisTTL-eld,itwillbethesame

for allresource recordsfrom that server. If,

ontheotherhand,thiseld issetbytheob-

jectserver,itallowstheobjectservertocom-

municateinformationonhowlongtheobject

islikelyto beavailableat thatserver.

The actual data section of the resource

record (RDATA) contains the IP-address of

theobjectserverandsomefreshnessinforma-

tionabouttheobject.Thisinformationcould

be for examplethe last modication date of

theobject,whichprovidesaneasywayofdis-

tinguishing between possible stale copies of

theobject. Inthiscase,theauthoritativelo-

cationserverwouldhavetoperiodicallyquery

theoriginservertogetanup-to-datemodi-

cationdatefortheoriginalobject.

Anotherpossiblepieceofinformationthat

couldbeincluded in theresourcerecord,in-

stead of the last modication date, is the

\age"oftheobject,as denedinHTTP/1.1.

Anobjectservercoulddeterminethislocally,

and a small age would refer to a relatively

freshcopy. Thismethodwouldnot,however,

oerthesameguaranteesonobjectfreshness

aswouldthelast modicationdate.

Yetanotherpossibilityistoincludesimple

TTL-information, but this would be redun-

dant, sincethe resourcerecord by denition

includesaTTL-eld. ATTL-estimateisuse-

fulwhenthequeryinghostdecideswhichob-

jectserverto use. Objectserverswithshort

cisionmakingprocess.

Because all location servers are allowed

to cache LDS replies, some location servers

may havestale location information in their

caches. Whenthestatusofanobjectchanges

atanobjectserver,theobjectservernoties

the authoritative location server. This up-

dateis not,however,reectedonthecached

copies of the location information. If stale,

cached locationinformationis deliveredto a

queryinghost,the queryinghostmaydecide

to forwardthe request to a Web cache that

hasevictedtheobject.

Onesolutionistousetheonly-if-cached

directive of HTTP/1.1 when requesting ob-

jectsfrom Web caches. If the distant cache

hasthe objectcached, itwill returnthe ob-

jectfrom the cache; otherwiseit will return

anerrortothequeryinghostindicatingthat

theobjectisnolongercached. Thequerying

host will then choose another object server

from the list. We are currently looking into

waysofreducingtheamountofstalelocation

informationinLDS.

2.2 UpdatingtheLocationServers

Theobjectserversneedto keeptheinfor-

mation at the authoritative location server

up-to-date. Forthis we needanewprotocol

that is used to exchange information about

newcopies of objects. Forexample, when a

cache caches a new object, it must send a

messageto the authoritative location server

responsible for this object, indicating that

the object has been cached. This message

shouldalsoincludetheinformationwhichwill

be present in the resource record (e.g., last

modicationdate)aswellasaTTL-estimate.

Similarly,whenthecacheremovesanobject,

itmustsendamessagetothelocationserver

indicatingthattheobjectisnolongercached.

Since the location data is managed over

DNS,thedynamicupdatefacilityofDNS[14]

can be used to dynamically update the lo-

cation information. With this method, it is

possible for ahostto add ordelete resource

(6)

desired.

UsingthedynamicupdatefacilityofDNS,

anobjectserversendsanupdatemessageev-

erytimethestatusoftheobjectchanges. For

a Web cache this change of status could be

thecaching ofanewobject,aremovalofan

objectorcachinganewversionoftheobject

aftertheobjecthasbeenmodiedattheori-

ginserver. Inorder toreduce theamountof

update traÆc, object servers can send their

reports in batches. This isespecially bene-

cialfor Web caches that can send theinfor-

mationaboutanHTMLpageandallinlined

imagesin onemessage,therebyreducingthe

overheadconsiderably.

2.3 Implementation

Implementing the LDS is relatively

straight-forwardand can be done incremen-

tally. Since the system is an extension to

DNS,nonewserversneedtobecreatedand

introduced.

For a DNS server to be LDS-capable,

it needs to be able to interpret a new

querytypecodeand the associatedresource

records. DNSalreadyhandlesnumerous dif-

ferent query and resource record types, so

addingonemoreissimple. Furthermore,the

authoritative serverhas to beable to main-

tain the location information database and

constructresourcerecordsfromthisdatabase

toincludeinrepliestoqueries.Althoughthis

URLdatabasecontainsmoredatathananor-

mal DNS database, it can be maintained in

asimilar way. For aWeb cache to be LDS-

capable,itsimplyhasto beableto sendthe

updatemessagesto theauthoritativeserver.

Thearchitectureallowsforincrementalde-

ployment,sinceifacontentproviderdoesnot

operate a location server, no LDS resource

recordswillexist. IfnoLDSresourcerecords

are available, the querying host would for-

wardallrequestsusingastaticpolicy,forex-

ample,forwardingthemdirectlytotheorigin

server or a parent in a cache hierarchy. If

LDS recordsexist,then LDS-enabledclients

normal, static methods for nding objects.

Intheearlyphasesofdeployment,thetradi-

tional hierarchyshould bekeptasafallback

forclientsandserversthat donotimplement

thenewprotocols.

3 Network Web Caching

In this section we propose a new cooper-

ativecachingschemethat exploits theLDS.

We assume that browsers forward their re-

quests through aproxy cache in addition to

thebrowser cache. Thismethod is currently

widely used by ISPs and other institutions,

suchascompaniesanduniversities,tooera

betterservicetotheirclientsaswellastore-

ducethe out-goingtraÆc. Also, ifa rewall

isused, thenall requestsmust go througha

proxy at the rewall in any case, so adding

a cache there does not present a signicant

overhead.

We now describe our cooperative caching

scheme. A browser rst sends anHTTP re-

questforanobjecttoitsproxycache. Ifthe

proxy cache does not have the object, the

proxy cache invokes LDS to obtain alist of

allthe object servers(i.e., originserverand

someothercachesin the network) that con-

taintheobject. Theproxycachethenchooses

the\best"objectserverfromthelistandfor-

wardstheHTTPrequesttothisobjectserver

(seeSection5). Theproxyreceivestheobject

fromthebestobjectserverandforwardsthe

objecttothebrowser.

Ourcachingschemehas several appealing

features. First,atmosttwoserversarevisited

to retrieve an object. Standard hierarchical

cachingschemes(suchasNLANR[8]andRe-

nater[9])cancauserequeststopassthrough

alargenumberofobjectserversbeforeacopy

oftheobjectisfound,which canseverelyin-

creaseobjectretrievaltime[13]. Second,the

schemeallowstheproxyservertochoosefrom

alltheobjectserversthatcontaintheobject.

Letuslookatacoupleofscenarios:

LDSreturnsalistofobjectservers,with

(7)

andalltheother objectserversonmore

distant ISPs. In this case, the proxy

server would choose the cache in the

neighboringISP.

onecacheserveronthesamecontinentas

theproxyserver,andalltheotherobject

serversontheother sideoftransoceanic

links. In this case, the proxy server

choosesthecacheinitscontinent.

alltheobjectserversfarfrom theproxy

and near or in the origin server's ISP.

In this case, the proxy may still prefer

tochooseoneofthecachesovertheori-

ginserver: Theoriginservermayrunon

a slow machine orhave a slow network

connection.

Itisimportanttonotethattraditionalhierar-

chicalcachingdoesnotpermitthelastoption.

Withhierarchicalcaching,ifthereisamissat

therootcache,therootcacheforwardsthere-

questdirectlytotheoriginserver. Thus our

scheme enables the proxy to select from all

caches that are on the \route" between the

proxyandtheoriginserver.

Ourcachingscheme doeshaveanunusual

peculiarity. Intheschemejustdescribed,the

proxy server will always retrieve an object

either from the origin server or from some

other proxy server. Because all the proxy

servers are near the bottom of the Internet

hierarchy(in institutionalISPsorlocal resi-

dentialISPs),withtheexceptionoftheorigin

server,all copiesof an object arestored, es-

sentially,intheleavesoftheInternet. There

are, however, several compelling reasons to

also store copies of objects higher upin the

Internet hierarchy, in particular in regional

and national ISPcaches. First,the request-

ing proxy may be ableto retrieve anobject

morequicklyifitisavailablefromacommon

parent(orgrandparent)ofsomeother proxy

cache that has the object. Second, hierar-

chicalcachingprovidesanasynchronousmul-

ticast infrastructure, which can signicantly

reducebandwidthusageintheInternet[10].

Giventhatcacheserversarepresentatre-

gional and national caches into hierarchies,

wenowmodifyourcachingschemetoexploit

thehigher-levelcaches.

3.1 Populating Caches

Inordertoexploit thehigher-levelcaches,

we must make sure that the higher level

caches obtain copies of the objects cached

at lowerlevels. We propose that lower level

cachespushobjectstotheirhigherlevelpar-

ents. This way the objectsare easily avail-

ableforsiblings under the sameparent,and

requestsfromdistanthostscanbesatisedat

ahigherlevelinthenetwork.

The details of our pushing strategy are

as follows. A low level cache periodically

sends a list of new objects (with last modi-

cationdates) to itsparent. Theparent de-

cideswhichoftheobjectsinthelisttocache

andretrievesthesefromthechildcache. This

is done at every level in the hierarchy and,

in the end, the caches are lled up as they

would have been using traditional hierarchi-

calcaching. Cachesatthelowestlevelshould

send informationaboutevery object, but at

higherlevelsacache mightuse somelocally

denedpolicy(e.g.,objectsthatarecachedin

multiplechildcaches)todecidewhichobjects

toreportto itsparent.

3.2 Network TraÆc

Since LDS increases the number of mes-

sagessentoverthe network,itmayoverload

somelinksorserversandinfactdegradeper-

formance. We are currently performing an

analysis of how much LDS increases current

networktraÆc. Thefollowingarethekeyfac-

tors:

1. LDS queries and replies: Although

thesearenormalDNSmessages,anLDS

query is sent every time a proxy needs

toretrieveaURLthatitisdoesn'thave

cached. Caching LDS recordsat thelo-

cation servers will reduce some of this

(8)

ordinaryDNS,since aURLreference is

morespecicthanahostnamereference.

2. Update traÆc: Whenthestatusofan

objectchangesinacache,thecachemust

send an update messageto theobject's

authoritativelocationserver. Forpopu-

lar, widelycached objects, this may re-

sult in too much traÆc directed at the

authoritativeserver.

In order to reduce the amount of LDS

lookup traÆc in the Internet, we propose a

variation to our basic LDS scheme. In this

variation, once we getthe locationinforma-

tion for an HTML-page we assume that all

theinlinedobjectsonthepagearealsoavail-

ablefromthesamesetofobjectservers. One

advantage of this scheme is that it heavily

cutsdownontheamountofLDStraÆccom-

paredtothatbasicschemesincewenowonly

send one LDS query for each HTML-page

instead of each URL. The disadvantage of

thevariationis that there arenoguarantees

thattheinlinedobjectsareavailablefromthe

sameobjectservers. Originserversandrepli-

cated servers should have the objectsbut a

cachemayhavealreadypurgedsomeorallof

them. Ifthe objectserverdoesnothavethe

requested object, we will do an LDS query

forthat objecttoobtainup-to-date location

information.

InSection4wewillshowhowtheamount

of LDS traÆc can be signicantly reduced

when the LDS system is restricted to repli-

catedservers.

To address the issue of update traÆc, we

proposethat anycachethat hasaparentre-

frain from sending update information. In-

stead, the update information is forwarded

totheparentduringthepushingphase. The

highestparentthat doesnot havea copy of

the object sends to the authoritative server

an update messagethat containsupdate in-

formationforitself anditsupdatedchildren.

Thebenetisthattherearesignicantlyless

higherlevelcachesthanlowlevelcaches,thus

much lessupdate traÆc.

Intheabovescheme,thehighestparentin

the hierarchy sends update messages for all

itschildrenwhicharethusallincludedinthe

databaseatthe locationserver. As aresult,

institutional caches would also be included

in the list returned by the location server,

whichmeansthattheycouldreceiverequests

for cached objects from hosts outside their

own network. It is desirableto keepthe in-

stitutional caches othe list oflocations for

tworeasons. First,institutions likely donot

wantoutsidehosts using theirbandwidthto

retrieve objects. Second, objectsat institu-

tional caches are likely to be cached in the

parentcachesand outsidehosts canretrieve

objectsfasterfromtheparentcache.

Cache digests [11] are used in traditional

hierarchies to represent cache contents in a

compressedform. Acachefetchesthedigests

ofitsneighborcachesandwhenarequestcan-

notbesatisedlocally, the cache checks the

digestsoftheneighborcachestoseewhether

anyofthemhavetheobjectcached. Ifso,the

objectisretrievedfromthatcache;otherwise

therequestisforwardedtotheparentcache.

In LDS-based caching, having the digest of

the parent cache is useful because this way

weavoidtheLDS-lookupforobjectsthatare

cached at theparentand would in any case

beretrievedfrom there.

4 Replicated Servers

TheLDSschemeisnotlimitedtocaching.

Itcanalsobeusedtodisseminateinformation

aboutreplicated ormirroredobjects. When

an object is replicated at another server,

this information is added into the location

databaseattheauthoritativelocationserver.

Whenaclientrequeststhisreplicatedobject,

it doesan LDS lookup and gets a list of all

serversholdingacopyoftheobject. Because

theinformationisnotdynamicallyreplicated

asincaching,butratherplacedatwell-chosen

locations,thereisrarelyneedtonotifythelo-

cationserversofnewcopies. Likewise,there

(9)

line.

With LDS, a user addresses a replicated

objectbythe object'suniqueURL,LDS re-

turns the list of servers that have the ob-

ject. Thebrowserthentransparentlychooses

one of these (the best in some sense). The

user would not have to know about the ac-

tualphysicallocationoftheobject.

4.1 Reducing LDS Query TraÆc

The scheme just described for replicated

serversstillrequiresthatclientssendanLDS

lookupforeachURL.Inorder toreduce the

numberofLDSmessagesweproposethefol-

lowingmethodofreplicatingserversandstor-

ing information about the replicated URLs.

In this scheme, the LDS no longer keeps

track ofcachedobjects. It onlytracksrepli-

catedandpartiallyreplicatedservers. Were-

quire that replicated serversreplicate either

whole sites orcomplete sub-treesof servers.

An example sub-tree of an origin server is

all the objects on the origin server below

a given URL prex, e.g., all objects under

http://cnn.com/sports/.

Suppose that a client wants to retrieve a

URL. Normally, a client would send a DNS

lookuptoobtaintheIPaddressoftheorigin

serverandretrievetheobjectfromthere. Us-

ingLDStheclientproceedsasfollows. First,

the client sends an LDS queryfor the host-

nameintheURL.TheLDSsystemreturnsa

listofallserversthatmirroreitherthewhole

siteoracompletesub-treefromthesite. We

need to extend the resource record dened

in Section 2.1 to include a prex indicating

the sub-tree that is mirrored by that par-

ticular server. The client then chooses the

\best" serveramong those serversthat mir-

rorthedesiredsub-treeandforwardstheac-

tual HTTP-request to this server. A server

may mirror multiple sub-trees from a single

originserver; in this case LDS would return

oneresourcerecordforeachsub-tree.

Comparedtothecachingschemediscussed

in Sections 2 and 3, this scheme has sev-

onlyone LDS lookupfor each server;in the

cachingscheme,theclientmustsendanLDS

lookup for each URL. The caching scheme

drastically increases DNS traÆc in the net-

work whilethemirroring schemekeepsDNS

traÆc at its current level. (Recall that a

client would in any case have to do a DNS

lookup to get the IP address of the ori-

gin server.) Second, objects at mirrored

serverstendtostayatthemirroredserversfor

longperiods of time while objects in caches

are cached and purged dynamically. Hence,

LDS information for a mirrored server can

be cached longer at location servers which

greatly reduces lookup traÆc. When the

authoritative LDS server sends information

about a mirrored site, it should rst send

outinformationaboutmirrorsthatmirrorthe

mostofthesite. ThisisbecauseDNSreplies

are by default sent over UDP and the mes-

sagesize is severely limited (512 bytes). By

sendinginformation aboutmirrorsthat mir-

rorlargesubtrees,wemaximizethelikelihood

thattheclienthasthenecessaryinformation

inthereply.

ComparedtothestandardDNS-wayofac-

cessing objects, our mirroring scheme does

notincreasethenumberof DNSmessagesin

thenetwork. Thereplymessagesareslightly

larger,however,sinceweneedtoindicatethe

prexin theresourcerecord. We donotex-

pectthisincreaseinsizetobesignicantsince

URLprexescanbeeÆcientlyencodedusing

theencodingspeciedinSection2.1andDNS

resourcerecordpointers.

5 Routing Decision

Whenthecachehasreceivedareplyto its

LDS-lookupcontainingseveralobjectservers

(cachesandoriginservers),thenextquestion

is: \Whichoneofthepossibilitiestochoose?"

Determiningthebestalternativeisanimpor-

tanttopicofourongoingresearch.

Possiblesolutionsinclude:

All querying hosts measure connection

(10)

PassQoSinformationalong in LDSup-

datesandreplies.

Use explicit routing information from

BGP.

We will now provide some details on how

thesemethodscouldbeused.

Intherstoption, allqueryinghosts(e.g.

low level Web caches) measure the time it

takestofetchobjectsfromotherservers(Web

serversorothercaches). Thisdownloadtime

gives a crude estimate of the actual band-

width between the two hosts. The query-

inghostkeepsalistof serverswithwhich it

has had good connections and prefers those

serversto otherswhen bothtypes ofservers

arepresentin theLDSreply. Ofcourse, the

actualnetworkconditionschangeallthetime,

but the results presented in [7] on perfor-

mance characteristics of mirrorserversindi-

cate,that from alarge setof mirrorservers,

onlyasmallnumberneedtobeconsideredas

candidatesfordownload. Althoughthestudy

in[7]usesaxednumberofmirrorsites,we

believethattheresultscanbeappliedtosit-

uationswhere thenumberofserverschanges

dynamically.

Thesecond optionistopasssomeQoSin-

formationin the LDS repliesalong with the

age information about the object. For ex-

ample, a Web server can measure the con-

nection times of incoming connections, esti-

mate theconnectionbandwidthand takean

averageoftheestimatedbandwidthsoverall

connections. Thisaveragebandwidthcanbe

seenas thebandwidth that a randomclient

somewhere in the network could expect to

getwhenrequestingobjectsfromthisserver.

Likewise,cachescanmeasurethebandwidths

ofallout-goingconnectionsandcalculatethe

averagebandwidth.

The third option uses explicit routing in-

formation obtained overBGP by talking to

localrouters. Thisinformationgivesthehost

atopological map of the network which can

beusedtondouthowfareachoftheservers

intheLDSreplyare. Unfortunately,routing

informationgivesonlyreachability,notqual-

al.[4]havestudiedusingroutinginformation

in request routingand have decided against

usingBGPinformationbecauseofcongura-

tionandsecurityissues,amountofdataand

diÆculty of obtaining it. Their approach is

basedonusingwhois-services.

Regardless of the approach chosen, any

queryinghostcan implement any local poli-

cies necessary when deciding where to for-

ward arequest. For example, acache oper-

ator may want to forward requests to other

caches based on the domain of the origin

server.

6 Related Research

In[3]Gadde etal. comparetraditionalhi-

erarchicalcachingwithanarchitecturewhere

asingle centralized serverholds information

aboutwhereeachdocumentiscached. When

a low level cache wants to nd out where

an object is cached, it sends a message to

this central mapping server which responds

by either redirecting the request to a peer

serverorbyforwardingittotheoriginserver.

This scheme resultsin allof the objectsbe-

ing cached at institutional caches. The au-

thors alsoperformsimulationsusingasmall

numberofcachesand nd outthat theper-

formanceofthecentralizedsolutionisonav-

erage better than that of the traditional hi-

erarchy. The numberof caches in the simu-

lations is low, only 8 and 32 cache congu-

rations are simulated. The authors express

theirdoubtsaboutthescalabilityoftheirso-

lution,butpresentsomeideasforreplicating

anddistributingthelocationinformation.

Inour approach, the queryinghostgets a

listofallservers holdingacopyoftheobject

insteadofbeingredirectedtooneofthembya

mappingserver. Thisputs thequeryinghost

incontrol overwheretherequestwillbefor-

warded. This decisionmight greatlydepend

onlocal conditionsand policies unknown to

a central server. Another dierence in our

approach is that the location information is

distributedoverDNSwhichisubiquitousand

(11)

where data is cached near the browser and

location hints are passed through a meta-

data hierarchy. Their architecture arranges

thecachesinvirtualhierarchies,oneforeach

object, for distributing the metadata infor-

mation. In their architecture, when acache

cachesacopyofanobject,itsendsoutaloca-

tionhintindicatingtheURLanditsaddress.

This hint ispropagated in the virtualmeta-

datahierarchyandcouldeventuallyreachall

caches in the system. Caches keep track of

allthehintstheyhavereceivedandusethem

toforwardrequests. If acachereceivesare-

questforanobjectwhichisnotcachedlocally

but thehintsindicate another cache holding

the object, the request is forwarded to this

cache. Thevirtual metadatahierarchies are

constructedusing IP-addressesandURLs in

away which tries to minimize the distances

betweenparentsandchildren. Also,thehier-

archyguaranteesthatacachecanreceiveonly

onehintforanyobject. Thishintislikelyto

befromthenearestcacheholdingtheobject,

butthisisnotguaranteed.

One dierence between their and our ap-

proaches is the way caches are placed. In

their architecture only institutional caches

holdobjectsandeveryrequestforwardedus-

ingahintwouldbeforwardedtoanotherin-

stitutionalcache. Thisbehaviormightbeun-

acceptable to some people who do notwant

more external traÆc on their networks. In

our approach, objects are pushed to higher

levels andthis eliminates theneed for going

to other institutionalcaches. Secondimpor-

tantdierenceisthatusingLDSforlocating

objects,aqueryinghostgetsalistofallpos-

sible locations. It can then choose from the

list accordingto alocal policy orby adapt-

ing to current network conditions. In [13] a

cache receivesonly one location hint. Since

the virtual metadatatrees are based on IP-

addressesandURLs,therearenoguarantees

thatthishintindicatesagoodserver. Weare

currentlyperformingacomparisonofthetwo

schemes.

Amiret al.[1]study three dierent meth-

odsforredirectingrequeststomirroredsites{

HTTPredirection,DNSroundtriptimemea-

surements,and sharedIPaddresses. Intheir

DNS-basedsolutiontheyuseDNSroundtrip

replicated server for a client. They place

the replicated servers near an authoritative

DNS server which gives the address of the

replicatedserverwhenqueried fortheorigin

server.Thisresultsinclientsbeingeventually

redirected to their closest replicated server.

Therearetwomajordierencesbetweentheir

approachand ourreplicated serverapproach

(Section4.1). First,intheirsystemreplicated

serversmust always replicate the entire site

while in our system a server may replicate

onlysub-trees oftheoriginalsite. Second,in

their system the replicated servers must be

placedincloseproximityoftheauthoritative

DNSserver. Ourarchitectureplacesnocon-

straintsontheplacementoftheservers.

7 Future Work

WewillcontinueourworkonLDSbyper-

forming quantitativecomparisons of the dif-

ferent architectures presented in this paper.

In particular we will concentrate on evalu-

ating the amount of new traÆc in the net-

work causedby thelookupand updatemes-

sages. We will also compare the traditional

cachinghierarchywith ourdierentLDSar-

chitectures to nd out how much objectac-

cesslatencycanbereducedthroughLDS.We

willalsocloselystudytherequestroutingis-

sueinordertondouthowmuchinformation

isneededto makegoodroutingdecisions.

8 Conclusion

InthispaperwehavepresentedtheLoca-

tionDataSystem,anewnetworkapplication

that can be used to locate where copies of

objectsarestoredontheWeb. LDSprovides

a service similar to DNS and can be imple-

mentedasanextensiontoDNSanddeployed

incrementally. Wehavepresentedtwoappli-

cationsof LDS, locating objectsin mirrored

servers and locating objects in Web caches.

We have also discussed how clients can de-

cide the best server to forward requests to

(12)

fromthenetwork.

Acknowledgments

Wewould liketothankNeilze Dortafrom

INRIA forthediscussionswehadon access-

ingmirroredsites and herinput onthesub-

ject. TheloglesusedinSection2.1werepro-

videdbyNationalScienceFoundation(grants

NCR-9616602 and NCR-9521745), and the

NationalLaboratoryforAppliedNetworkRe-

search.

References

[1] Y. Amir, A. Peterson, and D. Shaw,

\Seamlessly Selecting the Best Copy

from Internet-Wide Replicated Web

Servers", in Proc. 12th International

Symposium on Distributed Computing

(DISC'98), Andros, Greece, September

1998.

[2] R. Fielding, J. Gettys, J. Mogul, H.

Frystyk,andT.Berners-Lee,\Hypertext

Transfer Protocol { HTTP/1.1", RFC

2068,January1997.

[3] S.Gadde,J.Chase,andM.Rabinovich,

\DirectoryStructuresforScalableInter-

netCaches",TechnicalReportCS-1997-

18, Department of Computer Science,

DukeUniversity,1997.

[4] C. Grimm, J.-S. Vockler, H. Pralle,

\Request Routing in Cache Meshes",

in Proc. 3rd Web Caching Workshop,

Manchester,UK, June1998.

[5] P.Mockapetris,\DomainNames{Con-

ceptsandFacilities",RFC1034,Novem-

ber1987.

[6] P. Mockapetris, \Domain Names { Im-

plementation and Specication", RFC

1035,November1987.

[7] A. Myers, P. Dinda, and H. Zhang,

port CMU-CS-98-157, School of Com-

puter Science, CarnegieMellon Univer-

sity,1997.

[8] A Distributed Testbed for Na-

tional Information Provisioning,

<URL:http://ircache.nlanr.net>.

[9] Reseau National de telecommuncations

pour la Technologie,

l'Enseignement et la recherche,

<URL:http://www.renater.fr>.

[10] P. Rodriguez, K. W. Ross,E. W. Bier-

sack,\ImprovingtheWWW:Cachingor

Multicast?", in Proc. 3rd Web Caching

Workshop,Manchester,UK,June1998.

[11] A. Rousskov, D. Wessels, \Cache Di-

gests",inProc.3rdWebCaching Work-

shop, Manchester, UK,June1998.

[12] Squid Internet Object Cache,

<URL:http://squid.nlanr.net>.

[13] R. Tewari, M. Dahlin, H. M. Vin,

and J. S. Kay, \Beyond Hierarchies:

Design Considerations for Distributed

CachingontheInternet",TechnicalRe-

portTR98-04,DepartmentofComputer

Sciences, UniversityofTexasatAustin.

[14] P. Vixie, S. Thomson, Y. Rekhter,and

J.Bound,\DynamicUpdatesintheDo-

main Name System (DNS UPDATE)",

RFC2136, April1997.

[15] D. Wessels, K. Clay, \Internet Cache

Protocol (ICP), version 2", RFC 2186,

September1997.