Using the Domain Name System
Jussi Kangasharju, Keith W. Ross
Institut Eurecom
Sophia Antipolis, France
fkangasha,rossg@eurecom.fr
James W. Roberts
France Telecom { CNET
Issy les Moulineaux, France
Abstract
Inordertoreduceaveragedelayandband-
width usage in theWeb, geographicallydis-
persed servers often store copies of popular
objects. Forexample,with network caching,
the origin server stores a master copy of
theobjectandgeographicallydispersedcache
servers pull and store copies of the object.
With sitereplication, objectsstored at mas-
terarereplicatedintosecondarysites. Inthis
paperweproposeanewnetworkapplication,
LocationDataSystem(LDS),that allowsan
arbitrary hostto obtainthe IP addresses of
the servers that store a specied URL. Our
networkingapplicationisanextensiontothe
Domain Name System (DNS), requires only
small changes to the domain name servers,
and can be deployed incrementally. For the
case of network Web caching, we elaborate
on our proposal to allow a cache to (i) up-
dateadistributeddatabasewhenitstoresor
evictsobjects,and(ii)pushobjectstoparent
caches in order to improve delay and band-
widthusage. Forthecaseofmirroredservers,
we showhow aclient canobtainalistof all
servers mirroring all or part of the desired
site. LDSapplied to partially mirroredsites
generatessubstantiallylessDNStraÆc than
LDS applied to caching. Finally, we discuss
howahostcanusethelocationdatainorder
to make intelligentdecisionsaboutwhere to
retrievedesiredobjects.
1 Introduction
Networkcachingofdocumentshasbecome
a standard way of reducing network traÆc
and latency in the Web. Caches are cur-
rently employed in institutional, local, re-
gional and national ISPs. Cache hierar-
chies,createdwhencachesinlower-levelISPs
pointto cachesin higher-levelISPs,are cur-
rentlyprevalentintheInternet[8,9]. Today's
cache hierarchies use static, manually con-
guredpointersto dene the hierarchytree.
Cache hierarchies operate asfollows. When
abrowserrequestsadocument,itsendsare-
questto aleafcache. Thiscachetheneither
servesthe document (if it is cached) or for-
wardsthedocumenttoitsparentinthehier-
archy. Theprocessisrepeatedalongastatic
chainofcachesuntiltherootofthehierarchy
isreached. Ifthereisalsoacachemissatthe
root, the root forwards the request directly
to the originserver. A response is returned
alongthecachechaininthereversedirection.
Cooperatingcachesincachehierarchiesoften
useICP(InternetCacheProtocol)toimprove
andenlargethescopeofthesearch[15].
Cachinghierarchieshaveseveralproblems.
First,requestsforlesspopulardocumentswill
experience misses at all caches in the cache
chain. Fordeephierarchies,thesemisseslead
to poorlatency performance [13]; moreover,
ICP can further degrade performance, since
thecache mustwait forareplyfrom allsib-
lingand parentcachesoruntil atwosecond
|theydonotpermitabrowseroracacheto
choosethesubsequentcacheaccordingtocur-
rent topology or traÆc conditions. Manual
optimizations, such as sending requests for
certaintop level domainsto designated par-
ent caches,are possible, butevenwiththese
optimizations,thehierarchyforagivenURL
remainsstatic. Third,caching hierarchiesdo
not allow the chain of caches to extend be-
yond theroot cache;if thereisamissat the
rootserver,therequestisforwardeddirectly
totheoriginserver,andnevertoacachethat
liessomewhereinbetweentherootserverand
theoriginserver. This isa problembecause
a cache nearby the origin might be able to
serve an object much faster than the origin
server,especiallywhentheoriginserverruns
on a slow machine or has a low bandwidth
connection.
Inthispaperweproposeanewcooperative
caching scheme that has the following fea-
tures. (1)Atmosttwoservers(includingthe
originserver)arevisitedintherequestchain;
(2) The chain of caches depends on the re-
questedURLandcanchangedynamicallyas
a function of current network topology and
traÆc conditions; (3) An arbitrary cache in
theInternetcanbequeried,includingacache
that isfarfrom thebrowserbut closeto the
origin server. (4) The scheme can be in-
crementallydeployed with minor changesto
DNSservers. Furthermore, althoughathor-
oughperformance study is still required, we
feel the scheme should leadto a substantial
reductionindelayandnetworktraÆcascom-
paredto traditionalhierarchicalcaching.
Our caching scheme makes use of a new
networkapplication, theLocationData Sys-
tem(LDS),whichwealsodeneinthispaper.
TheLDS,denedasanextensionofDNS,al-
lows an arbitrary host to obtain the IP ad-
dresses of the servers that store a specied
URL.TheLDSisofindependentinterest,and
canbeused forotherapplications, including
choosing the best mirrored site for a given
URL. For the case of network Web caching,
wespecifyhowacacheupdatestheLDSdis-
tributed database when it stores and evicts
objects, and how a cache pushes objects to
parent cachesin orderto improvedelay and
bandwidthusage.
caching signicantlyincreasesthenumberof
DNS messages. In this paper we also show
how LDS can be applied to replicated and
partially replicatedservers. Inthiscase, the
amountofDNSmessagesdoesnotincrease.
Thispaperisorganizedasfollows. InSec-
tion 2 we dene the LDS. In particular, we
show how DNS can be extended to provide
the LDSservice. In Section 3weshowhow
network cachescanexploit the LDS; in par-
ticular,wediscusshowthecachesupdatethe
LDSdistributeddatabase,andhowcachesat
higherlevelsin thecaching hierarchycanbe
populated. In Section 4we show howdocu-
mentandsitereplicationcanexploittheLDS.
InSection 5wediscusshowahostcanmake
routingdecisions based on the results of an
LDS query. In Section 6 we discuss related
research on cooperative caching. Section 7
presents directions for future work and Sec-
tion8concludesthepaper.
2 Location Data System (LDS)
Copies of an object, with each object ref-
erenced by the same URL, are often avail-
able from dierent servers in the Inter-
net. These servers include origin servers,
replicated servers (also called mirrored
sites) and cache servers(also called proxy
servers). The origin server for an object is
the serverat which the objectoriginates; it
alwayscontainsanup-to-datecopyoftheob-
ject. A replicated server contains copies of
objects that have been placed into it (typi-
cally manually or by pulling); typically, ob-
jectsin replicated serversare up-to-date. A
replicated server may replicate entire Web
sites or may replicate only portions of vari-
ousWebsites. Cacheserversobtaincopiesof
objectsby pullingthem ondemand. Inpar-
ticular,whenacacheserverreceivesarequest
foranobject,andiftheobjectisnotcached,
thecacheserverretrievestheobjectfroman-
otherserver(whichmaybetheobject'sorigin
server, areplicated server, oranother cache
server), storesacopy ofthe object,and for-
wardsacopytotherequestor. Acachedcopy
thispaperwerefertoallthreetypesofservers
asobject servers.
ItishighlydesirableforahostintheInter-
nettobeabletodeterminethelocations(i.e.,
theIPaddresses)ofalltheobjectserversthat
containcopiesofaspeciedURL.Inparticu-
lar,ahostwouldliketobeabletogiveanet-
workapplicationaURLandreceivefromthe
applicationalistofalltheobjectserversthat
containtheURL.Inadditiontoreceivingthe
IPaddressesofalltheserversthatcontainthe
object, it is desirable to receive information
about thefreshness of each copy, e.g., when
each copy was last modied orthe \age" of
theobject(asdened intheHTTP/1.1[2]).
In this section weoutline a new network-
ing application that provides the service of
mapping a URL to a list of object servers
that contain the URL, with each server on
thelist having associated freshnessinforma-
tion. We refer to this system as the Loca-
tion Data System (LDS). Wehavetaken
aratherpragmaticapproachindesigningthe
LDS. Our principlegoal has been to design
a systemthat can be rapidly deployed with
incremental changes to the existing Internet
infrastructure. A second goal is scalability,
i.e., asystem that is decentralized and hier-
archical. A third goal is performance, that
is,asystemthat quicklyreturnsthelocation
data while injectinga minimumof overhead
traÆcintothenetwork.
Figure 1 shows the basicmapping service
providedbyLDS.Inthegure,theclienton
the left has a URL, it gives it to the LDS
systemwhich returnsalist of objectservers
thatcontaintheobject. Insteadofthewhole
URL,theclientcanaswellgiveonlyaprex
oftheURLtotheLDSblackboxwhichthen
returns a list of object servers that contain
URLs matching the prex. In the simplest
case, the prex is only the hostname of the
URL;inthiscase,theLDSserviceisidentical
to theDomainName System,operationalin
theInternetforover20years.
Given the similarities in the service pro-
vided by both DNS and LDS, we have de-
cided to base the design of LDS on DNS.
In fact, LDS can be implemented by mak-
ing minor extensions to BINDname servers
Black LDS
List of object Box servers that contain object
LDS
URL Client
Figure1: LDSservice
and to the DNS protocol. Recall that DNS
provides a mapping of hostnames to IP ad-
dresses. (DNScan returnmorethan one IP
address for a hostname for mirrored sites.)
DNSusesahierarchyofnameserversto im-
plement a distributed database of resource
records, whereby a resource record contains
amapping.
We now present an overview of the LDS
design;weprovidemoredetailsin thesubse-
quent subsections. As does DNS, LDS uses
a hierarchy of servers to implement a dis-
tributeddatabaseofresourcerecords. Were-
ferto theLDS serversaslocationservers.
EachLDSresourcerecordmapsaURLtoan
objectserver,withassociatedfreshnessinfor-
mation. EachURLhasone(ormore)author-
itativelocation servers(To conveythe main
idea,we initially assumethat each URLhas
exactly one authoritative server.) The au-
thoritative location server contains a list of
resourcerecords for the URL, that is, a list
ofobjectserversthatcontaintheURL(along
withthefreshness information). Otherloca-
tionserversmaycontaincachedcopiesofthe
listof resourcerecords. In ourbasic design,
hostsquerylocationserversinamannerthat
isfully analogousto theDNS protocol. The
sequence of query and reply events is illus-
tratedin Figure2.
In Figure 2, C is the querying host (e.g.,
a browser), O
0
is the origin server for the
queriedobject,machinesO
1 {O
4
areotherob-
jectservers(e.g., caches), andmachinesL
1 {
L
4
arelocationservers.
Thequerying host sends a location query
to its local location server (L
1 ). If L
1 does
L 2
C O 0
L 1
O 1 O 3
O 2
O 4 L 3
L 4
1 8
2
7
9 3
9 6
9 5 4
Figure 2: Sequence of Location Queries and
Replies
sends a query to a root location server
(L
2
). If L
2
does not have the location in-
formation cached, it returns the address of
alocation server responsible for the domain
of the origin server in the query (L
3 ). L
1
sends a new location query to L
3
and if L
3
doesnothavethelocationinformation,itre-
turns the address of the authoritative loca-
tionserver(L
4
). Finally, L
1
queriesthe au-
thoritativelocationserverandreceivesthelo-
cationinformation. LocationserverL
1 then
sends the information to the querying host
andalsocachestheinformation. (In thisex-
ample we assumed that all queries between
LDSserversareiterative;recursiveandcom-
binations of recursiveand iterative can also
be used. Like DNS,LDS isnotbased onei-
therquerytypebeingused. Wealsoassumed
thatthereisoneintermediatelocationserver
betweentherootandauthoritativeservers;in
practicethere canmoreorless.)
Suppose that thereply indicated that ob-
ject servers O
0 , O
1
, and O
3
contain copies
oftheURL.Thequeryinghostthenchooses
toretrievetheURLfrom oneoftheseobject
servers. Ideally, the hostchooses theobject
serverthatwilldelivertheobjectthefastest.
(We will discuss how this choice might be
madeinSection5.)
Giventhesimilaritiesbetweenourbasicde-
sign of the LDS and the DNS, we now ad-
dress whether the LDS can implemented in
the existing DNS, or within a slightly ex-
leadto rapiddeploymentoftheLDS.
2.1 Resource Records, Query
Messages and Reply Messages
Forlocation requestsanew DNSresource
record type is needed. It is similar to the
standardA-typeresourcerecord,whichmaps
a hostname to an IP address. We now de-
scribethisnewDNSresourcerecordtype.
Each resource record of this type must
contain an object identier. In order to
be compatible with standard DNS queries,
object identiers must be encoded in a
standardized way. Using this encoding,
the query appears like a normal DNS
A-query and can be resolved without
changing any parsing routines in DNS
servers. Figure 3 shows how the URL
http://www.eurecom.fr/~bob/index.html
wouldbeencoded.
t p 3 w w w 7 e u r e c o m 2 f r 0
4 h t m l 5 i n d e x 4 ~ b o b 4 h t
Figure3: RepresentationofaURL
ThisencodingallowsforeÆcientcompres-
sionofidentierssharingacommonpart,just
aswhen queryingthe addressesof all of the
serversinagivendomain. Sincetheidentier
canbeviewedasageneralizedhostname,all
thenormalfacilitiesofDNScanbeused.
We mentionthat there is a 255character
limitonhostnamesinDNSquerieswhichim-
plies that URLs longer than 255 characters
will have to be truncated. Also, the length
ofasinglelabel(e.g.,onecomponentofhost-
nameorURL path)is limited to 63 charac-
ters. We do not expect these limits to be
aproblem. Wealso mention that URLs are
case-sensitive but DNS does not guarantee
case-sensitivetreatment. Nonetheless,wedo
notexpectthistobeamajorissuesincemost
URLs cannot be confused with other URLs
evenifcaseisnotpreserved.
We studied the access log from one of
ceededthese limits. Evenifthese 122URLs
were truncated to conform to DNS limita-
tions, we did not observe any collisions be-
tween two dierent URLs. Neither did we
ndanyURLswherecase-insensitivitywould
havepresentedanyproblems. Therefore, we
donotproposeanymethodsforhandlingsuch
URLsbeyondsimpletruncation.
The reply will include several resource
records, one for each object server holding
a copy of the URL. Each of these resource
recordshas atime-to-live(TTL) eld which
species how long that resource record can
be cached at a DNS server. This TTL-
informationcaneitherbespeciedbytheau-
thoritativelocationserveror,forabetteres-
timate, by the object server. If the location
serversetsthisTTL-eld,itwillbethesame
for allresource recordsfrom that server. If,
ontheotherhand,thiseld issetbytheob-
jectserver,itallowstheobjectservertocom-
municateinformationonhowlongtheobject
islikelyto beavailableat thatserver.
The actual data section of the resource
record (RDATA) contains the IP-address of
theobjectserverandsomefreshnessinforma-
tionabouttheobject.Thisinformationcould
be for examplethe last modication date of
theobject,whichprovidesaneasywayofdis-
tinguishing between possible stale copies of
theobject. Inthiscase,theauthoritativelo-
cationserverwouldhavetoperiodicallyquery
theoriginservertogetanup-to-datemodi-
cationdatefortheoriginalobject.
Anotherpossiblepieceofinformationthat
couldbeincluded in theresourcerecord,in-
stead of the last modication date, is the
\age"oftheobject,as denedinHTTP/1.1.
Anobjectservercoulddeterminethislocally,
and a small age would refer to a relatively
freshcopy. Thismethodwouldnot,however,
oerthesameguaranteesonobjectfreshness
aswouldthelast modicationdate.
Yetanotherpossibilityistoincludesimple
TTL-information, but this would be redun-
dant, sincethe resourcerecord by denition
includesaTTL-eld. ATTL-estimateisuse-
fulwhenthequeryinghostdecideswhichob-
jectserverto use. Objectserverswithshort
cisionmakingprocess.
Because all location servers are allowed
to cache LDS replies, some location servers
may havestale location information in their
caches. Whenthestatusofanobjectchanges
atanobjectserver,theobjectservernoties
the authoritative location server. This up-
dateis not,however,reectedonthecached
copies of the location information. If stale,
cached locationinformationis deliveredto a
queryinghost,the queryinghostmaydecide
to forwardthe request to a Web cache that
hasevictedtheobject.
Onesolutionistousetheonly-if-cached
directive of HTTP/1.1 when requesting ob-
jectsfrom Web caches. If the distant cache
hasthe objectcached, itwill returnthe ob-
jectfrom the cache; otherwiseit will return
anerrortothequeryinghostindicatingthat
theobjectisnolongercached. Thequerying
host will then choose another object server
from the list. We are currently looking into
waysofreducingtheamountofstalelocation
informationinLDS.
2.2 UpdatingtheLocationServers
Theobjectserversneedto keeptheinfor-
mation at the authoritative location server
up-to-date. Forthis we needanewprotocol
that is used to exchange information about
newcopies of objects. Forexample, when a
cache caches a new object, it must send a
messageto the authoritative location server
responsible for this object, indicating that
the object has been cached. This message
shouldalsoincludetheinformationwhichwill
be present in the resource record (e.g., last
modicationdate)aswellasaTTL-estimate.
Similarly,whenthecacheremovesanobject,
itmustsendamessagetothelocationserver
indicatingthattheobjectisnolongercached.
Since the location data is managed over
DNS,thedynamicupdatefacilityofDNS[14]
can be used to dynamically update the lo-
cation information. With this method, it is
possible for ahostto add ordelete resource
desired.
UsingthedynamicupdatefacilityofDNS,
anobjectserversendsanupdatemessageev-
erytimethestatusoftheobjectchanges. For
a Web cache this change of status could be
thecaching ofanewobject,aremovalofan
objectorcachinganewversionoftheobject
aftertheobjecthasbeenmodiedattheori-
ginserver. Inorder toreduce theamountof
update traÆc, object servers can send their
reports in batches. This isespecially bene-
cialfor Web caches that can send theinfor-
mationaboutanHTMLpageandallinlined
imagesin onemessage,therebyreducingthe
overheadconsiderably.
2.3 Implementation
Implementing the LDS is relatively
straight-forwardand can be done incremen-
tally. Since the system is an extension to
DNS,nonewserversneedtobecreatedand
introduced.
For a DNS server to be LDS-capable,
it needs to be able to interpret a new
querytypecodeand the associatedresource
records. DNSalreadyhandlesnumerous dif-
ferent query and resource record types, so
addingonemoreissimple. Furthermore,the
authoritative serverhas to beable to main-
tain the location information database and
constructresourcerecordsfromthisdatabase
toincludeinrepliestoqueries.Althoughthis
URLdatabasecontainsmoredatathananor-
mal DNS database, it can be maintained in
asimilar way. For aWeb cache to be LDS-
capable,itsimplyhasto beableto sendthe
updatemessagesto theauthoritativeserver.
Thearchitectureallowsforincrementalde-
ployment,sinceifacontentproviderdoesnot
operate a location server, no LDS resource
recordswillexist. IfnoLDSresourcerecords
are available, the querying host would for-
wardallrequestsusingastaticpolicy,forex-
ample,forwardingthemdirectlytotheorigin
server or a parent in a cache hierarchy. If
LDS recordsexist,then LDS-enabledclients
normal, static methods for nding objects.
Intheearlyphasesofdeployment,thetradi-
tional hierarchyshould bekeptasafallback
forclientsandserversthat donotimplement
thenewprotocols.
3 Network Web Caching
In this section we propose a new cooper-
ativecachingschemethat exploits theLDS.
We assume that browsers forward their re-
quests through aproxy cache in addition to
thebrowser cache. Thismethod is currently
widely used by ISPs and other institutions,
suchascompaniesanduniversities,tooera
betterservicetotheirclientsaswellastore-
ducethe out-goingtraÆc. Also, ifa rewall
isused, thenall requestsmust go througha
proxy at the rewall in any case, so adding
a cache there does not present a signicant
overhead.
We now describe our cooperative caching
scheme. A browser rst sends anHTTP re-
questforanobjecttoitsproxycache. Ifthe
proxy cache does not have the object, the
proxy cache invokes LDS to obtain alist of
allthe object servers(i.e., originserverand
someothercachesin the network) that con-
taintheobject. Theproxycachethenchooses
the\best"objectserverfromthelistandfor-
wardstheHTTPrequesttothisobjectserver
(seeSection5). Theproxyreceivestheobject
fromthebestobjectserverandforwardsthe
objecttothebrowser.
Ourcachingschemehas several appealing
features. First,atmosttwoserversarevisited
to retrieve an object. Standard hierarchical
cachingschemes(suchasNLANR[8]andRe-
nater[9])cancauserequeststopassthrough
alargenumberofobjectserversbeforeacopy
oftheobjectisfound,which canseverelyin-
creaseobjectretrievaltime[13]. Second,the
schemeallowstheproxyservertochoosefrom
alltheobjectserversthatcontaintheobject.
Letuslookatacoupleofscenarios:
LDSreturnsalistofobjectservers,with
andalltheother objectserversonmore
distant ISPs. In this case, the proxy
server would choose the cache in the
neighboringISP.
LDSreturnsalistofobjectservers,with
onecacheserveronthesamecontinentas
theproxyserver,andalltheotherobject
serversontheother sideoftransoceanic
links. In this case, the proxy server
choosesthecacheinitscontinent.
LDSreturnsalistofobjectservers,with
alltheobjectserversfarfrom theproxy
and near or in the origin server's ISP.
In this case, the proxy may still prefer
tochooseoneofthecachesovertheori-
ginserver: Theoriginservermayrunon
a slow machine orhave a slow network
connection.
Itisimportanttonotethattraditionalhierar-
chicalcachingdoesnotpermitthelastoption.
Withhierarchicalcaching,ifthereisamissat
therootcache,therootcacheforwardsthere-
questdirectlytotheoriginserver. Thus our
scheme enables the proxy to select from all
caches that are on the \route" between the
proxyandtheoriginserver.
Ourcachingscheme doeshaveanunusual
peculiarity. Intheschemejustdescribed,the
proxy server will always retrieve an object
either from the origin server or from some
other proxy server. Because all the proxy
servers are near the bottom of the Internet
hierarchy(in institutionalISPsorlocal resi-
dentialISPs),withtheexceptionoftheorigin
server,all copiesof an object arestored, es-
sentially,intheleavesoftheInternet. There
are, however, several compelling reasons to
also store copies of objects higher upin the
Internet hierarchy, in particular in regional
and national ISPcaches. First,the request-
ing proxy may be ableto retrieve anobject
morequicklyifitisavailablefromacommon
parent(orgrandparent)ofsomeother proxy
cache that has the object. Second, hierar-
chicalcachingprovidesanasynchronousmul-
ticast infrastructure, which can signicantly
reducebandwidthusageintheInternet[10].
Giventhatcacheserversarepresentatre-
gional and national caches into hierarchies,
wenowmodifyourcachingschemetoexploit
thehigher-levelcaches.
3.1 Populating Caches
Inordertoexploit thehigher-levelcaches,
we must make sure that the higher level
caches obtain copies of the objects cached
at lowerlevels. We propose that lower level
cachespushobjectstotheirhigherlevelpar-
ents. This way the objectsare easily avail-
ableforsiblings under the sameparent,and
requestsfromdistanthostscanbesatisedat
ahigherlevelinthenetwork.
The details of our pushing strategy are
as follows. A low level cache periodically
sends a list of new objects (with last modi-
cationdates) to itsparent. Theparent de-
cideswhichoftheobjectsinthelisttocache
andretrievesthesefromthechildcache. This
is done at every level in the hierarchy and,
in the end, the caches are lled up as they
would have been using traditional hierarchi-
calcaching. Cachesatthelowestlevelshould
send informationaboutevery object, but at
higherlevelsacache mightuse somelocally
denedpolicy(e.g.,objectsthatarecachedin
multiplechildcaches)todecidewhichobjects
toreportto itsparent.
3.2 Network TraÆc
Since LDS increases the number of mes-
sagessentoverthe network,itmayoverload
somelinksorserversandinfactdegradeper-
formance. We are currently performing an
analysis of how much LDS increases current
networktraÆc. Thefollowingarethekeyfac-
tors:
1. LDS queries and replies: Although
thesearenormalDNSmessages,anLDS
query is sent every time a proxy needs
toretrieveaURLthatitisdoesn'thave
cached. Caching LDS recordsat thelo-
cation servers will reduce some of this
ordinaryDNS,since aURLreference is
morespecicthanahostnamereference.
2. Update traÆc: Whenthestatusofan
objectchangesinacache,thecachemust
send an update messageto theobject's
authoritativelocationserver. Forpopu-
lar, widelycached objects, this may re-
sult in too much traÆc directed at the
authoritativeserver.
In order to reduce the amount of LDS
lookup traÆc in the Internet, we propose a
variation to our basic LDS scheme. In this
variation, once we getthe locationinforma-
tion for an HTML-page we assume that all
theinlinedobjectsonthepagearealsoavail-
ablefromthesamesetofobjectservers. One
advantage of this scheme is that it heavily
cutsdownontheamountofLDStraÆccom-
paredtothatbasicschemesincewenowonly
send one LDS query for each HTML-page
instead of each URL. The disadvantage of
thevariationis that there arenoguarantees
thattheinlinedobjectsareavailablefromthe
sameobjectservers. Originserversandrepli-
cated servers should have the objectsbut a
cachemayhavealreadypurgedsomeorallof
them. Ifthe objectserverdoesnothavethe
requested object, we will do an LDS query
forthat objecttoobtainup-to-date location
information.
InSection4wewillshowhowtheamount
of LDS traÆc can be signicantly reduced
when the LDS system is restricted to repli-
catedservers.
To address the issue of update traÆc, we
proposethat anycachethat hasaparentre-
frain from sending update information. In-
stead, the update information is forwarded
totheparentduringthepushingphase. The
highestparentthat doesnot havea copy of
the object sends to the authoritative server
an update messagethat containsupdate in-
formationforitself anditsupdatedchildren.
Thebenetisthattherearesignicantlyless
higherlevelcachesthanlowlevelcaches,thus
much lessupdate traÆc.
Intheabovescheme,thehighestparentin
the hierarchy sends update messages for all
itschildrenwhicharethusallincludedinthe
databaseatthe locationserver. As aresult,
institutional caches would also be included
in the list returned by the location server,
whichmeansthattheycouldreceiverequests
for cached objects from hosts outside their
own network. It is desirableto keepthe in-
stitutional caches othe list oflocations for
tworeasons. First,institutions likely donot
wantoutsidehosts using theirbandwidthto
retrieve objects. Second, objectsat institu-
tional caches are likely to be cached in the
parentcachesand outsidehosts canretrieve
objectsfasterfromtheparentcache.
Cache digests [11] are used in traditional
hierarchies to represent cache contents in a
compressedform. Acachefetchesthedigests
ofitsneighborcachesandwhenarequestcan-
notbesatisedlocally, the cache checks the
digestsoftheneighborcachestoseewhether
anyofthemhavetheobjectcached. Ifso,the
objectisretrievedfromthatcache;otherwise
therequestisforwardedtotheparentcache.
In LDS-based caching, having the digest of
the parent cache is useful because this way
weavoidtheLDS-lookupforobjectsthatare
cached at theparentand would in any case
beretrievedfrom there.
4 Replicated Servers
TheLDSschemeisnotlimitedtocaching.
Itcanalsobeusedtodisseminateinformation
aboutreplicated ormirroredobjects. When
an object is replicated at another server,
this information is added into the location
databaseattheauthoritativelocationserver.
Whenaclientrequeststhisreplicatedobject,
it doesan LDS lookup and gets a list of all
serversholdingacopyoftheobject. Because
theinformationisnotdynamicallyreplicated
asincaching,butratherplacedatwell-chosen
locations,thereisrarelyneedtonotifythelo-
cationserversofnewcopies. Likewise,there
line.
With LDS, a user addresses a replicated
objectbythe object'suniqueURL,LDS re-
turns the list of servers that have the ob-
ject. Thebrowserthentransparentlychooses
one of these (the best in some sense). The
user would not have to know about the ac-
tualphysicallocationoftheobject.
4.1 Reducing LDS Query TraÆc
The scheme just described for replicated
serversstillrequiresthatclientssendanLDS
lookupforeachURL.Inorder toreduce the
numberofLDSmessagesweproposethefol-
lowingmethodofreplicatingserversandstor-
ing information about the replicated URLs.
In this scheme, the LDS no longer keeps
track ofcachedobjects. It onlytracksrepli-
catedandpartiallyreplicatedservers. Were-
quire that replicated serversreplicate either
whole sites orcomplete sub-treesof servers.
An example sub-tree of an origin server is
all the objects on the origin server below
a given URL prex, e.g., all objects under
http://cnn.com/sports/.
Suppose that a client wants to retrieve a
URL. Normally, a client would send a DNS
lookuptoobtaintheIPaddressoftheorigin
serverandretrievetheobjectfromthere. Us-
ingLDStheclientproceedsasfollows. First,
the client sends an LDS queryfor the host-
nameintheURL.TheLDSsystemreturnsa
listofallserversthatmirroreitherthewhole
siteoracompletesub-treefromthesite. We
need to extend the resource record dened
in Section 2.1 to include a prex indicating
the sub-tree that is mirrored by that par-
ticular server. The client then chooses the
\best" serveramong those serversthat mir-
rorthedesiredsub-treeandforwardstheac-
tual HTTP-request to this server. A server
may mirror multiple sub-trees from a single
originserver; in this case LDS would return
oneresourcerecordforeachsub-tree.
Comparedtothecachingschemediscussed
in Sections 2 and 3, this scheme has sev-
onlyone LDS lookupfor each server;in the
cachingscheme,theclientmustsendanLDS
lookup for each URL. The caching scheme
drastically increases DNS traÆc in the net-
work whilethemirroring schemekeepsDNS
traÆc at its current level. (Recall that a
client would in any case have to do a DNS
lookup to get the IP address of the ori-
gin server.) Second, objects at mirrored
serverstendtostayatthemirroredserversfor
longperiods of time while objects in caches
are cached and purged dynamically. Hence,
LDS information for a mirrored server can
be cached longer at location servers which
greatly reduces lookup traÆc. When the
authoritative LDS server sends information
about a mirrored site, it should rst send
outinformationaboutmirrorsthatmirrorthe
mostofthesite. ThisisbecauseDNSreplies
are by default sent over UDP and the mes-
sagesize is severely limited (512 bytes). By
sendinginformation aboutmirrorsthat mir-
rorlargesubtrees,wemaximizethelikelihood
thattheclienthasthenecessaryinformation
inthereply.
ComparedtothestandardDNS-wayofac-
cessing objects, our mirroring scheme does
notincreasethenumberof DNSmessagesin
thenetwork. Thereplymessagesareslightly
larger,however,sinceweneedtoindicatethe
prexin theresourcerecord. We donotex-
pectthisincreaseinsizetobesignicantsince
URLprexescanbeeÆcientlyencodedusing
theencodingspeciedinSection2.1andDNS
resourcerecordpointers.
5 Routing Decision
Whenthecachehasreceivedareplyto its
LDS-lookupcontainingseveralobjectservers
(cachesandoriginservers),thenextquestion
is: \Whichoneofthepossibilitiestochoose?"
Determiningthebestalternativeisanimpor-
tanttopicofourongoingresearch.
Possiblesolutionsinclude:
All querying hosts measure connection
PassQoSinformationalong in LDSup-
datesandreplies.
Use explicit routing information from
BGP.
We will now provide some details on how
thesemethodscouldbeused.
Intherstoption, allqueryinghosts(e.g.
low level Web caches) measure the time it
takestofetchobjectsfromotherservers(Web
serversorothercaches). Thisdownloadtime
gives a crude estimate of the actual band-
width between the two hosts. The query-
inghostkeepsalistof serverswithwhich it
has had good connections and prefers those
serversto otherswhen bothtypes ofservers
arepresentin theLDSreply. Ofcourse, the
actualnetworkconditionschangeallthetime,
but the results presented in [7] on perfor-
mance characteristics of mirrorserversindi-
cate,that from alarge setof mirrorservers,
onlyasmallnumberneedtobeconsideredas
candidatesfordownload. Althoughthestudy
in[7]usesaxednumberofmirrorsites,we
believethattheresultscanbeappliedtosit-
uationswhere thenumberofserverschanges
dynamically.
Thesecond optionistopasssomeQoSin-
formationin the LDS repliesalong with the
age information about the object. For ex-
ample, a Web server can measure the con-
nection times of incoming connections, esti-
mate theconnectionbandwidthand takean
averageoftheestimatedbandwidthsoverall
connections. Thisaveragebandwidthcanbe
seenas thebandwidth that a randomclient
somewhere in the network could expect to
getwhenrequestingobjectsfromthisserver.
Likewise,cachescanmeasurethebandwidths
ofallout-goingconnectionsandcalculatethe
averagebandwidth.
The third option uses explicit routing in-
formation obtained overBGP by talking to
localrouters. Thisinformationgivesthehost
atopological map of the network which can
beusedtondouthowfareachoftheservers
intheLDSreplyare. Unfortunately,routing
informationgivesonlyreachability,notqual-
al.[4]havestudiedusingroutinginformation
in request routingand have decided against
usingBGPinformationbecauseofcongura-
tionandsecurityissues,amountofdataand
diÆculty of obtaining it. Their approach is
basedonusingwhois-services.
Regardless of the approach chosen, any
queryinghostcan implement any local poli-
cies necessary when deciding where to for-
ward arequest. For example, acache oper-
ator may want to forward requests to other
caches based on the domain of the origin
server.
6 Related Research
In[3]Gadde etal. comparetraditionalhi-
erarchicalcachingwithanarchitecturewhere
asingle centralized serverholds information
aboutwhereeachdocumentiscached. When
a low level cache wants to nd out where
an object is cached, it sends a message to
this central mapping server which responds
by either redirecting the request to a peer
serverorbyforwardingittotheoriginserver.
This scheme resultsin allof the objectsbe-
ing cached at institutional caches. The au-
thors alsoperformsimulationsusingasmall
numberofcachesand nd outthat theper-
formanceofthecentralizedsolutionisonav-
erage better than that of the traditional hi-
erarchy. The numberof caches in the simu-
lations is low, only 8 and 32 cache congu-
rations are simulated. The authors express
theirdoubtsaboutthescalabilityoftheirso-
lution,butpresentsomeideasforreplicating
anddistributingthelocationinformation.
Inour approach, the queryinghostgets a
listofallservers holdingacopyoftheobject
insteadofbeingredirectedtooneofthembya
mappingserver. Thisputs thequeryinghost
incontrol overwheretherequestwillbefor-
warded. This decisionmight greatlydepend
onlocal conditionsand policies unknown to
a central server. Another dierence in our
approach is that the location information is
distributedoverDNSwhichisubiquitousand
where data is cached near the browser and
location hints are passed through a meta-
data hierarchy. Their architecture arranges
thecachesinvirtualhierarchies,oneforeach
object, for distributing the metadata infor-
mation. In their architecture, when acache
cachesacopyofanobject,itsendsoutaloca-
tionhintindicatingtheURLanditsaddress.
This hint ispropagated in the virtualmeta-
datahierarchyandcouldeventuallyreachall
caches in the system. Caches keep track of
allthehintstheyhavereceivedandusethem
toforwardrequests. If acachereceivesare-
questforanobjectwhichisnotcachedlocally
but thehintsindicate another cache holding
the object, the request is forwarded to this
cache. Thevirtual metadatahierarchies are
constructedusing IP-addressesandURLs in
away which tries to minimize the distances
betweenparentsandchildren. Also,thehier-
archyguaranteesthatacachecanreceiveonly
onehintforanyobject. Thishintislikelyto
befromthenearestcacheholdingtheobject,
butthisisnotguaranteed.
One dierence between their and our ap-
proaches is the way caches are placed. In
their architecture only institutional caches
holdobjectsandeveryrequestforwardedus-
ingahintwouldbeforwardedtoanotherin-
stitutionalcache. Thisbehaviormightbeun-
acceptable to some people who do notwant
more external traÆc on their networks. In
our approach, objects are pushed to higher
levels andthis eliminates theneed for going
to other institutionalcaches. Secondimpor-
tantdierenceisthatusingLDSforlocating
objects,aqueryinghostgetsalistofallpos-
sible locations. It can then choose from the
list accordingto alocal policy orby adapt-
ing to current network conditions. In [13] a
cache receivesonly one location hint. Since
the virtual metadatatrees are based on IP-
addressesandURLs,therearenoguarantees
thatthishintindicatesagoodserver. Weare
currentlyperformingacomparisonofthetwo
schemes.
Amiret al.[1]study three dierent meth-
odsforredirectingrequeststomirroredsites{
HTTPredirection,DNSroundtriptimemea-
surements,and sharedIPaddresses. Intheir
DNS-basedsolutiontheyuseDNSroundtrip
replicated server for a client. They place
the replicated servers near an authoritative
DNS server which gives the address of the
replicatedserverwhenqueried fortheorigin
server.Thisresultsinclientsbeingeventually
redirected to their closest replicated server.
Therearetwomajordierencesbetweentheir
approachand ourreplicated serverapproach
(Section4.1). First,intheirsystemreplicated
serversmust always replicate the entire site
while in our system a server may replicate
onlysub-trees oftheoriginalsite. Second,in
their system the replicated servers must be
placedincloseproximityoftheauthoritative
DNSserver. Ourarchitectureplacesnocon-
straintsontheplacementoftheservers.
7 Future Work
WewillcontinueourworkonLDSbyper-
forming quantitativecomparisons of the dif-
ferent architectures presented in this paper.
In particular we will concentrate on evalu-
ating the amount of new traÆc in the net-
work causedby thelookupand updatemes-
sages. We will also compare the traditional
cachinghierarchywith ourdierentLDSar-
chitectures to nd out how much objectac-
cesslatencycanbereducedthroughLDS.We
willalsocloselystudytherequestroutingis-
sueinordertondouthowmuchinformation
isneededto makegoodroutingdecisions.
8 Conclusion
InthispaperwehavepresentedtheLoca-
tionDataSystem,anewnetworkapplication
that can be used to locate where copies of
objectsarestoredontheWeb. LDSprovides
a service similar to DNS and can be imple-
mentedasanextensiontoDNSanddeployed
incrementally. Wehavepresentedtwoappli-
cationsof LDS, locating objectsin mirrored
servers and locating objects in Web caches.
We have also discussed how clients can de-
cide the best server to forward requests to
fromthenetwork.
Acknowledgments
Wewould liketothankNeilze Dortafrom
INRIA forthediscussionswehadon access-
ingmirroredsites and herinput onthesub-
ject. TheloglesusedinSection2.1werepro-
videdbyNationalScienceFoundation(grants
NCR-9616602 and NCR-9521745), and the
NationalLaboratoryforAppliedNetworkRe-
search.
References
[1] Y. Amir, A. Peterson, and D. Shaw,
\Seamlessly Selecting the Best Copy
from Internet-Wide Replicated Web
Servers", in Proc. 12th International
Symposium on Distributed Computing
(DISC'98), Andros, Greece, September
1998.
[2] R. Fielding, J. Gettys, J. Mogul, H.
Frystyk,andT.Berners-Lee,\Hypertext
Transfer Protocol { HTTP/1.1", RFC
2068,January1997.
[3] S.Gadde,J.Chase,andM.Rabinovich,
\DirectoryStructuresforScalableInter-
netCaches",TechnicalReportCS-1997-
18, Department of Computer Science,
DukeUniversity,1997.
[4] C. Grimm, J.-S. Vockler, H. Pralle,
\Request Routing in Cache Meshes",
in Proc. 3rd Web Caching Workshop,
Manchester,UK, June1998.
[5] P.Mockapetris,\DomainNames{Con-
ceptsandFacilities",RFC1034,Novem-
ber1987.
[6] P. Mockapetris, \Domain Names { Im-
plementation and Specication", RFC
1035,November1987.
[7] A. Myers, P. Dinda, and H. Zhang,
port CMU-CS-98-157, School of Com-
puter Science, CarnegieMellon Univer-
sity,1997.
[8] A Distributed Testbed for Na-
tional Information Provisioning,
<URL:http://ircache.nlanr.net>.
[9] Reseau National de telecommuncations
pour la Technologie,
l'Enseignement et la recherche,
<URL:http://www.renater.fr>.
[10] P. Rodriguez, K. W. Ross,E. W. Bier-
sack,\ImprovingtheWWW:Cachingor
Multicast?", in Proc. 3rd Web Caching
Workshop,Manchester,UK,June1998.
[11] A. Rousskov, D. Wessels, \Cache Di-
gests",inProc.3rdWebCaching Work-
shop, Manchester, UK,June1998.
[12] Squid Internet Object Cache,
<URL:http://squid.nlanr.net>.
[13] R. Tewari, M. Dahlin, H. M. Vin,
and J. S. Kay, \Beyond Hierarchies:
Design Considerations for Distributed
CachingontheInternet",TechnicalRe-
portTR98-04,DepartmentofComputer
Sciences, UniversityofTexasatAustin.
[14] P. Vixie, S. Thomson, Y. Rekhter,and
J.Bound,\DynamicUpdatesintheDo-
main Name System (DNS UPDATE)",
RFC2136, April1997.
[15] D. Wessels, K. Clay, \Internet Cache
Protocol (ICP), version 2", RFC 2186,
September1997.