Cache Management - Caching N allle Server Data

Caching N allle Server Data

7.1 Cache Management

7.1.1 Caching for performance enhancements

117

Performing a name service operation may involve several interactions with name servers that are dispersed throughout a large internet environment. The high cost of resolving an object's name, however. can be substantially reduced if clients maintain local caches of recently acquired name server data that is likely to be reused in the future. A cache is an unauthoritative repository of object attributes. By consulting the cache before querying the name service, the initial cost of utilizing the name service can be amortized over several object references, assuming the cost of accessing cached data is significantly lower than that of normal query operations.

In Chapter 5, the expected cost of a name server query, E(Lu), is formulated in Equa-tion 5.1 as a funcEqua-tion of the client's access patterns and the cost of retrieving informaEqua-tion

XEROX PARe. CSL-85-I. FEBRUARY 1985

wherePmⁱ⁸⁸is the probability that the desired information does not currently reside in the cache andCco.che is the cost of accessing the cache. Observe that E(C)

<

E(Lu) if the cache hit ratio, 1 - Pm^i8S'is greater than Cco.che/ E(Lu,}. Thus, if the cache access cost is much less than the expected cost of a Ilame server query, then caching results in significant gains, even for low cache hit ratios.

Two main factors contribute to a cache's low access time in comparison with a typical name server query. First, since the cached data is stored physically close to the users of that data~ the large delays in conversing with distant name servers are avoided. Second, the expensive name resolution process for locating an authoritative server for the named object in question is unnecessary.

Caches are unauthoritative in that they are used for performance enhancement only; the maintainer of a cache may store or discard cached object attributes freely without disrupting the basic name service. Caches can reside in fast volatile storage since the loss of cached data, in the event of a processor crash for instance, does not adversely affect the functional operation of the distributed name service.

7.1.2 Hints vs. strong consistency

If the name service database were immutable so that no existing database entries were ever modified, then caching data in a distributed environment could accrue all of the per-formance benefits and add no complexity to the clients. Realistically, the information about an object may change under normal operating conditions. For instance, an object may mi-grate to a new machine in order to balance the loads across machines or because its onginal processor crashed: in this case, the ~InternetAddress" attribute maintained by the name service for the object should be updated to reflect its new location.

One approach to maintaining cache consistency would be for the name servers to inform caches whenever data is updated. However, this requires elaborate cooperation between servers and clients and generates lots of extraneous messages. Expecting a name server to know about all clients that may have cached data handed out by that server for very large internet environments does not seem feasible. It would be difficult for the servers to maintain reliable records of what infornlation was cached by who. Such information needs to be maintained in stable storage so that it survives server crashes and might consume

XEROX PARe ~ CSL-8S-I. FEBRUARY 1985

CHAPTER 7. CACHING NAME SERVER DATA 119

unreasonable amounts of storage space. Because of this difficulty in maintaining the validity of cached data, distributed systems designers often avoid caching.

An alternative approach is to treat the cached data as hints, which are not assumed to be tompletely accurate. Clients of a cache, must be prepared to deal with updates to the name service database that do not automatically propagate to the cache. The detection of inaccurate cache entries and subsequent recovery must be done by the applications that use the data in an application-specific way. Application level recovery is necessary since the appropriate action to take depends on the semantics of the data and how it is being used by the name service client.

Caches of hints have been advocated in the past [Clark 82] [Lampson 83]. The R* catalog manager [Lindsay 80] and the Grapevine mail service [Birrell et al. 82] both make extensive use of hints. Generally, hints about the location and availability of various services registered with a name service can be verified when clients attempt to make use of these services.

7.1.3 Cache accuracy

At any given point in time, each cache entry is either invalid or valid depending on whether or not the corresponding name service database entry has been modified unbe-knownst to the cache manager. The accuracy level of a cache is defined to be the percentage of cache entries that are currently valid. This static measure of accuracy can be obtained by comparing a snapshot of a given cache with the nanle service database.

The percentage of cache lookups that return valid data to a client determines the observed accuracy level. This is a more dynamic notion of cache accuracy, but is difficult to quantify since it depends on the access patterns of clients over time. The observed accuracy level varies from client to client. whereas the static cache accuracy level remains independent of client behavior.

As with most caches, the hit ratio denotes the percentage of lookup requests that can be answered by cached data [Smith 821. regardless of the data's accuracy. With caches of hints, clients are perhaps more interested in the accurate-h,,·t ratio obtained by multiplying the hit ratio by the accuracy level. Both of these measures are highly dependent on client reference patterns and the cache management strategy.

7.1.4 A new approach to cache management

This chapter concentrates on techniques for Illanaging cached data that may not be completely accurate. caches of hints. Because of the distributed nature of the system and

XEROX PARe. CSL-85-1. FEBRrARY 1985

outside of the realm of the name service, is not known by the authoritative name servers.

The individual applications or hosts that choose to cache name server data must unilaterally maintain the' validity of that data since they do not participate in the usual nanle service maintenance operations.

The performance benefits obtained fronl a cache depend on the cost of accessing the cache, Ccache, the cost of detecting invalid cache entries for various client applications and types of data, Cdetect, the cost of accessing the name service, CNS

=

^E(Lu)obtained from Chapter 5. the update activity to the name service database~ clients' referencing behavior, and the way in which the cache is managed. Suppose that t~e, accuracy of the cache is expressed by the probability Pcorrect a.nd t.he hit ratio is given by Phit; the expected cost of a name service query becomes

E(C)

=

^Ccache

⁺

^{(1 -} ^Phit)CNS

⁺

(Phit)(1 - Pcorrect) (Cdetect

+

^CNS) ^{(7.25 )}

where Cdetect depends on the particular application. The cache Illanagenlent algorithm must determine what information should be maintained in the cache and what should be discarded so as to maximize the benefit of the cache to its clients.

Current cache memories for modern computer systems attempt to maximize the hit ratio for a fixed-size cache by utilizing intelligent cache replacement algorithms [Smith 82]. Many distributed systems that cache hints. such as Grapevine or R

*,

allow the size of the cache to grow indefinitely (by storing it on secondary storage); entries are only purged from the cache when detected invalid. Essentially, these systerns also maximize the cache hit ratio.

However, this simple scheme, which ignores the cache accuracy, may not be optimal, and may perform quite badly for data that changes frequently.

As a demonstration of why maximizing the hit ratio, or even the accurate-hit ratio, is suboptimal, suppose one cache experiences a hit ratio of Phit while a second maintains a slightly higher hit ratio of Phit

+

^L Assume that both caches have the same accuracy level (though, in reality, the accuracy level is probably a decreasing function of the hit ratio for a variable-size cache). The client observing the higher hit ratio gets a lower lookup cost if

==> -CNS

+

(1 - Pcorrect)Cdetect

+

(1 - Pcorrect)CNS < 0

XEROX PARC. CSL-85-1. FEBRUARY 1985

CHAPTER 7. CACHING NAME SERVER DATA 121

~

^Cdrtp.ct<

r:~~r_rectCl'!.~.

1 - Pcorrect

In other words, incrpasing th£' hit ratio incr('as('s the arnount of invalid data returned to a client as well as irnproving the accuutte-hit ratio. Thus. whether benefits are obtained froIu higher hit ratios dppends on t h£' CO:o't of r('covering frolli invalid data relative to the cost of straight name service lookups.

Optirnal cache IllanagenlPIlt iuvoiv('s maintaining a level of cache accuracy and a hit ratio that maxilnizes the benefit of tht' cache to its clients. Optimizing Equation 7.25, however.

is difficult since the two variabll':-i. Phit and Pcorrect, are not independent. For a variable-8ize cache in which only the Inost accurate inforlllation is retained, they are related through the size of the cache: a higher accuracy results in a smaller cache which results in a smaller hit ratio; unfortunately, the relation can not be easily quantified.

This chapter proposes a new approach to caching hints that guarantees a performance benefit from the cache, but does not atteIllpt to derive an optirnal management strategy. The agent managing the cache silnply Inaintains a IuininltlIn level of cache accuracy. Initially, the size of the cache is limited ouly by thp desired accuracy level. The mininluIll level of cache accuracy can be derived by observing that. at the very least, the cost incurred by using cached data should be less than the cost of retrieving the data directly fronl the name service. That is, E(C) should be h'88 than CNS.

Assuming the cost of accessing thp cache i~ negligible cOlllpared to the cost of a name service lookup, Ccache

«

^CNS

~ Pcorrect > 1- C CNS C

detect

+

P ^Cdetect

~ correct >

Cdetect

+

^CNS

This inequality therefore gives a lower bound for the desired cache accuracy. Generally speaking, the level of accuracy should 1)(' based on the cost of recovering from invalid cache data to achieve a successful cache manag(lInent policy. If the detection cost is substantiaL then the cache manager should make an effort to keep a high level of cache accuracy.

XEROX PARC. CSL-85-t. FEBReARY 1985

niques for estimating the accuracy of particular cache entries based on information about the lifetime of named objects. To maintain the desired accuracy level, cached data that is suspected of being invalid should be either purged or revalidated^l. Section 7.3 examines general techniques for revalidation of cache entries. The next section discusses mechanisms for using and caching name service data in nlOre detail.

Dans le document Distributed Name Servers: Naming and Caching in Large Distributed Computing Environments (Page 132-137)