Dynamic Algorithm { Improved Re-mapping

Dynamic Replication

7.4 Dynamic Algorithm { Improved Re-mapping

The re-mapping algorithm used in the previous section assumed that each B-tree node does not know all the locations of its parent and children. As a result, the parent must be involved in processing all re-mappings when the replication of a node changes. In this section we explore the potential benets from allowing the master copy of a node to know (within the limits of this knowledge being kept up to date) the location of all copies of its parent.

In this modication, knowledge of parent locations is kept up to date by sending a copy of the location map to the master copy of each child when the replication of a tree node changes.

If each node has this information about its parent, when the replication of a tree node changes the master copy of the node can directly perform the re-mapping of parent copies to its own copies, without involving the master copy of the parent. We also made one additional change { rather than telling each copy of the parent about only one copy to which it can forward descending tree operations, we send each copy the full location map of its children and allow a random selection from the full set of copies each time an operation is forwarded.

The results for a time lag of 10,000 and several dierent values of the threshold are shown in gure 7-13. Performance for the uniform access portion of the simulation is very similar to, but slightly lower than that of our initial model. There is slightly more overhead in sending location maps and making forwarding decisions, and this updated algorithm also must send a message to the master copy of each child.

When access is limited to 10% of the search space, the updated algorithm exhibits better performance for all values of the access threshold. For the cases with large values for the access threshold, the throughput shows a similarly shaped curve, but with consistently higher throughput. For the simulations with lower access threshold, throughput no longer tails o as the simulation progresses. With the elimination of the re-mapping bottleneck at the \pseudo-root", throughput is signicantly higher and can continue to grow as the cache contents are adjusted.

Operations (10 4 )

Throughput (10^-3 Operations/Cycle)

Access Threshold = 5Access Threshold = 10Access Threshold = 20Access Threshold = 50 Access Threshold = 100 048121620 0 10 20 30 40

Figure7-13:Throughput,ImprovedRe-mapping{Timelag=10,000

7.5 Future Directions

Inalgorithmsofthetypepresentedinthischapter,whenthecachereaches\steadystate",overheaddoesnotdroptozero.Instead,nodesareaddedandremovedfromcacheswithnosignicantnetchangeintheuseofreplication,merelyashuingofthecachecontents.Wehavebeguntoexplore\centralized"controlofreplicationtoreducethissteady-stateoverheadItisbasedonthedistributedcaptureofaccesscountsateachcopyofanode,butreplicationchangedecisionsaremadecentrallybythemastercopyofanode.Formuchofthetimethisnewalgorithmisactive,theonlyoverheadistheaccumulationofaccesscounts.Whenitistimetoreviewandpossiblychangereplication(determinedbyatimeintervaloranumberofaccessestoatreenode)rebalancingoftheB-treeisstartedattherootnode.Therootnodepollseachofitscopiesfortheirlocalaccesscount,whichisthenresettozero.Thesumofthecountsindicatesthenumberofoperationsthathavepassedthroughtherootnodesincethelastrebalanceandservesasthemeasurefor100%relativefrequencyofaccess.Asinthealgorithmtestedearlier,therootwouldgenerallybekeptfullyreplicated.When

118

any necessary changes in the replication of the root are completed, the new location map of the root and the count of the total number of operations is passed to each of its children. Each child begins a similar process to that performed at the root. It rst polls its copies for their access counts and sums the results. The ratio of that sum to the total operations through the system gives the relative frequency of access to the tree node. Relative frequency of access is translated into the desired number of copies using curves such as those developed in chapter 6.

If more copies are desired than currently exist, additional copies are sent to randomly selected processors not currently holding copies. If fewer copies are desired than currently exist, some processors are instructed to remove their copies. When these replication adjustments have been made, the node then remaps the copies of its parent to its own copies. Finally, it forwards its new location map and the total operation count to its own children.

While this algorithm can introduce a potentially heavy burden while it rebalances, between rebalancings it has virtually no overhead. Further, if there is little or no need for change during a rebalancing, overhead remains quite low. This algorithm would be weakest when the pattern of access changes quickly and dramatically.

7.6 Summary

In this chapter we have taken the results of prior chapters that indicated how replication could be optimally used given a static access pattern, and successfully applied those results using a dynamic replication control algorithm. We introduced a simple algorithm for dynamic control of B-tree replication in response to observed access patterns. Through simulation we showed that it does respond to observed access patterns and that it produces a replicated B-tree that, with the overhead of dynamic cache management turned o, matches the throughput produced by the best of our static replication algorithms. When dynamic cache management is active, of course, the overhead of management does reduce the throughput. We also introduced an update to this simple algorithm to eliminate potential bottlenecks and demonstrated that the update had a noticeably benecial eect.

Conclusions

Our objective in starting the work described in this report was to investigate two hypotheses:

1. Static Performance: Given a network, a B-Tree and a static distribution of search keys, it is possible to predict the performance provided by a static replication strategy.

2. Dynamic Balancing: Under certain changing load patterns, it is possible to apply the knowledge of static performance and change dynamically the replication of B-Tree nodes to increase overall performance.

In this work we have shown both of these hypotheses to be true. In doing so we have expanded on prior knowledge and assumptions on how replication can best be used with distributed B-trees.

In investigating the rst hypothesis, we demonstrated and described through modeling and simulation, the trade o between replication and performance in a distributed B-tree. Earlier work had used heuristics to select a single point for the appropriate amount of replication to use. We developed insights into the optimal relationship between relative frequency of access to a node and the number of copies to make of a node. While prior work assumed that replication should be proportional to relative frequency of access, we showed that the optimal relationship appears to be a slight variation of that { more copies should be made of frequently used nodes and fewer copies made of less frequently accessed nodes. We also showed that B-trees built using the prior heuristics, or any static placement algorithm, provided good performance (as measured by throughput) only when the pattern of access is fairly uniform. Finally, we showed

that, particularly for large B-trees, the prior heuristic approaches can use far more space than appears appropriate for the additional increase in performance.

We used the results from our analysis of static algorithms to direct our investigation of our second hypothesis on dynamic replication control. We introduced a simple algorithm for dynamic control of processor caches and demonstrated that dynamic replication control for B-trees is practical. This initial work presented the continuing challenge of lowering the overhead necessary to support B-tree caching.

The main avenue for future work is in dynamic control of replication. There are two di-rections future work can proceed. First, algorithms such as the one presented here can be ne tuned and adjusted to reduce overhead. They can also be extended to dynamically adapt the values of the controlling parameters in response to changing operation load. Second, radically dierent approaches such as the \centralized" balancing algorithm described in section 7.5 can be explored. In both cases the objective is create an algorithm that can react quickly to changes in the access pattern, but present low overhead when the access pattern is stable.

An additional direction for future work extends from our comments in chapter 6 that B-tree performance can be improved by creating a more balanced distribution of nodes and copies than random placement can provide. Future work on any dynamic replication control algorithm, and particularly the \centralized" approach of section 7.5, would benet from additional work on low cost load balancing techniques.

\Ideal" Path-to-Root Space Usage

In chapter 2 we indicated that the \ideal" path-to-root model will use space such that, on average, the number of copies per node

n

levels above the leaves, for a tree of depth

h

and branch factor

BF

, distributed across

P

processors, is:

average number of copies =

P

BF

ⁿ^,^h + 1^,

P=BF

To prove this result we rst introduce the symbol

m

to stand for the number of descendant leaf nodes below an intermediate node, and the symbol

lp

to stand for the average number of leaf nodes per processor. Given a node with

m

descendant leaf nodes, our objective is to determine the number of processors that one or more of the

m

leaves will be found on, and thus the total number of copies that must be made of the intermediate level node.

\Ideal" placement means that there are

lp

leaf nodes on each processor and that the logically rst

lp

nodes are on the rst processor, the logically second

lp

nodes are on the second processor, and so on. An \ideal" placement of

m

leaves covers a minimum of ^l^m_lp^mprocessors. Similarly, it covers a maximum of ^l^m_lp^m+ 1 processors.

We call an alignment the pattern of distribution of

m

nodes across processors, dened by the number of nodes placed on the rst processor in sequence. For example, if 7 nodes are placed on processors with 4 nodes per processor, there are 4 distinct patterns possible,

4 nodes on the rst processor in sequence, 3 on the next processor;

3 on the rst processor, 4 on the next processor;

m m-1

n*lp (m-1)

{

Figure A-1: Alignments Covering Maximum Processors

2 on the rst processor, 4 on the next processor, 1 on the next after that;

1 on the rst processor, 4 on the next processor, 2 on the next after that.

There are always

lp

possible alignments, then the cycle repeats. The maximum number of processors is covered for (

m

^,1)lp of the alignments, where

n

lp means

n

modulo

lp

. When an alignment has only one leaf node on the right-most processor it is covering, it will be covering the maximum number of processors. (The only exception is if (

m

^,1)lp = 0, in which case all alignments cover the minimum number of processors.) As the alignment is shifted right, there would be (

m

^,2)lp additional alignments covering the maximum number of processors. (See gure A-1). The minimum number of processors is covered by the rest of the alignments, or

lp

^,(

m

^,1)lp of the alignments.

Combining these pieces produces:

average number of copies =

lmlp^m(

lp

^,(

m

^,1)lp) + (^l_mlp^m+ 1)(

m

^,1)lp

lp

average number of copies =

lmlp^m

lp

+ (

m

^,1)lp

We evaluate this for two cases. First, when

m

lp = 0 (and

lp m >

0), ^l^m_lp^m

lp

m

and (

m

^,1)lp =

lp

^,1, the sum being

m

lp

^,1. Second, when

m

lp ⁶= 0,^l_mlp^m

lp

m

lp

m

and (

m

^,1)lp = (

m

^,1)lp, the sum again being

m

lp

^,1.

This yields:

average number of copies =

m

lp

^,1

lp

For a tree of depth

h

, with branch factor

BF

, on

P

processors, the average number of leaf nodes per processor is

BF

=P

. The number of descendant children for a node

n

levels above the leaves is

BF

ⁿ, thus:

average number of copies =

BF

ⁿ+

BF

=P

^,1

BF

=P

average number of copies =

P

BF

ⁿ^,^h + 1^,

P=BF

Dans le document Replication Control in Distributed B-Trees (Page 129-137)