• Aucun résultat trouvé

Object replication problems

Dans le document This page intentionally left blank (Page 196-200)

5 Terminology and basic algorithms

5.12 Object replication problems

We now describe a real-life graph problem based on web/data replication, which also requires dynamic distributed solutions.

1. Consider a weighted graph N L, whereinkusers are situated at some Nk⊆N nodes, andrreplicas of a data item can be placed at someNr⊆N. What is the optimal placement of the replicas ifk > rand the users access the data item in read-only mode?

A solution requires evaluating all placements ofNramong the nodes inN to identify min iN

kriNrdistir

i, wheredistir

i is the cost from nodeito ri, the replica nearest toi.

2. If we assume that the read accesses from each of the users inNk have a certain frequency (or weight), the minimization function would change.

3. If each edge has a certain bandwidth or capacity, that too has to be taken into account in identifying a feasible solution.

4. Now assume that a user access to the shared data is a read operation with probabilityx, and an update operation with probability 1−x. An update operation also requires all replicas to be updated. What is the optimal placement of the replicas ifk > r?

Many such graph problems do not always have polynomial solutions even in the static case. With dynamically changing input parameters, the case appears even more hopeless for an optimal solution. Fortunately, heuristics can often be used to provide good solutions.

5.12.1 Problem definition

In a large distributed system, data replication is useful for rapid access to data and for fault-tolerance. Here we look at Wolfson et al.’s optimal data replication strategy that is dynamic in that it adapts to the read and write pat-terns from the different nodes [37]. Let the network be modeled by the graph V E, and let us focus on a single object for simplicity. Define areplication

177 5.12 Object replication problems

scheme as a subset R of V such that each node in R has a replica of the object. Letri andwi denote the rates of reads and writes issued by node i.

Letcriandcwidenote the cost of a read and write issued by nodei. Let denote the set of all possible replication schemes. The goal is to minimize the cost of the replication scheme:

min

The algorithm assumes one copy serializability, which can be implemented by the read-one-write-all (ROWA) policy. ROWA can be strictly implemented in conjunction with a concurrency control mechanism such as two-phase locking; however, lazy propagation can also be used for weaker semantics.

5.12.2 Algorithm outline

For arbitrary graph topologies, minimizing the cost as in Eq. (5.3) is NP-complete. So we assume a tree topology T, as shown in Figure 5.14.

The nodes in the replication scheme R are shown in the ellipse. If T is allowed to be atree overlay T on the network topology, then all algorithm communication is confined to the overlay. Conceptually, the set of nodes R containing the replicas is an amoeba-like connected subgraph that moves around the overlay tree T towards the “center of gravity” of the read and write activity. The amoeba-like subgraph expands when the relative cost of the reads is more than that of writes, and shrinks as the relative cost of writes is more than that of reads, reaching an equilibrium under steady state activity.

This equilibrium-state subgraph for the replication scheme is optimal. The algorithm executes in steps that are separated by predetermined time periods or “epochs.” Irrespective of the initial replication scheme, the algorithm con-verges to the optimal replication scheme in (diameter+1) number of steps once the read-and-write pattern stabilizes.

5.12.3 Reads and writes

Read

A read operation is performed from the closest replica on the treeT. If the node issuing the read query or receiving a forwarded read query is not in

Figure 5.14 The tree topology and the replication schemeR.

Nodes inside the ellipse belong

to the replication scheme. A B

C

R, it forwards the query towards the nodes inR along the tree edges – for this, it suffices that aparentpointer point in the direction of the subgraphR.

Once the query reaches a node inR, the value read is returned along the same path.

Write

A write is performed to every replica in the current replication scheme R.

If a write operation is issued by a node not in R, the operation request is propagated to the closest node inR, like for the read operation request. Once a write operation reaches a nodeiinR, the local replica is updated, and the operation is propagated to all neighbors ofi that belong toR. To implement this, a node needs to track the set of its neighbors that belong to R. This is done using a variable,R-neighbor.

Implementation

To execute a read or write operation, a node needs to know (i) whether it is inR(so it can read/write from the local replica), (ii) which of its neighbors are inR (to propagate write requests), and (iii) if the node is not inR, then which of its neighbors is the unique node that leads on the tree to R (so it can propagate read and write requests). After appropriate initialization, this information is always locally available by tracking the status of the neighbor nodes.

5.12.4 Converging to an replication scheme

Within the replication schemeR, three types of nodes are defined:

• R-neighbor: Such a nodei belongs to Rbut has at least one neighbor j that does not belong toR.

• R-fringe: Such a nodei belongs to Rand has only one neighbor j that belongs toR. Thus,i is a leaf node in the subgraph of T induced by R andjis the parent ofi.

singleton:R =1 andi∈R.

Example In Figure5.14, node C is anR-fringenode, nodes Aand Eare bothR-fringeandR-neighbornodes, and nodeDis anR-neighbornode.

The algorithm uses the following three tests to adjust the replication scheme to converge to the optimal scheme:

Expansion test AnR-neighbor nodeiexamines each such neighborj to determine whetherj can be included in the replication scheme, using an expansion test. Node j is included in the replication scheme if the volume of reads coming from and viajis more than the volume of writes that would have to be propagated to j from i if j were included in the replication scheme.

179 5.12 Object replication problems

(variables)

integerNeighbors1 bi; //bineighbors in treeT topology integerRead_Received1 bi; //jth element gives # reads

// fromNeighborsj integerWrite_Received1 bi; //jth element gives # writes

// fromNeighborsj integerwritei readi; // # writes and # reads issued locally booleansuccess;

(1) Pidetermines which tests to execute at the end of each epoch:

(1a) ifiisR-neighborandR-fringethen (1b) ifexpansion testfailsthen

(1c) reduction test

(1d) else ifiisR-neighborandsingletonthen (1e) ifexpansion testfailsthen

(1f) switch test

(1g) else ifiisR-neighborand notR-fringeand notsingletonthen (1h) expansion test

(1i) else ifiisR−neighborandR-fringethen (1j) contraction test.

(2) Piexecutesexpansion test:

(2a) forj from1tobido

(2b) if Neighborsjnot inRthen

(2c) ifRead_Receivedj > writei+

k=1 bik=jWrite_Receivedkthen

(2d) send a copy of the object toNeighborsj;

success←−1;

(2e) return(success).

(3) Piexecutescontraction test:

(3a) letNeighborsjbe the only neighbor inR;

(3b) ifWrite_Receivedj > readi+ k=1 bik=jRead_Receivedkthen (3c) seek permission fromNeighborsjto exit fromR;

(3d) ifpermission receivedthen

(3e) success←−1; inform all neighbors;

(3f) return(success).

(4) Piexecutesswitch test:

(4a) forj from1tobido

(4b) ifRead_Receivedj+Write_Receivedj >

k=1 bik=jRead_Receivedk+Write_Receivedk+

readi+writeithen

(4c) transfer object copy toNeighborsj;success←−1;

inform all neighbors;

(4d) return(success).

Algorithm 5.15 Adaptive data replication algorithm executed by a nodePiin replication schemeR.

All variables exceptNeighborsare reset at the end of each epoch.Rstabilizes indiameter+1 epochs after the read–write rates stabilize.

Figure 5.15 Adaptive data replication tests executed by nodei. (a) Expansion test.

(b) Contraction test. (c) Switch

test. (a) (b) (c)

r

w r

w r+w

r+w

j i j i

i j

Example In Figure 5.15(a), nodei includesjin the replication scheme ifr > w.

Contraction test AnR-fringenodeiexamines whether it can exclude itself from the replication scheme, using acontraction test. Nodeiexcludes itself from the replication scheme if the volume of writes being propagated to it fromjis more than the volume of reads thatiwould have to forward toj ifi were to exit the replication scheme. Before exiting, nodei must seek permission fromjto prevent a situation whereR= i jand bothi andjsimultaneously have a successful contraction testand exit, leaving no copies of the object.

Example In Figure 5.15(b), node i excludes itself from the replication scheme ifw > r.

Switch test A singleton node i executes the switch test to determine if it can transfer its replica to some neighbor to optimize the objective function. A singleton node transfers its replica to a neighbor j if the volume of requests being forwarded by that neighbor is greater than the volume of requests the node would have to forward to that neighbor if the replica were shifted from itself to that neighbor. If such a node j exists, observe that it is uniquely identified among the neighbors of nodei.

Example In Figure5.15(c), nodeitransfers its replica tojifr+wbeing forwarded by j is greater thanr+w that node i receives from all other nodes.

The various tests are executed at the end of each “epoch.” AnR-neighbor node may also be an R-fringenode or a singleton node; in either case, the expansion testis executed first and if it fails, then thecontraction testor the switch testis executed. Note that a singleton node cannot be anR-fringenode.

The code is given in Algorithm5.15.

Implementation

Each node needs to be able to determine whether it is in R, whether it is an R-neighbor node, an R-fringe node, or a singleton node. This can be

Dans le document This page intentionally left blank (Page 196-200)