• Aucun résultat trouvé

Scheduling/Data Management Heuristics

N/A
N/A
Protected

Academic year: 2021

Partager "Scheduling/Data Management Heuristics"

Copied!
18
0
0

Texte intégral

(1)

HAL Id: hal-00759546

https://hal.inria.fr/hal-00759546

Preprint submitted on 3 Dec 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Scheduling/Data Management Heuristics

Frédéric Desprez, Sylvain Gault, Frédéric Suter

To cite this version:

(2)

MapReduce ANR-10-SEGI-001 Scalable data management for

Map/Reduce-based data-intensive applications on clouds and hybrid infrastructures.

D3.1 Scheduling/Data Management Heuristics

Frédéric DESPREZ Inria Grenoble Rhône-Alpes [email protected]

Sylvain GAULT Inria Grenoble Rhône-Alpes [email protected]

Frédéric SUTER CC IN2P3 [email protected]

Abstract

Data volume produced by scientic applications increase at a high speed. Some are expected to produce several petabyte per year. In order to process this amount of data, the computing power of several hundreds or thousands of machines have to be used at the same time. Regarding this, one of the biggest challenge is : how to program these machines in order to make them to collaborate for the same computation ?

One answer brought by Google is the MapReduce paradigm [7]. MapReduce has the

(3)

1 Introduction

Since the dawn of computers, the volume of data to be processed by applications has conti-nuously increased. This makes the processing requirement always bigger. Nowadays, we have reached the point where applications may have hundreds of terabytes to petabytes of data to process. For instance, the Large Synoptic Survey Telescope (LSST) is expected to produce 30

terabyte per night [9], and the Large Hadron Collider (LHC) is expected to produce around 15

petabytes per year [5]. Web indexing has also big requirement on computing power since a lare

amount of new data is produced and updated daily.

Processing huge amount of data is a real challenge by itself since the data have to be spread over several nodes just as the computation. In order to avoid the pain of reinventing the wheel for every processing task, some library already exists to help programmers in the production

of parallel and distributed applications, such as MPI [10] which make the usage of the network

simpler. Some framework have also been proposed to provide higher level of abstraction of the

distributed computation such as DIET [4], Dryad [6], BOINC [1] or MapReduce [7]. Providing

a high level of abstraction to the user require the software to be smart. Specically, for data-intensive computation, data management and scheduling are the biggest challenges.

Scheduling and resource management has been one of the main research subjects around Grids and Clouds. The MapReduce programming paradigm introduces new problems and it raises several research issues linked to data management as well as task scheduling. Below we mention some preliminary eorts concerning Cloud infrastructures. For Desktop Grids, no clear research results can be mentioned at this point for scheduling for MapReduce applications, since simply enabling MapReduce on Desktop Grids is still at an embryonic stage.

2 MapReduce

MapReduce is a framework for distributed computation introduced by Google in 2004. It aims at performing computation on a large number of nodes (several thousands) for processing large amount of data. Although the Google's implementation is not available to the public, some free implementations exists, especially Hadoop which has been reported to be used by several

organizations such as Yahoo ! [11] and Facebook [12].

MapReduce framework is based on the functions Map and Reduce which are commonly used in functional programming. In functional programming languages map is a high order function, it usually take a function and a list as argument. Then, map apply the function on every element of the list, and construct a new list. For instance, map could be use to multiply by 2 every element of a list. The reduce function (also known as fold) is another high order function which usually take a function and a list as argument. But unlike map, argument function must take two arguments, one is the result of a previous call of this function, the other is an element of the list. This function can be used to sum every element a list.

Within MapReduce, the spirit of these function is still there, but things are a bit dierent.

It works in three steps as follow as seen on Figure1.

 The input data is split into records composed of a key and a value. These key/value pairs are processed by the map function which produce 0, 1 or several key/value pairs. All the nodes which have been assigned a map task will process concurrently several parts of the input data. This is called the map phase.

 The data created by the map will then be physically grouped by key. This means that every node may send some data to every other node. This step is called the shue.  Once the data have been grouped by key, for each key, the reduce function is applied on

(4)

Data Map Data Map Data Map Data Map Reduce Reduce Reduce Result Result Result

Map phase Shuffle Reduce phase

Figure 1: MapReduce Workow.

set is usually reduced to one single value. Every node which have been assigned a reduce task will process the data it has received from the mappers. This is called, the reduce phase.

One classical example of MapReduce application is the wordcount where the user wants to count the number of occurrence of each word in a large set of documents. For this, the user could

write a map and a reduce function like the pseudo code given in Figure2.

Map (String key, String value) { /∗ key is the document name

∗ value is the document contents ∗/ for each word in value {

output the pair (word, 1); }

}

Reduce (String key, ListOfInteger list) { Integer result = 0;

for each value in list { result += value; }

output the list (result); }



Figure 2: Word Count Example.

In this pseudo code, the map function will output one pair (word, 1) per word in the docu-ment, which means that several pairs will be produced if the same word appear several time. Then the reduce function just sum the values up, thus counting the number of occurrence for each word. The nal result will be a list of pairs (word, count) which hold the wanted result.

(5)

More information can be found in the Google's publication.

But to make MapReduce work so easily from the user's point of view, the framework has to be optimized on several points. Indeed, it has to be fault tolerant, it should avoid data transfers between nodes for map computation, and it should schedule the transfers between the Map and the Reduce functions correctly in order to avoid that two nodes try to transfer the data to the same other node at the same time. This would increase the transfer time while better solutions exists.

3 Related Work

There already are some work around scheduling in MapReduce. Most of them focus on locality

of data for the Map computation. For instance, a delay scheduler [13] has been introduced to

improve the data locality by scheduling tasks on the y. It works by delaying tasks which cannot be run on a node holding a copy of the data and will allow gradually the data transfer to be rst rack-local, and nally completely remote.

Another approach for this data locality issue is BAR [8] which starts from an initial allocation

of tasks to the nodes for a given data distribution and then tune it to reach a better locality access. It takes into account various parameters like node workload and network usage to allow more or less data transfers between nodes.

The divisible load theory [3] is a methodology involving continuous models for dividing

workload over several processing nodes. Using continuous models usually allow simpler solving of problems than a discreet one, while still giving solutions close to optimal. This divisible load

theory has been applied to MapReduce scheduling by Berlinska and Drozdowski [2]. This is the

work we took as the basis.

4 Description of Previous Work by Berlinska and Drozdowski

4.1 Approach

4.1.1 Overview

In their paper [2], Berlinska and Drozdowski model a whole MapReduce job execution and

aim at optimizing the whole schedule length. This is done by assigning the right amount of data for each node to process. Moreover, their model include a bandwidth limitation which is taken into account by carefully and strictly scheduling data transfers between mappers and reducers.

4.1.2 Model

B & D described two views of the same model. The rst one describes the MapReduce jobs in terms of meaningful and actually measurable variables like communication startup time or computing rate. This is called microscopic view in their paper. As this view is quite hard to manipulate, a macroscopic view is derived from the previous one where variables or constant may actually be composed of several constants.

Table 1 is a list of useful variables used in the model.

In their model there are m nodes performing Map computations over a total of V bytes, and r nodes performing Reduce operations. For the sake of simplicity they are considered as distinct sets of machines, thought they could be the same with some transfers shorter than others because of localhost data transfers.

(6)

V Total amount of data to process

m Number of Map nodes

r Number of Reduce nodes

S Sequential startup time in seconds

C Transfer rate between Map nodes and Reduce nodes in seconds per byte

l Maximum number of simultaneous transfers

Ai Computing rate for i-th node in second per byte

αi Amount of data processed by i-th node

γ Data ratio between mapper output and input

Table 1: Main Variables of the Berlinska and Drozdowski's Model.

related to the time taken to upload the code to the nodes. For simplicity purpose, S is considered to be the same among all the nodes. Although not mentioned, most of the results of this report would still hold for initialization time distinct for each node as long as the startup order of nodes is the same.

In this model, the bandwidth is limited for each node and is named C (in seconds per byte). Since the data are supposed to be already present on the computing nodes (as usual in MapReduce) there is no need to transfer the data to the computing nodes. Only a rebalancing may be necessary but the time to transfer data for this, is not part of the model.

In addition to the bandwidth limitation, there is a bisection width limitation. That means that only l transfers at the same time can occur between the mappers and the reducers. This can happen in real situation in the case of a switch connecting more nodes that it can handle data transfers at full speed at the same time, or in the case of a tree-like network for which only the worst case is taken into account.

Every node i has a computing rate Ai expressed in seconds per byte. This is a macroscopic

variable that hides a computation overhead and a data transfer (to read from the disk) in addition to the actual pure-computing rate.

The αi variable is the amount of data (in bytes) to be processed by the i-th node. These

variables are actually computed by the algorithms in the paper.

Eventually, the γ variable is the ratio between the amount of input data to the mappers, and the amount of output data of the mappers. Its value depends on the kind of computation performed by the Map operation. It is computed as follows.

γ = output_size

input_size

Simplications and constraints. As a simplication, this paper considers that every mapper will start to transfer its data to mappers only after the end of the Map computation. Moreover, each mapper will send the same amount of data to every reducer. This is a strong assumption but it should statistically be met. This last simplication has the implication that every reducer will get the same total amount of data (one chunk from every mapper). Thus, the reducer computation time is not worth mentioning.

(7)

4.1.3 Resolution

The goal is to reduce the completion time. As the previous simplications say, the compu-tation time of the reducers is not taken into account. Thus, only the mapper compucompu-tation time and the transfer time are to be computed. In order to compute the transfer time, the transfer scheduling has to be known and deterministic. That's why Berlinska and Drozdowski chosed a time-interval based algorithm.

In this algorithm, a transfer from one mapper to one reducer cannot be interrupted. The length of the time intervals is determined by the largest transfer to do during this interval. And nally, the l mappers allowed to transfer at a given time are chosed by a simple round-robin.

This simple algorithm allows to express the completion time as a formula depending on the previously given parameters. Especially, the time to perform all the transfers depends on the largest transfer from a mapper to a reducer that occurs in every time-interval. That is why the

αi have to be computed by a linear program.

4.1.4 Issues

Because the number of constraints and variables of the linear program can be quite large. The number of variables is O(mr/l) and the number of constraint is O(mr). That makes the scheduler take up to several minutes or hours for a few hundred nodes with LP-solve on an Intel quadcore at 2.4GHz. This time is not acceptable for a real-life MapReduce framework.

Furthermore, as the parameters of the model may vary, some others parameters are hard to

predict. In particular, the Ai (computing throughput) may depend on the data itself. And the

γ parameter (data ratio of the mappers) is just impossible to predict in most cases. A way to

solve this is to continuously update the schedule with respect to the measurements reported by the nodes. But this can only be useful if the schedule can be recomputed fast enough.

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute t1t2t3t4t5t6t7t8t9t10t11t12t13t14t15t16t17t18 t19 t20 t21 t22 t23 t24 t25 t26t27t28t29t30t31t32t33t34t35t36t37t38 t39 t40 t41 t42 t43 t44 t45 t46t47t48t49t50t51t52t53t54t55t56t57t58 t59 t60 t61 t62 t63 t64 t65 t66t67t68t69t70t71t72t73t74t75t76t77t78 t79 t80 t81 t82

Figure 3: Schedule example as computed by the algorithm of Berlinska and Drozdowski.

And the chosen transfer scheduling algorithm, thought easy to write as a formula, uses

ineciently the network bandwidth. Figure3is a Gantt chart of occupation of mapper nodes as

(8)

5 Description of Our Algorithms

As previously said, the algorithm has to be made more dynamic because of the imprecision of the forecast that would be made about the computing time. But this dynamicity imply that the schedule has to be computed fast enough in order to apply its modications to the physical system before the computation is done.

The main reason why the problem has to be expressed as a linear program which takes so

much time to solve, is that it the transfer scheduler imply a dependence between every αi for

every time-interval.

5.1 Using a Linear System

One thing that can be remarked with the linear program, is that, sometime, it computes the

αi such that one transfer from the i-th machine will end just at the same time when αi+1 ends

its computation. Formally, this means that αi and αi+1 are computed such that :

iS + αiAi+

αiγC

r = (i + 1)S + αi+1Ai+1

This especially happens when every Ai are equals and Srm < γV C, which occurs quite often

since S is usually very small (a second or less) and V is usually quite big (in the order of TB).

When every Ai are equals, the above formula imply that αi < αi+1.

In a such case, the αi can be computed by a linear system as follows :

iS + αiAi+ αiγC r = (i + 1)S + αi+1Ai+1 for i < m m X i=1 αi= V

This system can be solved in O(m) time and O(m) space.

Experiments show that in such a case the linear system compute the same αi as the linear

program except for the oat rounding errors. Figure 4(a) shows the schedule computed by

the linear program (correcting some minor bugs from the original algorithm from Berlinska and Drozdowski), the numbers on the gray transfers are the target reducers for the transfers.

Figure 4(b)shows the schedule computed by the linear system above. As it can be seen, there

is no noticeable dierence between the results as long as αi< αi+1 holds.

When the condition Srm < γV C does not hold, the results of the linear system are very

dierent from the ones computed by the linear program. Figure 5(a)shows the behavior of the

linear program on another set of parameters, and Figure 5(b) shows the results of the linear

system. It can be noticed that the completion time is greater with the linear system than with the linear program. This is mainly due to the gap between the end of computation and the start

of the rst transfer. But another transfer scheduler exhibits better performance (see Section5.2).

It should also be noticed that less nodes are used by the linear system than the linear program.

This is because the computation of the amount of data to process αi may output negative results.

(9)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 6 5 4 3 2 7 6 5 4 3 8 7 6 5 4 9 8 7 6 5 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 12 11 10 9 8 13 12 11 10 9 14 13 12 11 10 15 14 13 12 11 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 18 17 16 15 14 19 18 17 16 15 19 18 17 16 19 18 17 19 18 19 1819 1819 1819 1819 1819 1819 1819 1819 1819 1819 18 19 18 19 18 19 18 19 18 19 t1t2t3t4t5t6t7t8t9t10t11t12t13t14t15t16t17t18 t19 t20 t21 t22 t23 t24 t25 t26t27t28t29t30t31t32t33t34t35t36t37t38 t39 t40 t41 t42 t43 t44 t45 t46t47t48t49t50t51t52t53t54t55t56t57t58 t59 t60 t61 t62 t63 t64 t65 t66t67t68t69t70t71t72t73t74t75t76t77t78 t79 t80 t81 t82

(a) linear program

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 6 5 4 3 2 7 6 5 4 3 8 7 6 5 4 9 8 7 6 5 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 12 11 10 9 8 13 12 11 10 9 14 13 12 11 10 15 14 13 12 11 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 18 17 16 15 14 19 18 17 16 15 19 18 17 16 19 18 17 19 18 19 1819 1819 1819 1819 1819 1819 1819 1819 1819 1819 18 19 18 19 18 19 18 19 18 19 t1t2t3t4t5t6t7t8t9t10t11t12t13t14t15t16t17t18 t19 t20 t21 t22 t23 t24 t25 t26t27t28t29t30t31t32t33t34t35t36t37t38 t39 t40 t41 t42 t43 t44 t45 t46t47t48t49t50t51t52t53t54t55t56t57t58 t59 t60 t61 t62 t63 t64 t65 t66t67t68t69t70t71t72t73t74t75t76t77t78 t79 t80 t81 t82 (b) linear system

(10)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute 0 1 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 13 12 11 10 9 8 7 6 5 4 3 2 1 0 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 40 39 38 37 36 35 34 33 32 31 30 29 28 27 40 39 38 37 36 35 34 33 32 31 30 29 28 40 39 38 37 36 35 34 33 32 31 30 29 40 39 38 37 36 35 34 33 32 31 30 40 39 38 37 36 35 34 33 32 31 40 39 38 37 36 35 34 33 32 40 39 38 37 36 35 34 33 40 39 38 37 36 35 34 40 39 38 37 36 35 40 39 38 37 36 40 39 38 37 40 39 38 40 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11t12 t13t14 t15 t16t17 t18t19 t20 t21t22 t23t24 t25 t26t27 t28t29 t30t31t32t33t34t35t36t37t38t39t40t41t42t43t44t45t46t47t48t49t50t51t52t53t54t55t56t57t58t59t60t61t62t63t64t65t66t67t68t69t70t71t72t73t74t75t76t77t78t79t80t81t82t83t84t85t86t87t88t89t90t91t92t93t94t95t96t97t98t99t100t101t102t103t104t105t106t107t108t109t110t111t112t113t114t115t116t117t118t119t120t121t122t123t124

(a) linear program

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute unsable node unsable node unsable node unsable node unsable node unsable node 0 1 021 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 13 12 11 10 9 8 7 6 5 4 3 2 1 0 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 40 39 38 37 36 35 34 33 32 31 30 29 28 27 40 39 38 37 36 35 34 33 32 31 30 29 28 40 39 38 37 36 35 34 33 32 31 30 29 40 39 38 37 36 35 34 33 32 31 30 40 39 38 37 36 35 34 33 32 31 40 39 38 37 36 35 34 33 32 40 39 38 37 36 35 34 33 40 39 38 37 36 35 34 40 39 38 37 36 35 40 39 38 37 36 40 39 38 37 40 39 38 40 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10t11t12t13t14t15t16t17t18t19t20t21t22t23t24t25t26t27t28t29t30t31t32t33t34t35t36t37t38t39t40t41t42t43t44t45t46t47t48t49t50t51t52t53t54t55t56t57t58t59t60t61t62t63t64t65t66t67t68t69t70t71t72t73t74t75t76t77t78t79t80t81t82t83t84t85t86t87t88t89t90t91t92t93t94t95t96t97t98t99t100t101t102t103t104t105t106 (b) linear system

(11)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 6 5 4 3 2 7 6 5 4 3 8 7 6 5 4 9 8 7 6 5 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 6 11 10 9 8 7 12 11 10 9 8 13 12 11 10 9 14 13 12 11 10 15 14 13 12 11 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 12 17 16 15 14 13 18 17 16 15 14 19 18 17 16 15 19 18 17 16 19 18 17 19 18 19 1819 1819 1819 1819 1819 1819 1819 1819 1819 1819 18 19 18 19 18 19 18 19 18 19 t1t2t3t4t5t6t7t8t9t10t11t12t13t14t15t16t17t18 t19 t20 t21 t22 t23 t24 t25 t26t27t28t29t30t31t32t33t34t35t36t37t38 t39 t40 t41 t42 t43 t44 t45 t46t47t48t49t50t51t52t53t54t55t56t57t58 t59 t60 t61 t62 t63 t64 t65 t66t67t68t69t70t71t72t73t74t75t76t77t78 t79 t80 t81 t82

(a) original scheduler

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 0 7 1 6 5 4 3 1 0 8 2 7 6 5 4 2 1 3 9 8 0 7 6 5 4 3 2 10 1 0 9 8 7 6 5 4 3 2 11 1 10 0 9 8 7 6 5 4 3 12 2 11 1 10 0 9 8 7 6 5 4 3 13 2 12 1 11 10 0 9 8 7 6 5 4 3 14 13 2 12 1 11 10 0 9 8 7 6 5 4 3 15 2 14 13 1 12 11 10 0 9 8 7 6 5 4 3 16 15 2 14 13 1 12 11 0 10 9 8 7 6 5 4 3 17 16 2 15 14 1 13 12 11 0 10 9 8 7 6 5 4 18 3 17 2 16 15 14 1 13 12 11 10 0 9 8 7 6 5 4 19 3 18 17 2 16 15 1 14 13 12 11 0 10 9 8 7 6 5 4 19 3 18 17 2 16 15 14 1 13 12 11 10 0 9 8 7 6 19 5 4 18 17 3 16 15 14 2 13 12 11 1 10 9 8 7 6 19 5 18 17 4 16 15 14 3 13 12 2 11 10 9 8 7 19 18 6 17 5 16 15 14 4 13 3 12 11 10 9 8 19 7 18 17 6 16 15 5 14 13 4 12 11 10 9 19 8 18 17 7 16 15 6 14 13 5 12 11 10 19 9 18 17 8 16 7 15 14 6 13 12 11 10 19 18 9 17 16 8 15 7 14 13 12 11 19 18 10 17 16 9 15 8 14 13 19 12 11 18 17 10 16 15 9 14 13 19 18 12 11 17 16 15 10 14 19 13 18 12 17 16 19 15 14 11 18 13 17 16 19 15 18 12 17 14 16 19 18 13 15 17 19 16 18 14 19 17 15 18 16 19 17 18 19 t0t1t2t3t4t5 t6 t7 t8 t9 t10 t11 (b) new scheduler

(12)

5.2 Transfer Scheduling

Figure 6(a)shows the bandwidth usage over the time just under the classic Gantt schedule

for the time-interval based scheduler. This chart shows a lot of holes which could be lled with a better transfer scheduling algorithm. As a simplication purpose, every mapper does its transfers in the same order, starting from the reducer 1 to the reducer r. And a transfer from mapper i to reducer j cannot start before the transfer from mapper i − 1 to reducer j has nished. The last constraint is an easy way to enforce the constraint that two mappers should not transfer to the same reducer at the same time while reducing the complexity of scheduling. In order to avoid a bad bandwidth usage and still enforce the above constraints, a rst heuristic could be to try to perform the transfer of the latest nodes as soon as possible, which means, try to make every node transfer to the same reducer at the same time. But given the constraints, an implementation of this heuristic could easily lead to a situation where the rst mapper has been delayed and when it starts its transfers, every other node has to wait for it. That's why a better heuristic is to try to keep the quantity i + j among all the nodes constant, with i the mapper number and j the target reducer for the transfer last started. Which means that when mapper 1 perform its transfers to reducer 5, mapper 2 would transfer to reducer 4.

Sketch of the algorithm : When a transfer from mapper i to reducer j ends, for every idle

i′ mapper and its next target reducer j′ compute pi = i+ j′. Then select the nodes with the

lowest previously computed pi. These nodes are considered the most late and their transfers

should start as soon as possible. If several nodes are equally late, then the one with the biggest

αi is chosen because, among all these selected nodes, the ones with the biggest amount of data

to transfer will be most probably the ones to delay the end of all transfers.

This second heuristic should limit the case when dependencies prevent a full bandwidth usage. And actually, the case does not happen often within the simulations. The result of this

second heuristic can be seen in Figure 6(b)for the case with αi < αi+1. In this example, a 30%

improvement has been achieved and the bandwidth usage show no hole and is at its maximum capacity during the main part of its transfers.

When αi >= αi+1 dierent things happens. First, when the amount of data to be processed

by each node αi is computed by the linear program, it may happen that the maximum network

bandwidth cannot be reached. Figure 7(a) show the original schedule along with its implied

bandwidth usage and Figure 7(b) show the behavior of the new transfer scheduler when the

αi are yet computed by the linear program. It can be seen on the later that the maximum

bandwidth shown by the red line is never reached. Actually, the αi have been computed for the

rst time-interval based transfer scheduler. That's why we obtain such a result. Still, in this example, it achieves a 16% improvment over the completion time.

When the αi are computed by the linear system and the transfers are scheduled by the

original time-interval based one, the completion time is not much dierent from the original

transfer scheduler with the αi computed by the linear program. The result can be seen in

Figure 8(a), and shows a performance decrease of less than 1%. Just like the reverse hybrid

case (shown in Figure 7(b)) the computation of the αi has not been designed for this transfer

scheduler. And nally, Figure 8(b) show the result when the amount of data per node αi is

computed by the linear system and the transfers are scheduled by the new algorithm. It achieve an improvment of 17% as compared to the original algorithm.

5.3 Transfer Preemption

(13)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute 0 1 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 13 12 11 10 9 8 7 6 5 4 3 2 1 0 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 40 39 38 37 36 35 34 33 32 31 30 29 28 27 40 39 38 37 36 35 34 33 32 31 30 29 28 40 39 38 37 36 35 34 33 32 31 30 29 40 39 38 37 36 35 34 33 32 31 30 40 39 38 37 36 35 34 33 32 31 40 39 38 37 36 35 34 33 32 40 39 38 37 36 35 34 33 40 39 38 37 36 35 34 40 39 38 37 36 35 40 39 38 37 36 40 39 38 37 40 39 38 40 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11t12 t13t14 t15 t16t17 t18t19 t20 t21t22 t23t24 t25 t26t27 t28t29 t30t31t32t33t34t35t36t37t38t39t40t41t42t43t44t45t46t47t48t49t50t51t52t53t54t55t56t57t58t59t60t61t62t63t64t65t66t67t68t69t70t71t72t73t74t75t76t77t78t79t80t81t82t83t84t85t86t87t88t89t90t91t92t93t94t95t96t97t98t99t100t101t102t103t104t105t106t107t108t109t110t111t112t113t114t115t116t117t118t119t120t121t122t123t124

(a) original scheduler

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute 0 1 2 3 0 4 152637 0 4 8 1 5 9 2 6 10 3 7 11 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 0 4 8 12 16 1 5 9 13 17 2 6 10 14 18 3 7 11 15 19 0 4 8 12 1620 1 5 9 13 17 21 2 6 10 14 1822 3 7 11 15 19 0 23 4 8 12 16 20 1 24 5 9 13 17 21 2 25 6 10 14 18 22 3 26 7 11 15 19 0 23 4 27 8 12 16 20 1 24 5 28 9 13 17 21 2 25 6 29 10 14 18 22 3 26 7 30 11 15 19 0 23 4 27 8 31 12 16 20 1 24 5 28 9 32 13 17 21 2 25 6 29 10 33 14 18 22 3 26 7 30 11 34 15 19 0 23 4 27 8 31 12 35 16 20 1 24 5 28 9 32 13 36 17 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 1822 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 1822 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 1822 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 1822 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 1 24 5 28 9 32 13 36 17 40 21 2 25 6 29 10 33 14 37 18 22 3 26 7 30 11 34 15 38 19 0 23 4 27 8 31 12 35 16 39 20 0 1 24 5 28 9 32 13 36 17 40 21 0 1 2 25 6 29 10 33 14 37 18 1 0 22 2 3 26 7 30 11 34 15 38 1 19 0 2 23 3 4 27 8 31 12 35 16 39 1 2 20 0 3 24 4 5 28 9 32 13 36 1 2 17 40 0 3 21 4 25 5 6 29 10 2 33 1 14 37 3 18 0 4 22 5 26 6 7 2 30 11 3 34 1 15 38 4 19 0 5 23 6 2 3 27 7 8 31 1 12 4 35 16 39 5 3 0 20 6 2 24 7 4 28 8 9 1 32 13 5 36 3 17 40 0 6 4 2 21 7 25 8 5 29 1 9 10 3 4 33 14 6 37 18 2 0 7 5 22 4 8 26 1 3 9 6 30 10 11 5 34 15 7 2 38 4 19 8 6 23 5 3 9 27 10 7 31 11 4 5 12 6 35 16 8 39 20 5 9 7 24 6 10 28 11 8 32 12 6 13 7 36 17 9 40 21 6 10 8 25 7 11 29 12 9 33 13 7 14 8 37 18 10 22 7 11 9 26 8 12 30 13 10 34 14 8 15 9 38 19 11 23 8 12 10 27 9 13 31 14 11 35 15 9 16 10 39 20 12 24 9 13 11 28 10 14 32 15 12 36 16 10 17 11 40 21 13 25 10 14 12 29 11 15 33 16 13 37 17 11 18 12 22 14 26 11 15 13 30 12 16 34 17 14 38 18 12 19 13 23 15 27 12 16 14 31 13 17 35 18 15 39 19 13 20 14 24 16 28 13 17 15 32 14 18 36 19 16 40 20 14 21 15 25 17 29 14 18 16 33 15 19 37 20 17 21 15 22 16 26 18 30 15 19 17 34 16 20 38 21 18 22 16 23 17 27 19 31 16 20 18 35 17 21 39 22 19 23 17 24 18 28 20 32 17 21 19 36 18 22 40 23 20 24 18 25 19 29 21 33 18 22 20 37 19 23 24 21 25 19 26 20 30 22 34 19 23 21 38 20 24 25 22 26 20 27 21 31 23 35 20 24 22 39 21 25 26 23 27 21 28 22 32 24 36 21 25 23 40 22 26 27 24 28 22 29 23 33 25 37 22 26 24 23 27 28 25 29 23 30 24 34 26 38 23 27 25 24 28 29 26 30 24 31 25 35 27 39 24 28 26 25 29 30 27 31 25 32 26 36 28 40 25 29 27 26 30 31 28 32 26 33 27 37 29 26 30 28 27 31 32 29 33 27 34 28 38 30 27 31 29 28 32 33 30 34 28 35 29 39 31 28 32 30 29 33 34 31 35 29 36 30 40 32 29 33 31 30 34 35 32 36 30 37 31 33 30 34 32 31 35 36 33 37 31 38 32 34 31 35 33 32 36 37 34 38 32 39 33 35 32 36 34 33 37 38 35 39 33 40 34 36 33 37 35 34 38 39 36 40 34 35 37 34 38 36 35 39 40 37 35 36 38 35 39 37 36 40 38 36 37 39 36 40 38 37 39 37 38 40 37 39 38 40 38 39 38 40 39 39 40 39 40 40 40 t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11t12 t13t14 t15t16 t17t18 t19t20 t21t22 t23t24 t25t26 t27t28 t29t30 t31t32 t33t34 t35t36 t37t38 t39t40 t41t42 t43t44 t45t46 t47t48t49t50t51t52t54t53t55t56t57t58t59t60t61t62t63t64t65t66t67t70t69t68t71t72t73t74t75t76t77t78t79t80t81t84t83t82t85t86t87t88t89t90t91t92t93t94t95t96t97t98t99t100t101t102t103t104t105t106t107t108t109t110t111t112t113t114t115t116t117t118t119t120t121t122t123t124t125t126t127t128t129t130t131t132t133t134t135t136t137t138t139t140t141t142t143t144t145t146t147t148t149t150t151t152t153t154t155t156t157t158t159

(b) new transfer scheduler

(14)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute unsable node unsable node unsable node unsable node unsable node unsable node 0 1 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 10 9 8 7 6 5 4 3 2 1 0 11 10 9 8 7 6 5 4 3 2 1 0 12 11 10 9 8 7 6 5 4 3 2 1 0 13 12 11 10 9 8 7 6 5 4 3 2 1 0 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 40 39 38 37 36 35 34 33 32 31 30 29 28 27 40 39 38 37 36 35 34 33 32 31 30 29 28 40 39 38 37 36 35 34 33 32 31 30 29 40 39 38 37 36 35 34 33 32 31 30 40 39 38 37 36 35 34 33 32 31 40 39 38 37 36 35 34 33 32 40 39 38 37 36 35 34 33 40 39 38 37 36 35 34 40 39 38 37 36 35 40 39 38 37 36 40 39 38 37 40 39 38 40 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10t11t12t13t14t15t16t17t18t19t20t21t22t23t24t25t26t27t28t29t30t31t32t33t34t35t36t37t38t39t40t41t42t43t44t45t46t47t48t49t50t51t52t53t54t55t56t57t58t59t60t61t62t63t64t65t66t67t68t69t70t71t72t73t74t75t76t77t78t79t80t81t82t83t84t85t86t87t88t89t90t91t92t93t94t95t96t97t98t99t100t101t102t103t104t105t106

(a) original transfer scheduler

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute S compute unsable node unsable node unsable node unsable node unsable node unsable node 0 0 1 0 1 2 0 1 2 3 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 0 1 23 4 5 6 7 8 0 1 2 34 5 0 6 7 8 9 1 2 3 4 0 5 6 1 7 8 9 10 2 3 0 4 5 1 6 7 2 8 9 10 11 0 3 4 1 5 6 2 0 7 8 3 9 10 11 12 1 4 5 2 0 6 7 3 1 8 9 4 10 11 0 12 13 2 5 6 3 1 7 0 8 4 2 9 10 5 11 12 1 13 14 3 6 0 7 4 2 8 1 9 5 0 3 10 11 6 12 13 2 14 15 4 7 1 0 8 5 3 9 2 10 6 0 1 4 11 12 7 13 14 3 15 16 5 8 0 2 1 9 6 4 10 0 3 11 7 1 2 5 12 13 8 14 0 15 4 16 17 6 9 1 3 2 10 0 7 5 11 1 4 12 8 0 2 3 6 13 14 9 0 15 1 16 5 17 18 7 10 2 4 0 3 11 1 8 6 0 12 2 0 5 13 9 1 3 4 0 7 14 0 15 10 0 1 16 0 2 17 6 18 19 8 11 3 5 1 4 12 2 9 7 1 13 3 1 6 14 10 2 4 5 1 8 15 1 16 11 1 2 17 1 3 18 7 19 20 9 12 4 6 2 5 13 3 10 8 2 14 4 2 7 15 11 3 5 6 2 9 16 2 17 12 2 3 18 2 4 19 8 20 21 10 13 5 7 3 6 14 4 11 9 3 15 5 3 8 16 12 4 6 7 3 10 17 3 18 13 3 4 19 3 5 20 9 21 22 11 14 6 8 4 7 15 5 12 10 4 16 6 4 9 17 13 5 7 8 4 11 18 4 19 14 4 5 20 4 6 21 10 22 23 12 15 7 9 5 8 16 6 13 11 5 17 7 5 10 18 14 6 8 9 5 12 19 5 20 15 5 6 21 5 7 22 11 23 24 13 16 8 10 6 9 17 7 14 12 6 18 8 6 11 19 15 7 9 10 6 13 20 6 21 16 6 7 22 6 8 23 12 24 25 14 17 9 11 7 10 18 8 15 13 7 19 9 7 12 20 16 8 10 11 7 14 21 7 22 17 7 8 23 7 9 24 13 25 26 15 18 10 12 8 11 19 9 16 14 8 20 10 8 13 21 17 9 11 12 8 15 22 8 23 18 8 9 24 8 10 25 14 26 27 16 19 11 13 9 12 20 10 17 15 9 21 11 9 14 22 18 10 12 13 9 16 23 9 24 19 9 10 25 9 11 26 15 27 28 17 20 12 14 10 13 21 11 18 16 10 22 12 10 15 23 19 11 13 14 10 17 24 10 25 20 10 11 26 10 12 27 16 28 29 18 21 13 15 11 14 22 12 19 17 11 23 13 11 16 24 20 12 14 15 11 18 25 11 26 21 11 12 27 11 13 28 17 29 30 19 22 14 16 12 15 23 13 20 18 12 24 14 12 17 25 21 13 15 16 12 19 26 12 27 22 12 13 28 12 14 29 18 30 31 20 23 15 17 13 16 24 14 21 19 13 25 15 13 18 26 22 14 16 17 13 20 27 13 28 23 13 14 29 13 15 30 19 31 32 21 24 16 18 14 17 25 15 22 20 14 26 16 14 19 27 23 15 17 18 14 21 28 14 29 24 14 15 30 14 16 31 20 32 33 22 25 17 19 15 18 26 16 23 21 15 27 17 15 20 28 24 16 18 19 15 22 29 15 30 25 15 16 31 15 17 32 21 33 34 23 26 18 20 16 19 27 17 24 22 16 28 18 16 21 29 25 17 19 20 16 23 30 16 31 26 16 17 32 16 18 33 22 34 35 24 27 19 21 17 20 28 18 25 23 17 29 19 17 22 30 26 18 20 21 17 24 31 17 32 27 17 18 33 17 19 34 23 35 36 25 28 20 22 18 21 29 19 26 24 18 30 20 18 23 31 27 19 21 22 18 25 32 18 33 28 18 19 34 18 20 35 24 36 37 26 29 21 23 19 22 30 20 27 25 19 31 21 19 24 32 28 20 22 23 19 26 33 19 34 29 19 20 35 19 21 36 25 37 38 27 30 22 24 20 23 31 21 28 26 20 32 22 20 25 33 29 21 23 24 20 27 34 20 35 30 20 21 36 20 22 37 26 38 39 28 31 23 25 21 24 32 22 29 27 21 33 23 21 26 34 30 22 24 25 21 28 35 21 36 31 21 22 37 21 23 38 27 39 40 29 32 24 26 22 25 33 23 30 28 22 34 24 22 27 35 31 23 25 26 22 29 36 22 37 32 22 23 38 22 24 39 28 40 30 33 25 27 23 26 34 24 31 29 23 35 25 23 28 36 32 24 26 27 23 30 37 23 38 33 23 24 39 23 25 40 29 31 34 26 28 24 27 35 25 32 30 24 36 26 24 29 37 33 25 27 28 24 31 38 24 39 34 24 25 40 24 26 30 32 35 27 29 25 28 36 26 33 31 25 37 27 25 30 38 34 26 28 29 25 32 39 25 40 35 25 26 25 27 31 33 36 28 30 26 29 37 27 34 32 26 38 28 26 31 39 35 27 29 30 26 33 40 26 36 26 27 26 28 32 34 37 29 31 27 30 38 28 35 33 27 39 29 27 32 40 36 28 30 31 27 34 27 37 27 28 27 29 33 35 38 30 32 28 31 39 29 36 34 28 40 30 28 33 37 29 31 32 28 35 28 38 28 29 28 30 34 36 39 31 33 29 32 40 30 37 35 29 31 29 34 38 30 32 33 29 36 29 39 29 30 29 31 35 37 40 32 34 30 33 31 38 36 30 32 30 35 39 31 33 34 30 37 30 40 30 31 30 32 36 38 33 35 31 34 32 39 37 31 33 31 36 40 32 34 35 31 38 31 31 32 31 33 37 39 34 36 32 35 33 40 38 32 34 32 37 33 35 36 32 39 32 32 33 32 34 38 40 35 37 33 36 34 39 33 35 33 38 34 36 37 33 40 33 33 34 33 35 39 36 38 34 37 35 40 34 36 34 39 35 37 38 34 34 34 35 34 36 40 37 39 35 38 36 35 37 35 40 36 38 39 35 35 35 36 35 37 38 40 36 39 37 36 38 36 37 39 40 36 36 36 37 36 38 39 37 40 38 37 39 37 38 40 37 37 37 38 37 39 40 38 39 38 40 38 39 38 38 38 39 38 40 39 40 39 39 40 39 39 39 40 39 40 40 40 40 40 40 40 t0 t1 t2 t3t4t5 t6t7t8 t9 t10t11t12t13t14t15t16t17t18t19t20t21t22t23t24t25t26t27t28t29t30t31t32t33t34t35t36t37t38t39t40t41t42t43t44t45t46t47t48t49t50t51t52t53t54t55t56t57t58t59t60t61t62t63t64t65t66t67t68t69t70t71t72t73t74t75t76t77t78t79t80t81t82t83t84t85t86t87t88t89t90t91 (b) new scheduler

(15)

One way to x this priority inversion is to allow transfer preemption. This means that when a node wants to start a transfer, the scheduler will start it if there is currently less than l transfers occuring. If the transfer count limit is already reached, then the scheduler will search for a transfer with a lower priority that will be suspended. This transfer suspension can actually happen in two cases. The rst case is when a mapper nish its computation and want to start its rst transfer, a transfer with a lower priority can be preempted. The second case is when

a mapper i with a large αi cannot transfer its data to reducer j because the mapper i − 1 is

already transferring to the same reducer j. Then the scheduler starts a transfer with a lower priority from a mapper k, and when the transfer from i − 1 to j is done, then the mapper i can start its transfer to reducer j and mapper i − 1 can start its transfer to reducer j + 1. since transfers from i and i − 1 have a higher priority than those from k, the transfer from k can be suspended.

In the current model, this algorithm with transfer preemption cannot perform worse than the previous one. Indeed, the above-mentioned case where preemption happen are times where a node with big amount of data would wait for a node with a small one. Keeping in mind the constraints on the order of the transfers, the last transfer will always be the one from mapper m to reducer r. This implies that the only thing that can happen is the big transfers start earlier thus making the completion time better.

On Figures 9(a) and 9(b), it can be seen that the algorithm with preemption performs

slightly better since the bandwidth usage is maintained at is maximum capacity a bit longer. This represent a bit less than 2% improvement. This is, indeed, negligible but cannot be worse that not preempting transfers in the current model. Actually the model does not take latency into account, thus, the gain of preempting unimportant transfers may be mitigated by the cost of the preemption itself.

Moreover, on Figure 9(b), it can be seen that the rst transfers do not start just after the

computation as nished whereas the αi have been computed such that the computation would

end just when its rst transfer can actually occur. This is due to the fact that the network has already reached its maximal capacity and is used by transfer with higher or equal priority.

In order to deal with this problem and make the big transfers start as soon as the computation nishes, we tried to relax the constraints on the order of transfers between nodes. Which means that when a big transfer is ready, it is allowed to interrupt any other transfer. The resulting

schedule is shown in Figure10(b). This actually show performance quite similar to the previous

algorithms whose result has been repeated in Figure10(a) for comparison.

6 Conclusion and Future Work

To conclude, these algorithms for either computing the αi parameters and for scheduling the

transfers outperform the ones from Berlinska and Drozdowski on several points. The computation

of the αi is faster since it relies on a linear system solved in time O(m) instead of a linear

program. Although in some cases the results are quite dierent, this does not aect much the overall performance.

The transfer scheduler also performs better than the one from Berlinska and Drozdowski since it does as much as possible to maximize bandwidth usage. Moreover, it does not involve synchronization barriers which would be quite costly in term of latency. On the other side, it is hard to predict anything and proving anything about it looks non-trivial.

(16)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 3 2 0 7 1 6 5 4 3 1 0 8 2 7 6 5 4 2 1 3 9 8 0 7 6 5 4 3 2 10 1 0 9 8 7 6 5 4 3 2 11 1 10 0 9 8 7 6 5 4 3 12 2 11 1 10 0 9 8 7 6 5 4 3 13 2 12 1 11 10 0 9 8 7 6 5 4 3 14 13 2 12 1 11 10 0 9 8 7 6 5 4 3 15 2 14 13 1 12 11 10 0 9 8 7 6 5 4 3 16 15 2 14 13 1 12 11 0 10 9 8 7 6 5 4 3 17 16 2 15 14 1 13 12 11 0 10 9 8 7 6 5 4 18 3 17 2 16 15 14 1 13 12 11 10 0 9 8 7 6 5 4 19 3 18 17 2 16 15 1 14 13 12 11 0 10 9 8 7 6 5 4 19 3 18 17 2 16 15 14 1 13 12 11 10 0 9 8 7 6 19 5 4 18 17 3 16 15 14 2 13 12 11 1 10 9 8 7 6 19 5 18 17 4 16 15 14 3 13 12 2 11 10 9 8 7 19 18 6 17 5 16 15 14 4 13 3 12 11 10 9 8 19 7 18 17 6 16 15 5 14 13 4 12 11 10 9 19 8 18 17 7 16 15 6 14 13 5 12 11 10 19 9 18 17 8 16 7 15 14 6 13 12 11 10 19 18 9 17 16 8 15 7 14 13 12 11 19 18 10 17 16 9 15 8 14 13 19 12 11 18 17 10 16 15 9 14 13 19 18 12 11 17 16 15 10 14 19 13 18 12 17 16 19 15 14 11 18 13 17 16 19 15 18 12 17 14 16 19 18 13 15 17 19 16 18 14 19 17 15 18 16 19 17 18 19 t0t1t2t3t4t5 t6 t7 t8 t9 t10 t11

(a) without transfer preemption

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 6 3 7 2 7 1 7 0 6 5 4 3 5 6 2 6 1 6 0 5 7 4 5 3 2 4 1 4 0 3 6 7 8 3 2 1 5 4 0 2 6 7 9 8 1 0 2 3 5 4 6 7 8 10 9 0 5 4 3 2 1 6 7 8 9 10 11 5 4 3 2 1 0 6 7 8 9 10 11 12 4 3 2 1 5 0 8 7 6 9 10 11 13 12 3 2 5 1 4 0 6 7 9 10 8 11 12 13 14 2 5 4 1 3 0 9 7 8 6 10 11 13 12 14 15 5 4 1 3 2 0 6 7 9 8 11 10 14 12 13 15 16 5 4 3 2 0 1 6 7 8 9 10 11 13 12 14 15 16 17 5 4 3 2 1 0 6 7 8 9 10 11 14 12 13 15 16 18 17 5 4 3 2 1 0 8 9 6 7 10 13 11 14 15 12 16 19 18 17 6 5 3 4 2 1 7 8 10 9 11 12 15 14 13 16 17 18 19 7 5 6 4 3 2 10 9 11 8 12 15 14 17 16 13 18 19 7 6 8 5 4 11 3 10 9 12 13 18 14 16 17 15 19 8 9 7 6 5 11 12 10 4 13 14 15 17 18 16 19 10 9 8 7 6 12 11 5 13 14 15 18 17 19 16 11 10 9 8 7 6 12 13 14 15 16 17 19 18 12 11 10 9 8 7 13 14 15 16 17 18 19 13 12 11 10 9 8 16 15 17 14 18 19 13 14 12 11 10 9 11 17 16 18 15 19 14 15 13 12 11 13 17 10 14 18 16 19 16 15 14 13 15 12 15 18 17 19 11 18 14 17 16 13 15 17 19 18 12 19 16 15 17 18 14 19 18 17 13 16 19 15 18 19 17 14 16 19 18 15 17 19 18 16 19 17 18 19 t0t1t2t3t4t5 t6 t7 t8 t9 t10 t11

(b) with transfer preemption

(17)

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 6 3 7 2 7 1 7 0 6 5 4 3 5 6 2 6 1 6 0 5 7 4 5 3 2 4 1 4 0 3 6 7 8 3 2 1 5 4 0 2 6 7 9 8 1 0 2 3 5 4 6 7 8 10 9 0 5 4 3 2 1 6 7 8 9 10 11 5 4 3 2 1 0 6 7 8 9 10 11 12 4 3 2 1 5 0 8 7 6 9 10 11 13 12 3 2 5 1 4 0 6 7 9 10 8 11 12 13 14 2 5 4 1 3 0 9 7 8 6 10 11 13 12 14 15 5 4 1 3 2 0 6 7 9 8 11 10 14 12 13 15 16 5 4 3 2 0 1 6 7 8 9 10 11 13 12 14 15 16 17 5 4 3 2 1 0 6 7 8 9 10 11 14 12 13 15 16 18 17 5 4 3 2 1 0 8 9 6 7 10 13 11 14 15 12 16 19 18 17 6 5 3 4 2 1 7 8 10 9 11 12 15 14 13 16 17 18 19 7 5 6 4 3 2 10 9 11 8 12 15 14 17 16 13 18 19 7 6 8 5 4 11 3 10 9 12 13 18 14 16 17 15 19 8 9 7 6 5 11 12 10 4 13 14 15 17 18 16 19 10 9 8 7 6 12 11 5 13 14 15 18 17 19 16 11 10 9 8 7 6 12 13 14 15 16 17 19 18 12 11 10 9 8 7 13 14 15 16 17 18 19 13 12 11 10 9 8 16 15 17 14 18 19 13 14 12 11 10 9 11 17 16 18 15 19 14 15 13 12 11 13 17 10 14 18 16 19 16 15 14 13 15 12 15 18 17 19 11 18 14 17 16 13 15 17 19 18 12 19 16 15 17 18 14 19 18 17 13 16 19 15 18 19 17 14 16 19 18 15 17 19 18 16 19 17 18 19 t0t1t2t3t4t5 t6 t7 t8 t9 t10 t11

(a) previous priority policy

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute Scompute 01 0 2 1 0 3 2 1 0 4 3 2 1 0 5 4 3 2 1 0 6 5 4 6 3 7 2 7 1 7 0 6 5 4 7 3 7 6 2 7 1 7 0 6 5 4 7 6 3 7 2 7 1 7 0 6 5 4 7 6 3 7 2 7 1 7 0 6 5 6 4 7 3 7 2 7 6 1 0 7 5 7 4 7 3 7 6 2 1 5 0 7 4 7 3 7 6 2 5 1 0 7 4 7 3 7 6 2 5 1 0 7 4 7 6 3 2 5 1 7 0 7 4 7 6 3 8 2 8 5 8 7 1 8 0 7 4 7 6 3 8 7 2 8 5 8 1 8 0 7 4 7 6 7 3 8 5 8 2 8 7 1 7 4 7 0 8 6 8 3 8 5 8 2 8 7 1 7 4 7 0 8 6 8 3 8 5 8 7 2 7 1 7 4 7 8 0 9 6 9 3 9 8 5 8 7 8 2 7 9 1 10 4 10 8 6 8 9 3 8 10 5 11 7 11 9 2 7 10 11 4 12 8 12 6 12 10 11 3 8 12 9 13 5 13 7 13 11 12 10 7 13 4 14 8 14 12 6 8 13 11 8 14 9 15 13 14 7 8 5 7 12 7 15 10 16 14 13 7 15 8 8 16 6 17 11 17 15 14 8 16 17 9 18 15 12 8 16 7 8 17 18 10 19 16 13 7 17 18 8 8 19 17 18 11 7 8 14 7 19 8 8 18 7 9 7 19 8 15 7 12 7 8 9 8 7 7 19 9 7 8 16 7 9 10 7 8 13 7 8 9 7 7 10 8 9 7 7 17 7 10 9 8 7 7 9 14 7 10 11 8 9 10 9 18 8 11 9 10 9 11 10 8 7 15 7 9 10 11 12 7 19 10 7 11 8 10 12 9 10 11 10 16 12 11 10 11 10 12 13 11 12 11 13 11 12 17 11 13 12 11 12 11 13 14 12 13 12 14 18 12 13 12 14 13 12 13 12 14 15 13 19 14 13 15 13 14 13 15 14 13 14 13 15 16 14 15 14 16 14 15 14 16 15 14 15 14 16 17 15 16 15 17 15 16 15 17 16 15 16 15 17 18 16 17 16 18 16 17 16 18 17 16 17 16 18 19 17 18 17 19 17 18 17 19 18 17 18 17 19 18 19 18 18 19 18 19 18 19 18 19 19 19 19 19 19 t0t1t2t3t4t5 t6 t7t8 t9t10 t11t12t13 t14t15 t16t17 t18t19 t20 t21 t22 t23

(b) transfer size priority policy

(18)

needed to do the actual transfers, but the synchronization between the mapper nodes and the master node may induce an unknown latency. Then we plan on implementing these algorithms with an iterative schedule recomputation. Indeed, as some parameters are unknown and/or platform dependent getting them as we get feedback from the computing nodes and adjusting the forecasted schedule and its platform counterpart. Starting from this, it could be quite easy to extend the model and software to handle iterative MapReduce computation where the next iteration starts as soon as possible.

Références

[1] D.P. Anderson. BOINC : A system for public-resource computing and storage. In Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on, pages 410. IEEE, 2004.

[2] J. Berlinska and M. Drozdowski. Scheduling divisible MapReduce computations. Journal of Parallel and Distributed Computing, 71(3) :450459, March 2010.

[3] Veeravalli Bharadwaj. Scheduling divisible loads in parallel and distributed systems. Wiley-IEEE Computer Society Pr, Los Alamitos-Washington-Brussels-Tokyo, 1996.

[4] Eddy Caron and Frédéric Desprez. Diet : A scalable toolbox to build network enabled servers on the grid. International Journal of High Performance Computing Applications, 20(3) :335, 2006.

[5] CERN. Worldwide LHC Computing Grid, 2011.

[6] Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly. Dryad : distributed data-parallel programs from sequential building blocks. In Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 5972. ACM, 2007.

[7] Dean Jerey and Ghemawat Sanjay. MapReduce : Simplied data processing on large clusters. In In OSDI'04 : Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, volume 51. USENIX Association, 2004.

[8] Jiahui Jin, Junzhou Luo, Aibo Song, Fang Dong, and Runqun Xiong. BAR : An Ecient Data Locality Driven Task Scheduling Algorithm for Cloud Computing. In Cluster, Cloud and Grid Computing (CCGrid), 2011 11th IEEE/ACM International Symposium on, pages 295304. IEEE, 2011.

[9] LSST. Large Synoptic Survey Telescope.

[10] Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra. MPI-The Complete Reference, Volume 1 : The MPI Core. MIT Press, Cambridge, MA, USA, 2nd. (revi edition, 1998.

[11] Yahoo ! Hadoop at Yahoo !, 2012. [12] Paul Yang. Facebook uses Hadoop, 2011.

Références

Documents relatifs

In this paper I will investigate the relation between the affective dimension of these normative pressures and their moral dimension by arguing that an important moral

I will consider h ow individuals make sense of concepts like surrendering to their disease and reaching out to a 'higher power' as well as their experiences with performing each of

However, the 724 indigenous malaria cases in 2017 is nearly double the number reported in 2016, demonstrating the difficulty in achieving elimination even in a

(Je n'ai pas besoin de faire cela.. PETER: 'I'm sorry.. Modaux - verbes défectifs, révisons les pas à pas. Les modaux, ces verbes défectifs. Révisons pas

Therefore, this paper presents the software architecture of CrizLAB TM , a middleware based on Event- Driven Architecture that proposes a unified approach

Their original findings showed that specific executive functions (e.g., flexibility) were more sensitive to high intensity interval training as compared to regular endurance training

• even taking into account the right to be forgotten as it is defined in the GDPR, the decision does not seem to be applying it, because under Article 17 of the GDPR the data

Using the separator algebra inside a paver will allow us to get an inner and an outer approximation of the solution set in a much simpler way than using any other interval approach..