PDF

(1)

Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference

K. Sel¸cuk Candan Arizona State University, USA

Sihem Amer-Yahia CNRS – LIG, France Nicole Schweikardt

University of Frankfurt, Germany

Vassilis Christophides

University of Crete & FORTH-ICS, Greece Vincent Leroy

University of Grenoble – CNRS, France

March 28th, 2014

(2)

International Conference on Extending Database Technology (EDBT) and International Conference on Database Theory (ICDT) are two prestigious forums for the exchange of the latest research results in data management and the theoretical foundations of database systems. While having the same overarching goal of presenting cutting-edge results, ideas, techniques, and theoretical advances in databases, the workshops of the EDBT/ICDT joint conference are separately tasked by focusing on emerging topics that complement the areas covered by the main technical program.

This year, our program includes workshops focusing on eight exciting topics:

• Algorithms for MapReduce and Beyond (BeyondMR) workshop, aiming to explore algorithms and computational models for systems that need large scale parallelization and systems designed to support efficient parallelization and fault tolerance,

• Bidirectional Transformations (BX) workshop, bringing together researchers and practitioners, established and new, interested in bidirectional transformations from different perspectives,

• Energy Data Management (EnDM) workshop, focusing on conceptual and system architecture issues related to the management of very large-scale data sets specifically in the context of the energy domain,

• Exploratory Search in Databases and the Web (ExploreDB) workshop, aiming to promote novel discovery methods that provide highly expressive discovery capabilities over large amounts of entity-relationship data, which are yet intuitive for end-users,

• Linked Web Data Management (LWDM) workshop, aiming at stimulating participants to discuss about data management issues related to the Linked Data and the relationships with other Semantic Web technologies, and at the same time proposes a glance at new issues,

• Multimodal Social Data Management (MSDM) workshop, bringing together experts in social network analysis, natural language processing, multimodal data management and integration, scalable data analysis, machine learning, to discuss how research contributions in different computer science areas can help better explain social data and build new applications,

• Privacy and Anonymity in the Information Society (PAIS) workshop, which provides a platform for researchers and practitioners from computer science and other fields that are interacting with computer science in the privacy area, such as statistics, healthcare informatics, and law, to discuss and present current research challenges and advances in data privacy and anonymity research, and

• Querying Graph Structured Data (GraphQ) workshop, which aims to encourage discussions about how to efficiently and effectively support graph queries in different application domains and seeks to provide the opportunity for cross-fertilization amoing teams working on graph-structured data, with a particular focus on the querying issues.

(5)

Before concluding, we would like to acknowledge those who have contributed to the success of the workshops program. First of all, we would like to thank all workshop organizers who have put together an exciting program as well as to all authors who submitted their works to the workshops.

We specially thank the authors of the accepted papers and the invited speakers who presented their works in the workshops program. Needless to say, we are grateful to the members of the workshop program committees and external reviewers who have helped put together a high-quality workshops program and we would like to acknowledge the conference organizers and many student volunteers for their invaluable help at various stages of the process. We would also like to give our thanks to the sponsors who have financially supported the workshops and the editors of the CEUR Workshop Proceedings (CEUR-WS.org) who have agreed to host these proceedings.

Sincerely,

K. Selçuk Candan, Workshops Chair

Sihem Amer-Yahia and Nicole Schweikardt, EDBT and ICDT Program Chairs Vassilis Christophides, General Chair

(6)

Algorithms for MapReduce and Beyond (BeyondMR)

Foto N. Afrati (National Technical University of Athens, Greece) Phokion G. Kolaitis (UC Santa Cruz & IBM Research, USA)

Jeffrey D. Ullman (Stanford University, USA)

(7)

Scheduling MapReduce Jobs on Unrelated Processors

^∗

D. Fotakis

National Technical University of Athens

[email protected]

I. Milis

Athens University of Economics and Business

[email protected]

E. Zampetakis

National Technical University

of Athens

[email protected] G. Zois

Université Pierre et Marie Curie and Athens University of

Economics and Business

[email protected] ABSTRACT

MapReduce framework is established as the standard approach for parallel processing of massive amounts of data. In this work, we extend the model of MapReduce scheduling on unrelated processors (Moseley et al., SPAA 2011) and deal with the practically important case of jobs with any number of Map and Reduce tasks. We present a polynomial-time (32 +)-approximation algorithm for minimizing the total weighted completion time in this setting. To the best of our knowledge, this is the most general setting of MapReduce scheduling for which an approximation guarantee is known.

Moreover, this is the first time that a constant approximation ratio is obtained for minimizing the total weighted completion time on unrelated processors under a nontrivial class of precedence constraints.

Keywords

MapReduce, Scheduling, Unrelated Processors

1. INTRODUCTION

Scheduling in MapReduce environments has become in- creasingly important during the last years, as MapReduce has been established as the standard programming model to implement massive parallelism in large data centers [5].

Applications of MapReduce such as search indexing, web analytics and data mining, involve the concurrent execution of several MapReduce jobs on a system like Google’s MapReduce or Apache Hadoop. When a MapReduce job is executed, a number of Map and Reduce tasks are created.

∗This work was supported by the project Handling Uncer- tainty in Data Intensive Applications, co-financed by the European Union (European Social Fund - ESF) and Greek national funds, through the Operational Program ”Educa- tion and Lifelong Learning”, under the program THALES, and by the project Heracleitus II.

(c) 2014, Copyright is with the authors. Published in the Workshop Pro- ceedings of the EDBT/ICDT 2014 Joint Conference (March 28, 2014, Athens, Greece) on CEUR-WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the terms of the Creative Commons license CC- by-nc-nd 4.0.

Each Map task operates on a portion of the input elements, translating them into a number of key-value pairs. Next, all key-value pairs are transmitted to the Reduce tasks, so that all pairs with the same key are available together at the same task. The Reduce tasks operate on the key-value pairs, combine the values associated with a key, and generate the final result. In addition to the many practical applications of MapReduce, there has been a significant interest in developing appropriate cost models and a computational complexity theory for MapReduce computation (see e.g., [3, 6]), in understanding the basic principles underlying the design of efficient MapReduce algorithms (see e.g., [1, 7]), and in obtaining upper and lower bounds on the performance of MapReduce algorithms for some fundamental computational problems (see e.g. [2] and the references therein).

Motivation and Previous Work. Many important ad- vantages of MapReduce are due to the fact that the Map tasks or the Reduce tasks can be executed in parallel and essentially independent from each other. However, to best exploit massive parallelism available in typical MapReduce systems, one has to carefully allocate and schedule Map and Reduce tasks to actual processors (or computational resources, in general). This important and delicate task is performed in a centralized manner, by a process running in the master node. A major concern of the scheduler, among others, is to satisfy task dependencies within the tasks of the same MapReduce job;all the Map tasks must finish before the execution of any Reduce task of the same job. During the assignment and scheduling process, a number of different needs must be taken into account, e.g., transferring of the intermediate data (shuffle), data locality, and data skew, which give rise to the study of new scheduling problems.

Despite the importance and the challenging nature of scheduling in MapReduce environments, and despite the extensive investigation of a large variety of scheduling problems in parallel computing systems (see e.g., [13]), less attention has been paid to MapReduce scheduling problems. In fact, most of the previous work on scheduling in MapReduce systems concerns the experimental evaluation of scheduling heuris- tics, mostly from the viewpoint of finding good trade-offs between different objectives (see e.g., [14]). From a theoretical viewpoint, only few results on MapReduce scheduling have appeared so far [11, 4].These are based on simplified ab- stractions of MapReduce scheduling, closely-related to some variants of the classical Open Shop and Flow Shop scheduling models, that capture issues such as task dependencies,

(8)

data locality, shuffle, and task assignment, under the key objective of minimizing the total weighted completion time of a set of MapReduce jobs.

In this direction, the theoretical model of Moseley et al. [11]

generalizes a variant of the Flow Shop scheduling model, referred to as 2-stage Flexible Flow Shop (FFS), which is known to be strongly N P-hard, even for jobs of a single Map and Reduce task and a single map and reduce processor (see in [11]). They consider the cases of both identical and unrelated processors and the goal is to minimize the total completion time of the jobs. For identical processors, they present a 12-approximation algorithm, and aO(1/²)- competitive online algorithm, for any∈(0,1), under the assumption that the processors used by the online algorithm are 1 +times faster than the processors used by the optimal schedule. Since the identical processors setting fails to capture issues as data locality and to model communication costs between the Map and the Reduce tasks, Moseley et al.

also consider the case of unrelated processors, which provides a more expressive theoretical model of scheduling in MapReduce environments. Nevertheless, they only consider the very restricted (and practically not so interesting) case where each job has a single Map and a single Reduce task, and present a 6-approximation algorithm and a O(1/⁵)- competitive online algorithm, for any∈(0,1), under the assumption that the processors of the online algorithm are 1 +times faster.

A similar model of MapReduce scheduling so as to minimize the total completion time was proposed by Chen et al. [4]. In contrast with the model of [11], they assume that tasks are preassigned to processors and, in this restricted setting, they present an LP-based 8-approximation algorithm.

Moreover, they deal with the shuffle phase in MapReduce systems and present a 58-approximation algorithm.

Contribution and Results. We adopt the theoretical model of [11] and consider MapReduce scheduling on unrelated processors. However, departing from [11], we deal with the general (and practically interesting) case where each job has any number of Map and Reduce tasks and we succeed in obtaining a polynomial-time constant approximation algorithm for minimizing the total weighted completion time.

More specifically, we consider a set of MapReduce jobs to be executed on a set of unrelated processors. Each job consists of a set of Map tasks, that can be executed only on map processors, and a set of Reduce tasks, that can be executed only on Reduce processors. Each task has a different processing time for each processor and is associated with a positive weight, representing its importance. All jobs are available at time zero. Map or Reduce tasks can run simultaneously on different processors and, for each job, every Reduce task can start its execution after the completion of all the job’s Map tasks. The goal is to find an assignment of the tasks to processors and schedule themnon-preemptively so as to minimize their total weighted completion time.

In terms of classical scheduling, the model we consider in this work is a special case of total weighted completion time minimization on unrelated processors under precedence constraints. Despite its importance and generality, only few results are known for this problem. These results concern only the case of treelike precedence constraints [8]. More specifically, in [8], Kumar et al. propose a polylogarith- mic approximation algorithm for the case where the undi- rected graph underlying the precedence constraints is a for-

est (a.k.a. treelike precedences). Their algorithm is based on a reduction from total weighted completion time minimization to an appropriate collection of makespan minimization problems. Based on ideas of [8], we present a (32+)-approximation algorithm for this problem that operates in two steps. In the first step, our algorithm computes a (8 +)-approximation schedule for the Map tasks (resp. Re- duce tasks) by combining a time indexed LP-relaxation of the problem with a well-known approximation algorithm for the makespan minimization problem on unrelated processors [9]. In fact, the makespan minimization algorithm runs on each time interval of the LP solution and computes an assignment of the Map (resp. Reduce) tasks to processors.

In the second step, based on an idea from [11], we merge the two schedules, produced for the Map tasks and the Reduce tasks, into a single schedule that respects the precedence constraints. Using techniques from [11], we show that the merging step increases the approximation ratio by a factor of at most 4.

On the practical side, the theoretical model of [11] for MapReduce scheduling on unrelated processors deals with the most of the important aspects of the problem. So, considering jobs with any number of Map and Reduce tasks in this model is particularly important for practical applications, since the basic idea behind MapReduce computation is that each job is split into a large number of Map and Re- duce tasks that can be executed in parallel (see e.g., [3, 6, 1, 2]). On the theoretical side, to the best of our knowledge, this is the first time that a constant approximation ratio is obtained for the problem of minimizing the total weighted completion time on unrelated processors under a nontrivial class of precedence constraints.

Notation. We consider a setJ ={1,2, . . . , n}ofnMapRe- duce jobs to be executed on a setP = {1,2, . . . , m}of m unrelated processors. Each job is available at time zero, is associated with a positive weightwjand consists of a setM of Map tasks and a setRof Reduce tasks. Each task is denoted byT^k,j∈ M∪R, wherek∈N is the task index of job j ∈ J and is associated with a vector of non-negative processing times{pi,k,j}, one for each processori∈ P^b, where b∈ {M,R}. LetPM andPR be the sets of map and reduce processors respectively. Each job has at least one Map and one Reduce task that can run simultaneously on different processors and every Reduce task can start its execution after the completion of all Map tasks of the same job.

For a given schedule we denote byCj andCk,j the completion times of each jobj∈ J and each taskT^k,j∈ M ∪ R respectively. Note that, due to the precedence constraints between Map and Reduce tasks,Cj= maxTk,j∈R{Ck,j}. By Cmax= maxj∈J{C^j}we denote the makespan of the schedule, i.e., the completion time of the job which finishes last.

Our goal is to schedule non-preemptively all Map tasks on processors ofPMand all Reduce tasks on processors ofPR, with respect to their precedence constraints, so as to minimize the total weighted completion time of the schedule, i.e.,P

j∈JwjCj. We refer to this problem asMapReduce schedulingproblem.

2. A CONSTANT APPROXIMATION ALGO- RITHM

In this section, we present a (32 +)-approximation algorithm, for∈(0,1), executed in the following two steps:

(9)

(i) it computes a (8 +)-approximate schedule for assigning and scheduling all Map tasks (resp. Reduce tasks) on processors of the setPM(resp. PR) and (ii) it merges the two schedules in one, with respect to the precedence constraints between Map and Reduce tasks of each job, increasing the approximation ratio by a factor of 4.

2.1 Scheduling Map and Reduce Tasks

Next, we propose an algorithm for the problem of minimizing the total weighted completion time of all Map (resp.

Reduce) tasks on processors of the setPM(resp. PR). For notational convenience, we use a dual variableb∈ {M,R}

to refer on either Map or Reduce sets of tasks.

We define (0, tmax = P

Tk,j∈bmaxi∈Pbpi,k,j] to be the time horizon of potential completion times, wheretmaxis an upper bound on the makespan of a feasible schedule. We dis- cretize the time horizon into intervals (1,1],(1,(1 +δ)],((1 + ),(1 +δ)²], . . . ,((1 +δ)^L−1,(1 +δ)^L], where δ ∈(0,1) is a small constant, and L is the smallest integer such that (1 +δ)^L⁻¹ ≥ tmax. Let I` = ((1 +δ)^`⁻¹,(1 +δ)^`], for 0≤`≤L, andL={0,1,2, . . . , L}. Note that, the number of intervals is polynomial in the size of the instance and to 1/δ. For each processori ∈ P^b, taskTk,j ∈ band` ∈ L, we introduce a variableyi,k,j,` that denotes the fraction of taskTk,j assigned to processori in time interval I`. Fur- thermore, for each task Tk,j ∈ T, we introduce a variable Ck,j corresponding to its completion time, and a variable zk,j corresponding to its fractional processing time. For every jobj ∈ J, we also introduce a dummy taskDj, with zero processing time on every processor, which has to be processed after the completion of every other taskTk,j ∈b.

LP(b) is an interval-indexed linear programming relaxation of our problem.

LP(b) : minimizeX

j∈J

wjDj

subject to : X

i∈Pb,`∈L

yi,k,j,`= 1, ∀T^k,j∈b (1)

zk,j= X

i∈Pb

pi,k,j

X

`∈L

yi,k,j,`, ∀T^k,j∈b (2) CD_j≥Ck,j+zk,j, ∀j∈ J,T^k,j∈b (3)

X

i∈Pb

X

`∈L

(1 +δ)^`⁻¹yi,k,j,`≤Ck,j≤ X

i∈Pb

X

`∈L

(1 +δ)^`yi,k,j,`,

∀T^k,j∈b (4) X

Tk,j∈b

pi,k,j

X

t≤`

yi,k,j,t≤(1 +δ)^`, ∀i∈ P^b, `∈ L (5) pi,k,j>(1 +δ)^`⇒yi,k,j,`= 0, ∀i∈ P^b,T^k,j∈b, `∈ L (6) yi,k,j,`≥0, ∀i∈ P^b,T^k,j∈b, `∈ L (7) Our objective is to minimize the sum of weighted completion times of all jobs. Constraint (1) ensures that each task is entirely assigned to processors of the set P^b and constraint (2) defines its fractional processing time. Con- straint (3) ensures that, for each job j ∈ J, the completion of each taskT^k,j precedes the completion of taskDj. Constraint (4) adapts a lower and an upper bound on the completion time of each task. For each` ∈ L, constraints (5) and (6) are validity constraints which state that the total fractional processing time on each processor is at most

(1 +δ)^`, and that if it takes time more than (1 +δ)^`to process a taskTj,kon a processori∈ P^b, thenTk,j should not be scheduled oni, respectively.

Assignment and Scheduling. Let (¯yi,k,j,l,z¯k,j,C¯k,j) be an optimal (fractional) solution toLP(b). For each 2≤`≤L, we define the set of tasksS(`) ={Tk,j∈b|(1 +δ)^`⁻²/2≤ C¯k,j ≤(1 +δ)^`⁻¹/2}, that complete their execution within the intervalI`. By definition, for each taskT^k,j ∈S(`), it must hold that 2(1 +δ) ¯Ck,j≤(1 +δ)^`.

We will assign all jobs of each setS(`) to processors inP^b according to the following algorithm.

AlgorithmMakespan

1: Compute a basic feasible solution (¯xi,k,j) toLP(T^?, b).

2: Assign all tasks having integral values to processors of P^bas in (¯xi,k,j).

3: Let a graphG = (A∪ P^b, E), whereA= {T^k,j |0<

xi,j,k<1}andE={{T^k,j, i} | T^k,j∈A, i∈ P^band 0<

xi,k,j<1}. Compute a perfect matchingMonG.

4: Assign eachT^k,j∈Atoi∈ P^b, as indicated byM.

5: foreach assigned taskT^k,j do

6: ScheduleT^k,j as early as possible, non-preemptively, with processing time pi,k,j on processor i∈ P^b that is assigned to. LetCk,j be the completion time ofT^k,j. AlgorithmMakespanhas been proposed in a seminal paper by Lenstra et al. [9] and it is based on the so-called parametric pruning technique in an LP setting. More specifically, ifT is an estimation on the optimal makespan of a schedule of the jobs inS(`), then by pruning away all task- processor pairs for which pi,k,j > T, we are able to define a set of variables corresponding only to triples of the set Q^T ={(i, k, j)|pi,k,j ≤T}; note that this pruning process has been already taken under consideration by constraints (6) ofLP(b). Since T ∈ ∪`⁰≤`I`⁰, using binary search on

∪^`⁰≤`I`0 withT as the search variable, we can find the minimum value of T such that the following system of linear constraints is feasible.

LP(b, T) : X

i:(i,k,j)∈QT

xi,k,j= 1 ∀Tk,j∈b (8)

X

T_k,j:(i,k,j)∈QT

xi,k,jpi,k,j≤T ∀i∈ P^b (9)

xi,k,j≥0 ∀(i, k, j)∈ Q^T

Each variable xi,k,j denotes the fractional processor assignment of each taskT^k,j ∈S(`). Now, ifT^? is the minimum value for whichLP(b, T) is feasible, thenT^?is a lower bound on the optimal integral makespan.

Similarly as in [9], it can be proved that a basic feasible solution to LP(b, T) has at most|b|+|P^b| non-zero variables, from which at least|b| − |P^b|, must be set integrally.

Then, the number of fractionalxi,k,jvalues must be at most 2|P^b|. If we formulate a bipartite graphG = (A∪ P^b, E), whereAis the set of tasks having fractionalxi,k,jvalues and E={{T^k,j, i} | T^k,j ∈A, i∈ P^band 0< xi,k,j <1}, then, according to the latter property, we deduce thatGis a con- nected graph with at most 2|P^b|vertices and at most 2|P^b| edges. However, this means thatG has the special topol- ogy of a pseudo-forest (a collection of trees with one possi-

(10)

ble extra edge) which enables the computation of a perfect matching on it. Hence, by executing steps 2-6 of Algorithm Makespan, a non-preemptive schedule of tasks inS(`) can be found.

The following lemma provides a tight upper bound on the makespan of the schedule computed by AlgorithmMakespan.

Lemma 1. AlgorithmMakespanis a 2-approximation algorithm for scheduling the tasks of the setS(`)so as to minimize their makespan.

In the next lemma, using filtering [10] we modify theyi,k,j,`

values of the solution toLP(b) to find an upper bound on the value ofT^∗.

Lemma 2. Consider a feasible solution toLP(b, T). For each set of jobsS(`)that complete their execution within the intervalI`, it holds thatT^?≤2(1 +δ)^`, forδ∈(0,1).

As consequence of filtering in Lemma 2 the completion time of each task inS(`) is increased by a factor of 4; this result has already proven to be tight (see Section 2 in [12]).

AlgorithmTaskScheduling(b)

1: Compute an optimal solution (¯yi,k,j,l,¯zk,j,C¯k,j) to LP(b).

2: foreach`∈ Ldo

3: compute S(`) = {Tk,j ∈b|(1 +δ)^`⁻²/2≤C¯k,j ≤ (1 +δ)^`⁻¹/2}

4: foreach`such thatS(`)6=∅do

5: Schedule all tasks in S(`) by running Algorithm Makespan.

Running AlgorithmTaskScheduling(b), we compute a schedule for all Map (resp. Reduce) tasks such that:

Theorem 1. TaskScheduling(b) is a(8+ε)-approximation algorithm, for scheduling a set of Map (Reduce) tasks on a set of unrelated processorsPM (PR), in order to minimize their total weighted completion time, forε∈(0,1).

Proof Sketch. Let Ck,j be the completion time of a taskTk,j∈S(`), in the schedule of AlgorithmTaskSchedul- ing(b) and letCmax(`) be the makespan of the schedule of Algorithm Makespanon the jobs in S(`). Since, Ck,j ≤ Cmax(`), for all T^k,j ∈ b, it suffices to prove that Ck,j ≤ 8(1 +δ)²C¯k,j: we combine Lemma 1 and Lemma 2 with the definition of the setS(`). Then, as we can select anεsuch that (1 +δ)²≤(1 +ε), the theorem follows. Note that this ratio is tight.

2.2 Merging Task Schedules

Letσ_M, σ_R be two schedules computed by two runs of Algorithm TaskScheduling(b), for b = M and b = R, respectively. Let also Cj^σ^M = maxTj,k∈M{Ck,j}, Cj^σ^R = maxTj,k∈R{C^k,j} be the completion times of the all Map and all Reduce tasks of a job j ∈ J within these schedules, respectively. Depending on these completion time values, we assign each job j ∈ J a width equal to ωj = max{Cj^σ^M, Cj^σ^R}. The following algorithm computes a feasible schedule.

Algorithm MRS. In each time instant where a processor i∈ P^bbecomes available, either it processes the Map task, assigned toi∈ PMinσM, with the minimum width, or the

available (w.r.t. its precedence constraints) Reduce task, assigned toi∈ PRinσR, with the minimum width.

By an analysis similar to that in [11], we can prove that:

Theorem 2. AlgorithmMRSis a(32+)-approximation for theMapReduce schedulingproblem, for∈(0,1).

Proof Sketch. By execution of Algorithm MRS, the feasibility of the resulted schedule can be easily verified.

To prove the theorem, it suffices to prove that in such a schedule,σ, all tasks of a jobj∈ J are completed by time 2 max{Cj^σ^M, Cj^σ^R}. LetCj^σ, be the completion time of a job j∈ J inσ. Note that, for each of the Map tasks ofj, their completion time is upper bounded byωj. On the other hand, the completion time of each Reduce task is upper bounded by a quantity equal tor+ωj, wherer is the earliest time when the task is available to be scheduled inσ. However, r= C_j^σ^M ≤ωj and thus Cj^σ ≤2ωj = 2 max{C_j^σ^M, C_j^σ^R}. By applying Theorem 1 and as we can select ansuch that ≤4ε, the theorem follows.

3. REFERENCES

[1] F. Afrati, D. Fotakis, and J. Ullman. Enumerating subgraph instances using MapReduce.IEEE-ICDE:

62-73, 2013.

[2] F. Afrati, A. D. Sarma, S. Salihoglu, and J. Ullman.

Upper and Lower Bounds on the Cost of a

MapReduce Computation.VLDB: 6(4):277-288, 2013.

[3] F. Afrati and J. Ullman. Optimizing multiway joins in a map-reduce environment.IEEE-TKDE:

23(9):1282-1298, 2011.

[4] F. Chen, M. S. Kodialam, and T. V. Lakshman. Joint scheduling of processing and shuffle phases in mapreduce systems.INFOCOM: 1143-1151, 2012.

[5] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters.OSDI: 137-150, 2004.

[6] H. Karloff, S. Suri, and S. Vassilvitskii. A Model of Computation for MapReduce.SODA: 938-948, 2010.

[7] R. Kumar, B. Moseley, S. Vassilvitskii, and

A. Vattani. Fast greedy algorithms in MapReduce and streaming.ACM-SPAA: 1-10, 2013.

[8] V. S. A. Kumar, M. V. Marathe, S. Parthasarathy, and A. Srinivasan. Scheduling on unrelated machines under tree-like precedence constraints.

Algorithmica: 55(1):205-226, 2009.

[9] J. K. Lenstra, D. B. Shmoys, and ´E. Tardos.

Approximation algorithms for scheduling unrelated parallel machines.Mathematical Programming:

46:259-271, 1990.

[10] J. Lin and J. S. Vitter. epsilon-approximations with minimum packing constraint violation.SODA: pages 771–782, 1992.

[11] B. Moseley, A. Dasgupta, R. Kumar, and T. Sarl´os.

On scheduling in map-reduce and flow-shops.

ACM-SPAA: 289-298, 2011.

[12] J. R. Correa and M. Skutella and J. Verschae. The Power of Preemption on Unrelated Machines and Applications to Scheduling Orders.Math. Oper. Res.:

379-398, 2012.

[13] M. Pinedo.Scheduling: theory, algorithms, and systems.Springer, 2012.

[14] D.-J. Yoo and K. M. Sim. A comparative review of job scheduling for mapreduce.IEEE-ICCIS: 353-358, 2011.

(11)

Binary Theta-Joins using MapReduce:

Efficiency Analysis and Improvements

Ioannis K. Koumarelas

Dept. of Informatics Aristotle University Thessaloniki, Greece

[email protected]

Athanasios Naskos

[email protected]

Anastasios Gounaris

[email protected] ABSTRACT

We deal with binary theta-joins in a MapReduce environment, and we make two contributions. First, we show that the best known algorithm to date for this problem can reach the optimal trade-oﬀ between the size of the input a reducer can receive and the incurred communication cost when the join selectivity is high. Second, when the join selectivity is low, we present improvements upon the state-of-the-art with a view to decreasing the communication cost and the maximum load a reducer can receive, taking also into account the load imbalance across the reducers.

1. INTRODUCTION

Data analysis on voluminous data, such as clickstream data or data derived from scientific experiments and simula- tions, has given rise to the establishment of MapReduce as the most popular framework for large-scale processing. An- alytical database queries remain a useful tool for big data analyses; however, such queries are being investigated in the MapReduce context rather than within a traditional DBMS environment. Analytical query processing in MapReduce has attracted a lot of interest, and the relevant work has investigated several issues, including indexing, data placement and layouts, optimizations, iterative processing, fair load al- location and interactive processing to name some of them [5]. In this work, we focus on improving the efficiency of join queries executed in MapReduce, for which several proposals already exist [7, 2, 9]. More specifically, we target binary theta-joins, where the join condition between two datasets is arbitrarily complex rather than a simple equation.

Nevertheless, most of the proposals to date tend to be developed on a best-eﬀort basis, without systematically an- alyzing the inherent trade-oﬀs. Two recent remedies to that have been proposed in [1, 8]. [8] introduces the notion of minimal MapReduce algorithms, which are algorithms accompanied by guarantees (up to a small constant) regarding several aspects, such as memory consumption and communication cost. The MapReduce rounds may be bounded but

they can be more than one. The work in [1] is complemen- tary and presents a way to compute the lower bounds on communication cost as a function of the maximum input a reducer is allowed to receive for specific problems. This al- lows to define the trade-off between the load on the reducer side and thereplication rate. The replication rate is defined as the average ratio of output to input key-value pairs on the map side, and is used as a metric of the communication cost. Further, the work in [1] examines whether known algorithms for those problems can match the lower bounds, provided that they consist of a single MapReduce round.

The algorithms 1-Bucket-Theta and M-Bucket in [7] form the basis of our work. Our first contribution is that we analyze the lower bounds for the binary theta-join problem and we show that the worst-case behaviour of 1-Bucket- Theta matches those bounds. However, such behaviour is expected only when the join selectivity is high. For low selectivities, and with the help of histograms, the more efficient M-Bucket-I and M-Bucket-O algorithms are presented in [7], which aim at minimizing the maximum reducer input and output, respectively. Our second contribution is that we enhance those algorithms through the clustering of histogram buckets. In that way, we can achieve more efficient partitioning of histogram buckets to reducers. The efficiency is measured in terms of the replication rate, the maximum reducer input, and the imbalance across reducers. We show that we can improve the replication rate (i.e., reduce the communication cost) and the maximum reducer input (i.e., reduce the longest running time and the space requirements of reducers) with insignificant impact on load imbalance.

The remainder of this extended abstract is structured as follows. In Sec. 2 we brieﬂy present the 1-Bucket-Theta and M-Bucket algorithms, which we analyze in Sec. 3 and enhance in Sec. 4, respectively. In Sec. 5, we conclude and describe next steps.

2. BACKGROUND

In [7] the problem of performing binary theta joinsS ◃▹θT on MapReduce is studied. The core of the approach lies in how the workload is partitioned across reducers. To rep- resent the workload, a join matrix (JM) is used. In JMs, each cell corresponds to a pair of tuples, one from each input dataset, to be processed. The JM is split into several regions, where each region is mapped to a reducer. For each region, we can compute the amount of tuples that belong to it, which is theinput cost of that region and is directly related to the computation and memory load of the associated reducer. For perfect load balancing, we want these regions to

(12)

Figure 1: Partitioning the JM in 1-Bucket-Theta (left) and M-Bucket (right).

have equal input cost. In order to accomplish the latter objective, two main algorithms are presented: 1-Bucket-Theta and M-Bucket-I (and its variation M-Bucket-O).

2.1 1-Bucket-Theta

1-Bucket-Theta is the most generic algorithm, since it examines all tuple pairs (as in the Cartesian product), and re- quires minimal statistical information, namely just the car- dinalities of the input. The strong point of the algorithm is the principled way that it partitions the JM, in a way that all JM cells are covered and, at the same time, the maximum reducer input is minimized. The algorithm is shown to be more suitable for high join selectivities (e.g., above 50%). Fig. 1(left) shows an example partitioning across 3 reducers, where there are 6 tuples fromS and T, and the input cost of each reducer is 7 (4 tuples fromS and 3 from T), 7 (4 fromS and 3 fromT) and 8 (2 fromS and 6 from T), respectively.

2.2 M-Bucket-I

In cases where there are histograms, so that we can safely reason as to whether a speciﬁc combination of tuples can satisfy the join condition, and the join selectivity is small, M-Bucket-I outperforms 1-Bucket-Theta. The histograms are equi-depth ones and are produced in a separate MapRe- duce phase, as explained in [7]. Then, the JM is constructed, where each cell corresponds to a pair of histogram buckets rather than a pair of tuples. As such, the size of a JM need not grow as the size of the input data increases at the expense of histogram buckets of higher depth. From the JM and the join condition, it is straightforward to identify pairs that do not contribute to the result (depicted as white cells in Fig. 1(right). During the partitioning step, a heuristic method is followed, which is not accompanied by guarantees as in 1-Bucket-Theta but yields better results, since it ben- eﬁts from the fact that most of the JM cells are not valid candidate pairs.

The diﬀerence between M-Bucket-I and M-Bucket-O is that the former targets the minimization of the maximum reducer input, whereas the latter targets the minimization of the maximum reducer output. Note that estimating the reducer output based on histograms is prone to signiﬁcant errors, even when the histograms are accurate.

3. ON THE OPTIMALITY OF 1-BUCKET- THETA

First we deﬁne the lower bound on the communication cost of any 1-round MapReduce algorithm for binary theta-

joins. As already mentioned, the communication cost is measured using the replication rate metric. Let us examine the steps of the short version of the generic recipe for deriving such bounds from [1]. Given two relations S and T, with sizes|S|and|T|, respectively, we have:

• Size (Number)of Inputs and Outputs:

– Inputs: |S|+|T|

– Outputs: |S||T| (accounting for the worst case, which is the cartesian product)

• Derivingg(q): The upper bound of outputs a reducer can produce given q inputs, denoted as g(q), occurs whenqis equally divided into input from S and T, i.e.,

q

2 tuples fromS and ^q₂ tuples fromT. The maximum result of applying the theta join on these two quantities is when we have a cartesian product, thusg(q) = ^q₄².

• Replication Rate r(q): The quantity ^g(q)_q equals

q2 4

q = ^q₄, which is monotically increasing inq. There- fore, the replication rate can be computed using the formula:r(q)≥ _g(q)I^q^|^O^| =_q(|S|+|T|)⁴⁽^|^S^||^T^|⁾ .So, the lower bound onr,rlb, is _q(⁴⁽_|_S^|^S_|^||₊^T_|_T^|⁾_|₎.

The above formula illustrates the exact trade-oﬀ between parallelism and communication cost in binary theta-joins.

By increasing the degree of parallelism in order to decrease the input q each reducer receives, the communication cost increases, since, for the lower bound, q andr are inversely proportional to each other.

The next step is to ﬁnd the upper bound on replication rate of 1-Bucket-Theta. In [7], three partitioning cases are presented, based on the sizes|S|and|T|and the number of available reducer processorsp. Due to the limited space, we will examine only the ﬁrst case in detail.

The ﬁrst case corresponds to the scenario, where the JM can be exactly covered by cS ×cT squares of side-length

√|S||T|/p. This means that the following conditions hold:

|S|=cS

√|S||T|/pand|T|=cT

√|S||T|/p, wherecS, cT are positive integers. For example, ifp= 4, then the JM in Fig.

1(left) can be exactly covered by 4 squares of side-length 3.

Then we have:

• Replication rate of 1-Bucket-Theta (r1BT):

r1BT ≤^|^S^|_|^cS^T|+⁺^||^TT^||^c^S =

|S||T|

√|S||T|

p

+√^|^T^||^S^|

|S||T| p

|S|+|T|

= ²^|^S^||^T^|

(

√

|S||T|

p )(|S|+|T|)

• Reducer input: q1BT = 2√

|S||T|

p

• Combiningr1BT andq1BT: r1BTq1BT ≤( ²^|^S^||^T^|

(

√|S||T|

p )(|S|+|T|)

)(2√

|S||T|

p ) = 4_|_S^|^S_|₊^||^T_|_T^|_| which implies thatr1BT(q1BT)≤rlb

So, the upper bound of the ﬁrst case of 1-Bucket-Theta is at most as high as the lower bound of the problem, which means that, for that case, the algorithm is optimal.

(13)

Following the same reasoning, the other two cases (Theo- rems 2 and 3 in [7], respectively), which correspond to different formulas forcS andcT, can be examined, for which we have:

• Case 2: r1BT ≤ ⁴q |T||S|

|S|+|T| =rlb

• Case 3: r1BT ≤ ⁸q |S||T|

|S|+|T| = 2rlb

Overall, the upper bound of the replication rate is at most two times the lower bound, and as such is optimal up to a constant factor. In [3], it is shown that the lower bound can be met for self-joins, which is special case of binary joins.

4. REDUCING THE REPLICATION RATE IN M-BUCKET

The partitioner of M-Bucket-I algorithm operates on a join matrix (JM), where each cell corresponds to a pair of histogram buckets. It tries to ﬁt the cells in rectangular regions;

each region is associated with a single reducer. The ratio- nale of our approach is to permute JM’s rows and columns, in order to improve the quality of the partitioning phase.

The problem of cell rearrangement can be addressed with several algorithm families, such asclustering (e.g., hierar- chical, array-based, and so on),combinatorial optimization (e.g., bin packing, knapsack) andbandwidth reduction. Here, we examine the impact of array-based clustering algorithms and more speciﬁcally, we employ the Bond Energy clustering algorithm (BEA) [6], due to its eﬃciency [4]. The purpose of BEA is to identify natural clusters that occur in complex data arrays, such as JMs. This task is accom- plished by permuting the rows and columns of the JM in a way that the numerically larger array elements are clus- tered together. As the JM of our interest comprises a two- dimensional bitmap array, i.e. the cell values are either 0 or 1 to indicate whether the processing of the corresponding pairs is meaningful or not, we expect all the non-zero values to be grouped as close as possible. The intuition is that, if the JM contains more empty sub-matrices, the mapping of the remainder sub-matrices to reducers will improve.

Our work adds a step of beforehand analysis to the M- Bucket-I/O algorithm, just after the histograms are built and the initial JM is produced. It thus takes place before the actual execution on a MapReduce platform. The quality of a JM is assessed with the help of the following three metrics:

1. replication rate (rep), deﬁned as in the Introduction and [1];

2. maximum reducer input (mri); and

3. input imbalance (imb), deﬁned as the ratio ofmri to the average reducer input, considering only the non- idle reducers.

Note that the metrics above can be accurately computed from the JM, without requiring the real execution to be completed. Thus, if the JM rearrangement is considered as not beneﬁcial, the execution can switch back to the original JM.

That is, it is straightforward to add a post-processing phase, in order to guarantee that we choose the best partitioning between the one based on the original and the one based on the re-arranged JM. Consequently, our proposal does never lead to performance degradation; actually it can lead to sig- niﬁcant improvements according to our experiments.

0 20 40 T 60 80 100

0

20

40

60

80

100 S

0 20 40 T 60 80 100

0

20

40

60

80

100 S

0 20 40 T 60 80 100

0

20

40

60

80

100 S

0 20 40 T 60 80 100

0

20

40

60

80

100 S

0 20 40 T 60 80 100

0

20

40

60

80

100 S

0 20 40 T 60 80 100

0

20

40

60

80

100 S

Figure 2: Example JMs before (left) and after (right) applying BEA.

As an example, we extracted a sample of 64M tuples from the Cloud dataset in http://cdiac.ornl.gov/ftp/

ndp026c/ndp026c.pdf. Fig. 2(top) shows the initial and rearranged JM for a self-join query that retrieves record pairs, for which the absolute diﬀerence of the sea level is between 0 and 2, or between 22 and 24, or between 50 and 52, or between 80 and 82 to give an example of a complex range query. The rearranged JM yields 21% lower rep and 19%

lowermbiat the expense of 4% higherimb. Next, we proceed to more systematic experiments on synthetic data.

4.1 Experimental Evaluation

We focus on band joins, which is a type of theta-joins that can signiﬁcantly beneﬁt from M-Bucket. In band joins, the condition is in the form ofR.A−ε≤S.A≤R.A+ε.

The experimental setup is as follows. We randomly generate synthetic JMs so that the produced JMs vary in the following aspects: join selectivity, number of band conditions, and size of JMs. Then, we compute the statistics of the resulting partitioning to reducers both when we cluster the JM and when we do not. In the ﬁrst experiment, we assume that the dimensions of the JM are 100×100. We vary the number of available reducers from 10 to 40. Also, the numbers of band conditions examined are 1, 3 and 5. For each band condition, we examined selectivity values of 1%, 5% and 10%.

Fig. 2 shows two more examples of JM rearrangement.

From the left column of the middle and bottom row, we can see the typical form of the original synthetic JMs. For each band condition, there is a diagonal stripe of cells, for which the join condition holds. The gaps between such stripes are randomly shifted, so that the JMs are not symmetric; for each condition the selectivity is set to 1%. As we can observe, the effect of the BEA algorithm is optically widely different, but in both cases, there were significant improvements, which we discuss below.

The average impact of BEA on the metrics examined are

(14)

rep mri imb coverage Overall 0.846 0.880 1.029 59.26%

Band Selectivity

1% 0.717 0.735 1.028 66.67%

5% 0.920 0.949 1.014 66.67%

10% 0.928 0.996 1.056 44.45%

Number of Band Conditions 1 0.987 0.967 0.964 33.34%

3 0.821 0.835 1.010 44.45%

5 0.810 0.873 1.058 100%

Table 1: Average ratio of the BEA-produced JM metrics to the original JM metrics.

rep mri imb

Overall 0.634 0.649 1.023 Band Selectivity 1% 0.634 0.649 1.023 5% 0.833 0.875 1.050 10% 0.848 0.900 1.050 Number of Band Conditions

1 0.979 1 0.988

3 0.737 0.733 0.995 5 0.634 0.649 1.023

Table 2: Ratio of the BEA-produced JM metrics to the original JM metrics for the maximumrepdrop observed.

summarized in Table 1. The rightmost column of the table shows the percentage of the times that the rearranged JM has led to improvements in the replication rate. Table 2 refers to the maximum improvements regarding replication observed. From these two tables, we can draw the following conclusions. On average, our proposal improves the partitioning in approximately 59% of the times. In those cases, the average decrease in the replication rate is 15%, but it can reach 37%. The improvements become more signiﬁcant as the number of the band conditions increase and the selectivity becomes lower. On average, when the band selectivity is 1%, the replication rate drops by 28%, while the maximum reducer input decreases by 26%. There is a slight increase in the relative imbalance though. Similarly, we can observe, that, when the number of band conditions is 5, there are improvements in all the cases examined.

We also investigated the impact of the number of reducers, but this was not found to be signiﬁcant. Finally, note that we considered only the cases where the replication rate is strictly less than that with the original JMs in order to com- putemriandimb. The average values of these two metrics are slightly diﬀerent if all the measurements are considered.

We conducted an additional experiment, where we increased the dimensions of the JM to 1000×1000 and we further decreased the minimum selectivity of each band condition to 0.1%. The main purpose was to verify our hypoth- esis that our proposal is more suitable for band joins with multiple conditions, each having a low selectivity. Indeed, in 100% of the cases examined when the selectivity was 0.1%

and the number of band conditions was 3 and 5, there was a signiﬁcant decrease in the replication rate (28.1% on average). The maximum reducer input was also decreased by the same amount, whereas the imbalance remained similar.

Overall, when the selectivity is low, there is more space for BEA to yield empty sub-matrices; whereas, when there are fewer band conditions, the diﬀerences from the original JMs are less signiﬁcant.

5. CONCLUSIONS AND FURTHER WORK

We investigate the execution of binary theta-joins using MapReduce. First we analyze the eﬃciency of the state-of- the-art and second, we propose the usage of a pre-processing clustering algorithm in order to help the partitioning of the map output to reducers. Our proposal was shown to incur signiﬁcant reductions in the communication cost and the maximum input received by each reducer when the theta clause comprises several conditions, each of low selectivity.

A strong point of our approach is that it is not intrusive, in the sense that it can be easily incorporated into the current state-of-the-art proposal in [7], as a pre-processing phase before the actual execution on a MapReduce platform be- gins. In addition, it is straightforward to assess whether our approach is beneﬁcial for a speciﬁc setting, and thus our proposal does not lead to overall performance degradation.

In the future, we plan to focus on more elaborate types of array rearrangement algorithms. Scalability is also an issue, since algorithms such as BEA do not scale to matrices with very large dimensions. Another avenue for further work is to investigate more sophisticated partitioning algorithms to be coupled with JM rearrangement. Harder problems include the investigation of provably optimal techniques for multiway theta-joins and eﬃcient histogram construction when there are multiple attributes participating in the theta-join condition.

Acknowledgments This research has been co-ﬁnanced by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program “Ed- ucation and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program:

Thales. Investing in knowledge society through the Euro- pean Social Fund.

6. REFERENCES

[1] F. N. Afrati, A. D. Sarma, S. Salihoglu, and J. D.

Ullman. Upper and lower bounds on the cost of a map-reduce computation.PVLDB, 6(4):277–288, 2013.

[2] F. N. Afrati and J. D. Ullman. Optimizing multiway joins in a map-reduce environment.IEEE Trans.

Knowl. Data Eng., 23(9):1282–1298, 2011.

[3] F. N. Afrati and J. D. Ullman. Matching bounds for the all-pairs mapreduce problem. InIDEAS, pages 3–4, 2013.

[4] S. Climer and W. Zhang. Rearrangement clustering:

Pitfalls, remedies, and applications.Journal of Machine Learning Research, 7:919–943, 2006.

[5] C. Doulkeridis and K. Nørv˚ag. A survey of large-scale analytical query processing in mapreduce.The VLDB Journal, pages 1–26, 2013.

[6] W. T. McCormick, P. J. Schweitzer, and T. W. White.

Problem decomposition and data reorganization by a clustering technique.Operations Research,

20(5):993–1009, 1972.

[7] A. Okcan and M. Riedewald. Processing theta-joins using mapreduce. InSIGMOD, pages 949–960, 2011.

[8] Y. Tao, W. Lin, and X. Xiao. Minimal mapreduce algorithms. InSIGMOD, pages 529–540, 2013.

[9] X. Zhang, L. Chen, and M. Wang. Eﬃcient multi-way theta-join processing using mapreduce.PVLDB, 5(11):1184–1195, 2012.

(15)

On the design space of MapReduce ROLLUP aggregates

Duy-Hung Phan

EURECOM

[email protected]

Matteo Dell’Amico

EURECOM

[email protected]

Pietro Michiardi

EURECOM

[email protected]

ABSTRACT

We define and explore the design space of efficient algorithms to compute ROLLUP aggregates, using the MapReduce programming paradigm. Using a modeling approach, we explain the non-trivial trade-off that exists between parallelism and communication costs that is inherent to a MapReduce implementation of ROLLUP. Furthermore, we design a new family of algorithms that, through a single parameter, allow to find a “sweet spot” in the parallelism vs. communication cost trade-off. We complement our work with an experimental approach, wherein we overcome some limitations of the model we use. Our results indicate that efficient ROLLUP aggregates require striking the good balance between parallelism and communication for both one-round and chained algorithms.

1. INTRODUCTION

Online analytical processing (OLAP) is a fundamental approach to study multi-dimensional data involving the computation of, for example, aggregates on data that are ac- cumulated in traditional data warehouses. When operating on massive amounts of data, it is typical for business in- telligence and reporting applications, to require data sum- marization, which is achieved using standard SQL operators such as GROUP BY, ROLLUP, CUBE, and GROUPING SETS.

Despite the tremendous amount of work carried out in the database community to come up with efficient ways of computing data aggregates, little work has been done to extend these lines of work to cope with massive scale. In- deed, the main focus of prior works in this domain has been on single server systems or small clusters executing a dis- tributed database, implementing efficient implementations of CUBE and ROLLUP operators, in line with the expecta- tions of low-latency access to data summaries [6, 8, 11, 13, 14, 19]. Only recently, the community devoted attention to solve the problem of computing data aggregates at massive scales using data intensive, scalable computing engines such

as MapReduce [10]. In support of the growing interest in computing data aggregates on batch-oriented systems, several high-level languages built on top of MapReduce, such as PIG [3] and HIVE [2], support simple implementations of, for example, the ROLLUP operator.

The endeavor of this work is to take a systematic approach to study the design space of the ROLLUP operator: besides being widely used on its own, ROLLUP is also a fundamental building block used to compute CUBE and GROUPING SETS [7]. We study the problem of defining the design space of algorithms to implement ROLLUP through the lenses of a recent model of MapReduce-like systems [4]. The model explains the trade-offs that exist between the degree of parallelism that is possible to achieve and the communication costs that are inherently present when using the MapReduce programming model. In addition, we overcome current limitations of the model we use (which glosses over important aspects of MapReduce computations) by extending our analysis with an experimental approach. We present instances of algorithmic variants of the ROLLUP operator that cover several points in the design space, implement and evaluate them using an Hadoop cluster.

In summary, our contributions are the following:

• We study the design space that exists to implement ROLLUP and show that, while it may appear deceiv- ingly simple, it is not a straightforward embarrassing parallel problem. We use modeling to obtain bounds on parallelism and communication costs.

• We design and implement new ROLLUP algorithms that can match the bounds we derived, and that swipe the design space we were able to define.

• We pinpoint the essential role of combiners (an optimization allowing pre-aggregation of data, which is available in real instances of the MapReduce paradigm, such as Hadoop [1]) for the practical relevance of some algorithm instances, and proceed with an experimental evaluation of several variants of ROLLUP implementations, both in terms of their performance (run- time) and their efficient use of cluster resources (total amount of work).

• Finally, our ROLLUP implementations exist in Java MapReduce and have been integrated in our experimental branch of PIG, which are available in a public repository.¹

1https://bitbucket.org/bigfootproject/rollupmr

PDF

Proceedings of the Workshops of the EDBT/ICDT 2014 Joint Conference

K. Sel¸cuk Candan Arizona State University, USA

Sihem Amer-Yahia CNRS – LIG, France Nicole Schweikardt

University of Frankfurt, Germany

Vassilis Christophides

University of Crete & FORTH-ICS, Greece Vincent Leroy

University of Grenoble – CNRS, France

March 28th, 2014

Contents

Message from the Chairs

Algorithms for MapReduce and Beyond (BeyondMR)

Foto N. Afrati (National Technical University of Athens, Greece) Phokion G. Kolaitis (UC Santa Cruz & IBM Research, USA)

Jeffrey D. Ullman (Stanford University, USA)

Scheduling MapReduce Jobs on Unrelated Processors

D. Fotakis

[email protected]

I. Milis

[email protected]

E. Zampetakis

[email protected] G. Zois

[email protected] ABSTRACT

Keywords

1. INTRODUCTION

2. A CONSTANT APPROXIMATION ALGO- RITHM

2.1 Scheduling Map and Reduce Tasks

2.2 Merging Task Schedules

3. REFERENCES

Binary Theta-Joins using MapReduce:

Efficiency Analysis and Improvements

Ioannis K. Koumarelas

[email protected]

Athanasios Naskos

[email protected]

Anastasios Gounaris

[email protected] ABSTRACT

1. INTRODUCTION

2. BACKGROUND

2.1 1-Bucket-Theta

2.2 M-Bucket-I

3. ON THE OPTIMALITY OF 1-BUCKET- THETA

4. REDUCING THE REPLICATION RATE IN M-BUCKET

4.1 Experimental Evaluation

5. CONCLUSIONS AND FURTHER WORK

6. REFERENCES

On the design space of MapReduce ROLLUP aggregates

Duy-Hung Phan

[email protected]

Matteo Dell’Amico

[email protected]

Pietro Michiardi

[email protected]

ABSTRACT

1. INTRODUCTION