• Aucun résultat trouvé

Parallelizing Algorithms in MapReduce

N/A
N/A
Protected

Academic year: 2022

Partager "Parallelizing Algorithms in MapReduce"

Copied!
17
0
0

Texte intégral

(1)

Parallelizing Algorithms in MapReduce

Mauro Sozio

[email protected]

(2)

Matrix Multiplication

• Input: A,B with n x K and K x m elem. resp.

• Output: C = A x B, where . i

j C

ij =

x

A B

(3)

Matrix Multiplication in MapReduce

• Main problem: A and/or B might not fit into one single machine’s main memory.

• Solution: split A and B into small blocks, so

product can be computed in main memory.

(4)

Splitting Matrices into Submatrices

X

X

A B

A11 A12

A21 A22

B11 B12

B12 B22

A B

where A,B,C are submatrices.

=

C

C

C11 C12

=

C21 C22

(5)

Block Matrices

Multiplication

(6)

MapReduce Jobs

Two MapReduce jobs.

First job. One reducer Rijk for each i,j,k computing Aik x Bkj. Mappers must route a copy of each Aik

and a copy of each Bkj to all reducers Rijk.

Second job computes the sum.

(7)

First MapReduce Job

A11 A12

A21 A22

A

B11 B12

B12 B22

B

R

111

R

112

R

121

R

122

R

211

R

212

R

221

R

222

M

1

M

2

A

11

x B

11

Map

A

12

x B

21

Reduce

Reducer Rijk

comp. Aik x Bkj

(8)

Second MapReduce Job

R

11

R

12

R

21

R

22

M

1

M

2

A

11

x B

11

A

12

x B

21

Map Reduce

A

11

x B

12

A

22

x B

22

C11 = A11 x B11 + A12 x B21 Reducer Rij

computes Cij

(9)

MapReduce Jobs

Two MapReduce jobs.

First job. One reducer Rijk computing Aik x Bkj, for each i,j,k. Mappers must route a copy of each Aik

and a copy of each Bkj to all reducers Rijk.

Second job computes the sum.

+ There are many reducers -> good parallelization.

- It requires a large amount of network traffic.

(10)

Minimum Spanning Tree

Definition: Given a graph G=(V,E) and a cost

function c : E -> R+, we seek to find a subset T of E such that:

T is connected.

T does not contain cycles.

T has minimum total cost.

There is a simple algorithm (greedy) computing the optimum.

(11)

Minimum Spanning Tree

(12)

Minimum Spanning Tree

(13)

Greedy algo for MST

Order the edges e1,..,em by non-decreasing cost.

At each step t include et, if there is no cycle.

Iterate until all edges have been processed.

Can we parallelize it efficiently in MapReduce?

(14)

Greedy2

• At each step, find a cycle C and remove an edge with maximum cost in C.

• Iterate until there is no cycle.

• Much easier to parallelize.

(15)

MRGreedy

Assumption: main memory is at least 4n bytes (n=

number of nodes).

Partition edges randomly.

Each reducer executes Greedy2 on “its” set of edges.

At each iteration, we halve the number of

machines and iterate, until only one reducer is left.

(16)

MST in MapReduce

1

2

3

(17)

MRGreedy

Assumption: main memory is at least 8n bytes (n= number of nodes).

Partition edges randomly.

Each reducer executes Greedy2 on “its” set of edges.

At each iteration, we halve the number of reducers and iterate, until only one reducer is left.

Questions:

do we compute a minimum spanning tree?

Is the total main memory sufficient?

how many iterations do we need?

Références

Documents relatifs

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String s = value.toString();. String[] tab

Can an infinite left-product of nonnegative matrices be expressed in terms of infinite left-products of stochastic ones?... BE EXPRESSED IN TERMS OF INFINITE LEFT-PRODUCTS OF

The Forum was organized by Memorial University of Newfoundland, specifically by the Leslie Harris Centre of Regional Policy and Development (based on the St.

Furthermore, as shown by Exam- ple 1, in order to determine this differential Hilbert function it may be necessary to compute derivatives of the original equations up to order ν,

The change-points appearing in the different rows are claimed to be change-points for the two-dimensional data either if they appear at least in one row the associated ROC curve for

The results show that, using ideal trackers, two of the methods that bin the list-mode data directly in the image space (DR and BTF) offer a better spatial resolution at the borders

The model proposed in this paper describes a delivery system based on the use of small autonomous electrical vehicles for delivering goods in inner city centers collaborating with

In this section, we aim to prove the US property for the quasi-inverse (Id − H M,s ) −1 of the constrained transfer operator with uniform bounds with re- spect to M.. We have