Matrix Multiplication
• Input: A,B with n x K and K x m elem. resp.
• Output: C = A x B, where . i
j C
ij =x
A B
Matrix Multiplication in MapReduce
• Main problem: A and/or B might not fit into one single machine’s main memory.
• Solution: split A and B into small blocks, so
product can be computed in main memory.
Splitting Matrices into Submatrices
X
X
A B
A11 A12
A21 A22
B11 B12
B12 B22
A B
where A,B,C are submatrices.
=
C
C
C11 C12
=
C21 C22
Block Matrices
Multiplication
MapReduce Jobs
•
Two MapReduce jobs.•
First job. One reducer Rijk for each i,j,k computing Aik x Bkj. Mappers must route a copy of each Aikand a copy of each Bkj to all reducers Rijk.
•
Second job computes the sum.First MapReduce Job
A11 A12
A21 A22
A
B11 B12
B12 B22
B
R
111R
112R
121R
122R
211R
212R
221R
222M
1M
2A
11x B
11Map
A
12x B
21Reduce
Reducer Rijk
comp. Aik x Bkj
Second MapReduce Job
R
11R
12R
21R
22M
1M
2A
11x B
11A
12x B
21Map Reduce
A
11x B
12A
22x B
22C11 = A11 x B11 + A12 x B21 Reducer Rij
computes Cij
MapReduce Jobs
•
Two MapReduce jobs.•
First job. One reducer Rijk computing Aik x Bkj, for each i,j,k. Mappers must route a copy of each Aikand a copy of each Bkj to all reducers Rijk.
•
Second job computes the sum.•
+ There are many reducers -> good parallelization.•
- It requires a large amount of network traffic.Minimum Spanning Tree
•
Definition: Given a graph G=(V,E) and a costfunction c : E -> R+, we seek to find a subset T of E such that:
•
T is connected.•
T does not contain cycles.•
T has minimum total cost.•
There is a simple algorithm (greedy) computing the optimum.Minimum Spanning Tree
Minimum Spanning Tree
Greedy algo for MST
•
Order the edges e1,..,em by non-decreasing cost.•
At each step t include et, if there is no cycle.•
Iterate until all edges have been processed.•
Can we parallelize it efficiently in MapReduce?Greedy2
• At each step, find a cycle C and remove an edge with maximum cost in C.
• Iterate until there is no cycle.
• Much easier to parallelize.
MRGreedy
•
Assumption: main memory is at least 4n bytes (n=number of nodes).
•
Partition edges randomly.•
Each reducer executes Greedy2 on “its” set of edges.•
At each iteration, we halve the number ofmachines and iterate, until only one reducer is left.