Duality between Merge Sort and Distribution Sort In the case of merging R runs together, if the number R of runs is

External Sorting and Related Problems

5.3 Prefetching, Caching, and Applications to Sorting In this section, we consider the problem of prefetch scheduling for

5.3.4 Duality between Merge Sort and Distribution Sort In the case of merging R runs together, if the number R of runs is

small enough so that we haveRDblock-sized prefetch buffers in internal memory, then it is easy to see that the merge can proceed in an optimum manner. However, this constraint limits the size ofR, as in disk striping, which can be suboptimal for sorting. The challenge is to make use of substantially fewer prefetch buffers so that we can increaseR to be as large as possible. The largerR is, the faster we can do merge sort, or equivalently, the larger the files that we can sort in a given number of passes. We saw in Section 5.2.1 that Θ(DlogD) prefetch buffers suffice for SRM to achieve optimal merge sort performance.

A tempting approach is duality: We know from Section 5.1.3 that we need only Θ(D) output buﬀers to do a distribution pass if we lay out the buckets on the disks in a randomized cycling (RCD) pattern. If we can establish duality, then we can merge runs using Θ(D) prefetch buﬀers, assuming the runs are stored on the disks using randomized cycling.

Figure 5.8 illustrates the duality between merging and distribution.

However, one issue must ﬁrst be resolved before we can legitimately apply duality. In each merge pass of merge sort, we merge R runs at a time into a single run. In order to apply duality, which deals with read and write sequences, we need to predetermine the read order Σ for the merge. That is, if we can specify the proper read order Σ of the blocks, then we can legitimately apply Theorem 5.2 to the write problem on Σ^R.

The solution to determining Σ is to partition internal memory so that not only does it consist of several prefetch buffers but it also includesRmerging buffers, whereRis the number of runs. Each merg-ing buffer stores a (partially filled) block from a run that is participat-ing in the merge. We say that a block is read when it is moved from

1 2 3 4 5 6

Stream of blocks are read in Σorder

Correspondence between

and prefetching step in lazy read-once schedule output step in greedy write-once schedule Output buffers Stream of blocks are

order 6

Σ^R

Fig. 5.8 Duality between merging with R= 8 runs and distribution with S= 8 buckets, usingD= 6 disks. The merge of theRruns proceeds from bottom to top. Blocks are input from the disks, stored in the prefetch buﬀers, and ultimately read into the merging buﬀers.

The blocks of the merged run are moved to the output buﬀers and then output to the disks.

The order in which blocks enter the merging buffers determines the sequence Σ, which can be predetermined by ordering the blocks based upon their smallest key values. The distribution intoSbuckets proceeds from top to bottom. Blocks are input from the disks into input buffers and moved to the partitioning buffers. The blocks of the resulting buckets are written in the order Σ^Rto the output buffers and then output to the appropriate disks.

5.3 Prefetching, Caching, and Applications to Sorting 357 a prefetch buffer to a merging buffer, where it stays until its items are exhausted by the merging process. When a block expires, it is replaced by the next block in the read sequence Σ (unless Σ has expired) before the merging is allowed to resume. The first moment that a block abso-lutely must be read and moved to the merging buffer is when its smallest key value enters into the merging process. We therefore define the read priority of a blockbto be itssmallest key value. We can sort the small-est key values (one per block) to form the read order Σ. Computing the read sequence Σ is fast to do because sorting N/B key values is a considerably smaller problem than sorting the entire file ofN records.

A subtle point is to show that this Σ ordering actually “works,”

namely, that at each step of the merging process, the item r with the smallest key value not yet in the merged run will be added next to the merged run. It may be, for example, that the R merging buffers contain multiple blocks from one run but none from another. However, at the time when item r should be added to the merged run, there can be at most one other nonempty run in a merging buffer from each of the other R−1 runs. Therefore, since there are R merging buffers and since the merging proceeds only when all R merging buffers are nonempty (unless Σ is expired), it will always be the case that the block containing r will be resident in one of the merging buffers before the merging proceeds.

We need to use a third partition of internal memory to serve as output buffers so that we can output the merged run in a striped fashion to theDdisks. Knuth [220, problem 5.4.9–26] has shown that we may need as many output buffers as prefetch buffers, but about 3D output buffers typically suffice. So the remaining m=m−R−3D blocks of internal memory are used as prefetch buffers.

We get an optimum merge schedule for read sequence Σ by comput-ing the greedy output schedule for the reverse sequence Σ^R. Figure 5.8 shows the ﬂow through the various components in internal memory.

When the runs are stored on the disks using randomized cycling, the length of the greedy output schedule corresponds to the performance of a distribution pass in RCD, which is optimal. We call the resulting merge sortrandomized cycling merge sort (RCM). It has the identical I/O performance bound (5.2) as does RCD, except that each level of

merging requires some extra overhead to ﬁll the prefetch buﬀers to start the merge, corresponding to the additive terms in Theorem 5.2.

For any parameters ε, δ >0, assuming that m≥D(ln 2 +δ)/ε+ 3D, the average number of I/Os for RCM is

2 +ε+O(e^−δD)n chooseεandδ appropriately to bound (5.8) as follows with a constant of proportionality of 2:

∼2n D

log_αmn

. (5.9)

Dementiev and Sanders [136] show how to overlap computation eﬀectively with I/O in the RCM method. We can apply the duality approach to other methods as well. For example, we could get a sim-ple randomized distribution sort that is dual to the SRM method of Section 5.2.1.

Dans le document Algorithms and Data Structures for External Memory (Page 52-55)