BATCHER SORTING NETWORKS - Selection Networks

Selection Networks

7.4. BATCHER SORTING NETWORKS

Batcher’s ingenious constructions date back to the early 1960s (published a few years later) and constitute some of the earliest examples of parallel algorithms. It is remarkable

136

SORTING AND SELECTION NETWORKS 137 that in more than three decades, only small improvements to his constructions have been made.

One type of sorting network proposed by Batcher is based on the idea of an (m , m ' )-merger and uses a technique known as even–odd merge or odd-even merge. An (m , m ' )-merger is a circuit that merges two sorted sequences of lengths m and m' into a single sorted sequence of length m + m'. Let the two sorted sequences be

If m = 0 or m' = 0, then nothing needs to be done. For m = m' = 1, a single comparator can do the merging. Thus, we assume mm' > 1 in what follows. The odd–even merge is done by merging the even- and odd-indexed elements of the two lists separately:

and

are merged to get

and

are merged to get

If we now compare–exchange the pairs of elements the resulting sequence will be completely sorted. Note that v₀, which is known to be the smallest element overall, is excluded from the final compare–exchange operations.

An example circuit for merging two sorted lists of sizes 4 and 7 using the odd–even merge technique is shown in Fig. 7.9. The three circuit segments, separated by vertical dotted lines, correspond to a (2, 4)-merger for even-indexed inputs, a (2, 3)-merger for odd-indexed inputs, and the final parallel compare–exchange operations prescribed above. Each of the smaller mergers can be designed recursively in the same way. For example, a (2, 4)-merger consists of two (1, 2)-mergers for even- and odd-indexed inputs, followed by two parallel compare–exchange operations. A (1, 2)-merger is in turn built from a (1, 1)-merger, or a single comparator, for the even-indexed inputs, followed by a single compare–exchange operation. The final (4, 7)-merger in Fig. 7.9 uses 16 modules and has a delay of 4 units.

It would have been quite difficult to prove the correctness of Batcher’s even–odd merger were it not for the zero–one principle that allows us to limit the proof of correctness to only 0 and 1 inputs. Suppose that the sorted x sequence has k 0s and m – k 1s. Similarly, let there be k' 0s and m' – k' 1s in the sorted y sequence. When we merge the even-indexed terms, the v sequence will have k_even= k / 2 + k '/ 2 0s. Likewise, the w sequence resulting from the merging of odd-indexed terms will have k_odd = k / 2+ k' /2 0s. Only three cases are possible:

Case a: k_even= k_odd The sequence is already sorted Case b: k_even= k_odd+ 1 The sequence is already sorted

138 INTRODUCTION TO PARALLEL PROCESSING

Figure 7.9. Batcher's even–odd merging network for 4 + 7 inputs.

Case c : k_even= k_odd+ 2

In the last case, the sequence has only a pair of elements that are not in sorted order, as shown in the example below (the out-of-order pair is underlined).

The problem will be fixed by the compare–exchange operations between w_iand v_{i +1}. Batcher’s (m, m) even–odd merger, when m is a power of 2, is characterized by the following delay and cost recurrences:

Armed with an efficient merging circuit, we can design an n-sorter recursively from two n /2-sorters and an (n/2, n/2)-merger, as shown in Fig. 7.10. The 4-sorter of Fig. 7.4 is an instance of this design: It consists of two 2-sorters followed by a (2, 2)-merger in turn built from two (1, 1)-mergers and a single compare–exchange step. A larger example, correspond-ing to an 8-sorter, is depicted in Fig. 7.11. Here, 4-sorters are used to sort the first and second halves of the inputs separately, with the sorted lists then merged by a (4, 4)-merger composed of an even (2, 2)-merger, an odd (2, 2)-merger, and a final stage of three comparators.

Batcher sorting networks based on the even–odd merge technique are characterized by the following delay and cost recurrences:

SORTING AND SELECTION NETWORKS 139

Figure 7.10. The recursive structure of Batcher's even–odd merge sorting network.

A second type of sorting network proposed by Batcher is based on the notion of bitonic sequences. A bitonic sequence is defined as one that “rises then falls”

, “falls then rises” , or is

obtained from the first two categories through cyclic shifts or rotations. Examples include 1 3 3 4 6 6 6 2 2 1 0 0 Rises then falls

8 7 7 6 6 6 5 4 6 8 8 9 Falls then rises

8 9 8 7 7 6 6 6 5 4 6 8 The previous sequence, right-rotated by 2

Batcher observed that if we sort the first half and second half of a sequence in opposite directions, as indicated by the vertical arrows in Fig. 7.12, the resulting sequence will be bitonic and can thus be sorted by a special bitonic-sequence sorter. It turns out that a

Figure 7.11. Batcher's even–odd merge sorting network for eight inputs.

140 INTRODUCTION TO PARALLEL PROCESSING

Figure 7.12. The recursive structure of Batcher's bitonic sorting network.

bitonic-sequence sorter with n inputs has the same delay and cost as an even–odd ( n/ 2 , n /2)-merger. Therefore, sorters based on the notion of bitonic sequences (bitonic sorters) have the same delay and cost as those based on even–odd merging.

A bitonic-sequence sorter can be designed based on the assertion that if in a bitonic sequence, we compare–exchange the elements in the first half with those in the second half, as indicated by the dotted comparators in Fig. 7.12, each half of the resulting sequence will be a bitonic sequence and each element in the first half will be no larger than any element in the second half. Thus, the two halves can be independently sorted by smaller bitonic-se-quence sorters to complete the sorting process. Note that we can reverse the direction of sorting in the lower n-sorter if we suitably adjust the connections of the dotted comparators in Fig. 7.12. A complete eight-input bitonic sorting network is shown in Fig. 7.13.

While asymptotically suboptimal, Batcher sorting networks are quite efficient. Attempts at designing faster or less complex networks for specific values of n have yielded only marginal improvements over Batcher’s construction when n is large.

Figure 7.13. Batcher's bitonic sorting network for eight inputs.

SORTING AND SELECTION NETWORKS 141 7.5. OTHER CLASSES OF SORTING NETWORKS

A class of sorting networks that possess the same asymptotic Θ(log²n) delay and Θ(n log²n ) cost as Batcher sorting networks, but that offer some advantages, are the periodic balanced sorting networks [Dowd89]. An n-sorter of this type consists of log₂n identical stages, each of which is a (log₂n)-stage n-input bitonic-sequence sorter. Thus, the delay and cost of an n-sorter of this type are (log₂n)²and n (log₂n)²/2, respectively. Figure 7.14 shows an eight-input example. The 8-sorter of Fig. 7.14 has a larger delay (9 versus 6) and higher cost (36 versus 19) compared with a Batcher 8-sorter but offers the following advantages:

1. The structure is regular and modular (easier VLSI layout).

2. Slower, but more economical, implementations are possible by reusing the blocks.

In the extreme, log n passes through a single block can be used for cost-efficient₂ sorting.

3. Using an extra block provides tolerance to some faults (missed exchanges).

4. Using two extra blocks provides tolerance to any single fault (a missed or incorrect exchange).

5. Multiple passes through a faulty network can lead to correct sorting (graceful degradation).

6. Single-block design can be made fault-tolerant by adding an extra stage to the block.

Just as we were able to obtain a sorting network based on odd–even transposition sort on a linear array, we can base a sorting network on a sorting algorithm for a 2D array. For example, the two 8-sorters shown in Figs. 7.15 and 7.16 are based on shearsort (defined in Section 2.5) with snakelike order on 2 × 4 and 4 × 2 arrays, respectively. Compared with Batcher 8-sorters, these are again slower (7 or 9 versus 6 levels) and more complex (24 or 32 versus 19 modules). However, they offer some of the same advantages enumerated for periodic In general, an rc-sorter can be designed based on shearsort on an r × c mesh. It will have balanced sorting networks.

log₂r identical blocks, each consisting of r parallel c-sorters followed by c parallel r -sorters, followed at the end by a special block composed of r parallel c-sorters. However, such networks are usually not competitive when r is large.

Figure 7.14. Periodic balanced sorting network for eight inputs.

142 INTRODUCTION TO PARALLEL PROCESSING

Figure 7.15. Design of an 8-sorter based on shearsort on 2 × 4 mesh.

Figure 7.16. Design of an 8-sorter based on shearsort on 4 × 2 mesh.

Dans le document Introduction to Parallel Processing (Page 161-167)