- EXTERNAL SORT OR MERGE

Phase 2 consists of one of four overlays, depending on the order of merge and the input record type:

• DSORT201 (4-way merge, fixed-length), for an order of merge from 1 to 4 and either a fixed-length record sort or the ADDROUT option for fixed- or variable-length records.

• DSORT202 (7-way merge, fixed-length).

for an order of merge from 5 to 7 and either fixed-length record sort or the ADDROUT option for fixed- or

variable-Iengt h records.

• DSORT203 (3-way merge,

variable-length), for an order of merge from 1 to 3 and variable-length

records.

• DSORT204 (6-way merge,

variable-length), for an order of merge from 4 to 6 and variable-length

records.

The overlay to be used in phase 2 is determined and called in by phase 1. Phase 2 then calls in the corresponding overlay for phase 3.

For the purpose of describing the

program logic, this phase has been divided into two general categories:

• Fixed-length records (7-way and 4~way

merges), Charts CA through CM

• variable-length records (6-way and 3-way merges), Charts CN through CY

This introduction serves for both

categories. Where necessary, duplicate figures and charts are provided (with the required differences, i f any) so that each ca tegory is complete in itself. For

example, there are two major-component charts (03) and two main sto.rage layout figures.

The sequences created in phase 1 are merged in phase 2. The order of merge (M) represents the number of input sequences that will be merged into one output

sequence during each merge. The order of merge, which was determined by the

assignment phase, is considered to be the fastest possible sort .for the input file.

Strings or sequences are read from the input portion of the work area, merged together, then written in the output

portion of the work area. When all strings from the input portion of the work area have been merged into the output portion of thew-ork area, a pass is complete. .For the next pass, the input half of the work area becomes the output half and vice-versa.

At the start of phase 2, pass 1, the sequences developed in phase 1 reside in

the input portion of the disk work area. A test is made comparing the number of

sequences (at the start of each pass) against the order of merge. If the number of sequences is equal to or less than the order of merge, the upcoming pass is the last one and phase 3 is fetched from the core image library. If the upcoming pass is not the last one. merging begins in phase 2.

Input File ...

...

Phose Sort

...

Figure 27. Sort Blocks

Sequence 1 ...

...

... Sequence 2

""

Sequence 3

As stated in phase 1, sequences are placed in the disk work area in a blocked format. Block 1 is the lowest alphameric block of each sequence in an ascending order (Figure 27).

The size of each input area depends on the type of cecord being processed. For fixed-length records, each input area is

Block

1 I 21 4 1 7

318 9 113

118

Block 2

11511 71 38 142

1251431551731 129135158166

equal to the length of a block from the input portion of the disk work area. as illustrated in the merge example in Figure

28.

For variable-length records, each input area has an additional overflow area, immediately preceding it (see Figure

29,

item 1 ) . The overflow area is equal in size to the maximum logical record length minus one (LMAX-1).

External Sort or Merge 87

Steps

General Layout Overflow Area Equal to LMAX-l

Read-In Area Filled

Read-In Point

,/"

Record 1

Read-in Area

Record 2

Sort Block Size

Record 3

. /

A Split Record has on HEX C in the First Four Bits of the RLI (Record Length Indicator). It can Only Occur at the End of a Block.

Spl it Record is Detected and the lst Port is Moved to the Overflow Area.

---

- - - - -Record 1

Depleted

Record 2 Depleted

-Record 3 Depleted

---When the First Part of a Split Record is Moved to the Overflow Nea, the Split Record Indicator is Restored to a HEX O.

Next Block is Read in and the Split Record is "Compacted".

~

Record 4

2nd Port Record 5 Record 6 Record 7

Upan Reading the Next Block, the Split Record Becomes a Whole Record and Comparing can Continue.

Figure 29. Variable-Length Record Input Area Because variable-length records are read and written in fixed-length blocks, phase 2

(as well as phase 3) must process split records in the overflow a.reas. A split record is one which is divided into two portions with one portion in each of two blocks. Before such a record can be compared for merging, i t must be rejoined or compacted.

Phase 1, in outputting sequences, creates a split record (identified by a hexadecimal C in the first four bits of the RLI) whenever a winning record cannot

wholly fit into an output block. As many bytes as can f i t into the unfilled portion of the block are moved. The record is then marked as a split record, the block is written, and the remaining portion is then moved to the start of the next block.

In phase 2, each time a winning record is moved to the output area the next record

in that input block is tested for a split condition. If i t is a split record, the remaining bytes within the block ( a split record can be encountered only as the last record in a block) are moved to the

overflow area; the split condition is erased and the next block of the sequence is read into main storage, compacting the split record. The merging process can then continue (see Figure 29, items 2, 3, and

4) •

An example of a 2-way merge is

illustrated in Figure 28. The first blocks (B1) from each of the first M sequences (M = 2) are read from the input portion of the disk work area into the main storage input areas.

Merging of the two sequences will begin with the first record of 51B1 (sequence 1

block 1) being compared against the first record in 52B1. working on an ascending

order, the lowest record is moved into the output area (Figure 30, step 1).

Step 1

from DISK

Work Area

1.

Main Storage

I. \ /'~"' A,~

Sequence--l Sequence--2

Block--l Block--l Output 12141711311£18191181

"

_...

_-/ ~ ,

^...

Figure 30. Main Storage Output Area

-The second record ill S1 Bl is now

compared against the first record in S2Bl (Figure 28, step 2). Again, the lowest is moved to the output area. Merging of records in the input areas continues until the output area is full, or until one of the input areas is depleted.

When the output area is full, it is written in the output disk work area as a block of this new sequence.

Figure 28, step 7, shows SlBl (sequence 1, block 1) being depleted as a result of a compare operation. Tqe next sequential block of the sequence using that a.rea is

read in. Merging continues with the first record of 81B2 being compared against the appropriate record of S2B1 (Figure 28, step

8).

This cycle continues until all blocks of the M sequences now using the main storage input areas have been merged into one sequence. This is called a merge within a pass.

At this time, the next M sequences are read in and merged as described. This is

repeated until all sequences have been read from the input disk work area into main storage input areas, merged, .and written in the output portion of the disk work area.

This is the end of a pass.

A checkpoint record is updated at the start of each phase 2 pass.

When an end-of-pass condition exists but the upcoming pass is not the last pass, the phase 2 mainline is reinitialized for a new pass. Pointers to the input and output work areas are reversed so that "the output strings of the previous pass become the input strings for the upcoming pass.

If the end-of-pass condition exists and the upcoming pass is the last pass, phase 3 is fetched into main storage.

The "compare tree" in Figure 31

illustrates the compare operation used for merging.

With M blocks in the input areas, the first record of each block is compared.

When the winning record is found it is moved to the output area. Figure 31 shows four input areas, D, C, B, and A, filled with two records for each block. Using the compare tree and the input area example 1,

reco.rd 1 of sequence D is compared against the first .record of the other sequences and found to be the winning record. Because D was the winning record, the next compare l'IRlst start again with D:C.

In example two of Figure 31, the sequence C record is found to be the

winner. The next compare must start again with D:C. In example three of Figure 31, sequence B is the winning record; in this case, because sequence C record was

compared to sequence B record, C must be lower than record D. Therefore, the next compare is started at level 2, C:B. In example four, sequence record A is the winning record. Compa.ring is returned to level 1 (B:A) because sequence B record was al ready found to be lower than either sequence D or C record.

The interleaved output technique that is used in this program is illustrated in Figure 32, using a 3-way merge as an example. In phase 1, the input file was sorted into a total of 38 sequences. The gap factor or interleave factor is, for the most part, equal to the order of merge. It may, however, be reduced in the later

stages of a pass.

3rd Level Compare D:C

(D-Lowl

7/\

I (C-Low) (Equol)

2nd Leve I Compare _D:B

1/ \

_C:B

: :

-I>; ^1\

(D-Low) / , (C-Low)

I

I / \ /'

D:A / \ C:A

~-~!,~~~~

ht Level Compare

Example 1

(D-Low) / ... / " (B-Low) (A Low) (Equol)

/';:':< 7~~E:~\ " : _'h;.~,~;\ -\

l

to 3 Leve I ) C I to 2 Leve I

1 l

to 1 Leve I ) A , Com pore I " Compare I , Compore I

/

j... / , , / " /

,

^'- ^{_ /} ^'- ^{_ /} ^, ^/

Sequence D C B A

~

QiU7~

Output Record D-- (8)

C--(7)

B-- (6)

A-- IS)

If D or C is moved to the output area, 3 compares are required before another record can be moved.

If B is moved to the output area, 2 compares are required before another record can be moved.

If A is moved to the output areo, 1 compare is required before another record can be moved.

Figure 31. Compare Tree

Order of Merge = 3

J

1 1213141516171 819110 1111121131141151161171181191201211221231241251261271281291301311321331341351361371381

I \" --v-- A _ - v - _ A - - v - - A - - y __ ..A.. __ y _ -.A..--v-_A._-v--.J\.--v--A_-y_-A--V- _ A __ y __ A __ y_ -.A.-v • .J

Note that a gap exists between S38B1 and

reduction in output interleaviing may have to be implemented.

sequences remain before attempting to merge M2 sequences. In other words, pass 1

checked when there were 38, 29, 20, 11, and 2 sequences remaining on input. The first time that Ma or less sequences were

detected was when there were two sequences remaining. Output interleave factor is reduced to one, while the input factor is interleave factor changes when the

remaining sequences are M2 or less, or when sequences. Therefore, the output

interleave factor starts at less than three

consecutively on the user-specified unit, ta pe or disk.

Figures 33 and 34 illustrate phase 2 main storage layouts for fixed- and variable-length records, respectively.

Supervisor

Mainline Constants

Mainline

Merge-Merge Routine

Patch Area Equal Routine Move Routine

Phase 2

Initiali~ation

Relocator

Pass-Pass Routine

Remainder of Core Storage

Start of Phase '}

Note: Not drown to sca Ie.

Supervisor

Mainline Constants

Mainline

Merge-Merge Routine

Patch Area Equal Routine Move Routine Checkpoint Record

Phase 2 Input/Output

Pass-Pass Routine

Phase 2

I/O Area

Pass-Pass Time

Supervisor

Mainline Constants

Mainline

Merge-Merge Routine

Patch Areo Equal Routine Move Routine

Phase 2 Input/Output Areas

Merging With Equal Routine

Supervisor

Mainline Constants

Mainline

Merge-Merge Routine

Patch Area Move Routine

Phase 2 Input/Output Areas

Merging With No Equal Routine

Figure 33. Phase 2 Main storage Layout for Fixed":,,Length Records

Move Routine Incorporated in Mainline

Supervisor

Constants

Mainline

Merge-Merge Routine

Patch Area Equal Routine

Phase 2 Initial ization

Pass-Pass Routine

Start of Phase 2

Note: Not drawn to scale.

Supervisor

Constants

Mainline

Merge-Merge Routine

Patch Area Equal Routine Checkpoint Record

Phase 2 Input/Output Areas

Pass-Pass Routine

I/O Area

Pass-Pass Time

Supervisor

Constants

Mainline

Merge-Merge Routine

Patch Area Equal Routine

Phase 2 Input/Output Areas

Merging With Equal Routine

Figure 34. Phase 2 Main storage for Variable-Length Records

Supervisor

Constants

Mainline

Merge-Merge Routine

Patch Area

Phase 2 Input/Output Areas

Merging With No Equal Routine

PHASE 2 INITIALIZATION, FIXED-LENGTH routine then initializes:

• Merge-merge routine

• Pass-pass routine

• Sequence compare loops

•

Interleave address routine

•

Relocatable routines

• Constants

Merge-merge. Initialized according to the order of merge that was calculated by the assignment phase (from 2 to 7).

Pass-pass. Initialized according to the order of merge to be used during phase 2. either ascending or descending sequence.

Interleave address routine. Factors are calculated for the input/output disk

address interleaving for phase 2.

Relocatable routines. The equal routine is not relocated. However, if the equal the relocator, initialization continues and the pass-pass routine is written out on the the sort-block size. The input/output channel programs are then initialized with the number of bytes to be transferred

routine continues to RDCPOK.

RDCPOK, CA-D2

A test is made to determine if the equal routine is required (more than one control field per record). If so, a branch is made

to COMFIT; if not. the branches in the mainline compare loops are initialized to

Dans le document is It (Page 86-96)

- EXTERNAL SORT OR MERGE

""

1

I 21 4 1 7

318

9 113

118

11511 71 38 142

1251431551731 129135158166

28.

29,

---

~

1.

I. \ /'~"' A,~

"

_-/ ~ ,

7/\

1/ \

-I>; 1\

I

I / \ /'

~-~!~~,~~~~~~

/';:':< 7~~E:~\ " : 'h;.~,~;\ -\

l

1

l

/

,

~

QiU7~

J

•

•

-I>; ^1\

~-~!,~~~~

/';:':< 7~~E:~\ " : _'h;.~,~;\ -\