Phase 2 consists of one of four overlays, depending on the order of merge and the input record type:
• DSORT201 (4-way merge, fixed-length), for an order of merge from 1 to 4 and either a fixed-length record sort or the ADDROUT option for fixed- or variable-length records.
• DSORT202 (7-way merge, fixed-length).
for an order of merge from 5 to 7 and either fixed-length record sort or the ADDROUT option for fixed- or
variable-Iengt h records.
• DSORT203 (3-way merge,
variable-length), for an order of merge from 1 to 3 and variable-length
records.
• DSORT204 (6-way merge,
variable-length), for an order of merge from 4 to 6 and variable-length
records.
The overlay to be used in phase 2 is determined and called in by phase 1. Phase 2 then calls in the corresponding overlay for phase 3.
For the purpose of describing the
program logic, this phase has been divided into two general categories:
• Fixed-length records (7-way and 4~way
merges), Charts CA through CM
• variable-length records (6-way and 3-way merges), Charts CN through CY
This introduction serves for both
categories. Where necessary, duplicate figures and charts are provided (with the required differences, i f any) so that each ca tegory is complete in itself. For
example, there are two major-component charts (03) and two main sto.rage layout figures.
The sequences created in phase 1 are merged in phase 2. The order of merge (M) represents the number of input sequences that will be merged into one output
sequence during each merge. The order of merge, which was determined by the
assignment phase, is considered to be the fastest possible sort .for the input file.
Strings or sequences are read from the input portion of the work area, merged together, then written in the output
portion of the work area. When all strings from the input portion of the work area have been merged into the output portion of thew-ork area, a pass is complete. .For the next pass, the input half of the work area becomes the output half and vice-versa.
At the start of phase 2, pass 1, the sequences developed in phase 1 reside in
the input portion of the disk work area. A test is made comparing the number of
sequences (at the start of each pass) against the order of merge. If the number of sequences is equal to or less than the order of merge, the upcoming pass is the last one and phase 3 is fetched from the core image library. If the upcoming pass is not the last one. merging begins in phase 2.
Input File ...
...
...
...
...
Phose Sort
...
...
...
Figure 27. Sort Blocks
Sequence 1 ...
...
... Sequence 2
""
Sequence 3
As stated in phase 1, sequences are placed in the disk work area in a blocked format. Block 1 is the lowest alphameric block of each sequence in an ascending order (Figure 27).
The size of each input area depends on the type of cecord being processed. For fixed-length records, each input area is
Block
1
I 21 4 1 7
1
318
I9 113
1118
1Block 2
11511 71 38 142
11251431551731 129135158166
1equal to the length of a block from the input portion of the disk work area. as illustrated in the merge example in Figure
28.
For variable-length records, each input area has an additional overflow area, immediately preceding it (see Figure29,
item 1 ) . The overflow area is equal in size to the maximum logical record length minus one (LMAX-1).
External Sort or Merge 87
Steps
General Layout Overflow Area Equal to LMAX-l
Read-In Area Filled
Read-In Point
,/"
Record 1
Read-in Area
Record 2
Sort Block Size
Record 3
. /
A Split Record has on HEX C in the First Four Bits of the RLI (Record Length Indicator). It can Only Occur at the End of a Block.
Spl it Record is Detected and the lst Port is Moved to the Overflow Area.
---
- - - - -Record 1Depleted
Record 2 Depleted
-Record 3 Depleted
---When the First Part of a Split Record is Moved to the Overflow Nea, the Split Record Indicator is Restored to a HEX O.
Next Block is Read in and the Split Record is "Compacted".
~
Record 4
2nd Port Record 5 Record 6 Record 7
Upan Reading the Next Block, the Split Record Becomes a Whole Record and Comparing can Continue.
Figure 29. Variable-Length Record Input Area Because variable-length records are read and written in fixed-length blocks, phase 2
(as well as phase 3) must process split records in the overflow a.reas. A split record is one which is divided into two portions with one portion in each of two blocks. Before such a record can be compared for merging, i t must be rejoined or compacted.
Phase 1, in outputting sequences, creates a split record (identified by a hexadecimal C in the first four bits of the RLI) whenever a winning record cannot
wholly fit into an output block. As many bytes as can f i t into the unfilled portion of the block are moved. The record is then marked as a split record, the block is written, and the remaining portion is then moved to the start of the next block.
In phase 2, each time a winning record is moved to the output area the next record
in that input block is tested for a split condition. If i t is a split record, the remaining bytes within the block ( a split record can be encountered only as the last record in a block) are moved to the
overflow area; the split condition is erased and the next block of the sequence is read into main storage, compacting the split record. The merging process can then continue (see Figure 29, items 2, 3, and
4) •
An example of a 2-way merge is
illustrated in Figure 28. The first blocks (B1) from each of the first M sequences (M = 2) are read from the input portion of the disk work area into the main storage input areas.
Merging of the two sequences will begin with the first record of 51B1 (sequence 1
block 1) being compared against the first record in 52B1. working on an ascending
order, the lowest record is moved into the output area (Figure 30, step 1).
Step 1
2
from DISK
Work Area
1.
Main StorageI. \ /'~"' A,~
Sequence--l Sequence--2
Block--l Block--l Output 12141711311£18191181
"
..._-/ ~ ,
...Figure 30. Main Storage Output Area
-The second record ill S1 Bl is now
compared against the first record in S2Bl (Figure 28, step 2). Again, the lowest is moved to the output area. Merging of records in the input areas continues until the output area is full, or until one of the input areas is depleted.
When the output area is full, it is written in the output disk work area as a block of this new sequence.
Figure 28, step 7, shows SlBl (sequence 1, block 1) being depleted as a result of a compare operation. Tqe next sequential block of the sequence using that a.rea is
read in. Merging continues with the first record of 81B2 being compared against the appropriate record of S2B1 (Figure 28, step
8).
This cycle continues until all blocks of the M sequences now using the main storage input areas have been merged into one sequence. This is called a merge within a pass.
At this time, the next M sequences are read in and merged as described. This is
repeated until all sequences have been read from the input disk work area into main storage input areas, merged, .and written in the output portion of the disk work area.
This is the end of a pass.
A checkpoint record is updated at the start of each phase 2 pass.
When an end-of-pass condition exists but the upcoming pass is not the last pass, the phase 2 mainline is reinitialized for a new pass. Pointers to the input and output work areas are reversed so that "the output strings of the previous pass become the input strings for the upcoming pass.
If the end-of-pass condition exists and the upcoming pass is the last pass, phase 3 is fetched into main storage.
The "compare tree" in Figure 31
illustrates the compare operation used for merging.
With M blocks in the input areas, the first record of each block is compared.
When the winning record is found it is moved to the output area. Figure 31 shows four input areas, D, C, B, and A, filled with two records for each block. Using the compare tree and the input area example 1,
reco.rd 1 of sequence D is compared against the first .record of the other sequences and found to be the winning record. Because D was the winning record, the next compare l'IRlst start again with D:C.
In example two of Figure 31, the sequence C record is found to be the
winner. The next compare must start again with D:C. In example three of Figure 31, sequence B is the winning record; in this case, because sequence C record was
compared to sequence B record, C must be lower than record D. Therefore, the next compare is started at level 2, C:B. In example four, sequence record A is the winning record. Compa.ring is returned to level 1 (B:A) because sequence B record was al ready found to be lower than either sequence D or C record.
The interleaved output technique that is used in this program is illustrated in Figure 32, using a 3-way merge as an example. In phase 1, the input file was sorted into a total of 38 sequences. The gap factor or interleave factor is, for the most part, equal to the order of merge. It may, however, be reduced in the later
stages of a pass.
3rd Level Compare D:C
(D-Lowl
7/\
I (C-Low) (Equol)2nd Leve I Compare D:B
1/ \
C:B: :
-I>; 1\
(D-Low) / , (C-Low)
I
I / \ /'
D:A / \ C:A
~-~!~~,~~~~~~
ht Level Compare
Example 1
2
3
4
(D-Low) / ... / " (B-Low) (A Low) (Equol)
/';:':< 7~~E:~\ " : 'h;.~,~;\ -\
D
l
to 3 Leve I ) C I to 2 Leve I1
Bl
to 1 Leve I ) A , Com pore I " Compare I , Compore I/
/
j... / , , / " /
,
'- _ / '- _ / , /Sequence D C B A
~
QiU7~
Output Record D-- (8)
C--(7)
B-- (6)
A-- IS)
If D or C is moved to the output area, 3 compares are required before another record can be moved.
If B is moved to the output area, 2 compares are required before another record can be moved.
If A is moved to the output areo, 1 compare is required before another record can be moved.
Figure 31. Compare Tree
Order of Merge = 3
J
1 1213141516171 819110 1111121131141151161171181191201211221231241251261271281291301311321331341351361371381I \" --v-- A _ - v - _ A - - v - - A - - y __ ..A.. __ y _ -.A..--v-_A._-v--.J\.--v--A_-y_-A--V- _ A __ y __ A __ y_ -.A.-v • .J
Note that a gap exists between S38B1 and
reduction in output interleaviing may have to be implemented.
sequences remain before attempting to merge M2 sequences. In other words, pass 1
checked when there were 38, 29, 20, 11, and 2 sequences remaining on input. The first time that Ma or less sequences were
detected was when there were two sequences remaining. Output interleave factor is reduced to one, while the input factor is interleave factor changes when the
remaining sequences are M2 or less, or when sequences. Therefore, the output
interleave factor starts at less than three
consecutively on the user-specified unit, ta pe or disk.
Figures 33 and 34 illustrate phase 2 main storage layouts for fixed- and variable-length records, respectively.
Supervisor
Mainline Constants
Mainline
Merge-Merge Routine
Patch Area Equal Routine Move Routine
Phase 2
Initiali~ation
Relocator
Pass-Pass Routine
Remainder of Core Storage
Start of Phase '}
Note: Not drown to sca Ie.
Supervisor
Mainline Constants
Mainline
Merge-Merge Routine
Patch Area Equal Routine Move Routine Checkpoint Record
Phase 2 Input/Output
Pass-Pass Routine
Phase 2
I/O Area
Pass-Pass Time
Supervisor
Mainline Constants
Mainline
Merge-Merge Routine
Patch Areo Equal Routine Move Routine
Phase 2 Input/Output Areas
Merging With Equal Routine
Supervisor
Mainline Constants
Mainline
Merge-Merge Routine
Patch Area Move Routine
Phase 2 Input/Output Areas
Merging With No Equal Routine
Figure 33. Phase 2 Main storage Layout for Fixed":,,Length Records
Move Routine Incorporated in Mainline
Supervisor
Constants
Mainline
Merge-Merge Routine
Patch Area Equal Routine
Phase 2 Initial ization
Pass-Pass Routine
Start of Phase 2
Note: Not drawn to scale.
Supervisor
Constants
Mainline
Merge-Merge Routine
Patch Area Equal Routine Checkpoint Record
Phase 2 Input/Output Areas
Pass-Pass Routine
I/O Area
Pass-Pass Time
Supervisor
Constants
Mainline
Merge-Merge Routine
Patch Area Equal Routine
Phase 2 Input/Output Areas
Merging With Equal Routine
Figure 34. Phase 2 Main storage for Variable-Length Records
Supervisor
Constants
Mainline
Merge-Merge Routine
Patch Area
Phase 2 Input/Output Areas
Merging With No Equal Routine
PHASE 2 INITIALIZATION, FIXED-LENGTH routine then initializes:
• Merge-merge routine
• Pass-pass routine
• Sequence compare loops
•
Interleave address routine•
Relocatable routines• Constants
Merge-merge. Initialized according to the order of merge that was calculated by the assignment phase (from 2 to 7).
Pass-pass. Initialized according to the order of merge to be used during phase 2. either ascending or descending sequence.
Interleave address routine. Factors are calculated for the input/output disk
address interleaving for phase 2.
Relocatable routines. The equal routine is not relocated. However, if the equal the relocator, initialization continues and the pass-pass routine is written out on the the sort-block size. The input/output channel programs are then initialized with the number of bytes to be transferred
routine continues to RDCPOK.
RDCPOK, CA-D2
A test is made to determine if the equal routine is required (more than one control field per record). If so, a branch is made
to COMFIT; if not. the branches in the mainline compare loops are initialized to