DATA SETS AND FILES - OS PL/I Optimizing Compiler:

DATA SETS

DATA SET NAMES

This chapter describes briefly the nature and organization of data sets, the data management services provided by the

operating system, the record formats acceptable for auxiliary storage devices, and the way in which data sets are associated with Pl/I files. It also describes some ENVIRONMENT options used in file declarations to describe the data set to Pl/I.

Methods of creating and accessing data sets are given in Chapter 5, "Defining Data Sets for Stream Files" on page 134, Chapter 6, "Using Consecutive, Indexed, Regional, and

Teleprocessing Data Sets" on page 149, and Chapter 7, "Using VSAM Data Sets from Pl/I" on page 222.

Chapter 7, "Using VSAM Data Sets fromPl/In on pa~e 222

describes VSAM data sets. These differ signifidarttly from other data set types; VSAM users will find that much of t~e

information in this chapter is irrelevant.

A data set is any collection of data that can be created by a program and accessed by the same or another program. A data set may be a deck of punched cards, it may be a series of items recorded on magnetic tape, or it may be recorded on a

direct-access device (as well as being input from, or output to, your terminal). A printed listing produced by a program is a data set, but it cannot be accessed by a program.

A volume is a physical unit of auxiliary storage (for example, a reel of magnetic tape or a disk pack) that can be written on or read by an input/output device; a serial number identifies each volume (other than a magnetic-tape volume either without labels or with nonstandard labels).

A magnetic-tape or direct-access volume can contain more than one data set; conversely, a single data set can span two or more magnetic-tape or direct-access volumes.

A data set on a direct-access device must have a name so that the operating system can refer to it. If you do not supply a name, the operating system will supply a temporary one. A data set on a magnetic-tape device must have a name if the tape has IBM standard labels (see "labels" on page 105). Names can be unqualified, qualified, temporary, or generation names, as

described in your JCl manual. Data sets on punched cards, paper tape, unlabeled magnetic tape, or nonstandard labeled magnetic tape do not have names.

You can place the name of a data set, with information

identifying the volume on which it resides, in a catalog. Such a data set is termed a cataloged data set. To catalog a data set, use the CATlG subparameter of the DISP parameter of the DD statement. To retrieve a cataloged data set, you need only specify the name of the data set and its disposition. The

operating system searches the catalog for information associated with the name and uses this information to request the operator

to mount the volume containing your data set.

100 OS Pl/I Optimizing Compiler: Programmer's Guide

BLOCKS AND RECORDS

RECORD FORMATS

The items of data in a data set are arranged in blocks separated by interblock gaps (IBG). (Some manuals refer to these as

interrecord gaps.)

A block is the unit of data transmitted to and from a data set.

Each block contains one record, part of a record, or several records. A block could also contain a prefix field of up to 99 bytes in length depending on the information interchange code (ASCII or EBCDIC) in which the data is recorded (see

"Information Interchange Codes"). Specify the block size in the BlKSIZE parameter of the DD statement or in the BlKSIZE option of the ENVIRONMENT attribute.

A record is the unit of data transmitted to and from a program.

When writing a Pl/I program, you need consider only the records that you are reading or writing; but when you describe the data sets that your program will create or access, you must be aware of the relationship between blocks and records.

If a block contains two or more records, the records are said to be blocked. Blocking conserves storage space in a volume

because it reduces the number of interblock gaps, and it may increase efficiency by reducing the number of input/output operations required to process a data set. Records are blocked and deblocked by the data management routines.

Specify the record length in the lRECl parameter of the DD

statement or in the RECSIZE option of the ENVIRONMENT attribute.

INFORMATION INTERCHANGE CODES: The normal code in which data is recorded is the Extended Binary Coded Decimal Interchange Code (EBCDIC), although source input can optionally be coded in Binary Coded Decimal (BCD). However, for magnetic tape only, the system accepts data recorded in the American Standard Code for Information Interchange (ASCII). Use the ASCII and BUFOFF options of the ENVIRONMENT attribute if you are reading or writing data sets recorded in ASCII.

A prefix field up to 99 bytes in length may be present at the beginning of each block in an ASCII data set. The use of this field is controlled by the BUFOFF option of the ENVIRONMENT attribute. For a full description of the options used for ASCII data sets, see "Consecutive Data Sets" on page 149.

Each character in the ASCII code is represented by a 7-bit

pattern and there are 128 such patterns. The ASCII set includes a substitute character (the SUB control character) that is used to represent EBCDIC characters having no valid ASCII code. The ASCII substitute character is translated to the EBCDIC SUB character, which has the bit pattern 00111111.

The records in a data set must be one of the following:

• Fixed-length

• Variable-length

• Undefined-length

Records can be blocked if required, but only fixed-length and variable-length records are deblocked by the system;

undefined-length records must be deblocked by your program.

Fixed-Length Records

You can specify the following formats for fixed-length records.

F Fixed-length, unblocked FB Fixed-length, blocked

FS Fixed-length, unblocked, standard FBS Fixed-length, blocked, standard

In a data set with fixed-length records, as shown in Figure 39~

all records have the same length. If the records are blocked, each block usually contains an equal number of fixed-l~ngth

records (although a block may be truncated). If the records are unblocked, each record constitutes a block.

Unblocked Records (F-format):

I

Record [IBGI Record

I ...

^IBG ^Record

Blocked Records (FB-format):

---Block---~

Record Record Record IBG Record Figure 39. Fixed-length Records

Because it can base blocking and deblocking on a constant record length, the operating system can process fixed-length records faster than it can variable-length records.

The use of "standard" (FS-format and FBS-format) records further optimizes the sequential processing of a data set on a

direct-access device. A standard format data set must contain fixed-length records and must have no emb~dded empty tracks or short blocks (apart from the last block). With a standard

format data set, the operating system can predict whether the next block of data will be on a new track and, if necessary, can select a new read/write head in anticipation of the transmission of that block. A PL/I program never places embedded short

blocks in a data set with fixed-length records. A data set containing fixed-length records can be processed as a standard data set even if it is not created as such, providing it

contains no embedded short blocks or empty tracks.

Variable-Length Records

You can specify the following formats for variable-length records:

V Variable-length, unblocked VB Variable-length, blocked

VS Variable-length, unblocked, spanned VBS Variable-length, blocked, spanned D Variable-length, unblocked, ASCII DB Variable-length, blocked, ASCII

102 OS PL/I Optimizing Compiler: Programmer's Guide·

V-format:

Record I

VB-format:

VS-format:

Record I ( entire)

VBS-format:

Record I (entire)

V-format permits both variable-length records and

variable-length blocks. The first 4 bytes of each record and of each block contain control information for use by the operating system (including the length in bytes of the record or block).

Because of these control fields, variable-length records cannot be read backward. Illustrations of variable-length records are shown in Fi~ure 40.

IBG

Record 2

IBG

Record 2 (first segment)

Record 3

Spanned record

lBG Record 2

(last segment) lBG

Spanned record Record 2 lBG

(fi rst segment') Record 2

(last se~ment) Record 3 CI: Block control information

C2: R~co~d or segment control information Figure 40., Variable-Length Records

V-format ~ignifies unblocked variable-length records. Each record is treated as a block containing only one record, the first 4 bytes of the block contain block control information, and the next 4 contain record control information.

VB-format signifies blocked variable-length records. Each block contains as many complete records as i t can accommodate. The first 4 bytes of the block contain block control information, and the first 4 bytes of each record contain record control information.

SPANNED RECORDS: A spanned record is a variable-length record in which the length of the record can exceed the size of a

block~ If this occurs, the record is divided into segments and accommodated in two or more consecutive blocks by specifying the

record format ~s either VS or VBS. Segmentation and reassembly are handled by the operating system. The use of spanned records allows you to select a block size, independently of record

length, that will combine optimum use of auxiliary storage with maximu. efficiency of transmission.

VS-format is similar to V-format. Each block contains only one record or segment of a record. The first 4 bytes of the block contain block control information, and the next 4 contain record or segment control information (including an indication of

whether the record is complete or is a first, intermediate, or last segment).

With REGIONAL(3) organization, the use of VS-format removes the limitations on block size imposed by the physical

characteristics of the direct-access device. If the record length exceeds the size of a track, or if there is no room left on the current track for the r~cord, the record will be spanned over one or more tracks.

VBS-format differs from VS-format in that each block contains as many complete records or segments as it can accommodate; each block is, therefore, approximately the same size (although there can be a variation of up to 4 bytes, since each segment must contain at least 1 byte of data).

ASCII RECORDS: For data sets that are recorded in ASCII, use D-format as follows:

• D-format records are similar to V-format, except that the data they contain is recorded in ASCII.

• DB-format records are similar to VB-format, except that the data they contain is recorded in ASCII.

Undefined-Length Records

U-format permits the processing of records that do net cenform to F- and V-formats. The operating system and the compiler treat each block as a record; your program must perform any required blocking or deblocking.

DATA SET ORGANIZATION

The data management routines of the operating system can handle a number' of types of data sets, which differ in the way data is stored within them and in the permitted means of access to the data. The three main types of non-VSAM data sets and the corresponding keywords describing their PL/I organization² are as follows:

Type of Data Set Sequential

Indexed sequential Direct

PL/I organization CONSECUTIVE

INDEXED REGIONAL

The compiler recognizes a fourth type, teleprocessing, by the file attribute TRANSIENT.

A fifth type, partitioned, has no corresponding Pl/I

organization. VSAM also provides a number of alternatives.

In a seguential (or CONSECUTIVE) data set, records are placed in physical sequence. Given one record, the location of the next record is determined by its physical position in the data set.

Sequential organization is used for all magnetic tapes, and may be selected for direct-access devices. Paper tape, punched cards, terminal, and printed output are sequentially organized.

2 Do not confuse the terms "sequential" and "direct" with the Pl/I file attributes SEQUENTIAL and DIRECT. The attributes refer to how the file is to be processed, and not to the way the corresponding data set is organized.

104 as Pl/I Optimizing Compiler: Programmer's Guide

LABELS

An indexed sequential (or INDEXED) data set must reside on a direct-access volume. An index or set of indexes maintained by the operating system gives the location of certain principal records. This permits direct retrieval, replacement, addition, and deletion of records, as well as sequential processing.

A direct (or REGIONAL) data set must reside on a direct-access volume. The records within the data set can be organized in three ways: REGIONAL(l), REGIONAL(2), and REGIONAl(S); in each case, the data set is divided into regions, each of which

contains one or more records. A key that specifies the region number and, for REGIONAl(2) and REGIONAl(S), identifies the record, permits direct-access to any record; sequential processing is also possible.

A teleprocessing data set (associated with a TRANSIENT file in a PL/I program) must reside in storage. Records are placed in physical sequence.

In a partitioned data set, independent groups of sequentially organized data, each called a member, reside in a direct-access data set. The data set includes a directory that lists the location of each member. Partitioned data sets are often called libraries. The compiler includes no special facilities for creating and accessing partitioned data sets. Each member can be processed as a CONSECUTIVE data set by a Pl/I program. The use of partitioned data sets as libraries is described under Chapter 8, "libraries of Data Sets" on page 264.

The operating system uses labels to identify magnetic-tape and direct-access volumes, and to store data set attributes (for example, record length and block size). The attribute

information must originally come from a DD statement or from your program. Once the label is written you need not specify the information again.

Magnetic-tape volumes can have IBM standard or nonstandard labels, or they can be unlabeled. IBM standard labels have two parts: the initial volume label, and header and trailer labels.

The initial volume label identifies a volume and its owner; the header and trailer labels precede and follow each data set on the volume. Header labels contain system information,

device-dependent information (for example, recording technique), and data-set characteristics. Trailer labels are almost

identical with header labels, and are used when magnetic tape is read backward.

Direct-access volumes have IBM standard labels. Each volume is identified by a volume label, which is stored on the volume.

This label contains a volume serial number and the address of a volume table of contents (VTOC). The table of contents, in turn, contains a label, termed a data set control block (DSCB), for each data set stored on the volume.

Dans le document OS PL/I Optimizing Compiler: (Page 125-130)