OPERATIONS ON FULL VECTORS

The following examples illustrate oper-ations on full vectors, where both zero and nonzero elements are represented in storage. Vectors in storage are accessed by sequential addressing.

The first three examples use three dif-ferent methods of controlling the sec-tioning loop.

Contiguous Vectors

and B in Two contiguous vectors

A

storage are added, and the stored in contiguous vector number of elements in each is by N. All vectors are in floating-point format.

result is C. The specified the long

*

^C⁼^A⁺^B

*

_L _GO,N

LA G1,A LA G2,B LA G3,C LP VLVCU GO

Vector length to GRO Address of

A

to GR1 Address of B to GR2 Address of C to GR3 Load VCT, update GRO Load section of A Add section of B Store section in C Test condition code set by VLVCU, branch if not last section

*

VLD VO,G1 VAD VO,VO,G2 VSTD VO,G3 BC 2,LP

Assuming, for purposes of illustration, a vector-section size of 8 and a vector length of 20, the above program would process three sections in turn (two full sections of eight elements and one partial section of four elements) before ending the loop. One section of A and one section of B are added in vector-regis ter pair

°

^{and 1.} The result is stored in a section of C, as illustrated below:

Storage Address C

Stored in

I

Loop

elementsl~1

^I

C+64 ---+ 2

Iri

⁸ ^elements

I

8 elements Vector

regis-I 2J

ters: 0, I

C+128 ~ 3 Section

4 elements size: 8 C+ 160 ~ ' - - - '

Vector C Length: 20

Elements: 8 bytes

Since all vectors are stored contig-uously, the stride for the three vector

instructions VLD, VAD, and VSTD is set to one by specifying a value of zero in the RT2 subfield. This may be done in

the assembler language either by placing a zero inside the parentheses of the stride subfield, as in:

Mnemonic

or by omitting the subfield, including the parentheses, altogether:

Mnemonic

Each of these instructions automatically updates the storage address in the

des-ignated general register to the value that will be needed for the next time, if any, around the loop.

The BRANCH ON CONDITION (BC) instruction tests the condition code set by VLVCU, because none of the intervening instruc-tions change the condition code. If an instruction setting the condition code had intervened, the instruction

"LTR GO ,GO" inserted before the BC instruction would test the contents of GRO; BC would test for condition code 2 in either case.

The following figure shows the condition-code setting (CC), the vector count (VCT) , and the contents of the general registers at the start, before executing the first VLVCU instruction, and at the end of each loop thereafter.

Loop CC VCT GRO GRI GR2 GR3

Start

- -

²⁰ ^A ^B ^C

End 1 2 8 12 A+64 B+64 C+64 End 2 2 8 4 A+128 B+128 C+128 End 3 3 4

°

A+160 B+160 C+160 Vectors with Stride

This example modifies the previous example in four ways. All vector ele-ments are in the short floating-point format. The result of the addition is returned to the storage location of vector B. Vector B is assumed to be stored with a stride T. Finally, a BC instruction which tests for the end of the loop is placed immediately after the VLVCU instruction, and the loop is closed with an unconditional branch.

This method, which could be used if additional instructions were to change the condition code later in the loop, allows the loop to be bypassed when the ini tial vector count is zero. (Note, however, that the previous loop control also works with a vector count of zero, because no elements would be processed if vector instructions were executed with a zero vector count.)

*

_~': ^B=A+B

L GO,N Vector length to GRO LA Gl,A Address of A to GRI LA G2,B Address of B to GR2 LR G3,G2 Copy address in GR3 L G4,T Stride for B to GR4 LP VLVCU GO Load VCT, update GRO

BC 12,NXT Exit loop if VCT=O VLE VO,Gl Load section of A VAE VO,VO,G2(G4)

-;': Add section of B

VSTE VO,G3(G4) Return section to B BC 15,LP Branch to loop start NXT Next instruction

Two registers, GR2 and GR3, are used to specify the current address of B, so that the two instructions VAE and VSTE in the sectioning loop will refer to the same section. Each of the two instruc-tions updates its separate copy of the address. (If a vector in storage is referred to more than twice within a sectioning loop, the address could be copied inside the loop for each use except the las t , so as to reduce the number of general registers needed.)

Vector and Scalar Operands

This example illustrates the use of both vector and scalar operands. It also shows how the three-operand arithmetic vector instructions can sometimes be used to avoid a separate vector-load instruction. A third loop-control method is used here.

A and B are vectors of length N, and S is a scalar. All are in the long floating-point format.

*

^B⁼^A

*

^(S-A) loop-control instructions, one before entry into the loop and one at the end.

Note that the QST-format arithmetic instruction (VSDS) saves a separate load instruction at the expense of having to access storage twice for the same vector section A. Depending on the model, a separate load instruction followed by QV-format arithmetic instructions may be more efficient in some circumstances, particularly when the stride is greater comment applies to the corresponding QV-format instructions.)

Sum of Products

The use of MULTIPLY AND ACCUMULATE and related instructions is illustrated by computing the inner product of a row

Then the sectioning loop accumulates partial sums: The VLD instruction loads a section of row A (with stride) into VR2. The VMCD instruction multiplies the elements of row A in VR2 by elements the model, because the instructions VZPSD, VLVCU, VMCD, and VSPSD take care of these dependencies automatically.

Compare and Swap Vector Elements Two vectors A and B, both of length N, are to be compared and their elements swapped so that vector A will have the smaller element of each pair and vector B the larger. The elements are 32-bit signed binary integers and stored con-tiguously.

L

GO,N LA Gl,A LR G2,Gl LA G3,B LR G4,G3 LP VLVCU GO

VL VO,Gl VL Vl,G3 VCR 2,VO,Vl VSTM VO,G4 VSTM Vl,G2 BC 2,LP

Vector length to GRO Address of

A

to GRI Copy address in GR2 Address of B to GR3 Copy address in GR4 Load VCT, update GRO Section of A to VRO Section of B to VRI Check where A>B Store greater in B

Store lesser in

A

Branch back if GRO>O

Dans le document IBM System/370 (Page 83-86)