Guide to FP Instructions - Floating-Point Support

Floating-Point Support

7.9 Guide to FP Instructions

This section gives a summary of FP instructions by function. FP instructions are listed in mnemonic order in Table 8.4.

We’ve divided the instructions up into the following categories:

• Load/store: Moving data directly between FP registers and memory.

• Move between registers: Data movement between FP and general-purpose registers.

174 7.9. Guide to FP Instructions

• Three-operand operations: The regular add, multiply, etc.

• Multiply-add operations: Fancy (and distinctly non-RISC) high-performance instructions, introduced with the MIPS IV ISA. (If you think this is com-plicated, just wait for MIPS IV. . . )

• Sign changing: Simple operations, separated out because their dumb implementation means no IEEE exceptions.

• Conversion operations: Conversion between single, double, and integer values.

• Conditional branch and test instructions: Where the FP unit meets the integer pipeline again.

7.9.1 Load/Store

These operations load or store 32 or 64 bits of memory in or out of an FP register.¹ On loads and stores, note the following points:

• The data is unconverted and uninspected, so no exception will occur even if it does not represent a valid FP value.

• These operations can specify the odd-numbered FP registers; on the 32-bit CPUs this is required to load the second half of 64-bit (double-precision) floating-point values. For the 32-bit CPUs, these data move-ments are the only instructions that ever access odd-numbered regis-ters.

• The load operation has a delay of one clock cycle, and (like loading to an integer register) this is not interlocked before MIPS III. The compiler and/or assembler will usually take care of this for you, but it is invalid for an FP load to be immediately followed by an instruction using the loaded value.

• When writing assembler, the synthetic instructions are preferred; they can be used for all CPUs, and the assembler will use multiple instruc-tions for CPUs that don’t implement the machine instruction. You can feed them any addressing mode that the assembler can understand (as described in Section 9.4 below).

• The address for an FP load/store operation must be aligned to the size of the object being loaded — on a 4-byte boundary for single-precision or word values or an 8-byte boundary for double-precision or 64-bit integer type.

1The 64-bit loads appear only from the MIPS III ISA and R4000 CPU forward.

Machine instructions (disp is signed 16 bit):

lwcl fd, disp(rs) fd = *(rs + disp);

swcl fs, disp(rs) *(rs + disp) = fd;

From MIPS III ISA onward we get 64-bit loads/stores:

ldcl fd, disp(rs) fd = (double)*(rs + disp);

sdcl fd, disp(rs) *(rs + disp) = (double)fd;

From MIPS IV ISA onward we get indexed addressing, with two registers:

lwxcl fd, ri(rs) fd = *(rs + ri);

swxcl fd, ri(rs) *(rs + ri) = fd;

ldxcl fd, ri(rs) fd = (double)*(rs + ri);

sdxcl fd, ri(rs) *(rs + ri) = (double)fd;

But in fact you don’t have to remember any of these when you’re writing assembler. Instead, “addr” can be any address mode the assembler under-stands:

1.d fd, addr fd = (double)*addr;

l.s fd, addr fd = (float)*addr;

s.d fs, addr (double)*addr = fs;

s.s fd, addr (float)*addr = fs;

The assembler will generate the appropriate instructions, including allow-ing a choice of valid address modes. Double-predsion loads on a 32-bit CPU will assemble to two load instructions.

7.9.2 Move between Registers

No data conversion is done here (bit patterns are copied as is) and no excep-tion results from any value. These instrucexcep-tions can specify the odd-numbered FP registers:

Between integer and FP registers:

mtcl rs, fd fd = rs; /* 32b uninterpreted */

mfcl rd, fs rd = fs;

dmtcl rs, fd fd = (long long) rs; /* 64 bits */

dmfcl rs, fd rs = (long long) fd;

Between FP registers:

mov.d fd, fs fd = fs; /* move 64b between register pairs */

mov.s fd, fs fd = fs; /* 32b between registers */

176 7.9. Guide to FP Instructions

Conditional moves (added in MIPS IV) — the .s versions are omitted to save space:

movt.d fd, fg, cc if(fpcondition(cc)) fd = fs;

movf.d fd, fs, cc if(!fpcondition(cc)) fd = fs;

movz.d fd, fs, rt if(rt == 0) fd = fs; /* rt is an integer register */

movn.d fd, fs, rt if(rt != 0) fd = fs;

The FP condition code calledfpcondition(cc) is a hard-to-avoid forward reference; you’ll see more in Section 7.9.7. If you want to know why condi-tional move instructions are useful, see Section 8.4.3.

7.9.3 Three-Operand Arithmetic Operations

Note the following points:

• All arithmetic operations can cause any IEEE exception type and may result in an unimplemented trap if the hardware is not happy with the operands.

• All these instructions come in single-precision (32-bit, C float) and double-precision (64-bit, C double) versions; the instructions are dis-tinguished by “.s” or “.d” on the op-code. We’ll only show the double-precision version. Note that you can’t mix formats; both source values and the result will all be either single or double. To mix singles and doubles you need to use explicit conversion operations.

In all ISA versions:

add.d fd, fs1, fs2 fd = fs1 + fs2;

div.d fd, fs1, fs2 fd = fs1 / fs2;

mul.d fd, fs1, fs2 fd = fs1 * fs2;

sub.d fd, fs1, fs2 fd = fs1 - fs2;

Added in MIPS II:

eqrt.d fd, fs fd = squarerootof(fs);

Added in MIPS IV for speed, and not IEEE accurate:

recip.d fd, fs fd = 1/fs;

rsqrt.d fd, fs fd = 1/(squarerootof(fs));

7.9.4 Multiply-Add Operations

These appeared in the MIPS IV version of the ISA, in response to Silicon Graphics’s interest in achieving supercomputer-like performance in very high-end graphics systems (related to the 1995 SGI acquisition of Cray Research, Inc.). IBM’s PowerPC chips seemed to get lots of FP performance out of their multiply-add, too. Although it’s against RISC principles to have a single in-struction doing two jobs, a combined multiply-add is widely used in common repetitive FP operations (typically the manipulation of matrices or vectors).

Moreover, it saves a significant amount of time by avoiding the interme-diate rounding and renormalization step that IEEE mandates when a result gets written back into a register.

Multiply-add comes in various forms, all of which take three register oper-ands and an independent result register:

madd.d fd, fs1, fs2, fs3 fd = fs2 * fs3 + fs1;

msub.d fd, fs1, fs2, fs3 fd = fs2 * fs3 - fs1;

nmadd.d fd, fs1, fs2, fs3 fd = -(fs2 * fs3 + fs1);

nmsub.d fd, fs1, fs2, fs3 fd = -(fs2 * fs3 - fs1);

IEEE754 does not rule specifically for multiply-add operations, but to con-form to the standard the result produced should be identical to that coming out of a two-instruction multiply-then-add sequence. Since every FP opera-tion may involve some rounding, this means that IEEE754 mandates some-what poorer precision for multiply-add than could be achieved. The MIPS R8000 supercomputer chip set falls into this trap, and its multiply-add in-structions do not meet (but exceed) the accuracy prescribed by IEEE. The R10000 and all subsequent implementations are IEEE compatible.

7.9.5 Unary (Sign-Changing) Operations

Although nominally arithmetic functions, these operations only change the sign bit and so can’t produce most IEEE exceptions. They can produce an invalid trap if fed with a signalling NaN value. They are as follows:

abs.d fd, fs fd = abs(fs);

neg.d fd, fs fd = -fs;

7.9.6 Conversion Operations

Note that “convert from single to double” is written “cvt.d.s” — and as usual the destination register is specified first. Conversion operators work between data in the FP registers: When converting data from CPU integer registers,

178 7.9. Guide to FP Instructions

the move from FP to CPU registers must be coded separately from the con-version operation. Concon-version operations can result in any IEEE exception that makes sense in the context.

Originally, all this was done by the one family of instructions

cvt.x.y fd, fs

where x and y specify the destination and source format, respectively, as one of the following:

s C float, IEEE single, 32-bit floating point d C double, IEEE double, 64-bit floating point w C int, “word”, 32-bit integer

l C long, “long”, 64-bit integer (available in MIPS III and higher CPUs only)

The instructions are as follows:

cvt.s.d fd, fs /* double fs -> float, leave in fd */

cvt.w.s fd, fs /* float fs -> int, leave in fd */

cvt.d.l fd, fs /* long long fs -> double, leave infd */

There’s more than one reasonable way of converting from floating-point to integer formats, dnd the result depends on the current rounding mode (as set up in the FCR31register, described in Section 7.7). But FP calculations quite often want to round to the integer explicitly (for example, the ceiling operator rounds upward), and it’s a nuisance trying to generate code to modify and restore FCR31. So at MIPS II, explicit rounding conversions were introduced.

Conversions to integer with explicit rounding:

rouad.x.y fd, fs /* round to nearest */

trunc.x.y fd, fs /* round toward zero */

ceil.x.y fd, fs /* round up */

floor.x.y fd, fs /* round down */

These instructions are only valid with x representing an integer format.

Dans le document 0.1 Style and Limits (Page 195-200)