Software Exception Handling - Unmasked Exceptions

Most of the masked exceptions in Streaming SIMD Extensions are handled by hardware without penalty except denormals and underflow. But these can also be handled without penalty if flush-to-zero mode is used.

Your application must ensure that the operating system supports unmasked exceptions before unmasking any of the exceptions in the MXCSR (see

“Checking for Processor Support of Streaming SIMD Extensions and MMX™ Technology” in Chapter 3).

If the processor detects a condition for an unmasked Streaming SIMD Extensions application exception, a software handler is invoked

immediately at the end of the excepting instruction. The handler is invoked through the Streaming SIMD Extensions exception interrupt (vector 19), irrespective of the state of the CR0.NE flag. If an exception is unmasked, but Streaming SIMD Extension unmasked exceptions are not enabled

(CR4.OSXMMEXCPT = 0), an invalid opcode fault is generated. However, the corresponding exception bit will still be set in the MXCSR, as it would be if CR4.OSXMMEXCPT = 1, since the invalid opcode handler/user needs to determine the cause of the exception.

A typical action of the exception handler is to store x87 floating-point and Streaming SIMD Extensions state information in memory (with the fxsave/fxrstor instructions) so that it can evaluate the exception and formulate an appropriate response. Other typical exception handler actions can include:

•

Examining stored x87 floating-point and Streaming SIMD Extensions

5

Intel Architecture Optimization Reference Manual

•

Taking action to correct the condition that caused the error.

•

Clearing the exception bits in the x87 floating-point status word (FSW) or the Streaming SIMD Extensions control register (^MXCSR).

•

Returning to the interrupted program and resuming normal execution.

In lieu of writing recovery procedures, the exception handler can do the following:

•

Increment in software an exception counter for later display or printing.

•

Print or display diagnostic information (such as the Streaming SIMD Extensions register state).

•

Halt further program execution.

When an unmasked exception occurs, the processor will not alter the contents of the source register operands prior to invoking the unmasked handler. Similarly, the integer ^EFLAGS will also not be modified if an unmasked exception occurs while executing the comiss or ucomiss instructions. Exception flags will be updated according to the following rules:

•

Exception flag updates are generated by a logical-OR of exception conditions for all sub-operand computations, where the OR is done independently for each type of exception. For packed computations, this means four suboperands; for scalar computations this means 1 sub-operand (the lowest one).

•

In the case of only masked exception conditions, all flags will be updated.

•

In the case of an unmasked precomputation type of exception condition (that is, denormal input), all flags relating to all precomputation conditions (masked or unmasked) will be updated, and no subsequent computation is performed (that is, no post-computation condition can occur if there is an unmasked pre-computation condition).

•

In the case of an unmasked post-computation exception condition, all flags relating to all post-computation conditions (masked or unmasked) will be updated; all precomputation conditions, which must be masked, will also be reported.

Optimizing Floating-point Applications

5 Interaction with x87 Numeric Exceptions

The Streaming SIMD Extensions control/status register was separated from its x87 floating-point counterparts to allow for maximum flexibility.

Consequently, the Streaming SIMD Extensions architecture is independent of the x87 floating-point architecture, but has the following implications for x87 floating-point applications that call Streaming SIMD

Extensions-enabled libraries:

•

The x87 floating-point rounding mode specified in ^FCW will not apply to calls in a Streaming SIMD Extensions library, unless the rounding control in MXCSR is explicitly set to the same mode.

•

x87 floating-point exception observability may not apply to a Streaming SIMD Extensions library.

•

An application that expects to catch x87 floating-point exceptions that occur in an x87 floating-point library will not be notified if an

exception occurs in a corresponding Streaming SIMD Extensions library, unless the exception masks, enabled in FCW, have also been enabled in MXCSR.

•

An application will not be able to unmask exceptions after returning from a Streaming SIMD Extensions library call to detect if an error occurred. A Streaming SIMD Extensions exception flag that was set when the corresponding exception was unmasked will not generate a fault; only the next occurrence of that exception will generate an unmasked fault.

NOTE. In certain cases, if any numerical exception is unmasked, the retirement rate might be affected and reduced. This might happen when Streaming SIMD Extensions code is scheduled without large impact of the dependency and with the intention to have maximum execution rate.

Usually such code consists of balanced operations such as packed floating-point multiply, add and load or store (or a mix that includes balanced 2 arithmetic operation/load or store with MMX technology or integer instructions).

5

Intel Architecture Optimization Reference Manual

•

An application which checks FSW to determine if any masked exception flags were set during an x87 floating-point library call will also need to check MXCSR in order to observe a similar occurrence of a masked exception within a Streaming SIMD Extensions library.

Use ofCVTTPS2PI/CVTTSS2SIInstructions

The cvttps2pi and cvttss2si instructions encode the truncate/chop rounding mode implicitly in the instruction, thereby taking precedence over the rounding mode specified in the MXCSR register. This behavior can eliminate the need to change the rounding mode from round-nearest, to truncate/chop, and then back to round-nearest to resume computation.

Frequent changes to the MXCSR register should be avoided since there is a penalty associated with writing this register; typically, through the use of the cvttps2pi and cvttss2si instructions, the rounding control in MXCSR can be always be set to round-nearest.

Dans le document Intel ArchitectureOptimization (Page 175-178)