• Aucun résultat trouvé

Worst Cases for Correct Rounding of the Elementary Functions in Double Precision

N/A
N/A
Protected

Academic year: 2021

Partager "Worst Cases for Correct Rounding of the Elementary Functions in Double Precision"

Copied!
23
0
0

Texte intégral

(1)Worst Cases for Correct Rounding of the Elementary Functions in Double Precision Vincent Lefèvre, Jean-Michel Muller. To cite this version: Vincent Lefèvre, Jean-Michel Muller. Worst Cases for Correct Rounding of the Elementary Functions in Double Precision. [Research Report] RR-4044, LIP RR-2000-35, INRIA,LIP. 2000. �inria-00072594�. HAL Id: inria-00072594 https://hal.inria.fr/inria-00072594 Submitted on 24 May 2006. HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés..

(2) INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE. Worst Cases for Correct Rounding of the Elementary Functions in Double Precision Vincent Lef`evre , Jean-Michel Muller. No 4044 Novembre 2000. ISSN 0249-6399. ISRN INRIA/RR--4044--FR+ENG. ` THEME 2. apport de recherche.

(3)

(4) Worst Cases for Correct Rounding of the Elementary Functions in Double Precision . Vincent Lefèvre , Jean-Michel Muller Thème 2 — Génie logiciel et calcul symbolique Projet Arénaire Rapport de recherche n˚4044 — Novembre 2000 — 19 pages. Abstract: We give here the results of a four-year search for the worst cases for correct rounding of the major elementary functions in double precision. These results allow the design of reasonably fast routines that will compute these functions with correct rounding, at least in some interval, for any of the four rounding modes specified by the IEEE-754 standard. They will also allow to easily test libraries that are claimed to provide correctly rounded functions. Key-words: Elementary functions, Computer arithmetic, Table Maker’s Dilemma, Correct rounding, Floating-Point Arithmetic.. (Résumé : tsvp). . . INRIA, Projet POLKA, UR INRIA Lorraine CNRS, Projet CNRS/ENS Lyon/INRIA ARENAIRE, Ecole Normale Supérieure de Lyon. Unit´e de recherche INRIA Rhˆone-Alpes 655, avenue de l’Europe, 38330 MONTBONNOT ST MARTIN (France) T´el´ephone : 04 76 61 52 00 - International: +33 4 76 61 52 00 T´el´ecopie : 04 76 61 52 52 - International: +33 4 76 61 52 52.

(5) Pires Cas pour l’Arrondi Correct de Fonctions Élémentaires en Double Précision Résumé : Nous donnons dans ce rapport les résultats de quatre ans de recherche des pires cas pour l’arrondi correct des principales fonctions élémentaires en double précision. Ces résultats permettent de construire des programmes raisonnablement rapides calculant ces fonctions avec arrondi correct – au moins dans un domaine donné – pour chacun des quatre modes d’arrondi spécifiés par la norme IEEE-754. Ils permettent également de tester des bibliothèques censées fournir l’arrondi correct de ces fonctions. Mots-clé : Fonctions élémentaires, arithmétique des ordinateurs, Dilemme du fabricant de tables, Arrondi correct, Virgule flottante..

(6) Worst Cases for Correct Rounding of the Elementary Functions. 3. 1 Introduction In general, the result of an arithmetic operation on two floating-point numbers is not exactly representable in the same floating-point format: it must be rounded. In a floating-point system that follows the IEEE 754 standard [2, 3, 6], the user can choose an active rounding  mode from: rounding towards  , rounding towards  , rounding towards  and rounding to the nearest (with a special convention if  is exactly between two machine numbers). The IEEE-754 standard requires that the system should behave as if the result of an arith metic operation ( , ,  ,  ) were first computed exactly, with “infinite precision”, and then rounded accordingly to the active rounding mode. Operations that satisfy this property are called “exactly rounded” or “correctly rounded”. There is a similar requirement for the square root. Unfortunately, there is no such requirement for the elementary functions 1 , probably because it has been believed for many years that exact rounding of the elementary functions would be much too expensive for double precision (for single precision, since checking  . input numbers requires a few days only, there already exist libraries that provide correct rounding. See for instance [13, 14]). Requiring correctly rounded results (that is, the “best possible” results) in some standard would not only improve the accuracy of computations: it would help to make numerical software more portable. Moreover, as noticed by Agarwal et al. [1], correct rounding facilitates the preservation of useful mathematical properties such as monotonicity, symmetry and important identities. See [12] for more details. Before going further, let us start with definitions. We call Infinite mantissa of a nonzero real number  the number.  

(7)    . In other words, the infinite mantissa of  is the real number  such that  !"$#% and '&(  )+* , where , is an integer. If  is a floating-point number, then its infinite mantissa coincides with its floating-point mantissa. If - and . belong to the same “binade” (they have the same sign and satisfy 0/132 -426572 .829:;/0<>= , where ? is an integer), we call their Mantissa distance the distance between their infinite mantissas (that is, 2 -@A.82B+ / ). Let C be an elementary function and  a floating-point number. Unless  is a very special case – e.g., DFE8G HI0J or KMLON4HPQJ –, RS&TCUHVWJ cannot be exactly represented. The only thing we can do is to compute an approximation R to R . If we wish to provide correctly rounded functions, the problem is to know what the accuracy of this approximation should be to make sure that rounding R is equivalent to rounding R . 1 By elementary functions we mean the radix X , Y and and hyperbolic functions.. RR n˚4044. ZI[. logarithms and exponentials, and the trigonometric.

(8) 4. Vincent Lefèvre and Jean-Michel Muller. In other words, from R and the known bounds on the approximation, the only information we have is that R belongs to some interval . Let us call the rounding function. Let us call a “breakpoint” a value where the rounding changes (that is, if = and are # # real numbers satisfying then "H J # "H J ). For “directed” rounding modes = =. (i.e., rounding upwards, downwards, or towards  ), the breakpoints are the floating-point numbers. For rounding to the nearest mode, the breakpoints are the exact middle of two consecutive floating-point numbers. If contains a breakpoint, then we cannot provide "HVR J : the computation must be carried again with a larger accuracy. There are two ways of solving that problem:.   . .    . . . . .  . iteratively increase the accuracy of the approximation, until interval no longer contains a breakpoint2 . The problem is that it is difficult to predict how many iterations will be necessary; compute, in advance and once for all, the smallest relative nonzero distance 3 between the image of a floating-point number and breakpoint. This will allow to deduce the relative accuracy with which CUHVWJ must be approximated to make sure that rounding the approximation is equivalent to rounding the exact result.. 

(9) . The first solution was suggested by Ziv [15]. It has been implemented in a library, called ml4j, available through the internet 4 . As a matter of fact, the last iteration uses bits of precision. There is no formal proof that this suffices (the results presented in this paper actually give the proof for the functions and domains considered here!), but probabilistic arguments[4, 5, 12] may be used to show that requiring a larger precision is extremely unlikely. We decided to implement the second solution, since the only way to implement the first one is to overestimate the accuracy that is needed in the worst cases. The basic principle of our algorithm for searching the worst cases was outlined in [10, 11]. We now present properties that have allowed to fasten the search, as well as the results obtained after having run our algorithms for years on several workstations, and interesting properties that can be obtained from our results..    ! "$#% '# &(,)"$#' * '+ -.%/0213 %+ + .   -.. This is not possible if is equal to a breakpoint. However one can show that [ is the only input value for which , , , and Y have a finite radix-2 representation – and Z is the only input value for which the breakpoints do have finite representations –, has a finite representation. Concerning X and ZI[ , they have a finite representation if and only if is an integer. Also, (resp. ) have a finite representation if and only if is an integer power of X (resp. ZI[ ). All  these cases are straightforwardly handled separately, so we do not discuss them in the sequel of the paper. 3 In fact, the mantissa distance. 4 http://www.alphaWorks.ibm.com/tech/mathlibrary4java. 2. -%/ . INRIA.

(10) Worst Cases for Correct Rounding of the Elementary Functions. 5. The results we have obtained are “worst cases for the Table Maker’s Dilemma”, that is, floating point numbers whose image is closest (for the “mantissa distance”) to a floatingpoint number (i.e., a breakpoint for a directed rounding mode) or to the exact middle of two consecutive floating-point numbers (i.e., a breakpoint for rounding to the nearest). For instance, the worst case for the natural logarithm in the full double precision range is reached for.        

(11)      

(12)              

(13)          !" "   " # " " $$"%& '() in decimal. whose logarithm is. ./ bits 0 *,+-    

(14)    

(15)  

(16)      13 2     65 5357  4  2         .  4:80 5 5359      1    ,;  zeroes. this worst case is a “difficult case” in a directed rounding mode, since it is very near a double precision floating-point number. The two worst cases for radix-2 exponentials in the full IEEE-754 double precision range are reached for. =<>?, 

(17)               

(18)     .      

(19) >@A'  #  $

(20)   B "! # ""! "  #CD  @A in decimal. and. ' E ;    

(21)  

(22)    

(23)   

(24)  

(25)     

(26)    

(27)  8  @< (    

(28)  $"B# # "B "!$ "    " !  B"  in decimal. whose radix-2 exponentials are. ./ bits 0  FGH         13 2    

(29) 

(30)   H5 557 4 2            4:0  85 5359     1    ;, ( zeroes./ bits F I> 0         13 2    655 5J 4 82        6435 0 55K       1   ,, .L ones. The former is a difficult case for directed rounding modes, whereas the latter is a difficult case for rounding to the nearest mode (since it is very close to the exact middle of two consecutive floating-point numbers).. RR n˚4044.

(31) 6. Vincent Lefèvre and Jean-Michel Muller. 2 Our algorithms for finding the worst cases 2.1 Basic principles The basic principles have been given in [11], so we only quickly describe them and focus on new aspects. Assume we wish to look for the worst cases for function C in double precision. Let us call test number a number that is representable in floating-point with 54 bits of mantissa (a test number is either a double precision number or a number that lies exactly halfway between two consecutive double precision numbers). The test numbers are the values that are breakpoints for one of the rounding modes. Our problem of finding worst cases now reduces to the problem of finding double precision numbers  such that CUHV4J is closest (for the mantissa distance) to a test number. We proceed in two steps: we first use a fast “filtering” method that eliminates all points whose distance to the closest breakpoint is above a given threshold. The value of the threshold is chosen so that this filtering method does not require highly accurate computations, and so that the number of values that remain to be checked after the filtering is so small that an accurate computation of the value of the function at each remaining value is possible. Details on the choice of parameters are given in [9]. In [11], we suggested to perform the filtering as follows:.  . first, the domain where we look for worst cases is split into “large subdomains” where all input values have the same exponent;. . each large subdomain is split into “small subdomains” that are small enough so that in each of these subdomains, within the accuracy of the filtering, the function can be approximated by a linear function. Hence in each small subdomain, our problem now is to find a point on a grid that is closest to a straight line. We solve a slightly different problem: given a “threshold” we just try to know if there can be a point of the grid that is at distance less than from the straight line. The value of is chosen so that for one given small subdomain this event is very unlikely. using a variant to the Euclidean algorithm suggested by V. Lefèvre [8], we solve that problem. If we find that there can be a point of the grid at distance less than from the straight line, we check all points of the small subdomain.. 2.2 Optimization:. . and. . simultaneously. Now, let us present a useful optimization of that method. Instead of finding double precision numbers  such that CUHVWJ is closest (for the mantissa distance) to a test number, we solve a slightly different problem: we look for test numbers  such that CUHVWJ is closest to a test. INRIA.

(32) Worst Cases for Correct Rounding of the Elementary Functions. 7. number. This allows to compute worst cases for C and for its inverse C = in one pass only (since the image CUHP- J of a breakpoint - is very near a breakpoint . if and only if C = H . J is very near - ). One could object that by checking the images of test numbers 5 instead of checking the double precision numbers only, we double the number of points that are examined. So getting in one pass the results for two functions ( C and C = ) seems to be a no-win no-loss operation. This is not quite true, since there are sometimes much less values to check for one of the two functions than for the other one.  with input domain &   Consider as an example the radix- exponential and logarithm, +5  for  (which corresponds to input domain %& 0B+ 5  for DFE8G HVR J .). The two. following strategies would lead to the same final result: they would give the worst cases for    in and for DFE8G HVR J in  .. . 1. check   for every test number  in ; 2. check DFE8G. HVR J for every test number R in  .. If we use the first strategy, we need to check all test numbers of exponent between 6 . and  . Hence we have to check    

(33)  numbers. If we use the second strategy, we need to check all positive test numbers of exponent equal to  or  , that is, )1

(34)  numbers. The second strategy is approximately  times faster than the first one.  If we separately check all double precision numbers in and all double precision num )S

(35) . The second strategy is approximately  times bers in  , we check   1

(36) . faster than this last method. Hence,  in the considered domain, it is much better to check DFE8G HVR J for every test number R in 0B+ 5  . In other domains, the converse holds: since the exponential of a large number is an overflow, when we want to check both functions in the domain defined by   (for   ) or R  (for DFE8G HVR J ), we only have to consider   values of the exponent. (hence  @ 

(37)  test numbers) if we check   for every test number in the domain, whereas we would have to consider  8 values of the exponent (hence  8  

(38)  test numbers) if we decided to check DFE8G HV4J for the test numbers in the corresponding domain.. The decision whether it is better to base our search for worst cases on the examination   of C in a given domain or C = in the corresponding domain  & CUH J can be helped by examining the value of HVWJ & 2 S C  HVWJ B+CUHV4J 2 in the considered domain. If  HV4J is. . . . 5. At the end of the test, we will suppress the values for which the input number is not a double precision number and the output number is close to a test number that is not a double precision number. These values (statistically, Z of the obtained values) do not correspond to a worst case for any of the rounding modes and ). any function ( or 6 For smaller numbers, there is no longer any problem of implementation: their radix-2 exponential is Z or Z SZ PZ X or Z SZ PZ depending on their sign and the rounding mode..   0  ! " #*   $  . . RR n˚4044. &%' " #*. .

(39) 8. Vincent Lefèvre and Jean-Michel Muller. Table 1: Some results for small values in double precision, assuming rounding to the nearest. These results make finding worst cases useless for negative exponents of large absolute value. This function  . can be replaced by. H J5   H J5   K LON H J 5

(40)  K LONH J ;EK H J   8N>H J 5

(41)  8N9H J.   . when. # 

(42)  2 2  

(43)  2 2    2 2      .  2 2   . . . much larger than  , then will contain less test numbers than  , so it will be preferable to  check C in . If it is much less than  , it will be preferable to check C = in  . When   HV4J is close to  , a more thorough examination is necessary. In all cases, another important point is which of the two functions is simpler to approximate.. 2.3 Optimization: special input values For most functions, it is not necessary to perform tests for the very small arguments (i.e., arguments whose exponent is negative and has large absolute value). As an example, consider the exponential of a very small positive number , on a floating-point format with ? -bit mantissas, assuming rounding to nearest. If #( / then (since is a ? -bit number),   / A  / . Hence, .  . .  H  / A  / J . . . H  / A  / J. . # . .  / . therefore  H J is less than  HI0B+J6? HI0J . Thus, the correctly rounded value of  H J is  . A similar reasoning can be done for other functions and rounding modes. Some results are given in Table 1.. 2.4 Normal and denormal numbers Our algorithm for finding worst cases assumes that input and output numbers are normalized floating-point values. Hence, we have to separately handle the case of input denormal numbers, and to check whether there exist normalized floating-point numbers  such that CUHV4J is so small that it should be represented by a denormal number.. INRIA.

(44) Worst Cases for Correct Rounding of the Elementary Functions. 2.4.1. 9. Can the output value be a denormal number?. A method based on the continued fraction theory, and originally designed for finding the worst cases for range reduction was suggested by Kahan [7]. Using this method, one can find the normalized floating point number that is closest to an integer multiple of B+ different from zero. This number is:.

(45)    )  I A&

(46)  +       8    

(47) 8     in single precision;  &

(48)   

(49)   8       in double precision. Therefore, the numbers  & 2 ;EK7H  J 2 3   3     S     and

(50) & 2 ;EK+H  J 2  

(51)        =  are lower bounds on the absolute value of the sine, cosine and tangent. . & . .  . .

(52). of normalized single precision (for ) and double precision (for ) floating-point numbers. These values are much larger than the smallest normalized floating-point numbers. Therefore when the input arguments to sines, cosine and tangent are not so small that the results of Table 1 could be used, then their values are representable as normalized floating-point numbers.. 3 Implementation of the method 3.1 Overview of the implementation The tests are implemented in three steps: 1. As said previously, the first step, which is by far the most time-consuming one, is a filter. It consists in eliminating most of the tested arguments. This step amounts to testing if 8 (in general) consecutive bits are all zeroes 7 thus keeping one argument out of   , in average. This step is very slow and needs to be parallelized. 2. The second step consists in reducing the number of worst cases obtained from the first step and grouping all the results together in a same file. This is done with a slower but more accurate test than in the first step. As the number of arguments has been reduced, this step is fast enough to be performed on a single machine. 3. The third step is run by the user to restrict the number of worst cases when he needs them. Results on the inverse function are also obtained. As the number of arguments is small, this step is very fast. 7. These are the bits following the first changes in the tested domain.. RR n˚4044. . bits of the mantissa, unless the exponent of the output values.

(53) 10. Vincent Lefèvre and Jean-Michel Muller. Most programs are written in Perl (text data handling, process control. . . ). Concerning the calculations, the tests of the first step are currently written in Sparc assembly language, as they need to be as fast as possible; and for the other calculations, we currently use Maple with an interval arithmetic package.. 3.2 The First Step Let us give more details about the first step. The user chooses a function C , an exponent, a mantissa size (usually  ) and some other parameters, and the first step starts as follows. 8   First, the tested interval is split into  =  subintervals  containing   test numbers  and the function C is approximated by polynomials of degree  (  4 to 20) on  . We have chosen to use Taylor expansions, as the error can easily be bounded and this approximation suffices for us. For each  , we start with  >&! , and successively  increase  until the error due to the approximation is small enough. is expressed modulo the distance between two consecutive test numbers, as we only need to know information about the bits following the mantissa.  Then, each interval  is split into subintervals

(54)  containing  =

(55) arguments and is approximated by degree-  polynomials 

(56)  on 

(57)  , with 64-bit precision.. .  . On 

(58)  : 

(59)  is approximated by a degree-  polynomial (by ignoring the degree-  coefficient) and the variant of the Euclidean algorithm is used. If it fails, that is, if the obtained distance is too small, one has to perform more tests: – 

(60)  is split into. subintervals 

(61) 

(62) .. *. – For each , : the Euclidean algorithm is used on 

(63) 

(64) , and if it fails, the ar* guments are tested the one after the other, using two 64-bit additions for each argument. The program performs the first point (thanks to Maple and the interval arithmetic package), generates a C/assembly source for the following points, then compiles and executes it. The first step requires much more time than the other steps, thus it is run on several machines (we have used around one hundred machines, in background). As the calculations in different intervals are totally independent, there is no need for communications between 8 The numbers that are given here are just those that are generally chosen; the user or the program may choose other values for particular cases.. INRIA.

(65) Worst Cases for Correct Rounding of the Elementary Functions. 11. different machines. We only have a server on a particular machine to distribute intervals to each client (the program that performs the tests). It is more interesting to run the program on a network of workstations than on a dedicated machines, and these workstations belong to users, who work on them. We must not disturb them. So, the programs were written so that they can.  . run with a low priority (nice),. . automatically stop after a given time, automatically detect when a machine is used (in particular the keyboard and the mouse) and stop if this is the case.. 4 Results: natural (radix- ) exponentials and logarithms These functions have been among the most time-consuming during our searches, since there is no known way of deducing the worst cases in a domain from the worst cases in another domain. And yet, we have obtained the worst cases for all possible double precision floating-point inputs. They are given in Tables 2 and 3. From these results we can deduce the following properties. Property 1 (Computation of exponentials) Let R be the exponential of a double-precision number  . Let R be an approximation to R such that the mantissa distance 9 between R and R is bounded by .  for 2 92  , if  

(66) 

(67) &  = = then for any rounding mode of the IEEE-754 standard, rounding R is equivalent to rounding R ;   for 2 92 #  , if  

(68)  =  &  =

(69)  then rounding R is equivalent to rounding R .. . . . Property 2 (Computation of logarithms) Let R be the natural (radix-  ) logarithm of a double-precision number  . Let R be an approximation to R such that the mantissa distance between R and R is bounded by . If  

(70)   &  = =  then for any rounding mode of the IEEE-754 standard, rounding R is equivalent to rounding R . 9. If one prefers to think in terms of relative error, one can use the following well-known results: if the mantissa distance between and is less than then their relative distance is less than . If the relative distance between and is less than then their mantissa distance is less than X .. . RR n˚4044. . . . . . . .         . .

(71) 12. Vincent Lefèvre and Jean-Michel Muller. * @ < (). . Table2: Worst cases for the exponential function in the full range. Exponentials of numbers less. *  < (').  are underflows (a routine should return or the smallest non zero positive reprethan   sentable number, depending on the rounding mode). Exponentials of numbers larger than are overflows.. -.* X  20 1    0  [ Z Z  ZUZ ZZZ Z Z ZZZ Z [ ZZ Z Z Z [ ZZ Z [Z[0Z;Z Z ZIZ;[ Z[Z [0Z;Z ZZ Z;[ Z [;Z[ ZMZ [ Z;[ ZI[ [7[0Z Z Z[ Z;[7Z ZIZI[0[7Z;Z ZIZM[;[ [0Z ZZMZ;[Z [;[;[0[ ZIZ [;Z;[0ZIZ [;Z [ Z [ Z ZMZ [ ZM[ [ Z [7Z ZI[ X Z  M[ [ 0 0  UZ [[ [[ [ [[ [[ [ [[ [[ [ [ [ [ [ [ [;[[ [;[[;[ [[;[ [;[[ [;[[;[ [[;[ [ [ [ [ [ [;[[ [;[[7Z X     [3  [ Z Z ZZ ZZ Z ZZ ZZ Z ZZ ZZ;Z Z;ZZ Z;ZZ;Z ZZ;Z Z Z Z Z Z Z;ZZ Z;ZZ;Z ZZ;Z Z;ZZ Z;ZZ;Z ZZM[ [ [ [ 0211 ZI[ O[ %UX  1  Z [ P [P Z Z[[[ Z[Z Z [ZZ [Z ZZ[Z [ZZZ[Z [ZZ [Z ZZ[Z [ZZ[Z Z [;ZZ Z;[ Z[;Z Z [[Z;[7[ Z[;Z Z Z;[Z;[;ZZZ[ Z;Z;[Z [;ZZZ;[ Z;[Z Z [ZZ Z;[ Z Z[ Z [ZZ Z;[;Z Z [Z;[0[ ZZ[;Z Z[Z;Z [;Z[[ Z;[;[Z [[;Z[[ Z;[;[;Z [[Z[;[ Z;[[;Z [[Z;[;[;Z[0[ Z ZM[Z;[[;Z[0[ Z;ZMZ [Z[ Z Z Z X X 02 1  [0 !Z   Z [ P[ Z [[ZI[[[[ [[ [[[[ [[[[ [[ [[[[ [[[[;[;[ [0[;ZI[[7[ [;Z Z;[[;ZZ[ Z;[[;ZZ;[ [Z [ZZ;[ [0Z ZZ Z;[7Z Z[0Z ZZ;Z Z[0Z;ZZ Z;ZZZ;ZZ Z;Z;ZZZ;Z ZZ;ZZIZ;[7Z Z [7[[;ZZ[ ZM[[ [ [ X  Z  Z   Z [ P [ Z [[ZI[[[[ [Z [Z [Z[Z [[ [Z [[[[ [Z [;ZZM[ [[;[[ZM[ [0[;Z;[[;ZZ [ [7[[;ZZ;[ [Z [ZZ;[ [0Z ZZ ZMZM[0[[Z [0[;Z[Z [;[[ [;[[[;[[ [7[;ZI[[;[ [[;[[ [;ZM[[0[;Z[Z;Z Z [7ZZ Z Z X  [ 0[  X  1 -. X 021    Z [ [ [[ P[Z [ Z [ [ [[[ [[ [ [[[0[Z[ Z [Z [;ZI[[0[;ZI[[0[ Z [;[ [[0[;Z [[0[Z [;Z;[ Z[ Z [[;[0[ ZZ;Z Z [;[ [Z;ZZ Z;Z;ZZIZM[0[0ZMZI[ [;Z;[0Z Z;ZI[7ZZ Z [;ZI[[;ZM[0[0Z ZZ Z;Z ZZ Z ZM[Z3Z [;[[  Z [  ZZ [ Z [0ZZ [[ [0ZI[0Z [[ [[7Z [7ZZ [7ZI[;[ [[;[ [0Z [ [0Z Z;ZI[ [7ZI[;[ [ ZM[0ZM[ Z Z;ZI[7Z ZZM[ [ [ Z [. Interval . worst case (binary) .

(72) . . .

(73) .  . . 

(74) . .  . . #". . '". $. . (. . (. . $. %. . . &. &. '". . . &. '". . .  ". '". .  . . . .  . . !. .  . . . . &. &. Table 3: Worst cases for the natural (radix ) ) logarithm in the full range. Interval. 0X  21 0 . .

(75) . *. X 021 .  /.. . .. . +. &. . (. . . ,+. &. ". (. . &. +. . -. '". . . . . . * . #". $. . . * . PZ. . . 0 IZ . . -.U /ZIP[ Z [[ Z [0[[0ZI[ ZI[[0ZZ [Z [ [ [ Z Z Z Z [ ZIZ [0ZMZ[Z [;[7[ Z[Z [7ZMZ [Z [;[ [0[0ZZ ZM[0[ ZM[[ [;Z [ Z;[7ZZIZ [ [7[7ZIZ[;ZM[ [ [ [ Z;Z;Z Z Z;[0ZIZ [0[0Z;Z ZIZ [;Z;[0ZZZ ZM[;[0[Z [7[0Z Z ZZ Z3X [ 1 Z [ -.U /ZIP[0Z ZZ Z [ Z[Z [[ [Z [[0ZI[ [ [ [ Z Z [0Z ZIZI[ [ [7[ZI[0[7Z;Z ZZIZ [7[7Z Z[0ZMZ [ [ [[0[;Z [0ZMZI[[7[0Z Z;Z;ZIZI[0[ ZM[7[ ZZ;Z;Z Z [ [ ZMZ;[ Z [;Z [[ [0[ ZM[ [ [ Z;[;Z [Z[0Z;ZMZ [Z [;Z [ Z [[ Z X Z 1 1 [ [ -.U /ZIP[0Z ZI[[ [[ [Z [[[[0Z ZIZI[0[0ZIZ[0Z Z Z [ [ Z Z ZM[[[0[7Z;Z Z[ Z Z;[;Z [[ [7[ Z [ [[0[7Z Z [;ZI[ [7Z Z [7[;Z[ Z Z Z;ZMZI[[;[7[0Z Z[ZM[7[0Z ZMZ [Z [ Z [;[ [ [0ZMZM[ [ [Z [;ZM[0[Z [;[ [ [ [[ [ X [  1 Z  [  U-. Z /

(76) -%Z Z/Z ZO[ZZ Z [Z Z [0ZIZZI[0Z [ Z Z[ ZZ [ Z Z Z [ ZZZ Z Z [0ZIZ Z[0ZZ ZMZM[ [ [0Z Z ZIZ [;[7Z[ Z ZMZ;[ [0Z [ ZZ[ ZMZ;[0[0Z Z Z Z;ZM[ Z[ [0Z Z Z ZM[7Z [ZI[;[;[7[[0Z [ ZI[[7[7[;ZZ [ ZM[ [;[ ZM[[ [ [ ZM[0[;[0Z [ Z;[ ZMZ[ [ Z [0[ Z;Z;ZMZZ[ ZMZ [[ ZM[;[[ [0[7Z;Z Z Z Z Z Z[ Z Z [0[ Z [ Z [0Z Z [ [3X[   Z Z  UZ [ [0ZI[0ZI[ [0ZI[ [[ [0ZZ ZM[ ZM[0ZI[;[ [ [ [0Z [ [7ZZ Z;ZI[ [;[ Z;Z [ Z;Z [;[ Z [;[[7Z ZZ;Z [ [ [ Z Z [0Z -. /P ZZ [ [ Z [ [[0ZZ [0ZI[0ZI[ [ [0Z Z [0Z Z Z Z Z;ZZ Z;ZI[7Z [ ZM[0ZM[[ [;[[;[0ZZ;Z Z Z Z Z Z [7ZZ Z' Z Z Z [ Z [ Z Z [[ [ Z Z [[7Z ZI[7Z Z [0Z [0Z [7ZI[ [;[ ZM[ [[;[ [;[[ [7ZZ;Z [ Z;Z Z Z [0Z [0ZM[[0Z Z Z [ [ -.Z/Z ZIP [0Z Z [ [ Z Z Z ZI[[ [ [[0Z ZI[0[ [ZI[0[ ZIZ;[0Z Z Z[ZM[[ [7[0ZIZ [ Z [;Z [Z [7[7ZZIZI[0[;Z;[ Z[ ZM[0[0Z ZI[[;[0[0Z;Z;ZIZ[0Z Z;Z;ZIZI[;[;[[0[ ZIZM[;[0[0ZMZ [[0[0Z Z;Z ZIZ [7[;Z[[0ZI[7Z Z Z [ [ [ X Z Z . worst case (binary). . . . . . . . . . . 0. . +. +. *. . &'. X. . #". &. &. 5 Results: radix-2 exponentials and logarithms 5.1 Radix-2 exponentials Using the identity # 1  <  & 21   allows to efficiently fasten the search. First, getting the worst cases for 43 + 5 J allows to derive all worst cases for  #!  and   , since an. INRIA.

(77) Worst Cases for Correct Rounding of the Elementary Functions. 13. -mantissa-bit number belonging to these domains is the sum of an integer and a number with at most mantissa bits. The worst cases for 2 92 #  were obtained through the radix-  logarithm in HI0B+ 5 J . These results, given in Table 4, allow to deduce the following property. Property 3 (Computation of radix-2 exponentials) Let R be the radix-2 exponential   of a double-precision number  . Let R be an approximation to R such that the mantissa distance between R and R is bounded by . If  

(78) 

(79) &  = = then for any rounding mode of the IEEE-754 standard, rounding R is equivalent to rounding R .. . 5.2 Radix-2 logarithms Concerning radix-2 logarithms, let us show that it suffices to test the input numbers greater than  , and whose exponent is a power of  : First, let us show that it suffices to test the input numbers whose exponent is a positive  8/ , power of  to get the worst cases for all input values greater than  . Consider  &  DFE8G H J , so that the infinite mantissa of with ?   . The radix-2 logarithm of  is R &A?. R begins with / & VDFE8G H6?WJ bits that represent ? , followed by the binary representation of. DOE8G H J . Let ?" be the largest power of  that is less than or equal to ? . Since PDOE8G H , J &. VDOE8G H6?WJ & / , we deduce that the infinite mantissa of the radix-2 logarithm R" of  &. 1;/ has the same bits as the infinite mantissa of R after position / . Hence, there is a chain of , consecutive ones (or zeroes) after bit  of the infinite mantissa of R if and only if there is a chain of , consecutive ones (or zeroes) after bit  of the infinite mantissa of R  . Therefore, from the worst cases for an exponent equal to  we easily deduce the worst  cases for exponents between   and  <>= 1 . For instance, in Table 5, we only give one of the worst cases: the input value has exponent   . All worst cases are easily deduced: they have the same mantissa, and exponents between   and   . Now, let us show how to deduce the worst cases for numbers less than  from the worst cases for numbers greater than  . This was done as follows: consider a floating  / , with   #  , and ? 3 .  is less than  . Its radix-2 point number  &  logarithm is RS& U? DFE8G H J . The integer part of the absolute value of R is ?'  and. its fractional part is  DFE8G H J . So the infinite mantissa of R begins with PDFE8G H6? 0J.  , followed by the bits that represent  DFE8G H J . Now, consider bits that represent ?. the floating point number   &   / = .  is greater than  . Its radix-2 logarithm is  R  & H6?) 0J DFE8G H J . So, the infinite mantissa of R  begins with PDFE8G H6? 0J bits that. represent ?  (that is, the same as for R ), followed by the bits that represent DFE8G H J .. But the bits that represent  DOE8G H J are obtained by complementation10 of the bits that. . . .  .  . . . . . 10. Z. is replaced by [ and. RR n˚4044. . [. . .

(80) . . is replaced by Z .. . .   . . . . . . . . .

(81) 14. Vincent Lefèvre and Jean-Michel Muller. . represent DFE8G H J . Hence, there is a chain of , consecutive ones (or zeroes) after bit  of. the infinite mantissa of R if and only if there is a chain of , consecutive zeroes (or ones) after bit  of the infinite mantissa of R  . Therefore,  is the worst case for input values less than  if and only if   is the worst case for input values greater than  . This is illustrated in Table 5: the infinite mantissa of the worst case for input values greater than  starts with the same bit chain (  88888888 as the mantissa of the worst case for  #T , then the bits that follow are complemented ( Z [[ [ Z [ [[0ZZ Z ZZ ZI[0Z [ [0Z [0Z Z;ZZ Z;ZI[;[ [[7Z [7ZZ [;[ ZM[ [[7Z Z [ [ [%Z ZI[ [ for the case  #  and [0ZZ ZI[0Z ZZ [[ [ [[ [ Z [7ZZM[0ZI[;[ [;[[ [7ZZ;Z ZI[7Z [ [0Z Z [0Z;ZZ [;[ Z Z Z % [ [ Z Z& for the case    ). Using these properties, we rather quickly obtained the worst cases for the radix-  logarithm of all possible double precision input values: it sufficed to run our algorithm for the input numbers of exponents  ,  ,  , , ,  , . . .   . These results, given in Table 5, allow to deduce the following property.. . Property 4 (Computation of radix-2 logarithms) Let R be the radix-2 logarithm DFE8G HV4J. of a double-precision number  . Let R be an approximation to R  such that the mantissa distance between R and R is bounded by . If  

(82) 

(83) 

(84) &  = then for any rounding mode of the IEEE-754 standard, rounding R is equivalent to rounding R .. Table 4: Worst cases for the radix-2 exponential function. BF. in the full range. Integer values of. are omitted. Interval. .  ZUZ ZZ Z[Z [0ZZIZ [0Z Z Z[Z [ Z[Z [0ZMZ [ZI[7[ Z [ZI[0[;Z;[0ZZMZM[ [0Z ZI[;[7[Z [7[7Z ZZZ Z;ZMZ [ Z ZM[0[0Z Z[ Z;[ Z [0[ Z ZMZ [0[;Z [[0[ Z [;Z [ Z Z;ZMZ [ ZZ ZMZ;[0ZZ;ZMZ[0Z ZZ ZM[ [7[ ZI[ [ X Z  Z0 Z [   UZ [ Z [[ [ [[0ZI[0Z ZI[0ZZ Z;ZI[7Z ZI[7Z ZM[[ [7ZZM[ [ ZM[ [[7Z [ [ [0Z [0Z;ZI[0ZM[ Z;Z [[7Z Z;ZZ X   1 . . . worst case (binary). . . UZ [  [3. X   '[ZZ X '[ ZZ X '[ ZZ X '[ ZZ X SZ  [[ X SZ  [[. . . O[ ZI[;X . . . . . (. . . . . &. $. ZZ Z ZZ ZZ Z ZZ ZZ Z;ZZ;Z ZZM[ [7ZI[ [;[[7Z [[7Z Z [ [0Z [0Z [0Z Z;ZI[0ZM[[7Z ZI[;[0Z;ZZ [ Z Z [ [[ [& UZ [[ [[ [0ZI[0ZI[0Z [ Z [ Z ZM[[;[ [[;[ [;[ Z Z;ZI[;[0ZI[;[ [ ZM[0Z [0Z [0Z ZM[[0Z;ZZ;Z ZZM[ [;[ Z$ X '"  ZZ Z ZZ ZZ Z ZZ ZZ Z;ZZ;Z ZZ;Z Z;ZZ Z;ZZ;Z ZZ;Z Z [0Z [ [0Z [0Z [7ZZ [7ZZM[0ZZM[ [;[[0Z Z Z  [ [[ [& UZ  [[ [ Z Z [[ [[0Z [ Z ZI[0Z;ZZM[ [[7Z ZM[ Z ZM[ Z;Z [ Z;Z [ Z;Z [0Z [0Z [0Z;ZI[ [;[[;[ [ Z;Z ZM[ Z$ X '"%" ZZ Z ZZ ZZ Z ZZ ZZ Z;ZZ;Z ZZ;Z Z;ZZ Z;ZZ;Z ZZ;Z Z Z [ [0Z Z Z Z [7ZZ [7ZI[7Z [ Z;Z Z;ZI[ [ [ [  Z ZI[ [& PZ  ZI[0Z ZZ ZZ Z ZI[0ZZ Z [ Z;Z ZZM[0Z Z Z Z [ [7ZI[ [;[ ZM[ [ Z;Z ZM[ Z ZM[ Z;Z ZZ;Z Z Z [ [ [0ZM[ Z- X    [[ [ [[ [[ [ [[ [[ [;[[;[ [[;[ [;[[0ZM[[7Z ZI[7Z Z [ [0Z [0Z Z [ [;[[0Z;ZZM[ [[;[0ZM[ Z [ [

(85) ;Z [0ZZ PZ  ZZ Z [[0ZI[ [ [ Z [ Z Z [[7Z [ Z;Z [ [0Z [0Z [;[ Z [;[ Z;Z [ ZM[0Z;ZZ Z;ZZ;Z [[7Z [0Z [ [0Z ZM[ Z- X  [[ [ [[ [[0Z [ Z [[0Z;ZZ;Z ZZ;Z ZM[[ [;[ ZM[0ZZ;Z [0Z Z [ [ [ [0Z [7ZI[0Z;ZI[7Z [ ZM[ [7ZZ [ ,Z 

(86) M[0Z [[ . . . ! 021 . '. . INRIA.

(87) Worst Cases for Correct Rounding of the Elementary Functions. Table 5: Worst cases for the radix-2 logarithm function are integer powers of 2 are omitted.. *;+ - ' . . 15. in the full range. Values of. that. -U Z / [  [P[Z [[ [ Z [ Z [[[ [ [ [[0ZI[Z [[[0Z [Z [Z [0[ [0ZI[7Z Z Z [0Z;Z Z[0Z ZZM[Z Z;ZM[Z[Z [7ZM[0ZZ;Z;Z Z[Z ZMZ;[0ZZZM[ZM[[[;[;[[0Z Z [;[0[Z [;Z [0[ [7ZI[7ZI[Z [;Z [0[ ZZ;Z Z [0[ ZM[[ [ [ X  Z 0 ZI[  [ . PZ X X 021   S-.Z  [ / [ [ P[Z [ [[ [Z [ZI[[ [ [ [Z [0ZZIZ [0[7Z Z[ Z Z ZM[ [Z [;[7[ ZI[[7[;Z [0[ ZMZ;[ Z Z Z;ZMZ[ Z Z [7[;Z[Z;[;Z [ [ [ZM[7[0Z Z Z Z Z [ Z [ [0[0Z ZM[;[[ [ Z [;ZM[ [ ZMZ;[0Z ZZIZM[;[0[0Z;Z ZI[0ZMZ [[ Z X [ [0  Z  Z. Interval. O[ Z X . worst case (binary) . . (. . ". . %. . (. . &. . %. &. 6 Results: trigonometric functions The results given in Tables 6 to 11 give the worst cases for functions KMLON ,

(88) KMLON , ;EK ,

(89)  ;EK ,   8N and

(90) 8N . For these functions, we have worst cases in some bounded domain only, because trigonometric functions are more difficult to handle than the other functions. And yet, it is sometimes possible to prune the search. Let us consider the arc-tangent function of large values. The double precision number that is closest to B+ is. . &.

(91) Q     8    

(92) 8   Q  . . Assuming rounding to the nearest, the breakpoint that is immediately below. . . 3 3    +  

(93)

(94)           . &.   8 .  8N>HV4J is larger than. . is. . . For any real number  , if

(95)  then the correctly rounded (to the near est) value that should be returned when evaluating

(96)  8N HVWJ in double precision is . We deduce from this that for    +      8 , we should return . A similar calculation shows that for  between    +   Q and  +        , we should 6? H J . return For rounded to nearest arc-tangent, the worst case for input numbers larger than        = is. .

(97)

(98) 

(99)  

(100) 3

(101)

(102)

(103) . . 83

(104)    

(105)  . + . .

(106)

(107) 

(108)  .

(109) 

(110)   +      0 . & 8 . whose arc-tangent is.  ". bits. . Z Z [[0Z [[0ZI[ [ [[0ZZ Z Z Z ZM[ Z MZ [ MZ [0 ZI[7Z [;[[0ZM[[;[0ZI[;[ [ [0Z [ [0ZM[ Z [7ZI[;[0ZZM[ [ Z [  Z Z ZI[0ZZ , . RR n˚4044. .

(111) 16. Vincent Lefèvre and Jean-Michel Muller. Property 5 (Computation of sines) Let R be the sine of a double-precision number  satisfying 0B 8S 2 U24  . Let R be an approximation to R such that the mantissa distance between R and R is bounded by . If  

(112)  

(113) &  = = then for any rounding mode of the IEEE-754 standard, rounding R is equivalent to rounding R . Property 6 (Computation of arc-sines) Let R be the arc-sine of a double-precision number  satisfying KMLON HI0B 8J  2 92   . Let R be an approximation to R such that the mantissa distance between R and R is bounded by . If  

(114)   &  = =  then for any rounding mode of the IEEE-754 standard, rounding R is equivalent to rounding R .  Similar properties are deduced for the other trigonometric functions: 1  = for cosines between 0B and   8B   &    ,    = =

(115) for arc-cosine between   ;EK7HI  8B  J % 88 8 and ;EK7HI0J T  ; % = = for tangent between 0B 8     and

(116)  8N>H J ; and   = for arc-tangent between 8NHI0B 8J and  ..

(117)

(118)   .

(119)

(120)   .  .    . Table 6: Worst cases for the sine function in the range    .. 0  IZ. '[ [ Z Z.Z Z [ [Z [Z Z [Z [ [Z[ ZZ Z ZIZ [[Z[0[0Z Z ZIZ[[7Z [ [Z [[[ [0ZZMZ [ZZ[Z Z;[7[ Z [ZZ[0ZMZM[Z [0[Z;Z Z;ZIZ [Z [0[7ZMZ [Z[0ZZMZ [Z;[0[ZIZ [[;[;[7[0[ZZ;[0Z;ZIZM[Z [[[;[;[7[ [0ZZMZI[Z [;[Z [ [0[7[;Z Z [ [Z Z [;[0[7[Z Z[0[ ZM[0Z;[0ZZMZI[Z;[;Z Z [ [[7[0Z;Z Z Z Z [7[ ZZ [ Z [ [7ZZ. Interval  " . Z X. worst case (binary) . . . '[. . . . ZZ Z ZZ ZZ Z ZZ ZZ Z ZZ;Z ZZ;Z Z Z Z Z Z Z;ZZ Z;ZZ;Z ZZ;Z Z;ZZ Z;ZZ;Z ZZ;Z Z Z Z Z Z Z;ZZ Z. Table 7: Worst cases for the arc-sine function in the range. Z, [0Z&.      

(121)    !>   ..   . .  . 0  IZ ' [ [0#'Z&(Z Z.Z >Z [ ZZ [0ZIZ[ Z [0Z ZZIZ [0ZMZI[ [ Z [0ZMZ[Z [7[Z [0ZZ ZM[ [0ZMZ [0Z ZIZ [;[ [ [0[ Z;[ Z[0Z Z [;Z [ ZMZ;[Z [0ZIZ;[7ZIZ [;[;[ [[[ [7[7Z ZZMZ [Z;[ ZI[7[7ZIZ [;ZI[ [7[ Z Z;Z Z [0[0Z Z [ [0[ Z [7Z[ Z [ Z [. Interval . Z. [[. + . ". worst case (binary) . . . +. &. Conclusion The worst cases we have obtained will allow the design of reasonably fast routines for evaluating most common mathematical functions with correct rounding (at least in some intervals) in the four rounding modes specified by the IEEE-754 standard. We are extending the domains for the functions for which we have not yet obtained the worst cases in. INRIA.

(122) Worst Cases for Correct Rounding of the Elementary Functions. 17.  "    $"B $

(123)  !  .   $"  $ !B. Table 8: Worst cases for the cosine function in the range .  .. slightly less than . 0 Z Z 0 0 . '[ Z ZQZ Z ZZ Z [Z [0Z ZIZ Z [0ZZ[0Z ZZZ Z[Z Z[0Z Z [ZI[[0Z;[ ZZ ZZZMZM[Z [[Z [0[ ZMZ;[[ZIZ [7[7[0Z Z Z ZI[[[7[7[0Z ZZ[0ZZ Z Z[;Z Z [Z [0[0[0Z ZMZM[;[[ [[7Z Z Z[;[7Z[ZZM[;ZM[[0[[7Z[ZZ;[7ZZ Z Z;[;[;ZI[[[7[ [ Z [7[7[ ZIZZ;[7Z;ZZ Z Z[ [ZZM[;Z [ [0[[ Z[[ Z[;Z[ ZZ Z [;[[;Z X [0 IZ [0Z. Interval . . +. . *. . . +. . -. . . . Z. .     "  

(124)  #    .. . +. ,%. SZ [ [ Z ZI[ [0ZZ [ Z Z ZZ ZZ;Z Z;ZI[ [;[ ZM[0ZZM[0Z Z [ [ [ [;[ Z Z;ZI[;[0ZI[7Z ZM[[ [7ZI[7Z [[7Z [. Table 9: Worst cases for the arc-cosine function in the range. . . [ ZI[' X. '". +   $"B $

(125)  !    +   . . worst case (binary).    *. *. . +. . . 0. .  B. Table 10: Worst cases for the tangent function in the range   .  ; B $ .. +. . .  .  , with.    . Z +  [ [&.  .  . worst case (binary) . ". .  . 00. . X. 0 00 "$#%PZ [0Z [ Z [[ [ [[0ZI[ [0ZI[ [;[[0Z;ZI[7Z [ Z;Z [ [0Z [0Z Z;ZZ ZM[[;[ [ Z;Z ZM[[ [;[[;[ [ ZM[0Z [ [ X    S Z [0ZI[0Z [[ [[ [0ZZ ZZ [ [ [0Z Z [ [7ZZ ZM[ ZM[0ZZ;Z Z;ZZ Z;ZZ;Z ZI[;[0Z Z Z [ [ [7ZZ ZM[[7Z [ Z3[ Z [ $" #%PZ Z [0ZI[ [[0Z ZI[ [ Z Z ZZ Z;ZI[ [7ZZM[ [ ZM[0Z [0Z [0Z ZM[[ [7ZI[7Z ZZM[ [7ZZ ZM[ ZM[0ZZM[0Z [0Z X   0 0 "$#%  0 0 SZ Z [ Z [0ZI[ [ Z [ [ Z ZI[ [0Z Z Z Z Z Z;ZZ ZM[[;[ [ ZM[0Z;ZZ [7ZZM[0ZI[7Z Z [ [0Z Z [7ZZ ZM[ ZM[0Z [ [ Z [  "$#'  0 0 "$#%  0 X ' [ Z $" [ #%Z Z OZ[ Z [Z [0Z ZZI[ZI[0[[0ZI[ Z [0ZI[0Z [ZI[0[0Z Z [[ [7Z ZMZ[Z [Z;[;Z[ZM[0Z;ZZ [Z;Z;Z [;Z [Z [Z [0Z;Z ZI[;[ [;[0[ZZMZ [;[ [[ [ZM[[ [0[[;Z [7[0Z;ZI[ZI[7[0ZMZ[Z;ZMZ [ZI[[ Z;Z [0Z Z Z [3Z  [[. Interval .  .  .  % 0 0 %'PZ   #'&(Z )Z  [ [ Z Z [Z[ Z [0Z ZZZ Z ZZIZ [0[ Z [[ [ Z [Z[;Z [ [ [[ [;Z [ ZM[7[ ZZ Z Z;[7ZZZMZM[0[0ZZIZ;[7Z Z Z [ Z [ [0[ Z [ [ [0[7Z;ZIZ[ Z [;[7[ ZZMZM[0[ ZI[[;[7[ Z[7ZMZI[ [0Z ZMZM[ [Z;[;Z [ ZZM[ Z[ [. Interval . is. worst case (binary). . *. . + . . . . . . . '". !. &'. %. . . . . . the full range. More examples can be obtained through the URL http://www.enslyon.fr/

(126) jmmuller/TMD.html. These “worst cases” will also be good test cases for checking whether a library provides correct rounding or not.. References [1] R. C. Agarwal, J. C. Cooley, F. G. Gustavson, J. B. Shearer, G. Slishman, and B. Tuckerman. New scalar and vector elementary functions for the IBM system/370. IBM. RR n˚4044. X.  X. . &'. . '".

(127) 18. Vincent Lefèvre and Jean-Michel Muller. 

(128)     "

(129)  .. Table 11: Worst cases for the arc-tangent function in the range. . ". . .       , with      . . . $" #' 0   0 S Z #!ZI&([0)ZI"$#%[ [ [ PZZ ZIZ [ [ [0Z Z[ Z Z Z[ Z [ Z Z Z [[ [0[7Z ZZIZ [ [;[ [ Z;ZMZ [0ZZIZ;[7Z Z Z;[7ZZZ Z ZM[;[[[;[7[ Z [ [ ZMZ;[0Z Z Z Z [ Z [0[0Z Z Z ZMZM[ [ Z Z [7[7ZZZMZM[ [0[ ZIZ;[;Z [ [7ZZ Z ZM[ ZMZ [0Z I[0Z X   X  0  X  [ !#ZI&([0)ZI"$#%[ [ [ O[ Z ZIZ [0[ Z Z [ ZZ Z [ Z Z [ Z Z [ Z[ Z [;[0[ ZIZ [ ZM[ [ ZMZ;[ Z [ ZIZM[;[ [0[7ZMZ[Z [ Z;[7ZZIZM[;[0[0ZZIZ;[;Z [ [ [ [0[0Z Z Z Z Z [0[;ZM[ [ Z Z ZM[;[[ [;Z;[ Z [[ [7Z;Z Z [7[ ZI[ [7[ ZZ;Z [ ZZ' Z Z '. Interval . . worst case (binary) . . . !. ,%. #". &'. #". . . . %. &. Journal of Research and Development, 30(2):126–144, March 1986. [2] American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE standard for binary floating-point arithmetic. ANSI/IEEE Standard, Std 754-1985, New York, 1985. [3] J. T. Coonen. An implementation guide to a proposed standard for floating-point arithmetic. Computer, January 1980. [4] C. B. Dunham. Feasibility of “perfect” function evaluation. SIGNUM Newsletter, 25(4):25–26, October 1990. [5] S. Gal and B. Bachelis. An accurate elementary mathematical library for the IEEE floating point standard. ACM Transactions on Mathematical Software, 17(1):26–45, March 1991. [6] D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5–47, March 1991. [7] W. Kahan. Minimizing q*m-n, text accessible electronically at http://http.cs.berkeley.edu/  wkahan/. At the beginning of the file "nearpi.c", 1983. [8] V. Lefèvre. Developments in Reliable Computing, chapter An Algorithm That Computes a Lower Bound on the Distance Between a Segment and , pages 203–212. Kluwer, Dordrecht, Netherlands, 1999. [9] V. Lefèvre. Moyens Arithmétiques Pour un Calcul Fiable. PhD thesis, École Normale Supérieure de Lyon, Lyon, France, 2000.. INRIA.

(130) Worst Cases for Correct Rounding of the Elementary Functions. 19. [10] V. Lefèvre, J. M. Muller, and A. Tisserand. Towards correctly rounded transcendentals. In Proceedings of the 13th IEEE Symposium on Computer Arithmetic, Asilomar, USA, 1997. IEEE Computer Society Press, Los Alamitos, CA. [11] V. Lefèvre, J.M. Muller, and A. Tisserand. Toward correctly rounded transcendentals. IEEE Transactions on Computers, 47(11):1235–1243, November 1998. [12] J.M. Muller. Elementary Functions, Algorithms and Implementation. Birkhauser, Boston, 1997. [13] M. Schulte and E. E. Swartzlander. Exact rounding of certain elementary functions. In E. E. Swartzlander, M. J. Irwin, and G. Jullien, editors, Proceedings of the 11th IEEE Symposium on Computer Arithmetic, pages 138–145, Windsor, Canada, June 1993. IEEE Computer Society Press, Los Alamitos, CA. [14] M. J. Schulte and E. E. Swartzlander. Hardware designs for exactly rounded elementary functions. IEEE Transactions on Computers, 43(8):964–973, August 1994. [15] A. Ziv. Fast evaluation of elementary mathematical functions with correctly rounded last bit. ACM Transactions on Mathematical Software, 17(3):410–423, September 1991.. RR n˚4044.

(131) Unit´e de recherche INRIA Lorraine, Technopˆole de Nancy-Brabois, Campus scientifique, ` NANCY 615 rue du Jardin Botanique, BP 101, 54600 VILLERS LES Unit´e de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unit´e de recherche INRIA Rhˆone-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST MARTIN Unit´e de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex Unit´e de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex. ´ Editeur INRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France) http://www.inria.fr ISSN 0249-6399.

(132)

Figure

Table 1: Some results for small values in double precision, assuming rounding to the nearest
Table 3: Worst cases for the natural (radix ) ) logarithm in the full range.
Table 4: Worst cases for the radix-2 exponential function BF in the full range. Integer values of are omitted.

Références

Documents relatifs