not aeted by round-o errors, are in ommon with the exat limit

(1)

omputation

Fabienne Jezequel

Laboratoire d'Informatique de Paris 6 - CNRS UMR7606,

4 plaeJussieu, 75252 Paris edex 05,Frane

Abstrat

Undersome assumptionsonthe speedof onvergeneof asequene,thesigniant

digits of one of its iterates in ommon with the exat limit an be determined

by omparing this iterate with the next one. Using a nite preision arithmeti,

if omputations are performed untilthe dierene between two suessive iterates

is insigniant, the global error on the last iterate is minimal. Furthermore, for

sequenesonvergingatleastlinearly,weandetermineintheresultobtainedwhih

exat signiant digits, i.e. not aeted by round-o errors, are in ommon with

the exat limit. This strategy an be used for the omputation of integrals with

thetrapezoidalorSimpson's rule.Asequene isthengeneratedbyhalvingthestep

value at eah iteration, while the dierene between two suessive iterates is a

signiantvalue.Theexat signiantdigitsof thelastiterateareinommonwith

theexat valueoftheintegral, uptoonebit.Thiskindofstrategyisthenextended

to numerial algorithms involving several sequenes, suh as the approximation of

integrals on aninnite interval.

Key words: onvergingsequenes, numerialvalidation, quadraturemethods,

trapezoidal rule,Simpson's rule,CESTAC method,DisreteStohasti Arithmeti

1 Introdution

In a numerial method whih involves the omputation of a onverging se-

quene, the limit is approximated by one of the iterates. It may be diÆult

toestimateinthe hosen iteratethe global error,onsisting of the trunation

error and the round-oerror. The optimaliterate, i.e. the approximation for

whih the global error isminimal, an be omputed dynamially [14℄. In this

Email address: Fabienne.Jezequellip6.fr(FabienneJezequel).

(2)

iterate,whihareaetedneitherbythetrunationerror,norbytheround-o

error.In setion 2,wepresent theorems established fromthe trunation error

whih enable one to determine the signiant digits of aniterate in ommon

with the exat limit.As round-o errors must also be taken into aount, in

setion 3, we briey review methods and onepts whih enable one to esti-

materound-oerror propagationwithaprobabilistiapproah:the CESTAC

method, the priniples of stohasti arithmeti and the implementation pro-

vided by Disrete Stohasti Arithmeti (DSA). We also present theoretial

results established instohasti arithmetifor the ontrol of arithmetialop-

erations. In setion 4, we desribe a strategy to ontrol both the trunation

and the round-o error during the omputation of a onverging sequene.

More preisely, under some assumptions on the speed of onvergene of the

sequene, we an determine in the optimal approximation the exat signi-

ant digits, i.e. not aeted by round-o errors, whih are in ommon with

the exat limit. In setion 5, we show how the theorems established in the

previous setions an be ombined to ontrol sequenes in whih eah term

is the limit of another sequene. We desribe a strategy whih an be used

fortheomputationof improperintegrals.Thelastsetionpresentsnumerial

experiments arriedout using DSA.

2 Theoretial results on onverging sequenes

2.1 Preliminary denitions

The theorems presented here have been established for sequenes having a

linear or an exponential onvergene speed. Therefore we reall properties

whih haraterize these two types of onvergene speed.

Denition 1 A sequene (I

n

) onverges to I with a linear speed if

I

n

I =K n

+o(

n

); where K 2R and 0<j j <1:

Withasequenehavingalinearonvergene,thenumberofiterationsrequired

to obtain an approximation of the limit with one more exat digit is quasi-

onstant.

Denition 2 A sequene (I

n

) onverges to I with an exponentialspeed if

I

n

I =K p

n

+o(

p n

); where K 2R; 0<j j<1 and p>1:

(3)

number of exat digits is quasi-multiplied by p.

Thetheoretialresultspresentedinthissetionrequirethenotionofsigniant

digitsommontotworealnumbers.Thereforeweneedthefollowingdenition.

Denition 3 Letaandb betwo realnumbers,thenumber ofsigniantdigits

that are ommonto a and b an be dened in R by

(1) for a6=b, C

a;b

=log

10

a+b

2(a b)

;

(2) 8a2R; C

a;a

=+1.

Then ja bj =

a+b

2

10 C

a;b

. For instane, if C

a;b

= 3, the relative dierene

between a et b is of the order of 10 3

whih means that a and b have three

signiantdigits in ommon.

Remark 4 The value of C

a;b

an seem surprising if we onsider the deimal

notations of a and b. For example, if a =2:4599976 and b =2:4600012, then

C

a;b

5:8. The dierene due to the sequenes of \0" or \9" is illusive. The

signiantdeimal digitsof aand b are reallydierentfromthe sixth position.

2.2 On sequenes with a linear onvergene

Letus onsider a sequene (I

n

) onverging linearly toI. From the number of

signiantdigitsommontotwosuessiveiterates,I

n andI

n+1

,thefollowing

theorem enables one to determine the number of signiant digits ommon

toI

n

and the exat limitI.

Theorem 5 Let(I

n

)beasequeneonverginglinearlytoI,i.e.whihsatises

I

n

I =K n

+o(

n

) where K 2R and 0<j j<1, then

C

In;I

n+1

=C

In;I +log

10

1

+o(1):

PROOF.

I

n

I =K n

+o(

n

) (1)

Byusing the same formulafor I

n+1

, one obtains

I

n I

n+1

=K n

(1 )+o(

n

) (2)

(4)

I

n

I

n I

=

I

n

K n

(1+o(1))

(3)

I

n

I

n I

= I

n

K n

(1+o(1)) (4)

Therefore

I

n

I

n I

= I

n

K n

+o

1

n

(5)

Then

I

n +I

2(I

n I)

= I

n

I

n I

1

2

= I

n

K n

+o

1

n

(6)

Similarly,fromequation (2), we dedue

I

n +I

n+1

2(I

n I

n+1 )

= I

n

I

n I

n+1 1

2

= I

n

K n

1

+o

1

n

(7)

From denition 3 and equation (6)we dedue

C

In;I

=log

10

I

n

K n

(1+o(1))

(8)

C

In;I

=log

10

I

n

K n

+log

10

j1+o(1)j (9)

Therefore

C

In;I

=log

10

I

n

K n

+o(1) (10)

Similarly,fromdenition 3 and equation (7)we dedue

C

I

n

;I

n+1

=log

10

I

n

K n

1

+o(1) (11)

Finally

C

I

n

;I

n+1

=C

I

n

;I +log

10

1

+o(1) (12)

(5)

beomesnegligible.Inthisase,fromthesigniantdigitsinommonbetween

I

n and I

n+1

, we an dedue the signiant digits in ommon between I

n and

the exat limit I.

If 1< <0, then log

10

2<log

10

1

<0. In this ase, if the onver-

gene zone is reahed, the signiant digits in ommon between I

n and I

n+1

are alsoin ommonwith I.

8 2℄0;1[, 9k 0 < 1 10 k

and therefore 0 < log

10

1

k. If the

onvergene zone isreahed, the signiantdigits inommonbetween I

n and

I

n+1

are alsoin ommonwith I, up tok digits. The lower is, the faster the

onvergene of the sequene is and the lower k is.

Remark 6 If 0 <

1

2

, then 0 < log

2

1

1. In this ase, if the

onvergene zone is reahed, the signiant bits in ommon between I

n and

I

n+1

are also in ommon with I, up to one.

2.3 On the trapezoidal and Simpson'srules

Theorem 5an beused forthe evaluationof integralswith the trapezoidalor

Simpson's rule. Indeed a sequene whih onverges linearly an be generated

by halving the step value ateah iteration.

Let f be a real funtion whih is C k

over [a;b℄ where k 3. Let I

n

be the

approximation of I = R

b

a

f(x)dx omputed using the trapezoidal rule with

step h = b a

2 n

. If f 0

(a) 6= f 0

(b), the development of the error up to order 4

is [1,8,9℄:

I

n

I = h

2

12 [f

0

(b) f 0

(a)℄+O(h 4

) (13)

Asthesequene (I

n

)satisesI

n

I =K n

+O(

2n

),withK = (b a)

2

12 [f

0

(b)

f 0

(a)℄and = 1

4

, theorem 5 ould apply.However the following property has

been established in [5℄:

C

I

n

;I

n+1

=C

I

n

;I +log

10

4

3

+O

1

4 n

: (14)

Let f be a real funtion whih is C k

over [a;b℄ where k 5. Let I

n

be the

approximation of I = R

b

a

f(x)dx omputed using Simpson's rule with step

h = b a

2 n

. If f (3)

(a) 6= f (3)

(b), the development of the error up to order 6

(6)

I

n I =

h 4

180 [f

(3)

(b) f (3)

(a)℄+O(h 6

): (15)

Thesequene (I

n

)satisesI

n

I =K n

+O(

3

2 n

),withK = (b a)

4

180 [f

(3)

(b)

f (3)

(a)℄ and = 1

16

. Therefore, as for the trapezoidal rule, theorem 5 ould

apply. The following property has atually been established in [5℄:

C

I

n

;I

n+1

=C

I

n

;I +log

10

16

15

+O

1

4 n

: (16)

If the onvergene zone is reahed, O

1

4 n

1. Furthermore log

10

4

3

and

log

10

16

15

represent atmost one bit. Indeed,for both rules, <

1

2

.Therefore,

if the onvergene zone is reahed, the signiant digits ommon to I

n and

I

n+1

are also ommontoI, the exat value of the integral,up to one bit.

2.4 On sequenes with an exponentialonvergene

Theoretialresultssimilartotheorem 5maybeestablishedforsequenes with

anexponentialonvergene.

Theorem 7 Let (I

n

)be asequene onvergingtoI withan exponentialspeed,

i.e. whih satises I

n

I = K p

n

+o(

p n

) where K 2 R, 0 < j j < 1 and

p>1, then

C

In;In+1

=C

In;I +log

10

1

p n

(p 1)

+o(1):

PROOF.

I

n

I =K p

n

+o(

p n

) (17)

Byusing the same formulafor I

n+1

, one obtains

I

n I

n+1

=K

p

n

p

n+1

+o(

p n

) (18)

From equation (17), wededue

I

n

I

n I

=

I

n

K p

n

(1+o(1))

(19)

I

n

I

n I

= I

n

K p

n

(1+o(1)) (20)

(7)

I

n

I

n I

= I

n

K p

n +o

1

p

n

(21)

Then

I

n +I

2(I

n I)

= I

n

I

n I

1

2

= I

n

K p

n +o

1

p

n

(22)

Similarly,fromequation (18), we dedue

I

n

I

n I

n+1

=

I

n

K ( p

n

p

n+1

) (1+o(1))

(23)

Therefore

I

n

I

n I

n+1

=

I

n

K ( p

n

p

n+1

) +o

1

p

n

(24)

Then

I

n +I

n+1

2(I

n I

n+1 )

= I

n

I

n I

n+1 1

2

=

I

n

K ( p

n

p

n+1

) +o

1

p

n

(25)

From denition 3 and equation (22) wededue

C

I

n

;I

=log

10

I

n

K p

n

(1+o(1))

(26)

Therefore

C

I

n

;I

=log

10

I

n

K p

n

+o(1) (27)

Similarly,fromdenition 3 and equation (25) we dedue

C

In;In+1

=log

10

I

n

K( p

n

p

n+1

)

(1+o(1))

(28)

Therefore

C

In;In+1

=log

10

I

n

K p

n

(1

p n

(p 1)

)

+o(1) (29)

Finally

C

I

n

;I

n+1

=C

I

n

;I +log

10

1

p n

(p 1)

+o(1) (30)

(8)

tweenI

n andI

n+1

arealsoommontotheexatlimitI,uptolog

10

1

1 p

n

(p 1)

.

If 0<j jM

n

,with M

n

=( 9

10 )

( 1

p n

(p 1) )

,then 0<log

10

1

1 p

n

(p 1)

1.The

signiantdigits ommontoI

n and I

n+1

are alsoommontoI,up toone. As

thenumbernof iterationsinreases,M

n

alsoinreases and theondition that

must satisfy in order to have log

10

1

1 p

n

(p 1)

1 beomes less and less

strit. For example, if the sequene (I

n

) has a quadrati onvergene, whih

is haraterized by p = 2, then M

1

> 0:94 and M

5

> 0:99. Similarly, as p

inreases, the speed of onvergene inreases and M

n

alsoinreases.

Remark 8 If the onvergenezone is reahed, thesigniant bits in ommon

betweenI

n andI

n+1

arealsoommontotheexatlimitI,uptolog

2

1

1 p

n

(p 1)

.

If 0< j j 2 (

1

p n

(1 p) )

, then 0 <log

2

1

1 p

n

(p 1)

1. This ondition on is

easily satised. Indeed in the ase of a quadrati onvergene (i.e. for p= 2)

if n=5, 2 (

1

p n

(1 p) )

>0:97.

The theoretialresultspresented inthis setionhavebeen establishedby tak-

ing into aount only the trunation error on two suessive iterates of a

sequene. Howeveromputedresultsare alsoaeted by round-oerror prop-

agation.Thenextsetiondesribeshowround-oerrorsanbeestimatedwith

a probabilisti approah in order to determine the exat signiant digits of

any omputed result.

3 Stohasti approah of round-o errors

3.1 The CESTAC method

The CESTAC(Contr^ole etEstimationStohastiquedes Arrondisde Caluls)

method,whihhas beendeveloped by LaPorteand Vignes[10,12,13℄,enables

one toestimatethe numberofexat signiantdigitsof anyomputed result.

This method is based on a probabilisti approah of round-o errors using a

randomrounding mode dened below.

Denition 9 Eah real number x, whih is not a oating-point number, is

bounded by two onseutive oating-point numbers: X (rounded down) and

X +

(rounded up). Therandom roundingmode denesthe oating-point num-

ber X representing x as being one of the two values X or X +

with the prob-

ability 1=2.

(9)

videsdierent results,due to dierentround-o errors.

It has been proved [2℄that aomputed result R ismodelledto the rst order

in2 p

as:

RZ =r+ n

X

i=1 g

i (d)2

p

z

i

(31)

wherer isthe exat result, g

i

(d) are oeÆientsdependingexlusively onthe

data and on the ode, p is the number of bits in the mantissa and z

i are

independent uniformlydistributed random variableson [ 1;1℄.

From equation (31), wededue that:

(1) the mean value of the randomvariable Z is the exat result r,

(2) under some assumptions, the distribution of Z is a quasi-Gaussian dis-

tribution.

Then by identifying R and Z, i.e. by negleting all the seond order terms,

Student's test an be used to determine the auray of R . Thus from N

samples R

i

; i = 1;2;:::;N, the number of deimal signiant digits ommon

toR and r an beestimated with the following equation.

C

R

=log

10 0

p

N

R

1

A

; (32)

where

R= 1

N N

X

i=1 R

i

and 2

= 1

N 1

N

X

i=1

R

i R

2

: (33)

is the value of Student's distribution for N 1 degrees of freedom and a

probability level 1 .

Thusthe implementationofthe CESTACmethodinaode providingaresult

R onsists in:

performing N times this ode with the random rounding mode, whih is

obtained by using randomly the rounding mode towards 1 or +1; we

thenobtain N samples R

i of R

hoosing asthe omputedresult the meanvalue R of R

i

,i=1;:::;N

estimatingwithequation(32)thenumberofexatdeimalsigniantdigits

of R .

In pratie N = 2 or N = 3 and = 0:05: Note that for N = 2, then

=12:706and for N =3,then

=4:4303:

(10)

potheses are:

(1) the round-o errors

i

are independent, entered uniformly distributed

randomvariables,

(2) the approximationto the rst order in2 p

is legitimate.

Conerningthersthypothesis,withtheuseoftherandomarithmeti,round-

oerrors

i

arerandomvariables,however, inpratie,theyarenotrigorously

entered and inthis ase Student's test givesa biased estimationof the om-

puted result. It has been proved [6℄that, with abias of afew , the error on

the estimation of the number of exat signiant digits of R is less than one

deimaldigit.Therefore even if the rst hypothesis isnot rigorouslysatised,

the reliability of the estimation obtained with equation (32) is not altered if

it isonsidered as exat up to one digit.

Conerning the seond hypothesis, the approximation to the rst order only

onerns multipliations and divisions. Indeed the round-o error generated

by an additionora subtrationdoesnot ontain any term of higher order. It

has been shown [2,4℄ that, if a omputed result beomes insigniant, i.e. if

the round-oerror itontains is ofthe same order of magnitudeas the result

itself, then the rst order approximation may be not legitimate. In pratie

the validation of the CESTAC method requires a dynami ontrol of multi-

pliations and divisions, during the exeution of the ode. This leads to the

synhronousimplementationofthemethod,i.e.totheparallelomputationof

the N samples R

i

,and alsotothe onept of omputationalzero, alsonamed

informatialzero [11℄.

Denition 10 During the run of a ode using the CESTAC method, an in-

termediate or a nal result R is a omputational zero, denoted by :0, if one

of the two following onditions holds:

8i;R

i

=0,

C

R 0.

Any omputed result R is a omputational zero if either R = 0, R being

signiant,or R isinsigniant. A omputationalzero is a value that annot

be dierentiated fromthe mathematialzero beause of its round-oerror.

From the synhronous implementationof the CESTAC method and the on-

ept of omputational zero, stohasti arithmeti [4,7,13℄ has been dened.

Twotypes of stohasti arithmeti atually exist: itan be either ontinuous

ordisrete.

(11)

3.2.1 Continuous stohasti arithmeti

Continuousstohastiarithmetiisamodellingofthesynhronous implemen-

tation of the CESTAC method. Byusing this implementation,sothat the N

runsof aode take plaeinparallel,theN resultsofeaharithmetialopera-

tion anbe onsideredasrealizations ofa Gaussianrandomvariableentered

onthe exat result. One an therefore dene anew number, alled stohasti

number, and a new arithmeti, alled (ontinuous) stohasti arithmeti, ap-

plied to these numbers. An equality onept and order relations, whih take

intoaountthenumberofexatsigniantdigitsofstohastioperands,have

alsobeen dened.

A stohasti number X is denoted by (m;

2

), where m is the mean value of

X and its standard deviation. Stohasti arithmetial operations (s+, s ,

s,s=) orrespond totermstothe rst orderin

m

of operationsbetween two

independent Gaussian randomvariables.

Denition 11 Let X

1

=(m

1

; 2

1

) and X

2

=(m

2

; 2

2

). Stohasti arithmetial

operations on X

1

and X

2

are dened as:

X

1

s+ X

2

=

m

1 +m

2

; 2

1 +

2

(34)

X

1

s X

2

=

m

1 m

2

; 2

1 +

2

(35)

X

1

s X

2

=

m

1 m

2

; m 2

2

1 +m

2

1

2

(36)

X

1 s= X

2

= 0

m

1

=m

2

;

1

m

2

+ m

1

2

m 2

2

!

2 1

A

with m

2

6=0: (37)

An auray an be assoiated toany stohasti number. IfX =(m;

2

),

exists (depending onlyon )suh that

P (X 2[m

;m+

℄)=1 ; (38)

I

;X

=[m

;m+

℄istheondeneintervalofmat1 .Thenumber

of deimal signiant digits ommon to all the elements of I

;X

and to m is

lower bounded by

C

;X

=log

10 jmj

!

: (39)

Thefollowingdenitionisthemodellingoftheonept ofomputationalzero,

previously introdued.

(12)

and only if

C

;X

0 or X =(0;0):

Inaordanewith theonept ofstohastizero,anew equality oneptand

new order relationshave been dened.

Denition 13 Let X

1

=(m

1

; 2

1

) and X

2

= (m

2

; 2

2

) be two stohasti num-

bers.

Stohastiequality, denoted by s=, is dened as:

X

1

s= X

2

if and only if X

1

s X

2

=0.

Stohastiinequalities, denoted by s> and s are dened as:

X

1

s> X

2

if and only if m

1

>m

2

and X

1

s6= X

2 ,

X

1

s X

2

if and only if m

1 m

2 or X

1

s= X

2 .

Continuous stohasti arithmeti is a modelling of the omputer arithmeti,

whihtakesintoaountround-oerrors.Thepropertiesofontinuousstohas-

ti arithmeti [3,4℄ have pointed out the theoretial dierenes between the

approximativearithmeti of aomputer and exat arithmeti.

3.2.2 Disrete Stohasti Arithmeti

Disrete StohastiArithmeti(DSA)has been dened fromthe synhronous

implementationof the CESTAC method. WithDSA, areal number beomes

an N-dimensional set and any operation on these N-dimensional sets is per-

formed element per element using the random rounding mode. The number

ofexat signiantdigitsofsuhanN-dimensionalset anbeestimatedfrom

equation(32).Fromthe onept ofomputationalzero previouslyintrodued,

anequality onept and orderrelations have been dened for DSA.

Denition 14 Let X and Y be N-samplesprovided bythe CESTACmethod.

Disrete stohastiequality denoted by ds=is dened as:

Xds=Y ifand only if X Y =:0.

Disrete stohastiinequalities denoted by ds> and ds are dened as:

Xds>Y ifand only if X >Y and Xds6=Y,

XdsY if and only if X Y or Xds=Y.

OrderrelationsinDSAare essentialtoontrolbranhingstatements.Beause

ofround-oerrors,ifAand B are twooating-pointnumbers andaand bthe

orresponding exat values,

a>b ;A>B and A>B ;a>b:

(13)

example,unsatisedstoppingriteriaorinniteloopsinalgorithmigeometry.

Taking into aount the numerialquality of the operands in order relations

enablesto partially solve these problems [3℄.

ThereforeDSAenablestoestimatetheimpatofround-oerrorsonanyresult

of a sienti ode and also to hek that no anomaly ourred during the

run, espeiallyin branhing statements.DSA isimplemented in the CADNA

library 1

.

Theaurayofastohastinumberanberelatedtothenumberofexatsig-

niantdigitsofanN-sampleprovidedbytheCESTACmethod.Indeed,when

N isa smallvalue (2or 3),whih isthe ase inpratie, the values obtained

with equations (32) and (39) are very lose. They represent in a omputed

result the number of signiant digits whih are not aeted by round-o

errors. Sothe two types of stohasti arithmetis are oherent.Properties es-

tablishedinthetheoretialframeworkofontinuousstohastiarithmetian

be appliedon aomputer via the pratialuse of DSA.

3.3 Theoretialresults on stohastioperations

Thetheoretialresultspresentedherehavebeenestablishedinontinuoussto-

hastiarithmeti.Theyenableonetoompareresultsofarithmetialstohas-

ti operations with those provided by the orresponding lassial operations

performed onexat values.

Let us onsider a numerial method whih aims to approximate an exat

value x

1

. This method may onsist for example in omputing an iterate of

a sequene (u

n

) suh that lim

n!1 u

n

= x

1

. Even using an arithmeti with

innite preision, the value obtained is not x

1

, but an approximation whih

is aetedby atrunation error. We ompare here the resultsobtained using

suh numerial methods in stohasti arithmeti with the exat values they

approximate.

Theorem 15 Let X

1

=(m

1

; 2

1

)bethe approximationof an exatvalue x

1 in

stohasti arithmeti. Let us assume that the exat signiant bits of X

1 , i.e.

not aeted by round-o errors, are in ommon with x

1

, up to p: the number

ofsigniantbitsof X

1

inommonwithx

1

islowerboundedbylog

2

jm

1 j

1

p.

Similarly let X

2

=(m

2

; 2

2

) be an approximation obtained in stohasti arith-

meti of an exat value x

2

, suhthat its exat signiant bits are in ommon

1

URLaddress:http://www.lip6.fr/adna/

(14)

2

Let be anexat arithmetialoperator:2f+; ;;=g ands the orre-

sponding stohastioperator s 2fs+;s ;s;s=g.

Then the exat signiant bits of X

1

s X

2

are in ommon with the exat

value x

1 x

2

, up to max(p;q).

PROOF. From equation (39), the number of exat signiant bits of X

1 ,

i.e. not aeted by round-o errors, is lower bounded by log

2

jm1j

1

. As the

number of signiant bits of X

1

in ommon with the exat value x

1

is lower

bounded by log

2

jm

1 j

1

p = log

2

jm

1 j

2 p

1

, to take into aount both the

trunation error and the round-o error on X

1

, one has to onsider not the

variane 2

1

, but (2 p

1 )

2

.

Similarly the number of signiant bits of X

2

in ommon with the exat

value x

2

is lower bounded by log

2

jm2j

2

q=log

2

jm2j

2 q

2

.

Fromequations(34)and(39), thenumberofexat signiantbits ofX

1 s+X

2

jm

1 +m

2 j

p

2

1 +

2

. To take into aount both the trun-

ation error and the round-o error on X

1 s+X

2

, one has to onsider not

the variane 2

1 +

2

, but (2 p

1 )

2

+ (2 q

2 )

2

. Therefore a lower bound for

the number of signiant bits of X

1 s+X

2

in ommon with the exat value

x

1 + x

2

is log

2

jm

1 +m

2 j

p

(2 p

1 )

2

+(2 q

2 )

2

, whih an be itself lower bounded by

log

2

jm

1 +m

2 j

p

2

1 +

2

max(p;q). Then the exat signiant bits of X

1 s+X

2 are

inommon with x

1 +x

2

, up tomax(p;q).

As X

1 s X

2

=(m

1 m

2

; 2

1 +

2

),the proof forthe subtration issimilar as

the one for the addition.

Fromequations(36)and(39),thenumberofexatsigniantbits ofX

1 sX

2

islowerboundedbylog

2

jm

1 m

2 j

p

m

2

1 +m

1

2

.Totakeintoaountboththetrun-

ation errorand the round-o erroron X

1 sX

2

, one has to onsider not the

variane m

2

1 +m

1

2

, but 2 2p

m

2

1 +2

2q

m

1

2

. Therefore a lower bound for

the number of signiant bits of X

1 sX

2

in ommon with the exat value

x

1 x

2

is log

2

jm

1 m

2 j

p

2 2p

m

2

1 +2

2q

m

1

2

log

2

jm

1 m

2 j

p

m

2

1 +m

1

2

max(p;q).Then the exat signiantbits ofX

1 sX

2

are in ommonwith x

1 x

2

, up tomax(p;q).

(15)

1 2

2 0

j m

1

m

2 j

q

(

1

m

2 )

2

+(

m

1

2

m 2

2 )

2 1

A

. To take into aount both the

trunation error and the round-o error on X

1 s=X

2

, one has to onsider not

the variane(

1

m

2 )

2

+( m

1

2

m 2

2 )

2

,but ( 2

p

1

m

2 )

2

+( 2

q

m

1

2

m 2

2 )

2

.Therefore alowerbound

for the numberof signiant bits of X

1 s=X

2

in ommonwith the exat value

x

1

=x

2

is log

2 0

B

j m

1

m

2 j

r

( 2

p

1

m

2 )

2

+(

2 q

m

1

2

m 2

2 )

2 1

C

A

log

2 0

j m

1

m

2 j

q

(

1

m

2 )

2

+(

m

1

2

m 2

2 )

2 1

A

max(p;q).ThentheexatsigniantbitsofX

1 s=X

2

are in ommonwith x

1

=x

2

, up tomax(p;q).

Theorem15enablesonetoontrolarithmetialoperationsperformedonom-

putedresultsofnumerialmethods.Thistheoremhasbeenprovedforstohas-

tiarithmetialoperations,whihareamodellingoftheoperationsperformed

inthe synhronous implementationof the CESTACmethod.In pratie,the-

orem 15 is used, aording to 3.2.2, for results obtained inDSA. In the next

setion,wepresent,inaordane withtheorem15and thetheoretial results

presented insetion2,astrategytodynamiallyontrolonvergingsequenes

omputed inDSA.

4 A strategy for a dynamial ontrol of onverging sequenes

Whenanumerialalgorithmrequiresthe evaluationofthelimitofasequene,

this limitis approximated by one of the iterates.As the numberof iterations

inreases, the trunation error usually dereases, but the round-o error in-

reases. Therefore the hoie of the optimaliterate may be problemati.

DSA enables one to estimate the number of exat signiant digits of any

omputed result, i.e. its signiant digits whih are not aeted by round-

o error propagation. Let us onsider the omputation of a sequene (I

n )

in DSA and let us assume that the onvergene zone is reahed. If disrete

stohastiequalityisahieved fortwosuessiveiterates,i.e.I

n I

n+1

=:0,

the dierenebetween I

n and I

n+1

isonlydue toround-o errors andfurther

iterations are useless. The optimal iterate I

n+1

an therefore be dynamially

determined at run time. Furthermore, if the sequene (I

n

) onverges at least

linearlytoI,fromsetion2,the exatsigniantdigitsofI

n+1

are inommon

with I, up tok digits. The value k, whih depends onthe onvergene speed

of (I

n

), an be determinedfrom theorem 5 or7.

(16)

with the tehnique of step halving previously desribed. If the onvergene

zone is reahed and omputations are performed until the dierene between

two suessive iterates isinsigniant, then, fromsetion 2, the exat signi-

antbits ofthelastiterateareinommonwiththeexatvalueof theintegral,

up to one.

Moregenerally,if asequene (I

n

)onvergingatleastlinearlytoI isomputed

usingDSA,theoptimaliterateanbedynamiallydeterminedandthenumber

of signiantdigits ithas inommonwiththe exat limitI an beevaluated.

If operations on limits of sequenes are required in a numerial algorithm, a

similarstrategy, based onthe followingtheorem, an be used.

Theorem 16 Let us onsider the omputation in DSA of two sequenes (I

k )

and (J

k

) onverging at least linearly to I and J respetively.

Let I

n

(respetively J

m

) be an iterate suh that its exat signiant bits are in

ommon with I up to p (respetively J up to q).

If we denote by an exat arithmetial operator, then the exat signiant

bits of I

n

J

m

are in ommon with the exat value IJ, up to max(p;q).

PROOF. From setion 2, as the sequene (I

k

) onverges at least linearly

to I, if it is omputed until the dierene between two suessive iterates is

insigniant,i.e.I

n 1 I

n

=:0,thenweandeterminethevaluepsuhthat

the exat signiantbits ofI

n

are inommonwithI,up top.Similarly ifthe

sequene (J

k

)isomputed until J

m 1 J

m

=:0, thenwean determinethe

value q suh that the exat signiant bits of J

m

are in ommon with J, up

to q. Aording to the appliation of theorem 15 in DSA, if an arithmetial

operation is performed on I

n

and J

m

, the exat signiant bits of the result

are those obtained with the same operation performed on I and J, up to

max(p;q).

Remark 17 Aording to setion 2, if the onvergene of the sequenes (I

k )

and (J

k

) is suÆiently fast, then p=q =1. Inthis ase, the exat signiant

bits of the result obtained are those provided by the same operation on the

limits, up to one.

More generally, in a numerial algorithm involving the omputation of sev-

eral sequenes, if eahsequene is omputed until the dierene between two

suessive iteratesis insigniant, eah limitis approximated by the optimal

iterate.Aordingtosetion2,if eah sequene onverges atleastlinearly,we

an evaluate the number of signiant digits ommon between the limitand

itsapproximation.Ifarithmetialoperations are performedon theseapproxi-

(17)

are ommonwith the result of the same operationsperformed onthe limits.

5 Dynamial ontrol of ombined sequenes

This setionshows howto approximate the limitof asequene by itsoptimal

iterate, this iterate being itself the limit of another sequene. The theorems

presented in setions 2 and 3 an be ombined to determine the number of

digits of the approximation obtained whih are in ommon with the exat

result. In the strategies desribed in this setion, small letters denote exat

values and apital letters the orresponding approximations omputed using

DSA.

5.1 A strategy to ompute ombined sequenes

Weonsiderasequeneinwhiheahtermu

m

isthelimitofanothersequene.

More preisely, let (u

m

) be a sequene onverging at least linearly to u and,

for allm, let(u

m;n

)be a sequene onverging atleast linearly to u

m .

For all m, let U

m

be the approximation of u

m

omputed using DSA. U

m is

obtainedby omputingthe sequene(u

m;n

)until,intheonvergenezone,the

dierenebetween twosuessive iteratesis insigniant.

As for allm, the sequene (u

m;n

) onverges at least linearly to u

m

,aording

tosetion2,one andeterminethe value qsuhthatthe exatsigniantbits

of U

m

are ommonto u

m

,up toq.

Figure 1 represents the signiant bits of U

m

and U

m+1

if the dierene

U

m U

m+1

is insigniant. In this ase, the exat signiant bits of U

m+1

are ommonto U

m

and are alsoommonto u

m

and u

m+1

, up to q.

As the sequene (u

m

) onverges atleast linearly to u, one an determine the

value p suh that the bits ommon to u

m

and u

m+1

are ommon with u, up

top.

Consequently if thediereneU

m U

m+1

isinsigniant,theexat signiant

bits of U

m+1

are ommon with u,up top+q.

(18)

PSfrag replaements

U

m

bitsommonwithu

m

U

m+1

bitsommon withu

m+1

bitsommonwithu pbits

q bits q bits

signiantbitsnotaetedbyround-oerrorsand ommontoU

m and U

m+1 signiant bitsnotaeted by round-oerrors

Fig.1.Signiant bitsof U

m and U

m+1

5.2 Dynamialontrolof integrals on an innite domain

Letus onsider the omputation of animproper integral g = R

1

0

(x)dx.The

innite interval of integration is partitioned into nite intervals of length L.

Letf

j

= R

(j+1)L

jL

(x)dx and g

m

= P

m

j=0 f

j , lim

m!1 g

m

=g.

ganbenumeriallyapproximatedbyaniterateg

m

,mbeingsuÆientlyhigh.

The optimal number of iterates to ompute an be determined dynamially

using DSA.

LetF

j;n

bethe approximationof f

j

omputed using the trapezoidalor Simp-

son's rule with step L

2 n

. For all j, the sequene (F

j;n

) is omputed until the

dierenebetween two suessiveiteratesisinsigniant.Thisisnot ahieved

at the same iteration of all values of j. Let n

j

be the iteration at whih

F

j;n

j 1

F

j;n

j

=:0.

Aording to setion 2, for all j, the exat signiant bits of F

j;n

j

are in

ommon with f

j

, up to one. Let G

m

= P

m

j=0 F

j;n

j

. Aording to theorem 16,

the exat signiant bits of G

m

are in ommonwith g

m

, up to one.

Figure 2 represents the signiant bits of G

m

and G

m+1

if the dierene

G

m G

m+1

is insigniant. In this ase, the exat signiant bits of G

m+1

are ommonto G

m

and are alsoommonto g

m and g

m+1

, up toone.

We assumethat the sequene (g

m

)onverges atleast linearlyto g.Aording

to setion 2, if the onvergene zone is reahed, C

gm;gm+1

=C

gm;g

+ where

represents p bits. Therefore the bits ommon to g

m and g

m+1

are ommon

with g, up top.

(19)

PSfrag replaements

G

m

bitsommonwithg

m

G

m+1

bitsommonwithg

m+1

bitsommon withg pbits

signiantbitsnotaetedbyround-oerrorsand ommontoG

m and G

m+1 signiant bitsnotaeted by round-oerrors

Fig.2. Signiant bitsofG

m

and G

m+1

ConsequentlyifthediereneG

m G

m+1

isinsigniant,theexatsigniant

bits of G

m+1

are ommonwith g,up top+1.

6 Numerial experiments

Numerialexperiments have been arriedout using DSA implemented inthe

CADNA library. Two examples are presented: the omputation of a denite

integral and the omputation of anintegral onan inniteinterval.

6.1 Computation of a denite integral

Letus onsider the integralI = Z

1

0 6x

3

15x 2

28x+22

9x 2

+12x+4

dx=1.

I has been estimated with the trapezoidal and Simpson's rules using the

strategy desribed in setion 2.ApproximationsI

n

have been omputed with

step 1

2 n

until the dierene I

n I

n+1

is insigniant. From setion 2, we an

guarantee that the exat signiant bits of the lastiterate I

N

are inommon

with the exat value of I, up toone.

Table 1 presents for both rules the approximations of I obtained in single

anddoublepreision. Thenumberofexat signiantdigitsofeahresulthas

been estimated using DSA. For eah sequene, the exat signiant digits of

the lastiterate are reported intable 1.

Weannotiethattheexat signiantdigitsofeahapproximationobtained

(20)

ApproximationsofI

rule insingle preision indoublepreision

trapezoidal I

9

=0:10000E+01 I

21

=0:100000000000E +001

Simpson I

8

=0:100000E+01 I

13

=0:1000000000000E +001

are in ommon with I. The number of iterations requested for the stopping

riterion to be satised depends of ourse on the preision hosen, but also

onthe quadrature method used. Whatever the preisionis,less iterationsare

performed with Simpson's rule than with the trapezoidal rule. This is due

to the dierent onvergene speeds of the omputed sequenes. Indeed the

approximation of I is of order 2 with the trapezoidal rule and of order 4

with Simpson's rule. For eah rule, the error on the last iterate jI

N

Ij is

insigniant. Beause of round-o error propagation, the omputer an not

distinguish I

N

fromI.

6.2 Computation of an improper integral

Letus onsider the improperintegral g = Z

1

0 e

ax

dx= 1

a

, wherea >0.

g has been estimated using the strategy desribed in 5.2. Using the same

notations asin5.2,letg

m

= P

m

j=0 f

j

,wheref

j

= R

(j+1)L

jL e

ax

dx. Theapprox-

imationsofthe integralsf

j

are omputedwithSimpson's ruleusing DSA.For

every j, a sequene is omputed until the dierene between two suessive

iteratesis insigniant.

As g

m

g = R

1

(m+1)L e

ax

dx =

m+1

a

, where = e aL

, the sequene (g

m )

onverges linearly to g. Therefore theorem 5 an apply: if the onvergene

zoneisreahed, thesigniantbitsommontotwosuessiveiteratesarealso

ommonto g,up tolog

2 (

1

1 ).

LetG

m

betheapproximationofg

m

omputedusingDSA.Thesequene (G

m )

isomputeduntilthedierenebetweentwosuessiveiteratesisinsigniant.

We denote by M the iteration at whih G

M 1 G

M

= :0. Aording to

setion 5.2, the exat signiant bits of G

M

are in ommon with g, up to

log

2 (

1

) +1. Therefore the exat signiant deimal digits of G

M

are in

ommonwith g up to Æ, where Æ=log

10 (

2

1 ).

Table 2 presents for a = 1 and dierent values of L the approximations G

M

obtainedindoublepreision.ThenumberofexatsigniantdigitsofG

M not

inommonwithg isapproximatedbyÆ.AsthelengthLinreases,thenumber

M of integrals f

j

to be approximated dereases. Only the exat signiant

(21)

M

o error propagation. We notie that the number of exat signiant digits

obtained(fromthirteentofteen)issatisfyingforomputationsarriedoutin

double preision. The exat signiant digits whih are not in ommon with

the exat value g = 1 an easily be identied. For example, if L = 10 1

,

among the fourteen exat signiant digitsof G

M

, the two lastdigits are not

in ommon with g. We notie that, for every approximation G

M

reported in

table 2,its exat signiant digits are inommonwith g up todÆe.

Table 2

Resultsobtained withSimpson's rulefora=1

L Æ M G

M

10 2

2.3 2335 0.9999999999276 E+000

10 1

1.3 284 0.9999999999995 3E+00 0

1 0.5 33 0.9999999999999 96 E+000

10 0.3 4 0.9999999999999 9E+00 0

50 0.3 2 0.1000000000000 4E+00 1

Table 3 presents for a = 10 5

and dierent values of L the exat signiant

digits of the approximations G

M

obtained in double preision. As in table 2,

we notie that if the length L inreases, the number M of integrals f

j to be

approximated dereases. For eah approximationG

M

obtained, we an easily

identify itsexat signiant digits whih are in ommonwith the exat value

g =10 5

.As intable2,we notiethatthe exat signiantdigitsof G

M

are in

ommonwith g up to dÆe.

Table 3

Resultsobtained withSimpson's rulefor a=10 5

L Æ M G

M

10 2

3.3 19136 0.999999995109E+00 5

10 3

2.3 2346 0.9999999999352 E+005

10 4

1.3 279 0.9999999999992 3E+00 5

10 5

0.5 33 0.9999999999999 95 E+005

10 6

0.3 5 0.9999999999999 9E+00 5

7 Conlusion

Disrete Stohasti Arithmetian beused todynamially determinethe op-

timaliterateofaonvergingsequene. Furthermore,ifthe sequene onverges

(22)

the limitanbeestimated.Thisnumberdepends onthespeed ofonvergene

of the sequene.

If an arithmetial operation is performed on the optimal iterates of two se-

quenes, we an determine the signiant digits of the omputed result om-

mon withthe exat resultof the sameoperationperformedonthe two limits.

This allows a dynamial ontrol of numerial algorithms involving the om-

putationofseveral sequenes. Integralsonaninniteintervalan beapproxi-

matedbyomputingseveralonvergingsequenes. Byontrollingdynamially

eah sequene, we an determine the signiant digits of the approximation

ommonwith the exat value of the integral.

The sequenes examined in this paper all onverge to a salar value. A per-

spetivetothisworkouldbethe numerialvalidationofsequenes ofvetors

involved forexample in iterativemethods for solving linear systems.

Referenes

[1℄ R. L. Burden and J. D. Faires, Numerial analysis, 7th ed., Brooks-Cole

Publishing,2001.

[2℄ J.-M. Chesneaux, Study of the omputing auray by using probabilisti

approah,in:Contributiontoomputerarithmetiandself-validatingnumerial

methods, C.Ullrihed., IMACS,New Brunswik, NJ, 1990,pp.19-30.

[3℄ J.-M. Chesneaux,Theequalityrelations insienti omputing, Num. Algo.7

(1994) 129-143.

[4℄ J.-M. Chesneaux, L'arithmetique stohastique et le logiiel CADNA,

Habilitation a diriger des reherhes, Universite Pierre et Marie Curie, Paris,

1995.

[5℄ J.-M.ChesneauxandF.Jezequel,Dynamialontrolofomputationsusingthe

Trapezoidaland Simpson'srules,J. Univ.Comput. Si.4 (1) (1998)2-10.

[6℄ J.-M.Chesneaux and J.Vignes, Sur larobustessede la methode CESTAC, C.

R. Aad.Si. Paris Ser. I Math. 307 (1988) 855-860.

[7℄ J.-M.ChesneauxandJ.Vignes,Lesfondementsdel'arithmetiquestohastique,

C.R. Aad. Si.Paris Ser. I Math.315 (1992) 1435-1440.

[8℄ M. K. Jain, R.K. Jain and S.R.K. Iyengar, Numerial methodsforsienti

and engineeringomputation,Halsted Press,1985.

[9℄ J. H. Mathews,Numerial methods formathematis, siene and engineering,

2nd ed., Prentie-Hall,1992.

(23)

Proessing 74, North-Holland,1974.

[11℄J. Vignes, Zero mathematique et zero informatique, C. R. Aad. Si. Paris

Ser. I Math.303 (1986) 997-1000;also: La Vie des Sienes 4(1) (1987)1-13.

[12℄J. Vignes, Estimation de la preision des resultats de logiiels numeriques,

La Viedes Sienes 7(2) (1990) 93-145.

[13℄J. Vignes, A stohasti arithmeti for reliable sienti omputation, Math.

Comput. Simulation 35 (1993) 233-261.

[14℄J.Vignes,Astohastiapproahtotheanalysisofround-oerrorpropagation.

AsurveyoftheCESTACmethod.in:Pro.2nd Real Numbers andComputers

onferene,Marseille,Frane, 1996,pp. 233-251.