omputation
Fabienne Jezequel
Laboratoire d'Informatique de Paris 6 - CNRS UMR7606,
4 plaeJussieu, 75252 Paris edex 05,Frane
Abstrat
Undersome assumptionsonthe speedof onvergeneof asequene,thesigniant
digits of one of its iterates in ommon with the exat limit an be determined
by omparing this iterate with the next one. Using a nite preision arithmeti,
if omputations are performed untilthe dierene between two suessive iterates
is insigniant, the global error on the last iterate is minimal. Furthermore, for
sequenesonvergingatleastlinearly,weandetermineintheresultobtainedwhih
exat signiant digits, i.e. not aeted by round-o errors, are in ommon with
the exat limit. This strategy an be used for the omputation of integrals with
thetrapezoidalorSimpson's rule.Asequene isthengeneratedbyhalvingthestep
value at eah iteration, while the dierene between two suessive iterates is a
signiantvalue.Theexat signiantdigitsof thelastiterateareinommonwith
theexat valueoftheintegral, uptoonebit.Thiskindofstrategyisthenextended
to numerial algorithms involving several sequenes, suh as the approximation of
integrals on aninnite interval.
Key words: onvergingsequenes, numerialvalidation, quadraturemethods,
trapezoidal rule,Simpson's rule,CESTAC method,DisreteStohasti Arithmeti
1 Introdution
In a numerial method whih involves the omputation of a onverging se-
quene, the limit is approximated by one of the iterates. It may be diÆult
toestimateinthe hosen iteratethe global error,onsisting of the trunation
error and the round-oerror. The optimaliterate, i.e. the approximation for
whih the global error isminimal, an be omputed dynamially [14℄. In this
Email address: Fabienne.Jezequellip6.fr(FabienneJezequel).
iterate,whihareaetedneitherbythetrunationerror,norbytheround-o
error.In setion 2,wepresent theorems established fromthe trunation error
whih enable one to determine the signiant digits of aniterate in ommon
with the exat limit.As round-o errors must also be taken into aount, in
setion 3, we briey review methods and onepts whih enable one to esti-
materound-oerror propagationwithaprobabilistiapproah:the CESTAC
method, the priniples of stohasti arithmeti and the implementation pro-
vided by Disrete Stohasti Arithmeti (DSA). We also present theoretial
results established instohasti arithmetifor the ontrol of arithmetialop-
erations. In setion 4, we desribe a strategy to ontrol both the trunation
and the round-o error during the omputation of a onverging sequene.
More preisely, under some assumptions on the speed of onvergene of the
sequene, we an determine in the optimal approximation the exat signi-
ant digits, i.e. not aeted by round-o errors, whih are in ommon with
the exat limit. In setion 5, we show how the theorems established in the
previous setions an be ombined to ontrol sequenes in whih eah term
is the limit of another sequene. We desribe a strategy whih an be used
fortheomputationof improperintegrals.Thelastsetionpresentsnumerial
experiments arriedout using DSA.
2 Theoretial results on onverging sequenes
2.1 Preliminary denitions
The theorems presented here have been established for sequenes having a
linear or an exponential onvergene speed. Therefore we reall properties
whih haraterize these two types of onvergene speed.
Denition 1 A sequene (I
n
) onverges to I with a linear speed if
I
n
I =K n
+o(
n
); where K 2R and 0<j j <1:
Withasequenehavingalinearonvergene,thenumberofiterationsrequired
to obtain an approximation of the limit with one more exat digit is quasi-
onstant.
Denition 2 A sequene (I
n
) onverges to I with an exponentialspeed if
I
n
I =K p
n
+o(
p n
); where K 2R; 0<j j<1 and p>1:
number of exat digits is quasi-multiplied by p.
Thetheoretialresultspresentedinthissetionrequirethenotionofsigniant
digitsommontotworealnumbers.Thereforeweneedthefollowingdenition.
Denition 3 Letaandb betwo realnumbers,thenumber ofsigniantdigits
that are ommonto a and b an be dened in R by
(1) for a6=b, C
a;b
=log
10
a+b
2(a b)
;
(2) 8a2R; C
a;a
=+1.
Then ja bj =
a+b
2
10 C
a;b
. For instane, if C
a;b
= 3, the relative dierene
between a et b is of the order of 10 3
whih means that a and b have three
signiantdigits in ommon.
Remark 4 The value of C
a;b
an seem surprising if we onsider the deimal
notations of a and b. For example, if a =2:4599976 and b =2:4600012, then
C
a;b
5:8. The dierene due to the sequenes of \0" or \9" is illusive. The
signiantdeimal digitsof aand b are reallydierentfromthe sixth position.
2.2 On sequenes with a linear onvergene
Letus onsider a sequene (I
n
) onverging linearly toI. From the number of
signiantdigitsommontotwosuessiveiterates,I
n andI
n+1
,thefollowing
theorem enables one to determine the number of signiant digits ommon
toI
n
and the exat limitI.
Theorem 5 Let(I
n
)beasequeneonverginglinearlytoI,i.e.whihsatises
I
n
I =K n
+o(
n
) where K 2R and 0<j j<1, then
C
In;I
n+1
=C
In;I +log
10
1
1
+o(1):
PROOF.
I
n
I =K n
+o(
n
) (1)
Byusing the same formulafor I
n+1
, one obtains
I
n I
n+1
=K n
(1 )+o(
n
) (2)
I
n
I
n I
=
I
n
K n
(1+o(1))
(3)
I
n
I
n I
= I
n
K n
(1+o(1)) (4)
Therefore
I
n
I
n I
= I
n
K n
+o
1
n
(5)
Then
I
n +I
2(I
n I)
= I
n
I
n I
1
2
= I
n
K n
+o
1
n
(6)
Similarly,fromequation (2), we dedue
I
n +I
n+1
2(I
n I
n+1 )
= I
n
I
n I
n+1 1
2
= I
n
K n
1
1
+o
1
n
(7)
From denition 3 and equation (6)we dedue
C
In;I
=log
10
I
n
K n
(1+o(1))
(8)
C
In;I
=log
10
I
n
K n
+log
10
j1+o(1)j (9)
Therefore
C
In;I
=log
10
I
n
K n
+o(1) (10)
Similarly,fromdenition 3 and equation (7)we dedue
C
I
n
;I
n+1
=log
10
I
n
K n
1
1
+o(1) (11)
Finally
C
I
n
;I
n+1
=C
I
n
;I +log
10
1
1
+o(1) (12)
beomesnegligible.Inthisase,fromthesigniantdigitsinommonbetween
I
n and I
n+1
, we an dedue the signiant digits in ommon between I
n and
the exat limit I.
If 1< <0, then log
10
2<log
10
1
1
<0. In this ase, if the onver-
gene zone is reahed, the signiant digits in ommon between I
n and I
n+1
are alsoin ommonwith I.
8 2℄0;1[, 9k 0 < 1 10 k
and therefore 0 < log
10
1
1
k. If the
onvergene zone isreahed, the signiantdigits inommonbetween I
n and
I
n+1
are alsoin ommonwith I, up tok digits. The lower is, the faster the
onvergene of the sequene is and the lower k is.
Remark 6 If 0 <
1
2
, then 0 < log
2
1
1
1. In this ase, if the
onvergene zone is reahed, the signiant bits in ommon between I
n and
I
n+1
are also in ommon with I, up to one.
2.3 On the trapezoidal and Simpson'srules
Theorem 5an beused forthe evaluationof integralswith the trapezoidalor
Simpson's rule. Indeed a sequene whih onverges linearly an be generated
by halving the step value ateah iteration.
Let f be a real funtion whih is C k
over [a;b℄ where k 3. Let I
n
be the
approximation of I = R
b
a
f(x)dx omputed using the trapezoidal rule with
step h = b a
2 n
. If f 0
(a) 6= f 0
(b), the development of the error up to order 4
is [1,8,9℄:
I
n
I = h
2
12 [f
0
(b) f 0
(a)℄+O(h 4
) (13)
Asthesequene (I
n
)satisesI
n
I =K n
+O(
2n
),withK = (b a)
2
12 [f
0
(b)
f 0
(a)℄and = 1
4
, theorem 5 ould apply.However the following property has
been established in [5℄:
C
I
n
;I
n+1
=C
I
n
;I +log
10
4
3
+O
1
4 n
: (14)
Let f be a real funtion whih is C k
over [a;b℄ where k 5. Let I
n
be the
approximation of I = R
b
a
f(x)dx omputed using Simpson's rule with step
h = b a
2 n
. If f (3)
(a) 6= f (3)
(b), the development of the error up to order 6
I
n I =
h 4
180 [f
(3)
(b) f (3)
(a)℄+O(h 6
): (15)
Thesequene (I
n
)satisesI
n
I =K n
+O(
3
2 n
),withK = (b a)
4
180 [f
(3)
(b)
f (3)
(a)℄ and = 1
16
. Therefore, as for the trapezoidal rule, theorem 5 ould
apply. The following property has atually been established in [5℄:
C
I
n
;I
n+1
=C
I
n
;I +log
10
16
15
+O
1
4 n
: (16)
If the onvergene zone is reahed, O
1
4 n
1. Furthermore log
10
4
3
and
log
10
16
15
represent atmost one bit. Indeed,for both rules, <
1
2
.Therefore,
if the onvergene zone is reahed, the signiant digits ommon to I
n and
I
n+1
are also ommontoI, the exat value of the integral,up to one bit.
2.4 On sequenes with an exponentialonvergene
Theoretialresultssimilartotheorem 5maybeestablishedforsequenes with
anexponentialonvergene.
Theorem 7 Let (I
n
)be asequene onvergingtoI withan exponentialspeed,
i.e. whih satises I
n
I = K p
n
+o(
p n
) where K 2 R, 0 < j j < 1 and
p>1, then
C
In;In+1
=C
In;I +log
10
1
1
p n
(p 1)
+o(1):
PROOF.
I
n
I =K p
n
+o(
p n
) (17)
Byusing the same formulafor I
n+1
, one obtains
I
n I
n+1
=K
p
n
p
n+1
+o(
p n
) (18)
From equation (17), wededue
I
n
I
n I
=
I
n
K p
n
(1+o(1))
(19)
I
n
I
n I
= I
n
K p
n
(1+o(1)) (20)
I
n
I
n I
= I
n
K p
n +o
1
p
n
(21)
Then
I
n +I
2(I
n I)
= I
n
I
n I
1
2
= I
n
K p
n +o
1
p
n
(22)
Similarly,fromequation (18), we dedue
I
n
I
n I
n+1
=
I
n
K ( p
n
p
n+1
) (1+o(1))
(23)
Therefore
I
n
I
n I
n+1
=
I
n
K ( p
n
p
n+1
) +o
1
p
n
(24)
Then
I
n +I
n+1
2(I
n I
n+1 )
= I
n
I
n I
n+1 1
2
=
I
n
K ( p
n
p
n+1
) +o
1
p
n
(25)
From denition 3 and equation (22) wededue
C
I
n
;I
=log
10
I
n
K p
n
(1+o(1))
(26)
Therefore
C
I
n
;I
=log
10
I
n
K p
n
+o(1) (27)
Similarly,fromdenition 3 and equation (25) we dedue
C
In;In+1
=log
10
I
n
K( p
n
p
n+1
)
(1+o(1))
(28)
Therefore
C
In;In+1
=log
10
I
n
K p
n
(1
p n
(p 1)
)
+o(1) (29)
Finally
C
I
n
;I
n+1
=C
I
n
;I +log
10
1
1
p n
(p 1)
+o(1) (30)
tweenI
n andI
n+1
arealsoommontotheexatlimitI,uptolog
10
1
1 p
n
(p 1)
.
If 0<j jM
n
,with M
n
=( 9
10 )
( 1
p n
(p 1) )
,then 0<log
10
1
1 p
n
(p 1)
1.The
signiantdigits ommontoI
n and I
n+1
are alsoommontoI,up toone. As
thenumbernof iterationsinreases,M
n
alsoinreases and theondition that
must satisfy in order to have log
10
1
1 p
n
(p 1)
1 beomes less and less
strit. For example, if the sequene (I
n
) has a quadrati onvergene, whih
is haraterized by p = 2, then M
1
> 0:94 and M
5
> 0:99. Similarly, as p
inreases, the speed of onvergene inreases and M
n
alsoinreases.
Remark 8 If the onvergenezone is reahed, thesigniant bits in ommon
betweenI
n andI
n+1
arealsoommontotheexatlimitI,uptolog
2
1
1 p
n
(p 1)
.
If 0< j j 2 (
1
p n
(1 p) )
, then 0 <log
2
1
1 p
n
(p 1)
1. This ondition on is
easily satised. Indeed in the ase of a quadrati onvergene (i.e. for p= 2)
if n=5, 2 (
1
p n
(1 p) )
>0:97.
The theoretialresultspresented inthis setionhavebeen establishedby tak-
ing into aount only the trunation error on two suessive iterates of a
sequene. Howeveromputedresultsare alsoaeted by round-oerror prop-
agation.Thenextsetiondesribeshowround-oerrorsanbeestimatedwith
a probabilisti approah in order to determine the exat signiant digits of
any omputed result.
3 Stohasti approah of round-o errors
3.1 The CESTAC method
The CESTAC(Contr^ole etEstimationStohastiquedes Arrondisde Caluls)
method,whihhas beendeveloped by LaPorteand Vignes[10,12,13℄,enables
one toestimatethe numberofexat signiantdigitsof anyomputed result.
This method is based on a probabilisti approah of round-o errors using a
randomrounding mode dened below.
Denition 9 Eah real number x, whih is not a oating-point number, is
bounded by two onseutive oating-point numbers: X (rounded down) and
X +
(rounded up). Therandom roundingmode denesthe oating-point num-
ber X representing x as being one of the two values X or X +
with the prob-
ability 1=2.
videsdierent results,due to dierentround-o errors.
It has been proved [2℄that aomputed result R ismodelledto the rst order
in2 p
as:
RZ =r+ n
X
i=1 g
i (d)2
p
z
i
(31)
wherer isthe exat result, g
i
(d) are oeÆientsdependingexlusively onthe
data and on the ode, p is the number of bits in the mantissa and z
i are
independent uniformlydistributed random variableson [ 1;1℄.
From equation (31), wededue that:
(1) the mean value of the randomvariable Z is the exat result r,
(2) under some assumptions, the distribution of Z is a quasi-Gaussian dis-
tribution.
Then by identifying R and Z, i.e. by negleting all the seond order terms,
Student's test an be used to determine the auray of R . Thus from N
samples R
i
; i = 1;2;:::;N, the number of deimal signiant digits ommon
toR and r an beestimated with the following equation.
C
R
=log
10 0
p
N
R
1
A
; (32)
where
R= 1
N N
X
i=1 R
i
and 2
= 1
N 1
N
X
i=1
R
i R
2
: (33)
is the value of Student's distribution for N 1 degrees of freedom and a
probability level 1 .
Thusthe implementationofthe CESTACmethodinaode providingaresult
R onsists in:
performing N times this ode with the random rounding mode, whih is
obtained by using randomly the rounding mode towards 1 or +1; we
thenobtain N samples R
i of R
hoosing asthe omputedresult the meanvalue R of R
i
,i=1;:::;N
estimatingwithequation(32)thenumberofexatdeimalsigniantdigits
of R .
In pratie N = 2 or N = 3 and = 0:05: Note that for N = 2, then
=12:706and for N =3,then
=4:4303:
potheses are:
(1) the round-o errors
i
are independent, entered uniformly distributed
randomvariables,
(2) the approximationto the rst order in2 p
is legitimate.
Conerningthersthypothesis,withtheuseoftherandomarithmeti,round-
oerrors
i
arerandomvariables,however, inpratie,theyarenotrigorously
entered and inthis ase Student's test givesa biased estimationof the om-
puted result. It has been proved [6℄that, with abias of afew , the error on
the estimation of the number of exat signiant digits of R is less than one
deimaldigit.Therefore even if the rst hypothesis isnot rigorouslysatised,
the reliability of the estimation obtained with equation (32) is not altered if
it isonsidered as exat up to one digit.
Conerning the seond hypothesis, the approximation to the rst order only
onerns multipliations and divisions. Indeed the round-o error generated
by an additionora subtrationdoesnot ontain any term of higher order. It
has been shown [2,4℄ that, if a omputed result beomes insigniant, i.e. if
the round-oerror itontains is ofthe same order of magnitudeas the result
itself, then the rst order approximation may be not legitimate. In pratie
the validation of the CESTAC method requires a dynami ontrol of multi-
pliations and divisions, during the exeution of the ode. This leads to the
synhronousimplementationofthemethod,i.e.totheparallelomputationof
the N samples R
i
,and alsotothe onept of omputationalzero, alsonamed
informatialzero [11℄.
Denition 10 During the run of a ode using the CESTAC method, an in-
termediate or a nal result R is a omputational zero, denoted by :0, if one
of the two following onditions holds:
8i;R
i
=0,
C
R 0.
Any omputed result R is a omputational zero if either R = 0, R being
signiant,or R isinsigniant. A omputationalzero is a value that annot
be dierentiated fromthe mathematialzero beause of its round-oerror.
From the synhronous implementationof the CESTAC method and the on-
ept of omputational zero, stohasti arithmeti [4,7,13℄ has been dened.
Twotypes of stohasti arithmeti atually exist: itan be either ontinuous
ordisrete.
3.2.1 Continuous stohasti arithmeti
Continuousstohastiarithmetiisamodellingofthesynhronous implemen-
tation of the CESTAC method. Byusing this implementation,sothat the N
runsof aode take plaeinparallel,theN resultsofeaharithmetialopera-
tion anbe onsideredasrealizations ofa Gaussianrandomvariableentered
onthe exat result. One an therefore dene anew number, alled stohasti
number, and a new arithmeti, alled (ontinuous) stohasti arithmeti, ap-
plied to these numbers. An equality onept and order relations, whih take
intoaountthenumberofexatsigniantdigitsofstohastioperands,have
alsobeen dened.
A stohasti number X is denoted by (m;
2
), where m is the mean value of
X and its standard deviation. Stohasti arithmetial operations (s+, s ,
s,s=) orrespond totermstothe rst orderin
m
of operationsbetween two
independent Gaussian randomvariables.
Denition 11 Let X
1
=(m
1
; 2
1
) and X
2
=(m
2
; 2
2
). Stohasti arithmetial
operations on X
1
and X
2
are dened as:
X
1
s+ X
2
=
m
1 +m
2
; 2
1 +
2
2
(34)
X
1
s X
2
=
m
1 m
2
; 2
1 +
2
2
(35)
X
1
s X
2
=
m
1 m
2
; m 2
2
2
1 +m
2
1
2
2
(36)
X
1 s= X
2
= 0
m
1
=m
2
;
1
m
2
2
+ m
1
2
m 2
2
!
2 1
A
with m
2
6=0: (37)
An auray an be assoiated toany stohasti number. IfX =(m;
2
),
exists (depending onlyon )suh that
P (X 2[m
;m+
℄)=1 ; (38)
I
;X
=[m
;m+
℄istheondeneintervalofmat1 .Thenumber
of deimal signiant digits ommon to all the elements of I
;X
and to m is
lower bounded by
C
;X
=log
10 jmj
!
: (39)
Thefollowingdenitionisthemodellingoftheonept ofomputationalzero,
previously introdued.
and only if
C
;X
0 or X =(0;0):
Inaordanewith theonept ofstohastizero,anew equality oneptand
new order relationshave been dened.
Denition 13 Let X
1
=(m
1
; 2
1
) and X
2
= (m
2
; 2
2
) be two stohasti num-
bers.
Stohastiequality, denoted by s=, is dened as:
X
1
s= X
2
if and only if X
1
s X
2
=0.
Stohastiinequalities, denoted by s> and s are dened as:
X
1
s> X
2
if and only if m
1
>m
2
and X
1
s6= X
2 ,
X
1
s X
2
if and only if m
1 m
2 or X
1
s= X
2 .
Continuous stohasti arithmeti is a modelling of the omputer arithmeti,
whihtakesintoaountround-oerrors.Thepropertiesofontinuousstohas-
ti arithmeti [3,4℄ have pointed out the theoretial dierenes between the
approximativearithmeti of aomputer and exat arithmeti.
3.2.2 Disrete Stohasti Arithmeti
Disrete StohastiArithmeti(DSA)has been dened fromthe synhronous
implementationof the CESTAC method. WithDSA, areal number beomes
an N-dimensional set and any operation on these N-dimensional sets is per-
formed element per element using the random rounding mode. The number
ofexat signiantdigitsofsuhanN-dimensionalset anbeestimatedfrom
equation(32).Fromthe onept ofomputationalzero previouslyintrodued,
anequality onept and orderrelations have been dened for DSA.
Denition 14 Let X and Y be N-samplesprovided bythe CESTACmethod.
Disrete stohastiequality denoted by ds=is dened as:
Xds=Y ifand only if X Y =:0.
Disrete stohastiinequalities denoted by ds> and ds are dened as:
Xds>Y ifand only if X >Y and Xds6=Y,
XdsY if and only if X Y or Xds=Y.
OrderrelationsinDSAare essentialtoontrolbranhingstatements.Beause
ofround-oerrors,ifAand B are twooating-pointnumbers andaand bthe
orresponding exat values,
a>b ;A>B and A>B ;a>b:
example,unsatisedstoppingriteriaorinniteloopsinalgorithmigeometry.
Taking into aount the numerialquality of the operands in order relations
enablesto partially solve these problems [3℄.
ThereforeDSAenablestoestimatetheimpatofround-oerrorsonanyresult
of a sienti ode and also to hek that no anomaly ourred during the
run, espeiallyin branhing statements.DSA isimplemented in the CADNA
library 1
.
Theaurayofastohastinumberanberelatedtothenumberofexatsig-
niantdigitsofanN-sampleprovidedbytheCESTACmethod.Indeed,when
N isa smallvalue (2or 3),whih isthe ase inpratie, the values obtained
with equations (32) and (39) are very lose. They represent in a omputed
result the number of signiant digits whih are not aeted by round-o
errors. Sothe two types of stohasti arithmetis are oherent.Properties es-
tablishedinthetheoretialframeworkofontinuousstohastiarithmetian
be appliedon aomputer via the pratialuse of DSA.
3.3 Theoretialresults on stohastioperations
Thetheoretialresultspresentedherehavebeenestablishedinontinuoussto-
hastiarithmeti.Theyenableonetoompareresultsofarithmetialstohas-
ti operations with those provided by the orresponding lassial operations
performed onexat values.
Let us onsider a numerial method whih aims to approximate an exat
value x
1
. This method may onsist for example in omputing an iterate of
a sequene (u
n
) suh that lim
n!1 u
n
= x
1
. Even using an arithmeti with
innite preision, the value obtained is not x
1
, but an approximation whih
is aetedby atrunation error. We ompare here the resultsobtained using
suh numerial methods in stohasti arithmeti with the exat values they
approximate.
Theorem 15 Let X
1
=(m
1
; 2
1
)bethe approximationof an exatvalue x
1 in
stohasti arithmeti. Let us assume that the exat signiant bits of X
1 , i.e.
not aeted by round-o errors, are in ommon with x
1
, up to p: the number
ofsigniantbitsof X
1
inommonwithx
1
islowerboundedbylog
2
jm
1 j
1
p.
Similarly let X
2
=(m
2
; 2
2
) be an approximation obtained in stohasti arith-
meti of an exat value x
2
, suhthat its exat signiant bits are in ommon
1
URLaddress:http://www.lip6.fr/adna/
2
Let be anexat arithmetialoperator:2f+; ;;=g ands the orre-
sponding stohastioperator s 2fs+;s ;s;s=g.
Then the exat signiant bits of X
1
s X
2
are in ommon with the exat
value x
1 x
2
, up to max(p;q).
PROOF. From equation (39), the number of exat signiant bits of X
1 ,
i.e. not aeted by round-o errors, is lower bounded by log
2
jm1j
1
. As the
number of signiant bits of X
1
in ommon with the exat value x
1
is lower
bounded by log
2
jm
1 j
1
p = log
2
jm
1 j
2 p
1
, to take into aount both the
trunation error and the round-o error on X
1
, one has to onsider not the
variane 2
1
, but (2 p
1 )
2
.
Similarly the number of signiant bits of X
2
in ommon with the exat
value x
2
is lower bounded by log
2
jm2j
2
q=log
2
jm2j
2 q
2
.
Fromequations(34)and(39), thenumberofexat signiantbits ofX
1 s+X
2
is lower bounded by log
2
jm
1 +m
2 j
p
2
1 +
2
2
. To take into aount both the trun-
ation error and the round-o error on X
1 s+X
2
, one has to onsider not
the variane 2
1 +
2
2
, but (2 p
1 )
2
+ (2 q
2 )
2
. Therefore a lower bound for
the number of signiant bits of X
1 s+X
2
in ommon with the exat value
x
1 + x
2
is log
2
jm
1 +m
2 j
p
(2 p
1 )
2
+(2 q
2 )
2
, whih an be itself lower bounded by
log
2
jm
1 +m
2 j
p
2
1 +
2
2
max(p;q). Then the exat signiant bits of X
1 s+X
2 are
inommon with x
1 +x
2
, up tomax(p;q).
As X
1 s X
2
=(m
1 m
2
; 2
1 +
2
2
),the proof forthe subtration issimilar as
the one for the addition.
Fromequations(36)and(39),thenumberofexatsigniantbits ofX
1 sX
2
islowerboundedbylog
2
jm
1 m
2 j
p
m
2
2
1 +m
1
2
2
.Totakeintoaountboththetrun-
ation errorand the round-o erroron X
1 sX
2
, one has to onsider not the
variane m
2
2
1 +m
1
2
2
, but 2 2p
m
2
2
1 +2
2q
m
1
2
2
. Therefore a lower bound for
the number of signiant bits of X
1 sX
2
in ommon with the exat value
x
1 x
2
is log
2
jm
1 m
2 j
p
2 2p
m
2
2
1 +2
2q
m
1
2
2
, whih an be itself lower bounded by
log
2
jm
1 m
2 j
p
m
2
2
1 +m
1
2
2
max(p;q).Then the exat signiantbits ofX
1 sX
2
are in ommonwith x
1 x
2
, up tomax(p;q).
1 2
is lower bounded by log
2 0
j m
1
m
2 j
q
(
1
m
2 )
2
+(
m
1
2
m 2
2 )
2 1
A
. To take into aount both the
trunation error and the round-o error on X
1 s=X
2
, one has to onsider not
the variane(
1
m
2 )
2
+( m
1
2
m 2
2 )
2
,but ( 2
p
1
m
2 )
2
+( 2
q
m
1
2
m 2
2 )
2
.Therefore alowerbound
for the numberof signiant bits of X
1 s=X
2
in ommonwith the exat value
x
1
=x
2
is log
2 0
B
B
j m
1
m
2 j
r
( 2
p
1
m
2 )
2
+(
2 q
m
1
2
m 2
2 )
2 1
C
C
A
, whih an be itself lower bounded by
log
2 0
j m
1
m
2 j
q
(
1
m
2 )
2
+(
m
1
2
m 2
2 )
2 1
A
max(p;q).ThentheexatsigniantbitsofX
1 s=X
2
are in ommonwith x
1
=x
2
, up tomax(p;q).
Theorem15enablesonetoontrolarithmetialoperationsperformedonom-
putedresultsofnumerialmethods.Thistheoremhasbeenprovedforstohas-
tiarithmetialoperations,whihareamodellingoftheoperationsperformed
inthe synhronous implementationof the CESTACmethod.In pratie,the-
orem 15 is used, aording to 3.2.2, for results obtained inDSA. In the next
setion,wepresent,inaordane withtheorem15and thetheoretial results
presented insetion2,astrategytodynamiallyontrolonvergingsequenes
omputed inDSA.
4 A strategy for a dynamial ontrol of onverging sequenes
Whenanumerialalgorithmrequiresthe evaluationofthelimitofasequene,
this limitis approximated by one of the iterates.As the numberof iterations
inreases, the trunation error usually dereases, but the round-o error in-
reases. Therefore the hoie of the optimaliterate may be problemati.
DSA enables one to estimate the number of exat signiant digits of any
omputed result, i.e. its signiant digits whih are not aeted by round-
o error propagation. Let us onsider the omputation of a sequene (I
n )
in DSA and let us assume that the onvergene zone is reahed. If disrete
stohastiequalityisahieved fortwosuessiveiterates,i.e.I
n I
n+1
=:0,
the dierenebetween I
n and I
n+1
isonlydue toround-o errors andfurther
iterations are useless. The optimal iterate I
n+1
an therefore be dynamially
determined at run time. Furthermore, if the sequene (I
n
) onverges at least
linearlytoI,fromsetion2,the exatsigniantdigitsofI
n+1
are inommon
with I, up tok digits. The value k, whih depends onthe onvergene speed
of (I
n
), an be determinedfrom theorem 5 or7.
with the tehnique of step halving previously desribed. If the onvergene
zone is reahed and omputations are performed until the dierene between
two suessive iterates isinsigniant, then, fromsetion 2, the exat signi-
antbits ofthelastiterateareinommonwiththeexatvalueof theintegral,
up to one.
Moregenerally,if asequene (I
n
)onvergingatleastlinearlytoI isomputed
usingDSA,theoptimaliterateanbedynamiallydeterminedandthenumber
of signiantdigits ithas inommonwiththe exat limitI an beevaluated.
If operations on limits of sequenes are required in a numerial algorithm, a
similarstrategy, based onthe followingtheorem, an be used.
Theorem 16 Let us onsider the omputation in DSA of two sequenes (I
k )
and (J
k
) onverging at least linearly to I and J respetively.
Let I
n
(respetively J
m
) be an iterate suh that its exat signiant bits are in
ommon with I up to p (respetively J up to q).
If we denote by an exat arithmetial operator, then the exat signiant
bits of I
n
J
m
are in ommon with the exat value IJ, up to max(p;q).
PROOF. From setion 2, as the sequene (I
k
) onverges at least linearly
to I, if it is omputed until the dierene between two suessive iterates is
insigniant,i.e.I
n 1 I
n
=:0,thenweandeterminethevaluepsuhthat
the exat signiantbits ofI
n
are inommonwithI,up top.Similarly ifthe
sequene (J
k
)isomputed until J
m 1 J
m
=:0, thenwean determinethe
value q suh that the exat signiant bits of J
m
are in ommon with J, up
to q. Aording to the appliation of theorem 15 in DSA, if an arithmetial
operation is performed on I
n
and J
m
, the exat signiant bits of the result
are those obtained with the same operation performed on I and J, up to
max(p;q).
Remark 17 Aording to setion 2, if the onvergene of the sequenes (I
k )
and (J
k
) is suÆiently fast, then p=q =1. Inthis ase, the exat signiant
bits of the result obtained are those provided by the same operation on the
limits, up to one.
More generally, in a numerial algorithm involving the omputation of sev-
eral sequenes, if eahsequene is omputed until the dierene between two
suessive iteratesis insigniant, eah limitis approximated by the optimal
iterate.Aordingtosetion2,if eah sequene onverges atleastlinearly,we
an evaluate the number of signiant digits ommon between the limitand
itsapproximation.Ifarithmetialoperations are performedon theseapproxi-
are ommonwith the result of the same operationsperformed onthe limits.
5 Dynamial ontrol of ombined sequenes
This setionshows howto approximate the limitof asequene by itsoptimal
iterate, this iterate being itself the limit of another sequene. The theorems
presented in setions 2 and 3 an be ombined to determine the number of
digits of the approximation obtained whih are in ommon with the exat
result. In the strategies desribed in this setion, small letters denote exat
values and apital letters the orresponding approximations omputed using
DSA.
5.1 A strategy to ompute ombined sequenes
Weonsiderasequeneinwhiheahtermu
m
isthelimitofanothersequene.
More preisely, let (u
m
) be a sequene onverging at least linearly to u and,
for allm, let(u
m;n
)be a sequene onverging atleast linearly to u
m .
For all m, let U
m
be the approximation of u
m
omputed using DSA. U
m is
obtainedby omputingthe sequene(u
m;n
)until,intheonvergenezone,the
dierenebetween twosuessive iteratesis insigniant.
As for allm, the sequene (u
m;n
) onverges at least linearly to u
m
,aording
tosetion2,one andeterminethe value qsuhthatthe exatsigniantbits
of U
m
are ommonto u
m
,up toq.
Figure 1 represents the signiant bits of U
m
and U
m+1
if the dierene
U
m U
m+1
is insigniant. In this ase, the exat signiant bits of U
m+1
are ommonto U
m
and are alsoommonto u
m
and u
m+1
, up to q.
As the sequene (u
m
) onverges atleast linearly to u, one an determine the
value p suh that the bits ommon to u
m
and u
m+1
are ommon with u, up
top.
Consequently if thediereneU
m U
m+1
isinsigniant,theexat signiant
bits of U
m+1
are ommon with u,up top+q.
PSfrag replaements
U
m
bitsommonwithu
m
U
m+1
bitsommon withu
m+1
bitsommonwithu pbits
q bits q bits
signiantbitsnotaetedbyround-oerrorsand ommontoU
m and U
m+1 signiant bitsnotaeted by round-oerrors
Fig.1.Signiant bitsof U
m and U
m+1
5.2 Dynamialontrolof integrals on an innite domain
Letus onsider the omputation of animproper integral g = R
1
0
(x)dx.The
innite interval of integration is partitioned into nite intervals of length L.
Letf
j
= R
(j+1)L
jL
(x)dx and g
m
= P
m
j=0 f
j , lim
m!1 g
m
=g.
ganbenumeriallyapproximatedbyaniterateg
m
,mbeingsuÆientlyhigh.
The optimal number of iterates to ompute an be determined dynamially
using DSA.
LetF
j;n
bethe approximationof f
j
omputed using the trapezoidalor Simp-
son's rule with step L
2 n
. For all j, the sequene (F
j;n
) is omputed until the
dierenebetween two suessiveiteratesisinsigniant.Thisisnot ahieved
at the same iteration of all values of j. Let n
j
be the iteration at whih
F
j;n
j 1
F
j;n
j
=:0.
Aording to setion 2, for all j, the exat signiant bits of F
j;n
j
are in
ommon with f
j
, up to one. Let G
m
= P
m
j=0 F
j;n
j
. Aording to theorem 16,
the exat signiant bits of G
m
are in ommonwith g
m
, up to one.
Figure 2 represents the signiant bits of G
m
and G
m+1
if the dierene
G
m G
m+1
is insigniant. In this ase, the exat signiant bits of G
m+1
are ommonto G
m
and are alsoommonto g
m and g
m+1
, up toone.
We assumethat the sequene (g
m
)onverges atleast linearlyto g.Aording
to setion 2, if the onvergene zone is reahed, C
gm;gm+1
=C
gm;g
+ where
represents p bits. Therefore the bits ommon to g
m and g
m+1
are ommon
with g, up top.
PSfrag replaements
G
m
bitsommonwithg
m
G
m+1
bitsommonwithg
m+1
bitsommon withg pbits
signiantbitsnotaetedbyround-oerrorsand ommontoG
m and G
m+1 signiant bitsnotaeted by round-oerrors
Fig.2. Signiant bitsofG
m
and G
m+1
ConsequentlyifthediereneG
m G
m+1
isinsigniant,theexatsigniant
bits of G
m+1
are ommonwith g,up top+1.
6 Numerial experiments
Numerialexperiments have been arriedout using DSA implemented inthe
CADNA library. Two examples are presented: the omputation of a denite
integral and the omputation of anintegral onan inniteinterval.
6.1 Computation of a denite integral
Letus onsider the integralI = Z
1
0 6x
3
15x 2
28x+22
9x 2
+12x+4
dx=1.
I has been estimated with the trapezoidal and Simpson's rules using the
strategy desribed in setion 2.ApproximationsI
n
have been omputed with
step 1
2 n
until the dierene I
n I
n+1
is insigniant. From setion 2, we an
guarantee that the exat signiant bits of the lastiterate I
N
are inommon
with the exat value of I, up toone.
Table 1 presents for both rules the approximations of I obtained in single
anddoublepreision. Thenumberofexat signiantdigitsofeahresulthas
been estimated using DSA. For eah sequene, the exat signiant digits of
the lastiterate are reported intable 1.
Weannotiethattheexat signiantdigitsofeahapproximationobtained
ApproximationsofI
rule insingle preision indoublepreision
trapezoidal I
9
=0:10000E+01 I
21
=0:100000000000E +001
Simpson I
8
=0:100000E+01 I
13
=0:1000000000000E +001
are in ommon with I. The number of iterations requested for the stopping
riterion to be satised depends of ourse on the preision hosen, but also
onthe quadrature method used. Whatever the preisionis,less iterationsare
performed with Simpson's rule than with the trapezoidal rule. This is due
to the dierent onvergene speeds of the omputed sequenes. Indeed the
approximation of I is of order 2 with the trapezoidal rule and of order 4
with Simpson's rule. For eah rule, the error on the last iterate jI
N
Ij is
insigniant. Beause of round-o error propagation, the omputer an not
distinguish I
N
fromI.
6.2 Computation of an improper integral
Letus onsider the improperintegral g = Z
1
0 e
ax
dx= 1
a
, wherea >0.
g has been estimated using the strategy desribed in 5.2. Using the same
notations asin5.2,letg
m
= P
m
j=0 f
j
,wheref
j
= R
(j+1)L
jL e
ax
dx. Theapprox-
imationsofthe integralsf
j
are omputedwithSimpson's ruleusing DSA.For
every j, a sequene is omputed until the dierene between two suessive
iteratesis insigniant.
As g
m
g = R
1
(m+1)L e
ax
dx =
m+1
a
, where = e aL
, the sequene (g
m )
onverges linearly to g. Therefore theorem 5 an apply: if the onvergene
zoneisreahed, thesigniantbitsommontotwosuessiveiteratesarealso
ommonto g,up tolog
2 (
1
1 ).
LetG
m
betheapproximationofg
m
omputedusingDSA.Thesequene (G
m )
isomputeduntilthedierenebetweentwosuessiveiteratesisinsigniant.
We denote by M the iteration at whih G
M 1 G
M
= :0. Aording to
setion 5.2, the exat signiant bits of G
M
are in ommon with g, up to
log
2 (
1
1
) +1. Therefore the exat signiant deimal digits of G
M
are in
ommonwith g up to Æ, where Æ=log
10 (
2
1 ).
Table 2 presents for a = 1 and dierent values of L the approximations G
M
obtainedindoublepreision.ThenumberofexatsigniantdigitsofG
M not
inommonwithg isapproximatedbyÆ.AsthelengthLinreases,thenumber
M of integrals f
j
to be approximated dereases. Only the exat signiant
M
o error propagation. We notie that the number of exat signiant digits
obtained(fromthirteentofteen)issatisfyingforomputationsarriedoutin
double preision. The exat signiant digits whih are not in ommon with
the exat value g = 1 an easily be identied. For example, if L = 10 1
,
among the fourteen exat signiant digitsof G
M
, the two lastdigits are not
in ommon with g. We notie that, for every approximation G
M
reported in
table 2,its exat signiant digits are inommonwith g up todÆe.
Table 2
Resultsobtained withSimpson's rulefora=1
L Æ M G
M
10 2
2.3 2335 0.9999999999276 E+000
10 1
1.3 284 0.9999999999995 3E+00 0
1 0.5 33 0.9999999999999 96 E+000
10 0.3 4 0.9999999999999 9E+00 0
50 0.3 2 0.1000000000000 4E+00 1
Table 3 presents for a = 10 5
and dierent values of L the exat signiant
digits of the approximations G
M
obtained in double preision. As in table 2,
we notie that if the length L inreases, the number M of integrals f
j to be
approximated dereases. For eah approximationG
M
obtained, we an easily
identify itsexat signiant digits whih are in ommonwith the exat value
g =10 5
.As intable2,we notiethatthe exat signiantdigitsof G
M
are in
ommonwith g up to dÆe.
Table 3
Resultsobtained withSimpson's rulefor a=10 5
L Æ M G
M
10 2
3.3 19136 0.999999995109E+00 5
10 3
2.3 2346 0.9999999999352 E+005
10 4
1.3 279 0.9999999999992 3E+00 5
10 5
0.5 33 0.9999999999999 95 E+005
10 6
0.3 5 0.9999999999999 9E+00 5
7 Conlusion
Disrete Stohasti Arithmetian beused todynamially determinethe op-
timaliterateofaonvergingsequene. Furthermore,ifthe sequene onverges
the limitanbeestimated.Thisnumberdepends onthespeed ofonvergene
of the sequene.
If an arithmetial operation is performed on the optimal iterates of two se-
quenes, we an determine the signiant digits of the omputed result om-
mon withthe exat resultof the sameoperationperformedonthe two limits.
This allows a dynamial ontrol of numerial algorithms involving the om-
putationofseveral sequenes. Integralsonaninniteintervalan beapproxi-
matedbyomputingseveralonvergingsequenes. Byontrollingdynamially
eah sequene, we an determine the signiant digits of the approximation
ommonwith the exat value of the integral.
The sequenes examined in this paper all onverge to a salar value. A per-
spetivetothisworkouldbethe numerialvalidationofsequenes ofvetors
involved forexample in iterativemethods for solving linear systems.
Referenes
[1℄ R. L. Burden and J. D. Faires, Numerial analysis, 7th ed., Brooks-Cole
Publishing,2001.
[2℄ J.-M. Chesneaux, Study of the omputing auray by using probabilisti
approah,in:Contributiontoomputerarithmetiandself-validatingnumerial
methods, C.Ullrihed., IMACS,New Brunswik, NJ, 1990,pp.19-30.
[3℄ J.-M. Chesneaux,Theequalityrelations insienti omputing, Num. Algo.7
(1994) 129-143.
[4℄ J.-M. Chesneaux, L'arithmetique stohastique et le logiiel CADNA,
Habilitation a diriger des reherhes, Universite Pierre et Marie Curie, Paris,
1995.
[5℄ J.-M.ChesneauxandF.Jezequel,Dynamialontrolofomputationsusingthe
Trapezoidaland Simpson'srules,J. Univ.Comput. Si.4 (1) (1998)2-10.
[6℄ J.-M.Chesneaux and J.Vignes, Sur larobustessede la methode CESTAC, C.
R. Aad.Si. Paris Ser. I Math. 307 (1988) 855-860.
[7℄ J.-M.ChesneauxandJ.Vignes,Lesfondementsdel'arithmetiquestohastique,
C.R. Aad. Si.Paris Ser. I Math.315 (1992) 1435-1440.
[8℄ M. K. Jain, R.K. Jain and S.R.K. Iyengar, Numerial methodsforsienti
and engineeringomputation,Halsted Press,1985.
[9℄ J. H. Mathews,Numerial methods formathematis, siene and engineering,
2nd ed., Prentie-Hall,1992.
Proessing 74, North-Holland,1974.
[11℄J. Vignes, Zero mathematique et zero informatique, C. R. Aad. Si. Paris
Ser. I Math.303 (1986) 997-1000;also: La Vie des Sienes 4(1) (1987)1-13.
[12℄J. Vignes, Estimation de la preision des resultats de logiiels numeriques,
La Viedes Sienes 7(2) (1990) 93-145.
[13℄J. Vignes, A stohasti arithmeti for reliable sienti omputation, Math.
Comput. Simulation 35 (1993) 233-261.
[14℄J.Vignes,Astohastiapproahtotheanalysisofround-oerrorpropagation.
AsurveyoftheCESTACmethod.in:Pro.2nd Real Numbers andComputers
onferene,Marseille,Frane, 1996,pp. 233-251.