Chapter 2. Finite Precision Arithmetic
2.4 Rounding Errors
2.4.1 Standard Model of Arithmetic
Let ˜aand ˜bbe two machine numbers. Thenc= ˜aטbwill in general not be a machine number anymore, since the product of two numbers contains twice as many digits. The computed result will therefore be rounded to a machine number ˜cwhich is closest toc.
As an example, consider the 8-digit decimal numbers
˜
a= 1.2345678 and ˜b= 1.1111111, whose product is
c= 1.37174198628258 and c˜= 1.3717420.
Theabsolute rounding error is the differencera= ˜c−c= 1.371742e−8, and r= ra
c = 1e−8 is called therelative rounding error.
On today’s computers, basic arithmetic operations obey the standard model of arithmetic: fora, b∈M, we have
a⊕˜b= (a⊕b)(1 +r), (2.2) wherer is the relative rounding error with|r|<eps, the machine precision.
We denote with⊕ ∈ {+,−,×, /}the exact basic operation and with ˜⊕the equivalent computer operation.
Another interpretation of the standard model of arithmetic is due to Wilkinson. In what follows, we will no longer use the multiplication sym-bol × for the exact operation; it is common practice in algebra to denote multiplication without any symbol: ab⇐⇒a×b. Consider the operations Addition: a+b˜ = (a+b)(1 +r) = (a+ar) + (b+br) = ˜a+ ˜b
Subtraction: a−˜b= (a−b)(1 +r) = (a+ar)−(b+br) = ˜a−˜b Multiplication: aטb=ab(1 +r) =a(b+br) =a˜b
Division: a˜/b= (a/b)(1 +r) = (a+ar)/b= ˜a/b In each of the above, the operation satisfies
Wilkinson’s Principle
The result of a numerical computation on the computer is the result of an exact computation with slightly perturbed initial data.
For example, the numerical result of the multiplication aטb is the exact result a˜b with a slightly perturbed operand ˜b = b+br. As a consequence of Wilkinson’s Principle, we need to study the effect that slightly perturbed data have on the result of a computation. This is done in Section 2.6.
2.4.2 Cancellation
A special rounding error is called cancellation. If we subtract two almost equal numbers, leading digits will cancel. Consider the following two numbers with 5 decimal digits:
1.2345e0
−1.2344e0
0.0001e0 = 1.0000e−4
If the two numbers were exact, the result delivered by the computer would also be exact. But if the first two numbers had been obtained by previous calculations and were affected by rounding errors, then the result would at best be 1.XXXXe−4, where the digits denoted byX are unknown.
This is exactly what happened in our example at the beginning of this chapter. To compute sin(α/2) from sinα, we used the recurrence (2.1):
sinαn
2 =
1−
1−sin2αn
2 .
Since sinαn→0, the numerator on the right hand side is 1−
1−ε2, with small ε= sinαn,
Rounding Errors 21
and suffers from severe cancellation. This is the reason why the algorithm performed so badly, even though the theory and program are both correct.
It is possible in this case to rearrange the computation and avoid cancel-lation:
This last expression no longer suffers from cancellation, and we obtain the new program:
Algorithm 2.2. Computation ofπ, Stable Version
oldA=0;s=sqrt(3)/2; newA=3*s; n=6; % initialization z=[newA-pi n newA s]; % store the results
while newA>oldA % quit if area does not increase oldA=newA;
s=s/sqrt(2*(1+sqrt((1+s)*(1-s)))); % new sine value n=2*n; newA=n/2*s;
This time we do converge to the correct value ofπ(see Table2.2). Notice also the eleganttermination criterion: since the surface of the next polygon grows, we theoretically have
A6<· · ·< An< A2n< π.
However, this cannot be true forever in finite precision arithmetic, since there is only a finite set of machine numbers. Thus, the situation An≥A2nmust occur at some stage, and this is the condition to stop the iteration. Note that this condition is independent of the machine, in the sense that the iteration will always terminate as long as we have finite precision arithmetic, and when it does terminate, it always gives the best possible approximation for the precision of the machine. More examples ofmachine-independent algorithms can be found in Section2.8.1.
A second example in which cancellation occurs is the problem ofnumerical differentiation (see Chapter 8). Given a twice continuously differentiable
n An An−π 6 2.598076211353316 −0.543516442236477 12 3.000000000000000 −0.141592653589793 24 3.105828541230249 −0.035764112359544 48 3.132628613281238 −0.008964040308555 96 3.139350203046867 −0.002242450542926 192 3.141031950890509 −0.000560702699284 384 3.141452472285462 −0.000140181304332 768 3.141557607911857 −0.000035045677936 1536 3.141583892148318 −0.000008761441475 3072 3.141590463228050 −0.000002190361744 6144 3.141592105999271 −0.000000547590522 12288 3.141592516692156 −0.000000136897637 24576 3.141592619365383 −0.000000034224410 49152 3.141592645033690 −0.000000008556103 98304 3.141592651450766 −0.000000002139027 196608 3.141592653055036 −0.000000000534757 393216 3.141592653456104 −0.000000000133690 786432 3.141592653556371 −0.000000000033422 1572864 3.141592653581438 −0.000000000008355 3145728 3.141592653587705 −0.000000000002089 6291456 3.141592653589271 −0.000000000000522 12582912 3.141592653589663 −0.000000000000130 25165824 3.141592653589761 −0.000000000000032 50331648 3.141592653589786 −0.000000000000008 100663296 3.141592653589791 −0.000000000000002 201326592 3.141592653589794 0.000000000000000 402653184 3.141592653589794 0.000000000000001 805306368 3.141592653589794 0.000000000000001
Table 2.2. Stable Computation ofπ
Rounding Errors 23
function f : R → R, suppose we wish to calculate the derivative f(x0) at some pointx0using the approximation
f(x0)≈Dx0,h(f) = f(x0+h)−f(x0)
h .
This approximation is useful if, for instance, f(x) is the result of a complex simulation, for which an analytic formula is not readily available. If we expandf(x) by a Taylor series aroundx=x0, we see that
f(x0+h) =f(x0) +hf(x0) +h2 2 f(ξ) where|ξ−x0| ≤h, so that
f(x0+h)−f(x0)
h =f(x0) +h
2f(ξ). (2.3)
Thus, we expect the error to decrease linearly withhas we lethtend to zero.
As an example, consider the problem of evaluatingf(x0) forf(x) =ex with x0= 1. We use the following code to generate a plot of the approximation error:
>> h=10.^(-15:0);
>> f=@(x) exp(x);
>> x0=1;
>> fp=(f(x0+h)-f(x0))./h;
>> loglog(h,abs(fp-exp(x0)));
10−15 10−10 10−5 100
10−8 10−6 10−4 10−2 100 102
h
Finite difference error
Figure 2.3.Results of numerical differentiation forf(x) =ex,x0= 1
Figure2.3shows the resulting plot. For relatively largeh, i.e., forh >1e−8, the error is indeed proportional to h, as suggested by (2.3). However, the plot clearly shows that the error is minimal for h ≈ 1e−8, and then the
error increases again as h decreases further. This is again due to severe cancellation: when h is small, we have f(x0+h) ≈ f(x0). In particular, sincef(x0) =f(x0) = 2.71828...is of moderate size, we expect forh= 10−t thatf(x0+h) differs fromf(x0) by only|log10(eps)| −tdigits, i.e., tdigits are lost due to finite precision arithmetic. Thus, when h < 10−8, we lose more digits due to roundoff error than the accuracy gained by a better Taylor approximation. In general, the highest relative accuracy that can be expected by this approximation is about√eps, which is a far cry from theepsprecision promised by the machine.
We observe that in the first example, we obtain bad results due to an unstable formula, but a better implementation can be devised to remove the instability and obtain good results. In the second example, however, it is unclear how to rearrange the computation without knowing the exact formula for f(x); one might suspect that the problem is inherently harder. In order to quantify what we mean by easy or hard problems, we need to introduce the notion ofconditioning.