Check the Residual - Stopping Criteria

Tips and Tricks

2.8 Stopping Criteria

2.8.3 Check the Residual

Another possibility to check whether an approximate solution is good enough is to insert this approximation into the equation to be solved, so that one can measure the amount by which the approximation fails to satisfy the equation.

This discrepancy is called theresidual r. For example, in case of the square root above, one might want to check ifr=x²_k−ais small in absolute value.

In the case of a system of linear equations, Ax = b, one checks how small the residual

r=b−Axk

becomes in some norm for an approximate solution xk.

Unfortunately,a small residual does not guarantee that we are close to a solution either! Take as an example the linear system

Ax=b, A=

0.4343 0.4340 0.4340 0.4337

1 0

The exact solution is x= 1

−43370000 43400000

−4.81888. . . 4.82222. . .

10⁶.

The entries of the matrix A are decimal numbers with 4 digits. The best 4-digit decimal approximation to the exact solution is

x4=

−4819000 4822000

Now if we compute the residual of that approximation we obtain the rather large residual

r4=b−Ax4= 144.7

144.6

;

We can easily guess solutions with smaller residuals, but which clearly are not better solutions. For example, we could have proposed as solution

x1= −1

⇒r1=b−Ax1=

1.0003 0.0003

which clearly has a smaller residual. In fact, the residual of x = (0,0) is r=b= (1,0), which is even smaller! Thus, we cannot trust small residuals to always imply that we are close to a solution.

2.9 Problems

Problem 2.1. Verify with some examples the standard model of arith-metic (2.2). For this purpose, write a Maple program and use extended arithmetic for the exact operations. Truncate the operands to “machine num-bers” by using a statement likeam:=evalf[7](a)to get a 7-digit number. Do the same for the machine operation.

Problem 2.2. Single precision numbers can be deﬁned in Matlabwith the function single. Run the statements

>> format hex

>> y=single(6.5)

Convert the hexadecimal number to the corresponding binary number by ex-panding each hexadecimal digit into 4 binary digits (bits). Finally, interpret the 32 bits as a single precision ﬂoating point number. Verify that it really represents the decimal number 6.5.

Problems 53

Problem 2.3. Try tocomputethe parameters that deﬁne the ﬁnite arith-metic in Matlab. Compare your results with the machine constants of the IEEE standard. Write Matlabprograms to compute

1. the machine precision eps. Hint: Use the deﬁnition that eps is the smallest positive machine number such that numerically 1 +eps > 1 holds. Compare your result with theMatlabconstant eps.

2. the smallest positive normalized machine number α. Compare your result withrealmin.

3. the smallest positive denormalized number. How can this number be computed by the IEEE constants?

4. the largest positive machine numberγ. This is not easy to compute. An approximation is given by1/α. Compare this value with the Matlab constantrealmax.

Problem 2.4. Do the same as in the previous problem but for the ﬁnite precision arithmetic used inMaple. Use the standard value for the precision Digits:=10.

1. Explain ﬁrst why the followingMapleprogram to compute the machine precision

eps:=1.0;

for i from 1 while 1.0+eps>1.0 do eps:=eps/2.0;

end do;

does not work. Change the program and make it work! Recall that Maple uses decimal rather than binary arithmetic.

2. Hint: To ﬁnd theMaple realminuse in your loop the statement realmin:=realmin/1.0e100000000;

otherwise you will wait too long! Then reﬁne your guess iteratively by dividing with smaller factors. Convince yourself by an experiment that there are no denormalized numbers inMaple.

3. Verify that realmax= 1/realmin= 1.0×10^2147483646.

Problem 2.5. (IEEE Quadruple Precision Arithmetic) In prob-lems where more precision is needed, the IEEE standard provides a speciﬁca-tion for quadruple precision numbers, which occupy 128 bits of storage. Look up this speciﬁcation on the web, such as the number of bits attributed to the

mantissa and exponent, the machine precision eps, the range of representable numbers, etc. Give the sequence of bits representing the real number 11.25 in quadruple precision.

Problem 2.6. (Matrix 2-norm) In this exercise, we will show that A2is the square root of the largest eigenvalue ofAA.

1. AA is clearly a square symmetric matrix. By the spectral theorem, there exists a complete set of orthonormal eigenvectorsv1, . . . ,vn with corresponding eigenvaluesλ1, . . . , λn. Show thatλi≥0for all i.

2. Let x=n

i=1civi be an arbitrary nonzero vector in Rⁿ, expressed in the basis of eigenvectors. Show that

Ax²2=xAAx= n i=1

c²_iλi. (2.20)

3. Assume without loss of generality that the eigenvalues are arranged in descending order, i.e., λ1 ≥ λ2 ≥ · · · ≥ λn ≥0. Deduce from (2.20) that

Ax²2≤λ1

n i=1

c²_i =λ1x²2. Conclude thatA2≤√

λ1.

4. Give an example of a vector x satisfying Ax²2 =λ1. Conclude that A2=√

λ1.

Problem 2.7. (Big- and Little-O Notation)Prove the properties of O(·) and o(·) given in Lemma 2.1. Hint: For the ﬁrst property, let f1 = O(g₁)andf₂ =O(g₂)be two functions. Then use the deﬁnition of O(·) to show that |f1+f2| ≤C(|g1|+|g2|)for some constant C. Proceed similarly for the other properties.

Problem 2.8. Determine the condition number of the elementary oper-ations/ and√

Problem 2.9. (Condition Number and Derivatives) Consider the problem P:x→f(x), in which f :U ⊂R→R is a diﬀerentiable function on the open subset U. Give a formula for the condition number κ in terms of the function f(x) and its derivative f(x). Hint: You may ﬁnd (2.11) helpful.

Problem 2.10. Consider the problem P :D⊂R²→R²:

a b

→ x1

, (2.21)

Problems 55

where x₁ and x₂ are the roots of the second degree polynomialp(x) =x²+ 2ax+b and D= {(a, b)∈R² |b < a²}. Is this problem well conditioned?

Use the inﬁnity norm to calculate the condition numberκ.

Problem 2.11. Study the forward and backward stability of the algorithm x_1,2=−a±

a²−b

for solving the problem (2.21). Also study the forward and backward stability of the improved algorithm in Problem 2.14 that uses Vieta’s formula, and explain why the latter algorithm is preferable.

Problem 2.12. (Monotonicity)Suppose we are operating in a base-10 finite precision arithmetic with two significant digits in the mantissa. LetM be the set of machine numbers. Show that the functionf :M→Mdefined by x →x² is not strictly monotonic by finding two machine numbers a, b∈M such thata=b, buta×ã=b×˜b.

Problem 2.13. Write aMatlabfunction[r,phi] = topolar(x,y)to convert the Cartesian coordinates of a point(x, y)to polar coordinates(r, φ).

In other words, the function should solve the equations x = rcosφ

y = rsinφ

forr andφ. Hint: study theMatlabfunctionatan2and avoid under- and overﬂow.

Problem 2.14. Write a fool-proofMatlabprogram to solve the quadratic equation

x²+px+q= 0,

wherepandq can be arbitrary machine numbers. Your program has to com-pute the solutions x1 andx2if they lie in the range of machine numbers.

Hints: consider the well-known formula x1,2=−p

2±p 2

−q.

If|p|is large, you might get overﬂow when you square it. Furthermore, there is cancellation if |p| |q|forx1.

Avoid overﬂow by rewriting of the formula appropriately. Avoid cancella-tion by using the relacancella-tion: x1x2=q (Vieta’s formula).

Problem 2.15. (Law of Cosines)Given one angleγ and the two ad-jacent sides aandb of a triangle, the opposite side can be determined using the law of cosines:

a²+b²−2abcosγ.

1. Numerically we can expect problems for a small angleγ and ifab c. The resultcwill be aﬀected by cancellation in this case.

Change the formula to avoid cancellation by introducing−2ab+ 2abin the square root and using the half angle formula

sin² α

= 1−cosα

2 .

You should obtain the new and more reliable expression c=

(a−b)²+ 4absin² γ

2. Simulate on a pocket computer a 2-decimal-digit arithmetic by rounding the result after each operation to 2 decimal digits. Use the valuesa= 5.6,b= 5.7 andγ = 5^◦ and compute the sidec with the two formulas for the Law of Cosines.

Problem 2.16. The law of cosines can be used to compute the circum-ference of an ellipse.

1. Represent the ellipse in polar coordinates

r(φ) = b

1−²cos²φ, ²= a²−b²

a² , a≥b.

2. Consider now a partition φn= ^2π_n and the triangle with angleφn and the two adjacent sidesr(kφn)andr((k+ 1)φn).

Compute the third side of this triangle (a chord on the ellipse) using the law of cosines. Sum up all the n chords to obtain this way an approximation of the ellipse circumference.

3. Compare your approximation as n→ ∞with the “exact” value, which you can obtain by numerically integrating the elliptic integral

U =a 2π

1−²cos²t dt ²= a²−b²

a² , a≥b.

Notice the diﬀerence that you obtain using the textbook formula and the stable formula for the law of cosines.

4. Implement theMatlabfunction function [U,n]=Circumference(a,b)

% CIRCUMF computes the circumference of an ellipse

% [U,n]=circumf(a,b) computes the circumference U of the ellipse

% with semiaxes a and b and returns the number n of chords used to

% approximate the circumference.

Problems 57

Use the stable formula and an elegant machine-independent termination criterion: start with n= 4 and doublenin each step. The sequence of approximations should increase monotonically. Stop the iteration when the monotonicity is lost. Be careful to implement this algorithm eﬃ-ciently — it takes a large number of operations. Avoid recomputations as much as possible.

Problem 2.17. The functionln(1+x)is evaluated inaccurately for small

|x|.

1. Evaluate this function in Matlabforx= 0.1,0.01, . . . ,10⁻¹¹. 2. Check the obtained values by computing the same in Maple using

Digits := 20.

3. Program now in Matlab and evaluate for the same arguments the function

ln(1 +x) =

⎧⎪

⎨

⎪⎩

x if1 +x= 1numerically, xln(1 +x)

(1 +x)−1 if1 +x= 1numerically.

Comment on your results. Can you explain why this works? This transformation is another clever idea by W. Kahan.

Problem 2.18. We have seen that computingf(x) =e^x using its Tay-lor series is not feasible for x = −20 because of catastrophic cancellation.

However, the series can be used without problems for small|x|<1.

1. Try therefore the following idea:

e^x= · · ·

e²^x^m2

· · ·2

This means that we ﬁrst compute a numbermsuch that z= x

2^m, with |z|<1.

The we use the series to compute e^z and ﬁnally we get the result by squaringmtimes.

Write a Matlab function function y=es(x) that computes e^x this way and compare the results with the Matlabfunction exp.

2. Perform an error analysis of this algorithm.

Problem 2.19. The function

f(x) = ln((1 +x⁴)²−1)

is computed inexactly in IEEE arithmetic (the results may even be completely wrong) for small positive values of x. Already forx≈10⁻³, we obtain only about 8 correct decimal digits. Forx≈10⁻⁴we get inMatlab-Inf.

Write a Matlab function y=f(x) which computes the correct function values for all realmin≤x <10⁻³.

Problem 2.20. When evaluating

f(x) = e^x−1−x x²

on the computer, we observe a large relative error for valuesx≈0.

1. Explain what happens.

2. Find a method to computef for|x|<1to machine precision and write a Matlabfunction for computingf.

Problem 2.21. When evaluating f(x) = x²

(cos(sinx))²−1

on the computer, we observe a large relative error for valuesx≈0.

1. Explain what happens.

2. Find a method to computef for|x|<1to machine precision and write a Matlabfunction for computingf.

Problem 2.22. (Checking successive iterates as stopping cri-terion)

Assume that the ﬁxed-point iterationxk+1=F(xk)yields a linearly con-vergent sequence x_k → s and that for the error e_k = |x_k−s|, the relation ek+1 ≤cek holds with0 < c < 1. Investigate for what values of c we can conclude that if |xk+1−xk| ≤holds, then alsoek+1< .

Problem 2.23. Write aMatlabfunction to compute the sine function in a machine-independent way using its Taylor series. Since the series is alternating, cancellation will occur for large|x|.

To avoid cancellation, reduce the argument x of sin(x) to the interval [0,^π₂]. Then sum the Taylor series and stop the summation with the machine-independent criterionsn+tn=sn, wheresn denotes the partial sum andtn

the next term. Compare the exact values for [sin(−10 +k/100)]k=0,...,2000

Problems 59

with the ones you obtain from your Matlab function and plot the relative error.

Problem 2.24. The function f(x) =x²

1! + 7x⁴

2! + 17x⁶

3! +· · ·= ∞ n=1

(2n²−1)x²ⁿ n!

should be evaluated to machine precision for x in the range 0 < x < 25.

Write aMatlabfunction for this purpose. Pay particular attention to a) compute the result to machine precision with an elegant stopping

crite-rion;

b) avoid any potential overﬂows.

Problem 2.25. We would like to compute the integrals y_n=

xⁿ x+adx forn= 0,1,2, . . . ,30 anda >0.

1. Show that the following recurrence holds:

yn= 1

n−ayn−1, y0= log 1 +a

. (2.22)

2. Compute upper and lower bounds for the valuesyn by choosingx= 0, respectivelyx= 1, in the denominator of the integrand.

3. Compute for a= 10the sequence {yn} forn= 1, . . . ,30 using (2.22) repeatedly. Print a table with the values and their bounds.

4. Solve (2.22) fory_n−1 and compute again the sequence fora= 10, this time backwards starting fromn= 30. Take as starting value the lower bound for y30.

5. Finally, check your results by computing the integrals directly using the Matlabfunction quad.

Problem 2.26. Given an integer n, one often needs to compute the function values

sk= sin(kφ), k= 1, . . . , n for φ= 2π n.

Instead of invoking the sine function ntimes to calculates_k, one could also calculate them recursively using the trigonometric identities

cos((k+ 1)φ) sin((k+ 1)φ)

cosφ −sinφ sinφ cosφ

cos(kφ) sin(kφ)

. (2.23)

Perform some experiments to compare the computing time versus the accu-racy, then design a fast “mixed” algorithm.

Problem 2.27. Vector and matrix norms

1. Show that for vectorsxthe inﬁnity normx∞and the one normx1

are norms according to Deﬁnition 2.1.

2. Show that for matrices Athe Frobenius normAF, the inﬁnity norm A∞ and the one normA1 are norms according to Deﬁnition 2.2.

Show that each norm is submultiplicative.

3. Two vector norms · A and · B are called equivalent if there are positive constantsC1 andC2such that for all x we have

C1xA≤ xB≤C2xA. Show that forx∈Rⁿ,

x2 ≤ x1 ≤ √ nx2, x∞ ≤ x2 ≤ √

nx∞, x∞ ≤ x¹ ≤ nx∞. (Hint: Squaring the inequalities helps for the two norm.)

Dans le document Scientiﬁ c Computing (Page 70-80)