For any invertible n × n matrix A and any matrix norm, the condition number of A indicates the relative distance of A from

RESULTING OUTPUT

Theorem 4.8 For any invertible n × n matrix A and any matrix norm, the condition number of A indicates the relative distance of A from

the nearest noninvertible n × n matrix. Specifically,

A complete proof of this theorem is beyond the scope of this book (but see Exercise 4.6-5). We only show that

i.e., that for any noninvertible n × n matrix B,

(4.44) Indeed, if B is not invertible, then by Theorem 4.4, there is a nonzero n-vector x such that Bx = 0. But then

using (4.38), and since we can divide by to obtain (4.44).

The argument just given establishes the following useful corollary.

Corollary If A is invertible and B is a matrix such that

then B is invertible.

To give an example, we find for the matrix

of Example 4.5 that since the matrix

is not invertible, and has max-norm

. .

= 0.02. Hence, since we get that cond(A) > 100. A different example is provided by invertible triangular matrices. If A is triangular, we know from Theorem 4.6 that all diagonal entries of A are nonzero, and that replacing any diagonal entry of A by 0 makes A noninvertible.

Consequently, if A is triangular, then

The condition number also plays a role in the analysis of a further complication in solving linear systems. If the linear system A x = b derives from a practical problem, we must expect the coefficients of this system to be subject to error, either because they result from other calculations or from physical measurement, or even only because of roundoff resulting from the conversion to a binary representation during read-in. Hence, assuming for the moment that the right side is accurate, we are, in fact, solving the linear system

(4.45) instead of Ax = b, where , the matrix E containing the errors in the coefficients. Even if all calculations are carried out exactly, we still compute only the solution of (4.45) rather than the solution x of Ax = b.

Now, we have x = A^-1b; hence, assuming that (4.45) has a solution, Therefore, with

Hence

4.6 BACKWARD ERROR ANALYSIS AND ITERATIVE IMPROVEMENT 179

giving the final result

(4.46) In words, the change in the solution from relative to can be as large as cond(A) times the relative change ||E||/||A|| in the coefficient matrix. If the coefficients of the linear system Ax = b are known to be accurate only to about 10^-5 (relative to the size of A ) and

then there is no point in calculating the solution to a relative accuracy better than 10^{t - s}.

Example 4.7 Consider once more the linear system (4.30) in Example 4.5. We found earlier that cond(A ) = 100 for its coefficient matrix A. By (4.46), a 1 percent change in the coefficients of the system could therefore change its solution drastically. Indeed, a 1 percent change (in the right direction) produces the linear system

which has no solution at all, for the coefficient matrix now fails to be invertible.

The preceding analysis can be put to good use in gauging the effect of round-off errors incurred during elimination and back-substitution on the accuracy of the computed solution with the aid of backward error analysis.

In this, we will make use of the terminology and notation introduced in Sec. 1.3.

Theorem 4.9 Suppose that, in order to obtain a factorization PLU for the nth order matrix A and, from this, the solution of the linear system Ax = b, we use Algorithms 4.2 and 4.4, but employ floating point arithmetic with unit roundoff u < 0.01, getting the computed factors and the computed solution Then satisfies exactly the perturbed equation

with

(4.47) (4.48) and

Here, we denote by |B| the matrix obtained from B = (b_ij) by replac-ing all its entries by their absolute value,

Also, we write

for two matrices B and C in case B and C are of the same order and for all i and j

The theorem states that if n is not “too large” and if is about the size of |A|, then we can account for the errors in the computed solution by adjustments in the equations of the same order of magnitude as are the changes we had to make merely to get the equations into the machine. In other words, the error in the computed solution caused by the use of floating-point arithmetic is then no worse than the error we had to accept from the outset because we were forced to round the entries of A to floating-point numbers.

Of course, should the matrix be much larger than |A|, then the errors in the computed may be much larger than those due to the conversion of the problem to machine floating-point numbers. Note that one could actually calculate the matrix (at some expense) and go to higher-precision arithmetic in case the resulting bound on the perturbation matrix E exceeds the tolerance to which the entries of A are known to be accurate. But more important, since the pivot order may materially affect the size of we draw from Theorem 4.9 the important conclusion that a pivoting strategy should try to keep the matrix small.

We now indicate the simple proof of Theorem 4.9, using the notation and terms introduced in Sec. 1.3. First, we deal with the interchanges made (as recorded in the permutation matrix P) by applying Algorithm 4.2 without interchanges to the matrix A' := P^-1A (as we did in Sec. 4.4).

Thus, we compute the interesting entries of the factors L and U according to (4.23) by

Consequently, by Sec. 1.3, especially by comparison of (1.12) with (1.13), the entries and of the factors and as computed in floating-point arithmetic satisfy the perturbed equations

Here, each stands for some number of the form with < u, the unit roundoff. To simplify these equations, we next observe that for any such number and for any r, there exists so that as long as u < 0.01. This shows that

4.6 BACKWARD ERROR ANALYSIS AND ITERATIVE IMPROVEMENT 181

and therefore (4.49)

with (4.50)

and

This shows that the computed factors and for A' are the exact factors for a perturbed matrix A' + F, with the error matrix F of the order of the roundoff in the entries of A, provided the matrix is not much larger than |A|.

The computational steps used in Algorithm 4.4, i.e., in the solving phase, are rather similar to those above. One can, therefore, show in the same way that the computed vector satisfies exactly the perturbed lower-triangular system

with

while the computed solution satisfies exactly the perturbed linear system We conclude that the computed solution satisfies

But now

where

which proves the theorem.

The bound (4.48) is conservative. If partial pivoting is used, then the bound

(4.50a) is often much more realistic. In any event, such a bound gives some insight into the effect of the precision used in the calculations on the accuracy of the computed solution. For we get, for example, from (4.46) and (4.50), that the error of the computed solution relative to the size of this solution is usually bounded as follows:

(4.51) Quite loosely, the linear system Ax = b is often called ill-conditioned if cond(A ) is “large.” Somewhat more to the point, one should say that the linear system is ill-conditioned with respect to the precision used if cond(A) is about 1/u, for then, by (4.51), a computed solution might well bear no resemblance to the (exact) solution of the system.

Example 4.8 Consider the linear system

(4.52) We attempt to solve this system by the elimination Algorithm 4.2, using two-decimal-digit floating-point arithmetic and scaled partial pivoting. The pivoting order turns out to be p^T = [1 2 3], and the final content of the working array is

Continuing the calculations, we find by back-substitution the approximate solution The residual is In fact, the solution is so that the computed solution is in error in the first significant digit.

The max-norm for the coefficient matrix A of this system is Further, the matrix

is noninvertible (its first column is 0.7 times 0.012. Hence we get from Theorem 4.8 that

its second column)

This system is therefore very ill-conditioned with respect to the precision used, and the very large error in the computed solution is not surprising.

Next, we repeat the calculations, using three-decimal-digit floating-point arithmetic this time. Since we still do not expect a very accurate computed solution.

After Algorithm 4.2, the working matrix has the content

(4.53)

and back-substitution gives the computed solution i.e., we get the (exact) solution, even though the system is still somewhat ill-conditioned with respect to the precision used. This becomes evident when we change the right side of (4.52) to Using the factorization (4.53), we calculate by Algorithm 4.4 the (ap

proximate) solution (still using three-decimal-digit floating-point

arithmetic), which has residual . The exact solution is hence our computed solution has about 10 percent error, which is compatible with (4.51).

As this example shows, a large condition number relative to the precision used may lead to a relatively large error in the computed solution but is not guaranteed to do so.

4.6 BACKWARD ERROR ANALYSIS AND ITERATIVE IMPROVEMENT 183 Whether or not a given linear system is ill-conditioned with respect to the precision used can be conveniently ascertained [even without knowl-edge of cond(A)] during iterative improvement, which we now discuss.

With the (unknown) error in the approximate solution for Ax = b, we found in Sec. 4.5 that

(4.54) where is the computable residual for . Here we have, then, a linear system whose solution is the error e and whose coefficient matrix agrees with the coefficient matrix of the original system. If is obtained by the elimination Algorithm 4.2, we can solve (4.54) rather quickly by the substitution Algorithm 4.4. Let be the (approximate) solution for (4.54) so computed. Then will, in general, not agree with e. But at the very least, should give an indication of the size of e. If

we conclude that the first s decimal places of probably agree with those of x. We would then also expect to be that accurate an approximation to e. Hence we expect

to be a better approximation to x than is We can now, if necessary, compute the new residual and solve (4.54) again to obtain a new correction and a new approximation to x. The number of places in agreement in the successive approximations as well as an examination of the successive residuals, should give an indication of the accuracy of these approximate solutions. One normally carries out this iteration until if t decimal places are carried during the calculations. The number of iteration steps necessary to achieve this end can be shown to increase with cond(A) . When cond(A) is “very large,” the corrections may never decrease in size, thus signaling extreme ill-conditioning of the original system.

For the success of iterative improvement, it is absolutely mandatory that the residuals be computed as accurately as possible. If, as is usual, floating-point arithmetic is used, the residual should always be calculated in double-precision arithmetic.

Algorithm 4.5: Iterative improvement

Dans le document ELEMENTARY NUMERICAL ANALYSIS A n A l g o r i t h m i c A p p r o a c h (Page 192-198)