ERROR AND RESIDUAL OF AN APPROXIMATE SOLUTION; NORMS

RESULTING OUTPUT

4.5 ERROR AND RESIDUAL OF AN APPROXIMATE SOLUTION; NORMS

Any computed solution of a linear system must, because of roundoff, be considered an approximate solution. In this section, we discuss the difficult problem of ascertaining the error of an approximate solution (without knowing the solution). In the discussion, we introduce and use norms as a convenient means of measuring the “size” of vectors and matrices.

If is a computed solution for the linear system A x = b, then its error is the difference

This error is, of course, usually not known to us (for otherwise, we would know the solution x, making any further discussions unnecessary). But we can always compute the residual (error)

since Ax is just the right side b. The residual then measures how well

satisfies the linear system Ax = b. If r is the zero vector, then is the (exact) solution; that is, e is then zero. One would expect each entry of r to be small, at least in a relative sense, if is a good approximation to the solution x.

Example 4.5 Consider the simple linear system

whose unique solution x has the entries x1 = x2 = 1. The approximate solution so that a “small”

residual (relative to the right side) corresponds to a relatively “small” error in this case.

On the other hand, the approximate solution but residual hence still a relatively “small” residual, while the error is now relatively

“large.” By taking a different right side, we can achieve the opposite effect. The linear system

has error but residual

has the unique solution x1 = 100, x2 = - 100. The approximate solution

hence the residual is now relatively

“large,” while the error is relatively “small” (only 1 percent of the solution).

As this example shows, the size of the residual of an approximate solution is not always a reliable indicator of the size of the error in this approximate solution. Whether or not a “small”

residual implies a “small” error depends on the “size” of the coefficient matrix and of its inverse, in a manner to be made precise below. For this discussion, we need a means of measuring the “size” of n-vectors and n × n matrices.

The absolute value provides a convenient way to measure the “size” of real numbers or even of complex numbers. It is much less certain how one should measure the size of an n-vector or an n × n matrix. There is certainly not any one way of doing this which is acceptable in all situa-tions.

For example, a frequently used measure for the size of an n -vector a is the nonnegative number

(4.31) Assume now that the computed solution to Ax = b is known to have six-place accuracy in this way of measuring size; i.e.,

(4.32)

4.5 ERROR AND RESIDUAL OF AN APPROXIMATE SOLUTION; NORMS 171 Then this would indicate a very satisfactory computed solution in case the unknowns are, say, approximate values of the well-behaved solution of a certain differential equation. But if one of the unknowns happens to be your annual income while another is the gross national product, then (4.32) gives no hint as to whether or not x is a satisfactory computed solution (as far as you are concerned), since, with (4.32) holding, the error in your computed yearly income (even if received for only one year) might make you independently wealthy or put you in debt for life. A measure like

(assuming your yearly income to be the first unknown) would give you much more information, as would certain measures of size which use several numbers (rather than just one nonnegative number) to describe the

“size” of an n-vector.

For most situations, however, it suffices to measure the size of an n-vector by a norm. A norm retains certain properties of the absolute value for numbers. Specifically, a norm assigns to each n-vector a a real number called the norm of a, subject to the following reasonable restrictions:

(4.33)

The first restriction forces all n-vectors but the zero vector to have positive “length.” The second restriction states, for example, that a and its negative -a have the same “length” and that the length of 3a is three times the length of a. The third restriction is the triangle inequality, so called since it states that the sum of the lengths of two sides of a triangle is never smaller than the length of the third side.

The student is presumably familiar with the euclidean length or norm,

of the n-vector a = (a_i), at least for the case n = 2 or n = 3. But, for a reason made clear below, we prefer to use, in the numerical examples below, the maximum norm (4.31) as a way to measure the size or length of the n-vector a. It is not difficult to verify that (4.31) defines a norm, i.e., that satisfies the three properties of a norm listed in (4.33). As to (i), is the maximum of nonnegative quantities, hence nonnegative;

also, if and only if, for all i, |a_i| = 0, which is the same as saying that a = 0. Further, if is any scalar, then

proving (ii). Finally,

proving (iii).

Other vector norms in frequent use include the 1-norm

and various instances of the weighted p-norm

where p is some number between 1 and and the numbers w₁, . . . , w_n are fixed positive quantities. The case p = 2, w_i = 1 (all i ) leads to the familiar euclidean norm.

Once a vector norm is chosen, we then measure the corresponding size of an n × n matrix A by comparing the size of Ax with the size of x.

Precisely, we define the corresponding matrix norm of A by

(4.34) where the maximum is taken over all (nonzero) n-vectors x. It can be shown that this maximum exists for every n × n matrix A (and any choice of the vector norm). The matrix norm ||A|| is characterized by the follow-ing two facts:

and (4.35)

Of course, (4.35) implies at once that ||Ax|| = ||A|| ||x|| for any x with

||Ax|| > ||A|| ||x||. Further, the following properties can be shown to hold for the matrix norm (4.34):

(4.36)

so that the term “norm” for the number ||A|| is justified.

In addition,

(4.37)

4.5 ERROR AND RESIDUAL OF AN APPROXIMATE SOLUTION; NORMS 173 Finally, if the matrix A is invertible, then x = A^-1(Ax); hence ||x|| <

||A^-1|| ||Ax||. Combining this with (4.35), one gets

(4.38) and both inequalities are sharp; i.e., each can be made an equality by an appropriate choice of a (nonzero) x.

As it turns out, the matrix norm

based on the euclidean vector norm, is usually quite difficult to calculate, while the matrix norm

based on the maximum norm, can be calculated quite easily, it being the number

(4.39) To prove this, we have to show that the number

satisfies the two statements in (4.35), i.e., that For all n-vectors x,

and

For some nonzero x, But for an arbitrary x,

which proves the first statement. As to the second statement, let i₀ be an integer between 1 and n so that

and let x be an n-vector of max-norm 1 such that

e.g., take

Then, for this clearly nonzero vector x, and

which proves the second statement.

Example 4.5a For the coefficient matrix A of Example 4.5, one readily finds

We have seen that

H e n c e

Therefore

Consequently, = max{|25.25| + | - 24.75|, | - 24.75| + |25.25|} = 50.

For this example, then, (4.38) states that For all 2-vectors x,

Choosing we get and the second

inequality becomes equality. Choosing

and the first inequality is an equality for this choice.

We now return to our discussion of the relationship between the error in the approximate solution of Ax = b and the residual

We have

Hence e = A^-1r. Therefore, remembering that (A^-1)^-1 = A, we get from (4.38)

(4.40) This gives an upper and a lower bound on the relative error ||e||/||x|| in

4.5 ERROR AND RESIDUAL OF AN APPROXIMATE SOLUTION; NORMS 175

terms of the relative residual ||r||/||b||, namely

(4.41) Here, one can estimate ||x|| from a computed solution for the system Ax = b. Else, use (4.40) in the special case i.e.,

to conclude from (4.41) that

(4.42) The bounds (4.41) and (4.42) are sharp in the following sense.

Whatever A and might be, there are nonzero choices for e or r for which one or the other of the inequalities in (4.41) becomes an equality. If one wants equality in one of the inequalities in (4.42), one would have to choose a particular x as well, but such choices are always possible.

Because of their importance, we state (4.41) and (4.42) in words: The relative error in an approximate solution for the linear system Ax = b can be as large as ||A|| ||A^-1|| times, or, more precisely, as large as

||A^-1|| ||b||/||x|| times, its relative residual, but it can also be as small as 1/(||A|| ||A^-1||) times, or, more precisely, as small as ||b||/(||A|| ||x||) times its relative residual. Hence, if then the relative error and relative residual are always of the same size, and the relative residual can then be safely used as an estimate for the relative error. But the larger

||A|| ||A^-1|| is, the less information about the relative error can be obtained from the relative residual.

The number ||A|| ||A^-1|| is called the condition number of A and is at times abbreviated

Note that the condition number cond(A) for A depends on the matrix norm used and can, for some matrices, vary considerably as the matrix norm is changed. On the other hand, the condition number is always at least 1, since for the identity matrix I, ||I|| = max||x||/||x|| = 1, and by (4.37), ||I|| = ||AA^-1|| < ||A|| ||A^-1||.

Example 4.6 We find from earlier calculations that, for the coefficient matrix A of

Example 4.5, Further, we saw in Example 4.5

that indeed the relative error of an approximate solution can be as large as 100 times its relative residual, but can also be just of its relative residual.

The bounds (4.41) and (4.42) require the number ||A^-1|| which is not readily available. But, in typical situations, a good estimate for ||e|| is the

number with the computed solution of the linear system Ae = r.

Since is usually obtained by Gauss elimination, a factorization for A is available and can therefore be obtained (by SUBST) with much less computational effort than was needed to obtain This presupposes that r is calculated in double precision. The vector so obtained is the first iterate in iterative improvement (Algorithm 4.5) discussed in the next section.

EXERCISES

4.5-1 Verify that

defines a norm for all n-vectors a.

4.5-2 Prove that the matrix norm ||A||₁ associated with the vector norm ||a||₁ of Exercise 4.5-1 can be calculated by

4.5-3 If we interpret a 2-vector a as a point in the plane with coordinates {a1, a2}, then its 2-norm ||a||2 is the euclidean distance of this point from the origin. Further, the set of all vectors of euclidean norm 1 forms a circle around the origin of radius 1. Draw the “circle of radius 1 around the origin” when the distance of the “point” a is measured by (a) the 1-norm

||a||1, (b) the norm ||a||3/2, (c) the euclidean norm ||a||2, (d) the norm ||a||4, (e) the max-norm 4.5-4 With the same interpretation of 2-vectors as points in the plane as used in Exercise 4.5-3, show that, for any two 2-vectors a and b, the three “points” 0, a, and a + b are the vertices of a triangle with sides of (euclidean) length ||a||2, ||b||2, and ||a + b||2, and explain the term “triangle inequality” for property (iii) of norms [Eq. (4.33)].

4.5-5 Show that, for any 2-vectors a and b and any particular vector norm,

4.5-6 Show that, for any 2-vectors a and b, and any number between 0 and 1,

4.5-7 Show that the matrix norm ||A|| = max(||Ax||/||x||) can also be calculated as

4.5-8 Prove all the statements in (4.36) regarding matrix norms.

4.5-9 Use Exercise 4.5-7 to calculate ||A||2, where

(Hint: A 2-vector x has 2-norm ||x||2 = 1 if and only if

4.5-10 Use Exercise 4.4-2 to calculate the condition number of the coefficient matrix A of the system of Exercise 4.2-8; then discuss relative error and relative residuals of the solutions calculated in Exercises 4.2-8 and 4.3-4 in terms of this condition number. Also, calculate tion

for these algorithm 4.4.

solutions (with r calculated in double precision), using just the

substitu-4.6 BACKWARD ERROR ANALYSIS AND ITERATIVE IMPROVEMENT 177

4.6 BACKWARD ERROR ANALYSIS AND ITERATIVE

Dans le document ELEMENTARY NUMERICAL ANALYSIS A n A l g o r i t h m i c A p p r o a c h (Page 184-192)