Condition of a System of Linear Equations

Tips and Tricks

Chapter 3. Linear Systems of Equations

3.3 Condition of a System of Linear Equations

What can we say about the accuracy of the solution when we solve a system of linear equations numerically? A famous early investigation of this question was done by John von Neumann and H. H. Goldstine in 1947 [145]. They conclude their error analysis (over 70 pages long!) with quite pessimistic remarks on the size of linear systems that can be solved on a computer in ﬁnite precision arithmetic, see Table3.1. The number of necessary multiplications they indicate are roughlyn³, which are the operations needed for inversion.

machine precision 10⁻⁸ 10⁻¹⁰ 10⁻¹²

n < 15 50 150

# multiplications 3’500 120’000 3’500’000 Table 3.1.

Pessimistic Error analysis by John von Neumann and H. H. Goldstine

It is the merit of Jim Wilkinson who discovered ﬁrst by experiment that the bounds were too pessimistic and who developed thebackward error anal-ysis which explains much better what happens with calculations on the com-puter.

Consider the system of linear equations

Ax=b, withA∈R^n×n nonsingular.

A perturbed system is ˆAˆx =ˆb, and we assume that ˆaij =aij(1 +εij) and ˆbi =bi(1 +εi), with|εij| ≤εÃand |εi| ≤ε˜b. This perturbation could come from roundoff errors in finite precision arithmetic, or from measurements, or any other source; we are only interested in how much the solution xˆ of the perturbed system differs from the solutionxof the original system. The element-wise perturbations imply

||Aˆ−A|| ≤εA||A||, ||ˆb−b|| ≤εb||b||, (3.46)

Condition of a System of Linear Equations 85

and if the norm is the 1-norm or inﬁnity norm, we haveε_A= ˜ε_Aandε_b= ˜ε_b, otherwise they diﬀer just by a constant.

Theorem 3.5. (Conditioning of the Solution of Linear Sys-tems) Consider two linear systems of equations Ax = b andAˆˆx = ˆb sat-isfying (3.46), and assume that Ais invertible. If εA·κ(A)< 1, then we

have ||xˆ−x||

||x|| ≤ κ(A)

1−εAκ(A)(εA+εB), (3.47) whereκ(A) :=||A|| ||A⁻¹||is the condition number of the matrixA.

Proof. From

ˆb−b= ˆAˆx−Ax= ( ˆA−A)ˆx+A(ˆx−x), we obtain

x−x=A⁻¹(−( ˆA−A)ˆx+ˆb−b), and therefore, using Assumption (3.46),

||xˆ−x|| ≤ ||A⁻¹||(εA||A|| ||xˆ||+εb||b||).

Now we can estimate ||xˆ|| = ||xˆ−x+x|| ≤ ||xˆ−x||+||x||, and ||b|| =

||Ax|| ≤ ||A|| ||x||, which leads to

||xˆ−x|| ≤ ||A⁻¹|| ||A||(εA(||x||+||xˆ−x||) +εb||x||), and thus

||xˆ−x||(1−εAκ(A))≤κ(A)||x||(εA+εb),

which concludes the proof.

This theorem shows that the condition of the problem of solving a linear system of equations is tightly connected to the condition number of the matrix A, regardless of whether the perturbation appears in the matrix or in the right hand side of the linear system. An alternative derivation of the perturbed solution when only the matrix is slightly changed is given in Section6.4.

Ifκ(A) is large then the solutionsxˆ and x will diﬀer signiﬁcantly. Ac-cording to Wilkinson’s Principle (see Section 2.7), the result of a numerical computation is the exact result with slightly perturbed initial data. Thus xˆ will be the result of a linear system with perturbed data of order eps. As a rule of thumb, we have to expect a relative error of eps·κ(A)in the solution.

Computing the condition number is in general more expensive than solv-ing the linear system. For example, if we use the spectral norm, the condition number can be computed by

κ2(A) =||A||2||A⁻¹||2= σmax(A)

σmin(A) = cond(A)inMatlab.

Example 3.8. We consider the matrix

>> A=[21.6257 51.2930 1.5724 93.4650 5.2284 83.4314 37.6507 84.7163 68.3400 3.6422 6.4801 52.5777 67.7589 4.5447 42.3687 9.2995];

choose the exact solution and compute the right hand side

>> x=[1:4]’; b=A*x;

Now we solve the system

>> xa=A\b;

and compare the numerical solution to the exact one.

>> format long e

>> [xa x]

>> cond(A)

>> eps*cond(A) ans =

0.999999999974085 1.000000000000000 1.999999999951056 2.000000000000000 3.000000000039628 3.000000000000000 4.000000000032189 4.000000000000000 ans =

6.014285987206616e+05 ans =

1.335439755935445e-10

We see that the numerically computed solution has about 5 incorrect decimal digits, which corresponds very well to the predicted error due to the large condition number of 6.014×10⁵ of the matrix A. The factor eps κ(A) = 1.33×10⁻¹⁰ indicates well the error after the 10th decimal digit.

Examples of matrices with a very large condition number are theHilbert matrix H withhij = 1/(i+j−1),i, j= 1, . . . , n(H = hilb(n)inMatlab) and Vandermonde matrices, which appear for example in interpolation and whose columns are powers of a vectorv, that isaij=v_i^n−j(A=vander(v)in Matlab). Table3.2shows the condition numbers forn= 3, . . . ,8,where we have usedv = [1:n]for the Vandermonde matrices.

Matrices with a small condition number are for example orthogonal ma-trices, U such thatUU =I, which gives for the spectral condition number κ2(U) =||U||2||U⁻¹||2=||U||2||U||2= 1, since orthogonal transformations preserve Euclidean length.

Thecondition number κ(A) satisﬁes several properties:

1. κ(A) ≥ 1, provided the matrix norm satisﬁes the submultiplicative property, since 1 =||I||=||A⁻¹A|| ≤ ||A|| ||A⁻¹||.

Condition of a System of Linear Equations 87 n cond(hilb(n)) cond(vander([1:n]))

3 5.2406e+02 7.0923e+01

4 1.5514e+04 1.1710e+03

5 4.7661e+05 2.6170e+04

6 1.4951e+07 7.3120e+05

7 4.7537e+08 2.4459e+07

8 1.5258e+10 9.5211e+08

Table 3.2. Matrices with large condition numbers

2. κ(αA) =ακ(A).

3. κ(A) =max_||y||=1||Ay||

min_||z||=1||Az||, since

||A⁻¹||= max

x=0

||A⁻¹x||

||x|| = max

z=0

||z||

||Az|| = min

z=0

||Az||

||z||

−1

. Another useful property of the condition number is that it measures how far Ais from a singular matrix, in a relative sense.

Theorem 3.6. LetE∈R^n×n. If E<1, thenI+E is non-singular, and we have

1 +E≤ (I+E)⁻¹ ≤ 1

1− E. (3.48)

Proof. To show thatI+Eis non-singular, let us argue by contradiction by assuming that I+E is singular. Then there exists a non-zero vector x such that (I+E)x= 0. Then we have

x=−Ex =⇒ x=Ex ≤ E x.

Since x= 0, we can divide both sides by x and conclude thatE ≥1, which contradicts the hypothesis thatE<1. HenceI+Eis non-singular.

To show the ﬁrst inequality in (3.48), we lety= (I+E)⁻¹x. Then x= (I+E)y =⇒ x ≤(1 +E)y,

which implies

(I+E)⁻¹x x = y

x ≥ 1 1 +E.

Since this is true for all x = 0, we can take the maximum of the left hand side over allx= 0 and obtain

(I+E)⁻¹ ≥ 1 1 +E,

as required. For the other inequality, note that

y ≤ x+y−x ≤ x+ −Ey ≤ x+E y, which implies

(1− E)y ≤ x.

Since 1−E>0, we can divide both sides by (1−E)xwithout changing the inequality sign. This yields

(I+E)⁻¹x x = y

x ≤ 1 1− E.

Maximizing the left hand side over allx= 0 yields(I+E)⁻¹ ≤1/(1−E),

as required.

Corollary 3.1. LetA∈Rⁿ^×ⁿbe a non-singular matrix, E∈Rⁿ^×ⁿand r=E/A. Ifr κ(A)<1, then A+Eis non-singular and

(A+E)⁻¹

A⁻¹ ≤ 1 1−r κ(A). Proof. By (3.48), we have

(A+E)⁻¹=(I+A⁻¹E)⁻¹A⁻¹

≤ (I+A⁻¹E)⁻¹ A⁻¹ ≤ A⁻¹

1− A⁻¹E ≤ A⁻¹ 1− A⁻¹ E. Substituting E=rAand dividing byA⁻¹gives the required result.

Note that the hypothesis implies that A+E will be nonsingular pro-vided that the perturbation E is small enough, in the relative sense, with respect to the condition number κ(A). In other words, ifκ(A) is small, i.e., if A is well conditioned, then a fairly large perturbation is needed to reach singularity. On the other hand, there always exists a perturbation E with E2=A2/κ2(A) such thatA+Eis singular, cf. Problem3.19.

Dans le document Scientiﬁ c Computing (Page 103-107)