• Aucun résultat trouvé

Iterative solvers for large sparse linear systems. Jocelyne Erhel - ALADIN group - INRIA-RENNES

N/A
N/A
Protected

Academic year: 2022

Partager "Iterative solvers for large sparse linear systems. Jocelyne Erhel - ALADIN group - INRIA-RENNES"

Copied!
55
0
0

Texte intégral

(1)

Iterative solvers for large sparse linear systems

Jocelyne Erhel - ALADIN group - INRIA-RENNES

3eme Cycle Romand de Math´ematiques Universit´e de Neuchˆatel

Aoˆut 2001

(2)

Krylov iterative methods

Solve Ax = b, with A sparse matrix of order n

A symmetric positive definite

A symmetric indefinite

A non symmetric

(3)

A symmetric positive definite

One method of choice : Conjugate Gradient (CG)

algorithm

properties

convergence

preconditioning

(4)

Conjugate Gradient

Algorithm

Initialisation choose x0

p0 = r0 = b Ax0 For k = 0,1. . .

αk = krkk2

(Apk,pk)

xk+1 = xk + αkpk rk+1 = rk αkApk βk+1 = krk+1k

2

krkk2

pk+1 = rk+1 + βk+1pk End For

Properties

(rk+1,pk) = 0 (rk+1,rk) = 0 (pk+1,Apk) = 0

krk+1kA1 = minαkrk αApk1kA1

(5)

Conjugate Gradient - properties

Orthogonality and minimisation (rk,pi) = (rk,ri) = 0, i k 1 (pk,Api) = 0, i k 1

krk+1kA1 ≤ krkkA1

krkkA1 = minxx

0+Span(p0,...,pk1) kb AxkA1

Krylov method

Kk(A,r0) = Span(r0,Ar0, . . . ,Ak1r0) Krylov space

Kk(A,r0) = Span(r0,r1, . . . ,rk1) = Span(p0,p1, . . . ,pk1) Projection method

xk x0 + Kk(A,r0) Space condition rk ⊥ Kk(A,r0) Galerkin condition

(6)

Conjugate Gradient - convergence

Polynomial method xk = x0 + Qk1(A)r0

rk = (I AQk1(A))r0 = Pk(A)r0 with Pk(0) = 1 Minmax property - asymptotic convergence

A = V ∆V 1 with ∆ = diag(λ1, . . . ,λn) 0 < λ1 . . . λn and κ(A) = λn1 krkkA1 ≤ kr0kA1 max{λ

j} |Pkj)|

≤ kr0kA1 min{P/deg(P)=k P(0)=1} maxλ

1tλn |P(t)| krkkA1 2kr0kA1

κ(A)1

κ(A)+1

k

(7)

Conjugate Gradient - asymptotic convergence

0 50 100 150 200 250 300

10−10 10−8 10−6 10−4 10−2 100 102 104

A=diag(1:10k−3:10k) − n=1000 − cond(A)=10k

iterations

residu relatif

k=1 k=2 k=3 k=5 k=10

(8)

Conjugate Gradient - superlinear convergence

0 50 100 150 200 250 300 350 400 450

10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101

A=diag([0.01:0.01:0.0n] [1:1000−n])

residu relatif

n=0

n=3

n=5

n=8

(9)

Preconditioned Conjugate Gradient (PCG)

symmetric positif definite preconditioning matrix M Algorithm

Initialisation choose x0 r0 = b Ax0 z0 = M1r0 p0 = z0

For k = 0,1. . . αk = (rk,zk)

(Apk,pk)

xk+1 = xk + αkpk rk+1 = rk αkApk zk+1 = M1rk+1 βk+1 = (rk+1,zk+1)

(rk,zk)

pk+1 = zk+1 + βk+1pk End For

(10)

Symmetric positive definite preconditionings

diagonal or Jacobi

SSOR

m-step SSOR

Schwarz

polynomial

Incomplete Cholesky IC(k) or ICT

Approximate inverse

multigrid

multilevel

etc

(11)

Jacobi and SSOR preconditionings

decomposition A = D + L + U,

D diagonal, L lower triangular, U upper triangular Jacobi

M = D : parallel but slow convergence

SSOR

M = (D + L)D1(D + U) : faster convergence but sequential If A is symmetric positive definie, M too

Block versions

D,L,U Block-diagonal and block-triangular matrices faster convergence

(12)

Incomplete Cholesky factorisations

A = LDLT + R

various strategies to choose R IC(0) : no fill-in

IC(k) : fill-in up to level k

ICT(α) : fill-in with threshold value α

(13)

PCG - comparisons

Laplacian on a Finite differences grid

0 5 10 15 20 25 30 35 40 45 50

0

5

10

15

20

25

30

35

40

45

50

nz = 1857 Grille (C,50)

0 20 40 60 80 100 120 140

10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101

Laplacien sur la grille (C,50) − n=1857 − nnz=9093 − PCG

iterations

residu relatif

diagonal

SSOR

IC(0) IC(0.01)

IC(0.0001)

(14)

Symmetric Lanczos

Symmetric Lanczos method

Span(Vk) = Kk(A,v1) Basis of Krylov subspace VkTVk = I orthonormal system

AVk = VkTk + δk+1vk+1eTk Tk =

γ1 δ2

δ2 γ2 . . .

. . . δk δk γk

Relation with CG

r0 = kr0k2v1 = βv1 Krylov subspace xk = x0 + Vky y Rk Space condition rk = r0 AVky = Vk(βe1 Tky) δk+1(eTk y)vk+1

VkTrk = 0 Tky = βe1 Galerkin condition CG : Cholesky Factorisation of Tk

and minimisation of krkkA1

(15)

A symmetric indefinite

Symmetric Lanczos

AVk = VkTk + δk+1vk+1eTk = Vk+1Tk Tk can be singular

SYMMLQ : rk ⊥ Kk(A,r0)

xk = x0 + Vky, Tky = βe1 with LQ factorisation CR and MINRES : rk+1 AKk(A,r0) or min krkk2

CR : rk+1T Ark = 0 and (Apk+1)TApk = 0 MINRES : minkβe1 Tkyk2

Adapted preconditioning : open problem

(16)

A non symmetric

More difficult ...

Not possible to get

a short recurrence together with a minimisation of the residual

Minimisation : GMRES

Short recurrence : Bi-Conjugate Gradient, QMR, etc

Preconditioning

(17)

Arnoldi and GMRES

Arnoldi process

Span(Vk) = Kk(A,v1) Basis of Krylov space VkTVk = I Orthonormal system AVk = VkHk + hk+1,kvk+1eTk HkHessenberg matrix GMRES algorithm

r0 = kr0k2v1 = βv1 Krylov space xk = x0 + Vky y Rk Space condition

rk AVk Galerkin condition

rk = r0 AVky = Vk+1(βe1 Hky) Hk = Hk hk+1,keTk

!

(AVk)Trk = HTk (βe1 Hky)

Solve minyRk kβe1 Hkyk2 Galerkin condition

(18)

GMRES - Convergence

Polynomial method xk = x0 + Qk1(A)r0

rk = (I AQk1(A))r0 = Pk(A)r0 avec Pk(0) = 1 Minmax property

IF A = U∆U1 with ∆ = diag(λ1, . . . ,λn)

krkk2 ≤ kr0k2κ(U) min{P/deg(P)=k P(0)=1} max1jn |Pj)|

(19)

GMRES - restarting

Memory space and complexity k it´erations

Arnoldi : O(k × nz) + O(k2 × n) operations Least-squares : negligible

Storage of k + 3 vectors of length n Restarted GMRES(m)

Initialisation choix de x0

Until convergence

k it´erations de GMRES xm = x0 + Vmy

x0 = xm End Until

Risk of stagnation

(20)

GMRES - algorithm

Initialisation choose x0

r0 = b Ax0

Until convergence Do Arnoldi process

v1 = kr0

r0k2

For j = 1,m w = Avj For i = 1,j

hij = viTw w = w hijvi End For

hj+1,j = kwk2 vj+1 = w/hj+1,j Givens rotations

Hj = QjRj compute krjk2 convergence test End For

Least-squares problem

compute ym solution of miny (kr0k2e1 Hmy) xm = x0 + Vmym

rm = b Axm convergence test Restarting

x0 = xm r0 = rm

End Do

(21)

GMRES - convergence with varying eigenvalues

0 50 100 150 200 250 300

10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

A=V D inv(V) − n=1000 − cond(A)=10k − cond(V)=19

residu relatif

k=1

k=3

k=5

k=9

(22)

GMRES - convergence with varying eigenvectors

0 50 100 150 200 250 300

10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

103

104 106

108

A = V D inv(V) − D=diag(1:1000) − cond(V) variable

(23)

Bi-Conjugate Gradient (Bi-CG)

Algorithm

Initialisation

choose x0 and ˜x0

r0 = b Ax0 et ˜r0 = b ATx˜0 p0 = r0 et ˜p0 = ˜r0

For k = 0,1. . . αk = (rkrk)

(Apkpk)

xk+1 = xk + αkpk

˜xk+1 = ˜xk + αkp˜k rk+1 = rk αkApk

˜rk+1 = ˜rk αkATp˜k βk+1 = (rk+1rk+1)

(rkrk)

pk+1 = rk+1 + βk+1pk p˜k+1 = ˜rk+1 + βk+1p˜k End For

Properties

(rk+1,˜rk) = 0

(Apk+1,˜pk) = (pk+1,ATp˜k) = 0

(24)

Bi-CG - properties

Orthogonality

(rk,˜ri) = 0, i k 1 (˜pk,Api) = 0, i k 1 No minimisation

Krylov method

Kk(A,r0) = Span(r0,Ar0, . . . ,Ak1r0) Krylov space with A

Kk(AT,˜r0) = Span(˜r0,ATr0, . . . ,(AT)k1r˜0) Krylov space with AT Projection method

xk x0 + Kk(A,r0) Space condition rk ⊥ Kk(AT,˜r0) Galerkin condition

(25)

Bi-CG - equivalent formulation

Augmented matrix A 0 0 AT

!

Krylov projection method xk

x˜k

!

x0 x˜0

!

+ Kk(A,r0) Kk(AT,˜r0)

!

Space condition

˜rk rk

!

Kk(A,r0) Kk(AT,˜r0)

!

Galerkin condition

(26)

Non symmetric Lanczos

Method

Span(Vk) = Kk(A,v1) Basis of Krylov space

Span(Wk) = Kk(AT,w1) Basis of dual Krylov space WkTVk = I Bi-orthogonal system

AVk = VkTk + δk+1vk+1eTk Tk =

γ1 η2

δ2 γ2 . . .

. . . ηk δk γk

ATWk = WkTkT + ηk+1wk+1eTk Relation with Bi-CG

r0 = kr0k2v1 = βv1 Krylov space xk = x0 + Vky y Rk Space condition rk = r0 AVky = Vk(βe1 Tky) δk+1(eTk y)vk+1

WkTrk = 0 Tky = βe1 Galerkin condition Bi-CG : Gauss Factorisation of Tk

(27)

Bi-CG - Convergence - Variants

risk of Breakdown in Lanczos : (rk,˜rk) = 0 Lanczos version with Look-Ahead

irregular convergence

Product by AT : Transpose-Free version BICGSTAB Smoother convergence of BICGSTAB

risk of Breakdown in LU : (˜pk,Apk) = 0 QMR Algorithm

(28)

Quasi-minimum Residual QMR

Algorithm

Non symmetric Lanczos with look-ahead

rk = Vk+1(βe1 Tky) Tk = Tk δk+1eTk

!

k+1 = diag(kv1k2, . . . ,kvk+1k2) Solve minyRk kβe1 k+1Tkyk2

Convergence No breakdown

IF Tk is diagonalisable krkk2 ≤ kr0k2

√k + 1κ(Tk) min{P/deg(P)=k P(0)=1} max1jn |Pj)|

(29)

Some Preconditionings

diagonal or Jacobi

Gauss-Seidel, SOR

m-step SOR

Schwarz

polynomial

Incomplete Gauss ILU(k) or ILUT

Approximate inverse

multigrid

multilevel

etc

(30)

Comparison between Bi-CG, QMR, GMRES

matrices from Harwell-Boeing (HB) collection and from matrix market tests done with Matlab

(31)

Matrix Sherman4 (HB) - n=1104-nz=3786

0 50 100 150 200 250 300 350 400 450 500

10−8 10−6 10−4 10−2 100 102

sherman4 − no preconditioning

matrix−vector products

residual

qmr full−gmres

bicgstab

gmres(20)

gmres(50)

gmres(80)

(32)

Matrix Sherman4 (HB) - n=1104 - nz=3786

0 5 10 15 20 25 30 35 40

10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101

sherman4 − preconditioning ILU(0.01)

residual

qmr

bicgstab full−gmres

(33)

Matrix Sherman5 (HB) - n=3312 - nz=20793

0 1000 2000 3000 4000 5000 6000

10−10 10−8 10−6 10−4 10−2 100 102 104

sherman5 − no preconditioning

matrix−vector products

residual

bicgstab

qmr full−gmres

gmres(50)

gmres(200)

(34)

Matrix Sherman5 (HB)- n=3312 - nz=20793

1 2 3 4 5 6 7 8 9

10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100

sherman5 − preconditioning ILU(1.e−5)

residual

qmr

bicgstab

full−gmres

(35)

3D Biharmonic equations

Joint work with Irfan Altas (Charles Sturt University, Wagga Wagga, Australia) and Murli Gupta (George Washington University, Washing- ton DC, USA)

Ω is a closed convex domain in three- dimensions and ∂Ω is its boundary.

(x,y,z)

Biharmonic equation

4u

∂x4 + 4u

∂y4 + 4u

∂z4 + 2 4u

∂x2∂y2 + 2 4u

∂x2∂z2 + 2 4u

∂y2∂z2 = f(x,y,z), with Dirichlet boundary conditions

u = g1(x,y,z), ∂u

∂n = g2(x,y,z) (x,y,z) ∂Ω.

(36)

Choice of Finite Difference scheme

3D uniform grid

4 unknowns at each grid point (xi,yj,zk) :

u(xi,yj,zk),∂u∂x(xi,yj,zk),∂u∂y(xi,yj,zk),∂u∂z(xi,yj,zk)

Compact Finite Difference approximation

Exact boundary conditions

no modification near the boundary

values of gradients already available

(37)

Compact 27 points 3D Finite Difference cell

−1

0

1

−1 0

1

−1 0 1

(38)

Linear solving methods

Large sparse non symmetric matrices

Grid N × N × N - Matrix order n = (N + 1)3 4

examples : N = 32 n = 143,748 and N = 64 n = 1,098,500

Matrix-free iterative solvers Matrix-free preconditioners

No fast solver SSOR too slow

Either MULTIGRID or KRYLOV methods

Comparison between BICGSTAB, QMR, MULTIGRID With None, SSOR, m-step SSOR, Multigrid preconditioners

(39)

Comparison between BICGSTAB and QMR

0 20 40 60 80 100 120 140 160 180 200

10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101

102 Problem 1 − QMR and BICGSTAB

number of iterations

residual BICGSTAB − N=8

BICGSTAB − N=16 BICGSTAB − N=32 QMR − N=8 QMR − N=16 QMR − N=32

(40)

BICGSTAB - Comparison of preconditioners

0 100 200 300 400 500 600 700 800 900 1000

10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101 102

Problem 1 − Grid 32 − BICGSTAB

number of matvec

residual

NO PREC SSOR 5−STEP SSOR MGRID−V CYCLE(2)

(41)

BICGSTAB - Comparison of preconditioners

0 500 1000 1500 2000 2500 3000

10−4 10−3 10−2 10−1 100 101 102

Problem 1 − Grid 64 − BICGSTAB

number of matvec

residual

NO PREC SSOR 5−STEP SSOR MGRID−V CYCLE(4)

(42)

Comparison between BICGSTAB and MULTIGRID

0 100 200 300 400 500 600 700 800 900

10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101

Problem 1 − Multigrid and BICGSTAB

error

Multigrid − N=64 Multigrid − N=32 BICGSTAB − N=64 BICGSTAB − N=32

(43)

Parallel Krylov methods

Parallel PCG

Parallel GMRES

Parallel Schwarz preconditionings

(44)

PCG - dependencies

Algorithm

For k = 0,1. . . qk = Apk

αk = (rk,zk)

(qk,pk)

xk+1 = xk + αkpk rk+1 = rk αkApk zk+1 = M1rk+1 βk+1 = (rk+1,zk+1)

(rk,zk)

pk+1 = zk+1 + βk+1pk End For

Operations

sparse matrix-vector pro- duct

scalar product vector operation vector operation linear system

scalar product vector operation

Sequence of vector and matrix operations

(45)

Parallel PCG

Parallel sparse matrix-vector product

Parallel preconditioning

Parallel operations between vectors

Synchronisations after each dot product

(46)

GMRES - dependencies

Sequence in Arnoldi with dot products Sparse matrix-vector product

Sequential least-squares solving

(47)

Parallel Arnoldi

First compute a basis then orthogonalise : less dependencies

(48)

Schwarz preconditionings

Domain partition into subdomains with overlapping

SD 2 SD 1

INTERFACE

p subdomains Ωi

Ri restriction of Ω in Ωi RTi extension of Ωi in Ω A in Ωi : Ai = RTi ARi

Mi preconditioning of Ai

(49)

multiplicative Schwarz Preconditioning

Solving M z = r Algorithm

r1 = R1r M1z1 = r1 z(1) = RT1z1 t = Az(1) t2 = R2t r2 = R2r

M2z2 = r2 t2 z(2) = RT2z2 z = z(1) + z(2)

Data dependencies

t2 6= 0 if both subdo- mains overlap

subdomain 2 after subdo- main 1

global addition in z

M = RT1M1R1 + R2TM2R2(I ART1M1R1)

(50)

Additive Schwarz preconditioning

Solving M z = r Algorithm

r1 = R1r M1z1 = r1 z(1) = RT1z1 r2 = R2r M2z2 = r2 z(2) = RT2z2 z = z(1) + z(2)

Data dependencies

subdomain 2 in parallel with subdomain 1

global addition in z

M = RT1M1R1 + R2TM2R2

(51)

Coarse grid correction

0 set of interfaces between subdomains R0 restriction and RT0 extension

A0 = RT0AR0 et M0 preconditioning correction R0TM0R0

M = RT1M1R1 + R2TM2R2 + RT0M0R0 sequential correction

Number of iterations independent of number of subdomains

(52)

Some bibliography and software

Bibliography

Software

(53)

Bibliography

Y. Saad, Iterative methods for sparse linear systems, PWS Publi- shing Company, 1996

http://www-users.cs.umn.edu/ saad/books.html

G. Meurant, Computer solution of large linear systems, North Hol- land, 1999

http://perso.wanadoo.fr/gerard.meurant/#GACC

Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition, R. Barrett et al., SIAM, 1994

http://www.netlib.org/templates/index.html

J. Erhel, chapter of book in preparation (in french)

etc

(54)

Software

list : http://www.netlib.org/utk/papers/iterative-survey/

PETSc : http://www-fp.mcs.anl.gov/petsc/

Partial, Ordinary and Algebraic Differential Equations, Nonlinear and linear systems,

on distributed memory computers

Parpre : http://www.cs.utk.edu/ eijkhout/parpre.html Parallel preconditionings for iterative methods

uses Petsc, part of Scalapack project, V. Eijkhout and T. Chan

Psparslib : http://www.cs.umn.edu/Research/arpa/p sparslib/psp- abs.html

Parallel iterative methods, Y. Saad and al.

Matlab : http://www.mathworks.com

Scilab and extension Scilin

iterative methods, Aladin group

etc

(55)

Conclusion

Multigrid methods efficient in regular cases

Krylov methods efficient in general cases

Preconditioning : necessary and difficult

Multigrid or multilevel preconditioning : efficient for large problems

Références

Documents relatifs

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

One important pattern that can be recognised from EC us- age in the above proposals is the coherence between shared phenomena defined on a problem diagram and the fluents and

Among medical diagnoses, neurological disorders contributed to decisional incapacity, and among the psychiatric diagnoses, cognitive disorders were most frequently documented in

One route to the parallel scalable solution of large sparse linear systems in parallel scientific computing is the use of hybrid methods that hierarchically combine direct and

Without specific markers of anemia, the indications for a red cell transfusion differ from one neonatal unit to another, hence the need to evaluate our transfusions criteria

Abstract— In a Multiple Input Multiple Output (MIMO) context, we present, in this paper, a Space-Time Bit Interleaved Coded Modulation (STBICM) system combined with a space-

More- over, when the number of equations is not much more than the number of un- knowns, iterating in the lsqr algorithm using LL −1 1 usually requires relatively few iterations for

As the asymptotic behaviour analysis of the CIRs w.r.t multiple delay parameters corresponds to an open problem, we propose an indirect yet effective methodology called the