Iterative solvers for large sparse linear systems. Jocelyne Erhel - ALADIN group - INRIA-RENNES

(1)

Iterative solvers for large sparse linear systems

Jocelyne Erhel - ALADIN group - INRIA-RENNES

3eme Cycle Romand de Mathématiques Université de Neuchâtel

Aoˆut 2001

(2)

Krylov iterative methods

Solve Ax = b, with A sparse matrix of order n

• A symmetric positive definite

• A symmetric indefinite

• A non symmetric

(3)

A symmetric positive definite

One method of choice : Conjugate Gradient (CG)

• algorithm

• properties

• convergence

• preconditioning

(4)

Conjugate Gradient

Algorithm

Initialisation choose x₀

p₀ = r₀ = b − Ax₀ For k = 0,1. . .

α_k = ^k^r^k^k²

(Ap_k,p_k)

x_k+1 = x_k + α_kp_k r_k+1 = r_k − α_kAp_k β_k+1 = ^k^r^k+1^k

2

kr_kk²

p_k+1 = r_k+1 + β_k+1p_k End For

Properties

(r_k+1,p_k) = 0 (r_k+1,r_k) = 0 (p_k+1,Ap_k) = 0

kr_k+1k_A−1 = min_αkr_k − αAp_k₋₁k_A−1

(5)

Conjugate Gradient - properties

Orthogonality and minimisation (r_k,p_i) = (r_k,r_i) = 0, i ≤ k − 1 (p_k,Ap_i) = 0, i ≤ k − 1

kr_k+1k_A−1 ≤ kr_kk_A−1

kr_kk_A−1 = min_x_∈_x

0+Span(p₀,...,p_k₋₁) kb − Axk_A−1

Krylov method

Kk(A,r₀) = Span(r₀,Ar₀, . . . ,A^k⁻¹r₀) Krylov space

Kk(A,r₀) = Span(r₀,r₁, . . . ,r_k₋₁) = Span(p₀,p₁, . . . ,p_k₋₁) Projection method

x_k ∈ x₀ + Kk(A,r₀) Space condition r_k ⊥ Kk(A,r₀) Galerkin condition

(6)

Conjugate Gradient - convergence

Polynomial method x_k = x₀ + Q_k₋₁(A)r₀

r_k = (I − AQ_k₋₁(A))r₀ = P_k(A)r₀ with P_k(0) = 1 Minmax property - asymptotic convergence

A = V ∆V ⁻¹ with ∆ = diag(λ₁, . . . ,λ_n) 0 < λ₁ ≤ . . . ≤ λ_n and κ(A) = λ_n/λ₁ kr_kk_A−1 ≤ kr₀k_A−1 max_{_λ

j} |P_k(λ_j)|

≤ kr₀k_A−1 min_{_P/deg(P_{)=k P}₍₀₎₌₁_} max_λ

1≤t≤λ_n |P(t)| kr_kk_A−1 ≤ 2kr₀k_A−1

√

κ(A)−1

√κ(A)+1

_k

(7)

Conjugate Gradient - asymptotic convergence

0 50 100 150 200 250 300

10⁻¹⁰ 10⁻⁸ 10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 10² 10⁴

A=diag(1:10^k−3:10^k) − n=1000 − cond(A)=10^k

iterations

residu relatif

k=1 k=2 k=3 k=5 k=10

(8)

Conjugate Gradient - superlinear convergence

0 50 100 150 200 250 300 350 400 450

10⁻⁹ 10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹

A=diag([0.01:0.01:0.0n] [1:1000−n])

residu relatif

n=0

n=3

n=5

n=8

(9)

Preconditioned Conjugate Gradient (PCG)

symmetric positif definite preconditioning matrix M Algorithm

Initialisation choose x₀ r₀ = b − Ax₀ z₀ = M⁻¹r₀ p₀ = z₀

For k = 0,1. . . α_k = ^(r^k^,z^k⁾

(Ap_k,p_k)

x_k+1 = x_k + α_kp_k r_k+1 = r_k − α_kAp_k z_k+1 = M⁻¹r_k+1 β_k+1 = ^(r^k+1^,z^k+1⁾

(r_k,z_k)

p_k+1 = z_k+1 + β_k+1p_k End For

(10)

Symmetric positive definite preconditionings

• diagonal or Jacobi

• SSOR

• m-step SSOR

• Schwarz

• polynomial

• Incomplete Cholesky IC(k) or ICT

• Approximate inverse

• multigrid

• multilevel

• etc

(11)

Jacobi and SSOR preconditionings

decomposition A = D + L + U,

D diagonal, L lower triangular, U upper triangular Jacobi

M = D : parallel but slow convergence

SSOR

M = (D + L)D⁻¹(D + U) : faster convergence but sequential If A is symmetric positive definie, M too

Block versions

D,L,U Block-diagonal and block-triangular matrices faster convergence

(12)

Incomplete Cholesky factorisations

A = LDL^T + R

various strategies to choose R IC(0) : no fill-in

IC(k) : fill-in up to level k

ICT(α) : fill-in with threshold value α

(13)

PCG - comparisons

Laplacian on a Finite differences grid

0 5 10 15 20 25 30 35 40 45 50

0

5

10

15

20

25

30

35

40

45

50

nz = 1857 Grille (C,50)

0 20 40 60 80 100 120 140

10⁻⁹ 10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹

Laplacien sur la grille (C,50) − n=1857 − nnz=9093 − PCG

iterations

residu relatif

diagonal

SSOR

IC(0) IC(0.01)

IC(0.0001)

(14)

Symmetric Lanczos

Symmetric Lanczos method

Span(V_k) = Kk(A,v₁) Basis of Krylov subspace V_k^TV_k = I orthonormal system

AV_k = V_kT_k + δ_k+1v_k+1e^T_k T_k =







γ₁ δ₂

δ₂ γ₂ . . .

. . . δ_k δ_k γ_k







Relation with CG

r₀ = kr₀k2v₁ = βv₁ Krylov subspace x_k = x₀ + V_ky y ∈ R^k Space condition r_k = r₀ − AV_ky = V_k(βe₁ − T_ky) − δ_k+1(e^T_k y)v_k+1

V_k^Tr_k = 0 ⇔ T_ky = βe₁ Galerkin condition CG : Cholesky Factorisation of T_k

and minimisation of kr_kk_A−1

(15)

A symmetric indefinite

Symmetric Lanczos

AV_k = V_kT_k + δ_k+1v_k+1e^T_k = V_k+1T_k T_k can be singular

SYMMLQ : r_k ⊥ Kk(A,r₀)

x_k = x₀ + V_ky, T_ky = βe₁ with LQ factorisation CR and MINRES : r_k+1 ⊥ AKk(A,r₀) or min kr_kk2

CR : r_k+1^T Ar_k = 0 and (Ap_k+1)^TAp_k = 0 MINRES : minkβe₁ − T_kyk2

Adapted preconditioning : open problem

(16)

A non symmetric

More difficult ...

Not possible to get

a short recurrence together with a minimisation of the residual

• Minimisation : GMRES

• Short recurrence : Bi-Conjugate Gradient, QMR, etc

• Preconditioning

(17)

Arnoldi and GMRES

Arnoldi process

Span(V_k) = Kk(A,v₁) Basis of Krylov space V_k^TV_k = I Orthonormal system AV_k = V_kH_k + h_k+1,kv_k+1e^T_k H_kHessenberg matrix GMRES algorithm

r₀ = kr₀k2v₁ = βv₁ Krylov space x_k = x₀ + V_ky y ∈ R^k Space condition

r_k ⊥ AV_k Galerkin condition

r_k = r₀ − AV_ky = V_k+1(βe₁ − H_ky) H_k = H_k h_k+1,ke^T_k

!

(AV_k)^Tr_k = H^T_k (βe₁ − H_ky)

Solve min_y_∈_R_k kβe₁ − H_kyk2 Galerkin condition

(18)

GMRES - Convergence

Polynomial method x_k = x₀ + Q_k₋₁(A)r₀

r_k = (I − AQ_k₋₁(A))r₀ = P_k(A)r₀ avec P_k(0) = 1 Minmax property

IF A = U∆U⁻¹ with ∆ = diag(λ₁, . . . ,λ_n)

kr_kk2 ≤ kr₀k2κ(U) min_{_P/deg(P_{)=k P}₍₀₎₌₁_} max₁_≤_j_≤_n |P(λ_j)|

(19)

GMRES - restarting

Memory space and complexity k it´erations

Arnoldi : O(k × nz) + O(k² × n) operations Least-squares : negligible

Storage of k + 3 vectors of length n Restarted GMRES(m)

Initialisation choix de x₀

Until convergence

k it´erations de GMRES x_m = x₀ + V_my

x₀ = x_m End Until

Risk of stagnation

(20)

GMRES - algorithm

Initialisation choose x0

r0 = b − Ax0

Until convergence Do Arnoldi process

v1 = _k^r⁰

r0k²

For j = 1,m w = Av_j For i = 1,j

h_ij = v_i^Tw w = w − h_ijv_i End For

h_j+1,j = kwk² v_j+1 = w/h_j+1,j Givens rotations

H_j = Q_jR_j compute kr_jk² convergence test End For

Least-squares problem

compute y_m solution of min_y (kr0k²e1 − H_my) x_m = x0 + V_my_m

r_m = b − Ax_m convergence test Restarting

x0 = x_m r0 = rm

End Do

(21)

GMRES - convergence with varying eigenvalues

0 50 100 150 200 250 300

10⁻⁹ 10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

A=V D inv(V) − n=1000 − cond(A)=10^k − cond(V)=19

residu relatif

k=1

k=3

k=5

k=9

(22)

GMRES - convergence with varying eigenvectors

0 50 100 150 200 250 300

10⁻⁹ 10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

10³

10⁴ 10⁶

10⁸

A = V D inv(V) − D=diag(1:1000) − cond(V) variable

(23)

Bi-Conjugate Gradient (Bi-CG)

Algorithm

Initialisation

choose x₀ and ˜x₀

r₀ = b − Ax₀ et ˜r₀ = b − A^Tx˜₀ p₀ = r₀ et ˜p₀ = ˜r₀

For k = 0,1. . . α_k = ^(r^k^,˜^r^k⁾

(Ap_k,˜p_k)

x_k+1 = x_k + α_kp_k

˜x_k+1 = ˜x_k + α_kp˜_k r_k+1 = r_k − α_kAp_k

˜r_k+1 = ˜r_k − α_kA^Tp˜_k β_k+1 = ^(r^k+1^,˜^r^k+1⁾

(r_k,˜r_k)

p_k+1 = r_k+1 + β_k+1p_k p˜_k+1 = ˜r_k+1 + β_k+1p˜_k End For

Properties

(r_k+1,˜r_k) = 0

(Ap_k+1,˜p_k) = (p_k+1,A^Tp˜_k) = 0

(24)

Bi-CG - properties

Orthogonality

(r_k,˜r_i) = 0, i ≤ k − 1 (˜p_k,Ap_i) = 0, i ≤ k − 1 No minimisation

Krylov method

Kk(A,r₀) = Span(r₀,Ar₀, . . . ,A^k⁻¹r₀) Krylov space with A

Kk(A^T,˜r₀) = Span(˜r₀,A^Tr₀, . . . ,(A^T)^k⁻¹r˜₀) Krylov space with A^T Projection method

x_k ∈ x₀ + Kk(A,r₀) Space condition r_k ⊥ Kk(A^T,˜r₀) Galerkin condition

(25)

Bi-CG - equivalent formulation

Augmented matrix A 0 0 A^T

!

Krylov projection method x_k

x˜_k

!

∈ x₀ x˜₀

!

+ Kk(A,r₀) Kk(A^T,˜r₀)

!

Space condition

˜r_k r_k

!

⊥ Kk(A,r₀) Kk(A^T,˜r₀)

!

Galerkin condition

(26)

Non symmetric Lanczos

Method

Span(V_k) = Kk(A,v₁) Basis of Krylov space

Span(W_k) = Kk(A^T,w₁) Basis of dual Krylov space W_k^TV_k = I Bi-orthogonal system

AV_k = V_kT_k + δ_k+1v_k+1e^T_k T_k =







γ₁ η₂

δ₂ γ₂ . . .

. . . η_k δ_k γ_k







A^TW_k = W_kT_k^T + η_k+1w_k+1e^T_k Relation with Bi-CG

r₀ = kr₀k2v₁ = βv₁ Krylov space x_k = x₀ + V_ky y ∈ R^k Space condition r_k = r₀ − AV_ky = V_k(βe₁ − T_ky) − δ_k+1(e^T_k y)v_k+1

W_k^Tr_k = 0 ⇔ T_ky = βe₁ Galerkin condition Bi-CG : Gauss Factorisation of T_k

(27)

Bi-CG - Convergence - Variants

risk of Breakdown in Lanczos : (r_k,˜r_k) = 0 Lanczos version with Look-Ahead

irregular convergence

Product by A^T : Transpose-Free version BICGSTAB Smoother convergence of BICGSTAB

risk of Breakdown in LU : (˜p_k,Ap_k) = 0 QMR Algorithm

(28)

Quasi-minimum Residual QMR

Algorithm

Non symmetric Lanczos with look-ahead

r_k = V_k+1(βe₁ − T_ky) T_k = T_k δ_k+1e^T_k

!

Ω_k+1 = diag(kv₁k2, . . . ,kv_k+1k2) Solve min_y_∈_R_k kβe₁ − Ω_k+1T_kyk2

Convergence No breakdown

IF T_k is diagonalisable kr_kk2 ≤ kr₀k2

√k + 1κ(T_k) min_{_P/deg(P_{)=k P}₍₀₎₌₁_} max₁_≤_j_≤_n |P(λ_j)|

(29)

Some Preconditionings

• diagonal or Jacobi

• Gauss-Seidel, SOR

• m-step SOR

• Schwarz

• polynomial

• Incomplete Gauss ILU(k) or ILUT

• Approximate inverse

• multigrid

• multilevel

• etc

(30)

Comparison between Bi-CG, QMR, GMRES

matrices from Harwell-Boeing (HB) collection and from matrix market tests done with Matlab

(31)

Matrix Sherman4 (HB) - n=1104-nz=3786

0 50 100 150 200 250 300 350 400 450 500

10⁻⁸ 10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 10²

sherman4 − no preconditioning

matrix−vector products

residual

qmr full−gmres

bicgstab

gmres(20)

gmres(50)

gmres(80)

(32)

Matrix Sherman4 (HB) - n=1104 - nz=3786

0 5 10 15 20 25 30 35 40

10⁻⁹ 10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹

sherman4 − preconditioning ILU(0.01)

residual

qmr

bicgstab full−gmres

(33)

Matrix Sherman5 (HB) - n=3312 - nz=20793

0 1000 2000 3000 4000 5000 6000

10⁻¹⁰ 10⁻⁸ 10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 10² 10⁴

sherman5 − no preconditioning

matrix−vector products

residual

bicgstab

qmr full−gmres

gmres(50)

gmres(200)

(34)

Matrix Sherman5 (HB)- n=3312 - nz=20793

1 2 3 4 5 6 7 8 9

10⁻¹⁰ 10⁻⁹ 10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰

sherman5 − preconditioning ILU(1.e−5)

residual

qmr

bicgstab

full−gmres

(35)

3D Biharmonic equations

Joint work with Irfan Altas (Charles Sturt University, Wagga Wagga, Australia) and Murli Gupta (George Washington University, Washing- ton DC, USA)

Ω is a closed convex domain in three- dimensions and ∂Ω is its boundary.

(x,y,z) ∈ Ω

Biharmonic equation

∂⁴u

∂x⁴ + ∂⁴u

∂y⁴ + ∂⁴u

∂z⁴ + 2 ∂⁴u

∂x²∂y² + 2 ∂⁴u

∂x²∂z² + 2 ∂⁴u

∂y²∂z² = f(x,y,z), with Dirichlet boundary conditions

u = g₁(x,y,z), ∂u

∂n = g₂(x,y,z) (x,y,z) ∈ ∂Ω.

(36)

Choice of Finite Difference scheme

3D uniform grid

4 unknowns at each grid point (x_i,y_j,z_k) :

u(x_i,y_j,z_k),^∂u_∂x(x_i,y_j,z_k),^∂u_∂y(x_i,y_j,z_k),^∂u_∂z(x_i,y_j,z_k)

Compact Finite Difference approximation

• Exact boundary conditions

• no modification near the boundary

• values of gradients already available

(37)

Compact 27 points 3D Finite Difference cell

−1

0

1

−1 0

1

−1 0 1

(38)

Linear solving methods

Large sparse non symmetric matrices

Grid N × N × N - Matrix order n = (N + 1)³ ∗ 4

examples : N = 32 ⇒ n = 143,748 and N = 64 ⇒ n = 1,098,500

Matrix-free iterative solvers Matrix-free preconditioners

No fast solver SSOR too slow

Either MULTIGRID or KRYLOV methods

Comparison between BICGSTAB, QMR, MULTIGRID With None, SSOR, m-step SSOR, Multigrid preconditioners

(39)

Comparison between BICGSTAB and QMR

0 20 40 60 80 100 120 140 160 180 200

10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹

10² Problem 1 − QMR and BICGSTAB

number of iterations

residual BICGSTAB − N=8

BICGSTAB − N=16 BICGSTAB − N=32 QMR − N=8 QMR − N=16 QMR − N=32

(40)

BICGSTAB - Comparison of preconditioners

0 100 200 300 400 500 600 700 800 900 1000

10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹ 10²

Problem 1 − Grid 32 − BICGSTAB

number of matvec

residual

NO PREC SSOR 5−STEP SSOR MGRID−V CYCLE(2)

(41)

BICGSTAB - Comparison of preconditioners

0 500 1000 1500 2000 2500 3000

10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹ 10²

Problem 1 − Grid 64 − BICGSTAB

number of matvec

residual

NO PREC SSOR 5−STEP SSOR MGRID−V CYCLE(4)

(42)

Comparison between BICGSTAB and MULTIGRID

0 100 200 300 400 500 600 700 800 900

10⁻⁸ 10⁻⁷ 10⁻⁶ 10⁻⁵ 10⁻⁴ 10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹

Problem 1 − Multigrid and BICGSTAB

error

Multigrid − N=64 Multigrid − N=32 BICGSTAB − N=64 BICGSTAB − N=32

(43)

Parallel Krylov methods

• Parallel PCG

• Parallel GMRES

• Parallel Schwarz preconditionings

(44)

PCG - dependencies

Algorithm

For k = 0,1. . . q_k = Ap_k

α_k = ^(r^k^,z^k⁾

(q_k,p_k)

x_k+1 = x_k + α_kp_k r_k+1 = r_k − α_kAp_k z_k+1 = M⁻¹r_k+1 β_k+1 = ^(r^k+1^,z^k+1⁾

(r_k,z_k)

p_k+1 = z_k+1 + β_k+1p_k End For

Operations

sparse matrix-vector product

scalar product vector operation vector operation linear system

scalar product vector operation

Sequence of vector and matrix operations

(45)

Parallel PCG

• Parallel sparse matrix-vector product

• Parallel preconditioning

• Parallel operations between vectors

• Synchronisations after each dot product

(46)

GMRES - dependencies

Sequence in Arnoldi with dot products Sparse matrix-vector product

Sequential least-squares solving

(47)

Parallel Arnoldi

First compute a basis then orthogonalise : less dependencies

(48)

Schwarz preconditionings

Domain partition into subdomains with overlapping

SD 2 SD 1

INTERFACE

p subdomains Ω_i

R_i restriction of Ω in Ω_i R^T_i extension of Ω_i in Ω A in Ω_i : A_i = R^T_i AR_i

M_i preconditioning of A_i

(49)

multiplicative Schwarz Preconditioning

Solving M z = r Algorithm

r₁ = R₁r M₁z₁ = r₁ z⁽¹⁾ = R^T₁z₁ t = Az⁽¹⁾ t₂ = R₂t r₂ = R₂r

M₂z₂ = r₂ − t₂ z⁽²⁾ = R^T₂z₂ z = z⁽¹⁾ + z⁽²⁾

Data dependencies

t₂ 6= 0 if both subdomains overlap

subdomain 2 after subdomain 1

global addition in z

M = R^T₁M₁R₁ + R₂^TM₂R₂(I − AR^T₁M₁R₁)

(50)

Additive Schwarz preconditioning

Solving M z = r Algorithm

r₁ = R₁r M₁z₁ = r₁ z⁽¹⁾ = R^T₁z₁ r₂ = R₂r M₂z₂ = r₂ z⁽²⁾ = R^T₂z₂ z = z⁽¹⁾ + z⁽²⁾

Data dependencies

subdomain 2 in parallel with subdomain 1

global addition in z

M = R^T₁M₁R₁ + R₂^TM₂R₂

(51)

Coarse grid correction

Ω₀ set of interfaces between subdomains R₀ restriction and R^T₀ extension

A₀ = R^T₀AR₀ et M₀ preconditioning correction R₀^TM₀R₀

M = R^T₁M₁R₁ + R₂^TM₂R₂ + R^T₀M₀R₀ sequential correction

Number of iterations independent of number of subdomains

(52)

Some bibliography and software

• Bibliography

• Software

(53)

Bibliography

• Y. Saad, Iterative methods for sparse linear systems, PWS Publi- shing Company, 1996

http://www-users.cs.umn.edu/ saad/books.html

• G. Meurant, Computer solution of large linear systems, North Hol- land, 1999

http://perso.wanadoo.fr/gerard.meurant/#GACC

• Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition, R. Barrett et al., SIAM, 1994

http://www.netlib.org/templates/index.html

• J. Erhel, chapter of book in preparation (in french)

• etc

(54)

Software

• list : http://www.netlib.org/utk/papers/iterative-survey/

• PETSc : http://www-fp.mcs.anl.gov/petsc/

Partial, Ordinary and Algebraic Differential Equations, Nonlinear and linear systems,

on distributed memory computers

• Parpre : http://www.cs.utk.edu/ eijkhout/parpre.html Parallel preconditionings for iterative methods

uses Petsc, part of Scalapack project, V. Eijkhout and T. Chan

• Psparslib : http://www.cs.umn.edu/Research/arpa/p sparslib/psp- abs.html

Parallel iterative methods, Y. Saad and al.

• Matlab : http://www.mathworks.com

• Scilab and extension Scilin

iterative methods, Aladin group

• etc

(55)

Conclusion

• Multigrid methods efficient in regular cases

• Krylov methods efficient in general cases

• Preconditioning : necessary and difficult

• Multigrid or multilevel preconditioning : efficient for large problems