Iterative solvers for large sparse linear systems
Jocelyne Erhel - ALADIN group - INRIA-RENNES
3eme Cycle Romand de Math´ematiques Universit´e de Neuchˆatel
Aoˆut 2001
Krylov iterative methods
Solve Ax = b, with A sparse matrix of order n
• A symmetric positive definite
• A symmetric indefinite
• A non symmetric
A symmetric positive definite
One method of choice : Conjugate Gradient (CG)
• algorithm
• properties
• convergence
• preconditioning
Conjugate Gradient
Algorithm
Initialisation choose x0
p0 = r0 = b − Ax0 For k = 0,1. . .
αk = krkk2
(Apk,pk)
xk+1 = xk + αkpk rk+1 = rk − αkApk βk+1 = krk+1k
2
krkk2
pk+1 = rk+1 + βk+1pk End For
Properties
(rk+1,pk) = 0 (rk+1,rk) = 0 (pk+1,Apk) = 0
krk+1kA−1 = minαkrk − αApk−1kA−1
Conjugate Gradient - properties
Orthogonality and minimisation (rk,pi) = (rk,ri) = 0, i ≤ k − 1 (pk,Api) = 0, i ≤ k − 1
krk+1kA−1 ≤ krkkA−1
krkkA−1 = minx∈x
0+Span(p0,...,pk−1) kb − AxkA−1
Krylov method
Kk(A,r0) = Span(r0,Ar0, . . . ,Ak−1r0) Krylov space
Kk(A,r0) = Span(r0,r1, . . . ,rk−1) = Span(p0,p1, . . . ,pk−1) Projection method
xk ∈ x0 + Kk(A,r0) Space condition rk ⊥ Kk(A,r0) Galerkin condition
Conjugate Gradient - convergence
Polynomial method xk = x0 + Qk−1(A)r0
rk = (I − AQk−1(A))r0 = Pk(A)r0 with Pk(0) = 1 Minmax property - asymptotic convergence
A = V ∆V −1 with ∆ = diag(λ1, . . . ,λn) 0 < λ1 ≤ . . . ≤ λn and κ(A) = λn/λ1 krkkA−1 ≤ kr0kA−1 max{λ
j} |Pk(λj)|
≤ kr0kA−1 min{P/deg(P)=k P(0)=1} maxλ
1≤t≤λn |P(t)| krkkA−1 ≤ 2kr0kA−1
√
κ(A)−1
√κ(A)+1
k
Conjugate Gradient - asymptotic convergence
0 50 100 150 200 250 300
10−10 10−8 10−6 10−4 10−2 100 102 104
A=diag(1:10k−3:10k) − n=1000 − cond(A)=10k
iterations
residu relatif
k=1 k=2 k=3 k=5 k=10
Conjugate Gradient - superlinear convergence
0 50 100 150 200 250 300 350 400 450
10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
A=diag([0.01:0.01:0.0n] [1:1000−n])
residu relatif
n=0
n=3
n=5
n=8
Preconditioned Conjugate Gradient (PCG)
symmetric positif definite preconditioning matrix M Algorithm
Initialisation choose x0 r0 = b − Ax0 z0 = M−1r0 p0 = z0
For k = 0,1. . . αk = (rk,zk)
(Apk,pk)
xk+1 = xk + αkpk rk+1 = rk − αkApk zk+1 = M−1rk+1 βk+1 = (rk+1,zk+1)
(rk,zk)
pk+1 = zk+1 + βk+1pk End For
Symmetric positive definite preconditionings
• diagonal or Jacobi
• SSOR
• m-step SSOR
• Schwarz
• polynomial
• Incomplete Cholesky IC(k) or ICT
• Approximate inverse
• multigrid
• multilevel
• etc
Jacobi and SSOR preconditionings
decomposition A = D + L + U,
D diagonal, L lower triangular, U upper triangular Jacobi
M = D : parallel but slow convergence
SSOR
M = (D + L)D−1(D + U) : faster convergence but sequential If A is symmetric positive definie, M too
Block versions
D,L,U Block-diagonal and block-triangular matrices faster convergence
Incomplete Cholesky factorisations
A = LDLT + R
various strategies to choose R IC(0) : no fill-in
IC(k) : fill-in up to level k
ICT(α) : fill-in with threshold value α
PCG - comparisons
Laplacian on a Finite differences grid
0 5 10 15 20 25 30 35 40 45 50
0
5
10
15
20
25
30
35
40
45
50
nz = 1857 Grille (C,50)
0 20 40 60 80 100 120 140
10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
Laplacien sur la grille (C,50) − n=1857 − nnz=9093 − PCG
iterations
residu relatif
diagonal
SSOR
IC(0) IC(0.01)
IC(0.0001)
Symmetric Lanczos
Symmetric Lanczos method
Span(Vk) = Kk(A,v1) Basis of Krylov subspace VkTVk = I orthonormal system
AVk = VkTk + δk+1vk+1eTk Tk =
γ1 δ2
δ2 γ2 . . .
. . . δk δk γk
Relation with CG
r0 = kr0k2v1 = βv1 Krylov subspace xk = x0 + Vky y ∈ Rk Space condition rk = r0 − AVky = Vk(βe1 − Tky) − δk+1(eTk y)vk+1
VkTrk = 0 ⇔ Tky = βe1 Galerkin condition CG : Cholesky Factorisation of Tk
and minimisation of krkkA−1
A symmetric indefinite
Symmetric Lanczos
AVk = VkTk + δk+1vk+1eTk = Vk+1Tk Tk can be singular
SYMMLQ : rk ⊥ Kk(A,r0)
xk = x0 + Vky, Tky = βe1 with LQ factorisation CR and MINRES : rk+1 ⊥ AKk(A,r0) or min krkk2
CR : rk+1T Ark = 0 and (Apk+1)TApk = 0 MINRES : minkβe1 − Tkyk2
Adapted preconditioning : open problem
A non symmetric
More difficult ...
Not possible to get
a short recurrence together with a minimisation of the residual
• Minimisation : GMRES
• Short recurrence : Bi-Conjugate Gradient, QMR, etc
• Preconditioning
Arnoldi and GMRES
Arnoldi process
Span(Vk) = Kk(A,v1) Basis of Krylov space VkTVk = I Orthonormal system AVk = VkHk + hk+1,kvk+1eTk HkHessenberg matrix GMRES algorithm
r0 = kr0k2v1 = βv1 Krylov space xk = x0 + Vky y ∈ Rk Space condition
rk ⊥ AVk Galerkin condition
rk = r0 − AVky = Vk+1(βe1 − Hky) Hk = Hk hk+1,keTk
!
(AVk)Trk = HTk (βe1 − Hky)
Solve miny∈Rk kβe1 − Hkyk2 Galerkin condition
GMRES - Convergence
Polynomial method xk = x0 + Qk−1(A)r0
rk = (I − AQk−1(A))r0 = Pk(A)r0 avec Pk(0) = 1 Minmax property
IF A = U∆U−1 with ∆ = diag(λ1, . . . ,λn)
krkk2 ≤ kr0k2κ(U) min{P/deg(P)=k P(0)=1} max1≤j≤n |P(λj)|
GMRES - restarting
Memory space and complexity k it´erations
Arnoldi : O(k × nz) + O(k2 × n) operations Least-squares : negligible
Storage of k + 3 vectors of length n Restarted GMRES(m)
Initialisation choix de x0
Until convergence
k it´erations de GMRES xm = x0 + Vmy
x0 = xm End Until
Risk of stagnation
GMRES - algorithm
Initialisation choose x0
r0 = b − Ax0
Until convergence Do Arnoldi process
v1 = kr0
r0k2
For j = 1,m w = Avj For i = 1,j
hij = viTw w = w − hijvi End For
hj+1,j = kwk2 vj+1 = w/hj+1,j Givens rotations
Hj = QjRj compute krjk2 convergence test End For
Least-squares problem
compute ym solution of miny (kr0k2e1 − Hmy) xm = x0 + Vmym
rm = b − Axm convergence test Restarting
x0 = xm r0 = rm
End Do
GMRES - convergence with varying eigenvalues
0 50 100 150 200 250 300
10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100
A=V D inv(V) − n=1000 − cond(A)=10k − cond(V)=19
residu relatif
k=1
k=3
k=5
k=9
GMRES - convergence with varying eigenvectors
0 50 100 150 200 250 300
10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100
103
104 106
108
A = V D inv(V) − D=diag(1:1000) − cond(V) variable
Bi-Conjugate Gradient (Bi-CG)
Algorithm
Initialisation
choose x0 and ˜x0
r0 = b − Ax0 et ˜r0 = b − ATx˜0 p0 = r0 et ˜p0 = ˜r0
For k = 0,1. . . αk = (rk,˜rk)
(Apk,˜pk)
xk+1 = xk + αkpk
˜xk+1 = ˜xk + αkp˜k rk+1 = rk − αkApk
˜rk+1 = ˜rk − αkATp˜k βk+1 = (rk+1,˜rk+1)
(rk,˜rk)
pk+1 = rk+1 + βk+1pk p˜k+1 = ˜rk+1 + βk+1p˜k End For
Properties
(rk+1,˜rk) = 0
(Apk+1,˜pk) = (pk+1,ATp˜k) = 0
Bi-CG - properties
Orthogonality
(rk,˜ri) = 0, i ≤ k − 1 (˜pk,Api) = 0, i ≤ k − 1 No minimisation
Krylov method
Kk(A,r0) = Span(r0,Ar0, . . . ,Ak−1r0) Krylov space with A
Kk(AT,˜r0) = Span(˜r0,ATr0, . . . ,(AT)k−1r˜0) Krylov space with AT Projection method
xk ∈ x0 + Kk(A,r0) Space condition rk ⊥ Kk(AT,˜r0) Galerkin condition
Bi-CG - equivalent formulation
Augmented matrix A 0 0 AT
!
Krylov projection method xk
x˜k
!
∈ x0 x˜0
!
+ Kk(A,r0) Kk(AT,˜r0)
!
Space condition
˜rk rk
!
⊥ Kk(A,r0) Kk(AT,˜r0)
!
Galerkin condition
Non symmetric Lanczos
Method
Span(Vk) = Kk(A,v1) Basis of Krylov space
Span(Wk) = Kk(AT,w1) Basis of dual Krylov space WkTVk = I Bi-orthogonal system
AVk = VkTk + δk+1vk+1eTk Tk =
γ1 η2
δ2 γ2 . . .
. . . ηk δk γk
ATWk = WkTkT + ηk+1wk+1eTk Relation with Bi-CG
r0 = kr0k2v1 = βv1 Krylov space xk = x0 + Vky y ∈ Rk Space condition rk = r0 − AVky = Vk(βe1 − Tky) − δk+1(eTk y)vk+1
WkTrk = 0 ⇔ Tky = βe1 Galerkin condition Bi-CG : Gauss Factorisation of Tk
Bi-CG - Convergence - Variants
risk of Breakdown in Lanczos : (rk,˜rk) = 0 Lanczos version with Look-Ahead
irregular convergence
Product by AT : Transpose-Free version BICGSTAB Smoother convergence of BICGSTAB
risk of Breakdown in LU : (˜pk,Apk) = 0 QMR Algorithm
Quasi-minimum Residual QMR
Algorithm
Non symmetric Lanczos with look-ahead
rk = Vk+1(βe1 − Tky) Tk = Tk δk+1eTk
!
Ωk+1 = diag(kv1k2, . . . ,kvk+1k2) Solve miny∈Rk kβe1 − Ωk+1Tkyk2
Convergence No breakdown
IF Tk is diagonalisable krkk2 ≤ kr0k2
√k + 1κ(Tk) min{P/deg(P)=k P(0)=1} max1≤j≤n |P(λj)|
Some Preconditionings
• diagonal or Jacobi
• Gauss-Seidel, SOR
• m-step SOR
• Schwarz
• polynomial
• Incomplete Gauss ILU(k) or ILUT
• Approximate inverse
• multigrid
• multilevel
• etc
Comparison between Bi-CG, QMR, GMRES
matrices from Harwell-Boeing (HB) collection and from matrix market tests done with Matlab
Matrix Sherman4 (HB) - n=1104-nz=3786
0 50 100 150 200 250 300 350 400 450 500
10−8 10−6 10−4 10−2 100 102
sherman4 − no preconditioning
matrix−vector products
residual
qmr full−gmres
bicgstab
gmres(20)
gmres(50)
gmres(80)
Matrix Sherman4 (HB) - n=1104 - nz=3786
0 5 10 15 20 25 30 35 40
10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
sherman4 − preconditioning ILU(0.01)
residual
qmr
bicgstab full−gmres
Matrix Sherman5 (HB) - n=3312 - nz=20793
0 1000 2000 3000 4000 5000 6000
10−10 10−8 10−6 10−4 10−2 100 102 104
sherman5 − no preconditioning
matrix−vector products
residual
bicgstab
qmr full−gmres
gmres(50)
gmres(200)
Matrix Sherman5 (HB)- n=3312 - nz=20793
1 2 3 4 5 6 7 8 9
10−10 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100
sherman5 − preconditioning ILU(1.e−5)
residual
qmr
bicgstab
full−gmres
3D Biharmonic equations
Joint work with Irfan Altas (Charles Sturt University, Wagga Wagga, Australia) and Murli Gupta (George Washington University, Washing- ton DC, USA)
Ω is a closed convex domain in three- dimensions and ∂Ω is its boundary.
(x,y,z) ∈ Ω
Biharmonic equation
∂4u
∂x4 + ∂4u
∂y4 + ∂4u
∂z4 + 2 ∂4u
∂x2∂y2 + 2 ∂4u
∂x2∂z2 + 2 ∂4u
∂y2∂z2 = f(x,y,z), with Dirichlet boundary conditions
u = g1(x,y,z), ∂u
∂n = g2(x,y,z) (x,y,z) ∈ ∂Ω.
Choice of Finite Difference scheme
3D uniform grid
4 unknowns at each grid point (xi,yj,zk) :
u(xi,yj,zk),∂u∂x(xi,yj,zk),∂u∂y(xi,yj,zk),∂u∂z(xi,yj,zk)
Compact Finite Difference approximation
• Exact boundary conditions
• no modification near the boundary
• values of gradients already available
Compact 27 points 3D Finite Difference cell
−1
0
1
−1 0
1
−1 0 1
Linear solving methods
Large sparse non symmetric matrices
Grid N × N × N - Matrix order n = (N + 1)3 ∗ 4
examples : N = 32 ⇒ n = 143,748 and N = 64 ⇒ n = 1,098,500
Matrix-free iterative solvers Matrix-free preconditioners
No fast solver SSOR too slow
Either MULTIGRID or KRYLOV methods
Comparison between BICGSTAB, QMR, MULTIGRID With None, SSOR, m-step SSOR, Multigrid preconditioners
Comparison between BICGSTAB and QMR
0 20 40 60 80 100 120 140 160 180 200
10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
102 Problem 1 − QMR and BICGSTAB
number of iterations
residual BICGSTAB − N=8
BICGSTAB − N=16 BICGSTAB − N=32 QMR − N=8 QMR − N=16 QMR − N=32
BICGSTAB - Comparison of preconditioners
0 100 200 300 400 500 600 700 800 900 1000
10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101 102
Problem 1 − Grid 32 − BICGSTAB
number of matvec
residual
NO PREC SSOR 5−STEP SSOR MGRID−V CYCLE(2)
BICGSTAB - Comparison of preconditioners
0 500 1000 1500 2000 2500 3000
10−4 10−3 10−2 10−1 100 101 102
Problem 1 − Grid 64 − BICGSTAB
number of matvec
residual
NO PREC SSOR 5−STEP SSOR MGRID−V CYCLE(4)
Comparison between BICGSTAB and MULTIGRID
0 100 200 300 400 500 600 700 800 900
10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 101
Problem 1 − Multigrid and BICGSTAB
error
Multigrid − N=64 Multigrid − N=32 BICGSTAB − N=64 BICGSTAB − N=32
Parallel Krylov methods
• Parallel PCG
• Parallel GMRES
• Parallel Schwarz preconditionings
PCG - dependencies
Algorithm
For k = 0,1. . . qk = Apk
αk = (rk,zk)
(qk,pk)
xk+1 = xk + αkpk rk+1 = rk − αkApk zk+1 = M−1rk+1 βk+1 = (rk+1,zk+1)
(rk,zk)
pk+1 = zk+1 + βk+1pk End For
Operations
sparse matrix-vector pro- duct
scalar product vector operation vector operation linear system
scalar product vector operation
Sequence of vector and matrix operations
Parallel PCG
• Parallel sparse matrix-vector product
• Parallel preconditioning
• Parallel operations between vectors
• Synchronisations after each dot product
GMRES - dependencies
Sequence in Arnoldi with dot products Sparse matrix-vector product
Sequential least-squares solving
Parallel Arnoldi
First compute a basis then orthogonalise : less dependencies
Schwarz preconditionings
Domain partition into subdomains with overlapping
SD 2 SD 1
INTERFACE
p subdomains Ωi
Ri restriction of Ω in Ωi RTi extension of Ωi in Ω A in Ωi : Ai = RTi ARi
Mi preconditioning of Ai
multiplicative Schwarz Preconditioning
Solving M z = r Algorithm
r1 = R1r M1z1 = r1 z(1) = RT1z1 t = Az(1) t2 = R2t r2 = R2r
M2z2 = r2 − t2 z(2) = RT2z2 z = z(1) + z(2)
Data dependencies
t2 6= 0 if both subdo- mains overlap
subdomain 2 after subdo- main 1
global addition in z
M = RT1M1R1 + R2TM2R2(I − ART1M1R1)
Additive Schwarz preconditioning
Solving M z = r Algorithm
r1 = R1r M1z1 = r1 z(1) = RT1z1 r2 = R2r M2z2 = r2 z(2) = RT2z2 z = z(1) + z(2)
Data dependencies
subdomain 2 in parallel with subdomain 1
global addition in z
M = RT1M1R1 + R2TM2R2
Coarse grid correction
Ω0 set of interfaces between subdomains R0 restriction and RT0 extension
A0 = RT0AR0 et M0 preconditioning correction R0TM0R0
M = RT1M1R1 + R2TM2R2 + RT0M0R0 sequential correction
Number of iterations independent of number of subdomains
Some bibliography and software
• Bibliography
• Software
Bibliography
• Y. Saad, Iterative methods for sparse linear systems, PWS Publi- shing Company, 1996
http://www-users.cs.umn.edu/ saad/books.html
• G. Meurant, Computer solution of large linear systems, North Hol- land, 1999
http://perso.wanadoo.fr/gerard.meurant/#GACC
• Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition, R. Barrett et al., SIAM, 1994
http://www.netlib.org/templates/index.html
• J. Erhel, chapter of book in preparation (in french)
• etc
Software
• list : http://www.netlib.org/utk/papers/iterative-survey/
• PETSc : http://www-fp.mcs.anl.gov/petsc/
Partial, Ordinary and Algebraic Differential Equations, Nonlinear and linear systems,
on distributed memory computers
• Parpre : http://www.cs.utk.edu/ eijkhout/parpre.html Parallel preconditionings for iterative methods
uses Petsc, part of Scalapack project, V. Eijkhout and T. Chan
• Psparslib : http://www.cs.umn.edu/Research/arpa/p sparslib/psp- abs.html
Parallel iterative methods, Y. Saad and al.
• Matlab : http://www.mathworks.com
• Scilab and extension Scilin
iterative methods, Aladin group
• etc
Conclusion
• Multigrid methods efficient in regular cases
• Krylov methods efficient in general cases
• Preconditioning : necessary and difficult
• Multigrid or multilevel preconditioning : efficient for large problems