• Aucun résultat trouvé

Solving large sparse linear systems with a variable s-step GMRES preconditioned by DD

N/A
N/A
Protected

Academic year: 2021

Partager "Solving large sparse linear systems with a variable s-step GMRES preconditioned by DD"

Copied!
52
0
0

Texte intégral

(1)

HAL Id: hal-01528636

https://hal.inria.fr/hal-01528636

Submitted on 23 Nov 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Solving large sparse linear systems with a variable s-step GMRES preconditioned by DD

David Imberti, Jocelyne Erhel

To cite this version:

David Imberti, Jocelyne Erhel. Solving large sparse linear systems with a variable s-step GMRES

preconditioned by DD. DD24 - International Conference on Domain Decomposition Methods, Feb

2017, Longyearbyen, Norway. �hal-01528636�

(2)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Solving large sparse linear systems with a variable s-step GMRES preconditioned by DD

David Imberti and Jocelyne Erhel

Joint work with D´ esir´ e Nuentsa Wakam (first part)

FLUMINANCE team, Inria Rennes, France

DD24

Longyearbyen - Svalbard, Norway, February 2017

1 / 27

(3)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical simulations

Solving large sparse linear systems is at the heart of many numerical simulations

Simulation of the velocity field on the Greenland inlandsis, with Elmer/Ice model Image : Fabien Gillet-Chaulet, CNRS and LGGE, Grenoble.

More information on the websiteIntersticesfor the general public [”modeliser-et-simuler-la-fonte-des-calottes-polaires”, Nodet and JE, 2015]

2 / 27

(4)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Outline

1

Krylov subspace algorithm GMRES

2

m-step preconditioned and deflated GMRES

3

Variable s-step algorithm VGMRES(m,s)

3 / 27

(5)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Preconditioned GMRES

Ax=b, A∈Rn×n x,b∈Rn B=AM−1 x0∈Rn r0=b−Ax0

GMRES(m): a Krylov subspace method

[Saad and Schultz 1986, Meurant’s book 1999, Saad’s book 2003, Simoncini and Szyld 2007, JE 2011, ...]

Km(B,r0) =span{r0,Br0, . . .Bm−1r0}

Findxm∈x0+M−1Km(B,r0) such thatkrmk2=kb−Bxmk2= minx∈x

0 +M−1Km(B,r0 )kb−Bxk2

Building blocks of GMRES(m)

Build an orthonormal basisVk+1of the Krylov subspaceKk+1 get the Arnoldi-like relationBVk=Vk+1Hkfork= 1, . . . ,m Minimize the residual in the Krylov subspace

x=x0+M−1Vkyimpliesr=r0−BVky=Vk+1(βe1−Hky) Solve the least-squares problem: miny∈Rkkβe1−Hkyk Restart if not converged

x0=x0+M−1Vmym

4 / 27

(6)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Preconditioned GMRES

Ax=b, A∈Rn×n x,b∈Rn B=AM−1 x0∈Rn r0=b−Ax0

GMRES(m): a Krylov subspace method

[Saad and Schultz 1986, Meurant’s book 1999, Saad’s book 2003, Simoncini and Szyld 2007, JE 2011, ...]

Km(B,r0) =span{r0,Br0, . . .Bm−1r0}

Findxm∈x0+M−1Km(B,r0) such thatkrmk2=kb−Bxmk2= minx∈x

0 +M−1Km(B,r0 )kb−Bxk2

Building blocks of GMRES(m)

Build an orthonormal basisVk+1of the Krylov subspaceKk+1 get the Arnoldi-like relationBVk=Vk+1Hkfork= 1, . . . ,m Minimize the residual in the Krylov subspace

x=x0+M−1Vkyimpliesr=r0−BVky=Vk+1(βe1−Hky) Solve the least-squares problem: miny∈Rkkβe1−Hkyk Restart if not converged

x0=x0+M−1Vmym

4 / 27

(7)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

GMRES ... practical issues

Arnoldi process

1:

v1=r0/kr0k2

2:

fork= 1,mdo

3:

p=Bvk

4:

fori= 1 :kdo

5:

hik=viTp

6:

p=p−hikvi

7:

end for

8:

hk+1,k=||p||2

9:

vk+1=p/hk+1,k

10:

end for

BVm=Vm+1Hm

Granularity issues in parallel algorithms

⇒Communication-avoiding strategies

Generate the basis vectors[Reichel 1990, Bai et al 1994]

Orthogonalize the basis[JE 1995, Sidje 1997, Demmel et al 2011]

Compute the basis block by block[De Sturler 1994, Hoemmen 2010]

Preconditioning issues

⇒multilevel methods to deal with large systems Schwarz preconditioning[Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009,..]

Filtering and Schur complement[Li et al 2003, Grigori et al 2011]

Multilevel parallelism[Nuentsa Wakam et al 2011, Giraud et al 2010, ...]

Complexity and stagnation issues with restarted GMRES(m)

⇒deflation to recover possible loss of information

Deflation by preconditioning[JE et al 1996, Burrage et al 1998, Baglama et al 1998, ...]

Deflation by augmented basis[Morgan 1995, Morgan 2002,...]

Strategy

Combine ’communication-avoiding’ GMRES ... and Deflation ... and Domain Decomposition preconditioners [Nuentsa Wakam et al 2013, Nuentsa Wakam and Pacull 2013, Nuentsa Wakam and JE 2013]

5 / 27

(8)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

GMRES ... practical issues

Arnoldi process

1:

v1=r0/kr0k2

2:

fork= 1,mdo

3:

p=Bvk

4:

fori= 1 :kdo

5:

hik=viTp

6:

p=p−hikvi

7:

end for

8:

hk+1,k=||p||2

9:

vk+1=p/hk+1,k

10:

end for

BVm=Vm+1Hm

Granularity issues in parallel algorithms

⇒Communication-avoiding strategies

Generate the basis vectors[Reichel 1990, Bai et al 1994]

Orthogonalize the basis[JE 1995, Sidje 1997, Demmel et al 2011]

Compute the basis block by block[De Sturler 1994, Hoemmen 2010]

Preconditioning issues

⇒multilevel methods to deal with large systems Schwarz preconditioning[Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009,..]

Filtering and Schur complement[Li et al 2003, Grigori et al 2011]

Multilevel parallelism[Nuentsa Wakam et al 2011, Giraud et al 2010, ...]

Complexity and stagnation issues with restarted GMRES(m)

⇒deflation to recover possible loss of information

Deflation by preconditioning[JE et al 1996, Burrage et al 1998, Baglama et al 1998, ...]

Deflation by augmented basis[Morgan 1995, Morgan 2002,...]

Strategy

Combine ’communication-avoiding’ GMRES ... and Deflation ... and Domain Decomposition preconditioners [Nuentsa Wakam et al 2013, Nuentsa Wakam and Pacull 2013, Nuentsa Wakam and JE 2013]

5 / 27

(9)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

GMRES ... practical issues

Arnoldi process

1:

v1=r0/kr0k2

2:

fork= 1,mdo

3:

p=Bvk

4:

fori= 1 :kdo

5:

hik=viTp

6:

p=p−hikvi

7:

end for

8:

hk+1,k=||p||2

9:

vk+1=p/hk+1,k

10:

end for

BVm=Vm+1Hm

Granularity issues in parallel algorithms

⇒Communication-avoiding strategies

Generate the basis vectors[Reichel 1990, Bai et al 1994]

Orthogonalize the basis[JE 1995, Sidje 1997, Demmel et al 2011]

Compute the basis block by block[De Sturler 1994, Hoemmen 2010]

Preconditioning issues

⇒multilevel methods to deal with large systems Schwarz preconditioning[Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009,..]

Filtering and Schur complement[Li et al 2003, Grigori et al 2011]

Multilevel parallelism[Nuentsa Wakam et al 2011, Giraud et al 2010, ...]

Complexity and stagnation issues with restarted GMRES(m)

⇒deflation to recover possible loss of information

Deflation by preconditioning[JE et al 1996, Burrage et al 1998, Baglama et al 1998, ...]

Deflation by augmented basis[Morgan 1995, Morgan 2002,...]

Strategy

Combine ’communication-avoiding’ GMRES ... and Deflation ... and Domain Decomposition preconditioners [Nuentsa Wakam et al 2013, Nuentsa Wakam and Pacull 2013, Nuentsa Wakam and JE 2013]

5 / 27

(10)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

GMRES ... practical issues

Arnoldi process

1:

v1=r0/kr0k2

2:

fork= 1,mdo

3:

p=Bvk

4:

fori= 1 :kdo

5:

hik=viTp

6:

p=p−hikvi

7:

end for

8:

hk+1,k=||p||2

9:

vk+1=p/hk+1,k

10:

end for

BVm=Vm+1Hm

Granularity issues in parallel algorithms

⇒Communication-avoiding strategies

Generate the basis vectors[Reichel 1990, Bai et al 1994]

Orthogonalize the basis[JE 1995, Sidje 1997, Demmel et al 2011]

Compute the basis block by block[De Sturler 1994, Hoemmen 2010]

Preconditioning issues

⇒multilevel methods to deal with large systems Schwarz preconditioning[Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009,..]

Filtering and Schur complement[Li et al 2003, Grigori et al 2011]

Multilevel parallelism[Nuentsa Wakam et al 2011, Giraud et al 2010, ...]

Complexity and stagnation issues with restarted GMRES(m)

⇒deflation to recover possible loss of information

Deflation by preconditioning[JE et al 1996, Burrage et al 1998, Baglama et al 1998, ...]

Deflation by augmented basis[Morgan 1995, Morgan 2002,...]

Strategy

Combine ’communication-avoiding’ GMRES ... and Deflation ... and Domain Decomposition preconditioners [Nuentsa Wakam et al 2013, Nuentsa Wakam and Pacull 2013, Nuentsa Wakam and JE 2013]

5 / 27

(11)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

GMRES ... practical issues

Arnoldi process

1:

v1=r0/kr0k2

2:

fork= 1,mdo

3:

p=Bvk

4:

fori= 1 :kdo

5:

hik=viTp

6:

p=p−hikvi

7:

end for

8:

hk+1,k=||p||2

9:

vk+1=p/hk+1,k

10:

end for

BVm=Vm+1Hm

Granularity issues in parallel algorithms

⇒Communication-avoiding strategies

Generate the basis vectors[Reichel 1990, Bai et al 1994]

Orthogonalize the basis[JE 1995, Sidje 1997, Demmel et al 2011]

Compute the basis block by block[De Sturler 1994, Hoemmen 2010]

Preconditioning issues

⇒multilevel methods to deal with large systems Schwarz preconditioning[Atenekeng Kahou et al 2007, Dufaud+Tromeur-Dervout 2010, Giraud+Haidar 2009,..]

Filtering and Schur complement[Li et al 2003, Grigori et al 2011]

Multilevel parallelism[Nuentsa Wakam et al 2011, Giraud et al 2010, ...]

Complexity and stagnation issues with restarted GMRES(m)

⇒deflation to recover possible loss of information

Deflation by preconditioning[JE et al 1996, Burrage et al 1998, Baglama et al 1998, ...]

Deflation by augmented basis[Morgan 1995, Morgan 2002,...]

Strategy

Combine ’communication-avoiding’ GMRES ... and Deflation ... and Domain Decomposition preconditioners [Nuentsa Wakam et al 2013, Nuentsa Wakam and Pacull 2013, Nuentsa Wakam and JE 2013]

5 / 27

(12)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Outline

1

Krylov subspace algorithm GMRES

2

m-step preconditioned and deflated GMRES

3

Variable s-step algorithm VGMRES(m,s)

6 / 27

(13)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

m-step GMRES(m)

Building blocks of m-step GMRES(m)

Build a basisWm+1of the Krylov subspaceKm+1such thatBWm=Wm+1Tm+1 Build an orthonormal basisVm+1ofKm+1such thatWm+1=Vm+1Rm+1 get the Arnoldi-like relationsBWm=Vm+1HmandBVm=Vm+1HmRm−1 [Nuentsa Wakam and JE 2013, Hoemmen 2010]

Minimize the residual in the Krylov subspace

x=x0+M−1Vmyimpliesr=r0−BVmy=Vm+1(βe1−HmR−1m y) Solve the least-squares problem:

min

y∈Rmkβe1−HmRm−1yk Alternative to minimize the residual in the Krylov subspace [Imberti and JE 2016]

x=x0+M−1Wmyimpliesr=r0−BWmy=Vm+1(βe1−Hmy) Solve the least-squares problem:

min

y∈Rmkβe1−Hmyk

Restart if not converged

7 / 27

(14)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

m-step GMRES(m)

Building blocks of m-step GMRES(m)

Build a basisWm+1of the Krylov subspaceKm+1such thatBWm=Wm+1Tm+1 Build an orthonormal basisVm+1ofKm+1such thatWm+1=Vm+1Rm+1 get the Arnoldi-like relationsBWm=Vm+1HmandBVm=Vm+1HmRm−1 [Nuentsa Wakam and JE 2013, Hoemmen 2010]

Minimize the residual in the Krylov subspace

x=x0+M−1Vmyimpliesr=r0−BVmy=Vm+1(βe1−HmR−1m y) Solve the least-squares problem:

min

y∈Rmkβe1−HmRm−1yk Alternative to minimize the residual in the Krylov subspace [Imberti and JE 2016]

x=x0+M−1Wmyimpliesr=r0−BWmy=Vm+1(βe1−Hmy) Solve the least-squares problem:

min

y∈Rmkβe1−Hmyk

Restart if not converged

7 / 27

(15)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

m-step GMRES(m)

Building blocks of m-step GMRES(m)

Build a basisWm+1of the Krylov subspaceKm+1such thatBWm=Wm+1Tm+1 Build an orthonormal basisVm+1ofKm+1such thatWm+1=Vm+1Rm+1 get the Arnoldi-like relationsBWm=Vm+1HmandBVm=Vm+1HmRm−1 [Nuentsa Wakam and JE 2013, Hoemmen 2010]

Minimize the residual in the Krylov subspace

x=x0+M−1Vmyimpliesr=r0−BVmy=Vm+1(βe1−HmR−1m y) Solve the least-squares problem:

min

y∈Rmkβe1−HmRm−1yk Alternative to minimize the residual in the Krylov subspace [Imberti and JE 2016]

x=x0+M−1Wmyimpliesr=r0−BWmy=Vm+1(βe1−Hmy) Solve the least-squares problem:

min

y∈Rmkβe1−Hmyk

Restart if not converged

7 / 27

(16)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

m-step GMRES(m)

Building blocks of m-step GMRES(m)

Build a basisWm+1of the Krylov subspaceKm+1such thatBWm=Wm+1Tm+1 Build an orthonormal basisVm+1ofKm+1such thatWm+1=Vm+1Rm+1 get the Arnoldi-like relationsBWm=Vm+1HmandBVm=Vm+1HmRm−1 [Nuentsa Wakam and JE 2013, Hoemmen 2010]

Minimize the residual in the Krylov subspace

x=x0+M−1Vmyimpliesr=r0−BVmy=Vm+1(βe1−HmR−1m y) Solve the least-squares problem:

min

y∈Rmkβe1−HmRm−1yk Alternative to minimize the residual in the Krylov subspace [Imberti and JE 2016]

x=x0+M−1Wmyimpliesr=r0−BWmy=Vm+1(βe1−Hmy) Solve the least-squares problem:

min

y∈Rmkβe1−Hmyk

Restart if not converged

7 / 27

(17)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

m-step GMRES(m)

Building blocks of m-step GMRES(m)

Build a basisWm+1of the Krylov subspaceKm+1such thatBWm=Wm+1Tm+1 Build an orthonormal basisVm+1ofKm+1such thatWm+1=Vm+1Rm+1 get the Arnoldi-like relationsBWm=Vm+1HmandBVm=Vm+1HmRm−1 [Nuentsa Wakam and JE 2013, Hoemmen 2010]

Minimize the residual in the Krylov subspace

x=x0+M−1Vmyimpliesr=r0−BVmy=Vm+1(βe1−HmR−1m y) Solve the least-squares problem:

min

y∈Rmkβe1−HmRm−1yk Alternative to minimize the residual in the Krylov subspace [Imberti and JE 2016]

x=x0+M−1Wmyimpliesr=r0−BWmy=Vm+1(βe1−Hmy) Solve the least-squares problem:

min

y∈Rmkβe1−Hmyk

Restart if not converged

7 / 27

(18)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

m-step GMRES(m)

Building blocks of m-step GMRES(m)

Build a basisWm+1of the Krylov subspaceKm+1such thatBWm=Wm+1Tm+1 Build an orthonormal basisVm+1ofKm+1such thatWm+1=Vm+1Rm+1 get the Arnoldi-like relationsBWm=Vm+1HmandBVm=Vm+1HmRm−1 [Nuentsa Wakam and JE 2013, Hoemmen 2010]

Minimize the residual in the Krylov subspace

x=x0+M−1Vmyimpliesr=r0−BVmy=Vm+1(βe1−HmR−1m y) Solve the least-squares problem:

min

y∈Rmkβe1−HmRm−1yk Alternative to minimize the residual in the Krylov subspace [Imberti and JE 2016]

x=x0+M−1Wmyimpliesr=r0−BWmy=Vm+1(βe1−Hmy) Solve the least-squares problem:

min

y∈Rmkβe1−Hmyk

Restart if not converged

7 / 27

(19)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Choice of the basis W m

Computation of vectorswk Monomial basis:wk+1=Bwk

Newton basis:σk+1wk+1= (B−αk+1I)wk

where the shiftsαkare computed during a preliminary cycle of GMRES(m) and scaling is done withσk

[Reichel 1990, Bai et al 1994, JE 1995, Nuentsa Wakam and JE 2013]

Chebyshev basis[Joubert+Carey 1992, Philippe+Reichel 2012]

Stability and parallelism issues

the condition number ofWmincreases withm Parallel performances increases withm

8 / 27

(20)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

GMRES combined with Domain Decomposition

Main steps with Domain Decomposition preconditioning

Partition the weighted graph of the matrix in parallel with PARMETIS.

Redistribute the matrix and right-hand-side according to the PARMETIS partitioning.

Perform a parallel iterative row and column scaling on the matrix and the right-hand side vector [Amestoy et al 2008].

Define the overlap between the submatrices for the additive Schwarz preconditioner [Cai and Sarkis 1999, Efstathiou and Gander 2003]

MRAS−1 = D X

k=1

(Rk0)T(Aδk)−1Rδk

Setup the submatrices (ILU or LU factorization).

Solve iteratively the preconditioned system using GMRES.

9 / 27

(21)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Deflation strategies

Restarted GMRES(m)

The convergence rate depends on the spectral distribution inB Smallest eigenvalues slow down the convergence

Deflation occurs when the Krylov subspace is large enough With restarting : loss of spectral information, risk of stalling

Accelerating the restarted GMRES[Simoncini and Szyld, 2007]

Approximate the smallest eigenvalues and the associated invariant subspaceUr Explicit deflation technique

[JE et al 1996; Burrage et al 1998; Moriya et al 2000, Nuentsa Wakam et al 2013 ]:

BM¯−1˜x=b

withM¯−1= (In+Ur(|λn|T−1−Ir)UrTandT=UrTBUr Augmented techniques

[Morgan 2000, 2002, Giraud et al 2010, Nuentsa Wakam and JE 2013]

xm∈x0+Km(B,r0)+span{Ur}

10 / 27

(22)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Deflation strategies

Restarted GMRES(m)

The convergence rate depends on the spectral distribution inB Smallest eigenvalues slow down the convergence

Deflation occurs when the Krylov subspace is large enough With restarting : loss of spectral information, risk of stalling

Accelerating the restarted GMRES[Simoncini and Szyld, 2007]

Approximate the smallest eigenvalues and the associated invariant subspaceUr Explicit deflation technique

[JE et al 1996; Burrage et al 1998; Moriya et al 2000, Nuentsa Wakam et al 2013 ]:

BM¯−1˜x=b withM¯−1= (In+Ur(|λn|T−1−Ir)UrTandT=UrTBUr Augmented techniques

[Morgan 2000, 2002, Giraud et al 2010, Nuentsa Wakam and JE 2013]

xm∈x0+Km(B,r0)+span{Ur}

10 / 27

(23)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Experiments with CFD matrices

FLUOREM matrices

in MatrixMarket collection large, sparse, nonsymmetric matrices

linearization of Navier-Stokes: symmetric profile with structured blocks [Nuentsa Wakam+Pacull 2013]

RM07R n= 381,689; nnz=37,464,962

Software

Krylov solvers integrated in Petsc [Nuentsa Wakam 2011]

Schwarz preconditioning combined with GMRES, Newton basis and deflation DGMRES: preconditioning deflation AGMRES: augmented subspace

11 / 27

(24)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Experiments with CFD matrices

FLUOREM matrices

in MatrixMarket collection large, sparse, nonsymmetric matrices

linearization of Navier-Stokes: symmetric profile with structured blocks [Nuentsa Wakam+Pacull 2013]

RM07R n= 381,689; nnz=37,464,962

Software

Krylov solvers integrated in Petsc [Nuentsa Wakam 2011]

Schwarz preconditioning combined with GMRES, Newton basis and deflation DGMRES: preconditioning deflation AGMRES: augmented subspace

11 / 27

(25)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Experiments with CFD matrices

FLUOREM matrices

in MatrixMarket collection large, sparse, nonsymmetric matrices

linearization of Navier-Stokes: symmetric profile with structured blocks [Nuentsa Wakam+Pacull 2013]

RM07R n= 381,689; nnz=37,464,962

Software

Krylov solvers integrated in Petsc [Nuentsa Wakam 2011]

Schwarz preconditioning combined with GMRES, Newton basis and deflation DGMRES: preconditioning deflation AGMRES: augmented subspace

11 / 27

(26)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Convergence with Augmented GMRES (AGMRES)

RM07R: effect of the restarting

1e-10 1e-08 1e-06 0.0001 0.01 1

0 100 200 300 400 500 600 700 800 900

Rel. Res. Norm

Number of Iterations RM07R-AGMRES(m,2)-GMRES(m)

GMRES(32) AGMRES(32,2) GMRES(48) AGMRES(48,2)

RM07R: effect of the number of subdomains

1e-10 1e-08 1e-06 0.0001 0.01 1

0 200 400 600 800 1000

Rel. Res. Norm

Number of Iterations RM07R-GMRES(m)-AGMRES(m,2)-D32-64

GMRES(32)-D32 AGMRES(32,2)-D32 GMRES(32)-D64 AGMRES(32,2)-D64

CPU time on parallel computers

RM07R, n = 381,689, nz = 37,464,962

D GMRES(32) AGMRES(32,r)

ITS Time (s) ITS Time (s)

16 254 379.3 169 224.1

32 886 573.4 212 91.41

64 - - 287 62.39

12 / 27

(27)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Parallel CPU Time with AGMRES

Convection-Diffusion test cases

3DCONSKY 121 : size = 1,771,561; nonzeros = 50,178,241 3DCOSKY 161 : size= 4,173,281; nonzeros = 118,645,121 [Nuentsa Wakam and JE 2013]

3DCONSKY 121

16 32 64 128 256

0 5 10 15 20 25 30 35 40

Subdomains

CPU Time

3DCONSKY_121 GMRES(16) AGMRES(16,1)

3DCONSKY 161

16 32 64 96 128 256

0 20 40 60 80 100 120 140 160 180 200

Subdomains

CPU Time

3DCONSKY_161 GMRES(16) AGMRES(16,1)

13 / 27

(28)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Outline

1

Krylov subspace algorithm GMRES

2

m-step preconditioned and deflated GMRES

3

Variable s-step algorithm VGMRES(m,s)

14 / 27

(29)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Variable s-step VGMRES(m,s)

Variablesstep GMRES: VGMRES(m,s)

Variable block sizesjand Krylov sizel1=s1,lj=lj−1+sj,j≥2 [Imberti and JE 2016]

Build a basisWlj of the Krylov subspaceKljfor 1≤j≤J Build an orthonormal basisVlj+1of the Krylov subspaceKlj+1 get the Arnoldi-like relationBWlj=Vlj+1Hlj

Minimize the residual in the Krylov subspace

x=x0+M−1Wljyimpliesr=r0−BWljy=Vlj+1(βe1−Hljy) Solve the least-squares problem:

min

y∈Rljkβe1−Hljyk

Test convergence at each stepj

Restart if not converged at stepJwithlJ=m

x0=x0+M−1Wmym

15 / 27

(30)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Variable s-step VGMRES(m,s)

Variablesstep GMRES: VGMRES(m,s)

Variable block sizesjand Krylov sizel1=s1,lj=lj−1+sj,j≥2 [Imberti and JE 2016]

Build a basisWlj of the Krylov subspaceKljfor 1≤j≤J Build an orthonormal basisVlj+1of the Krylov subspaceKlj+1 get the Arnoldi-like relationBWlj=Vlj+1Hlj

Minimize the residual in the Krylov subspace

x=x0+M−1Wljyimpliesr=r0−BWljy=Vlj+1(βe1−Hljy) Solve the least-squares problem:

min

y∈Rljkβe1−Hljyk

Test convergence at each stepj

Restart if not converged at stepJwithlJ=m

x0=x0+M−1Wmym

15 / 27

(31)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Variable s-step VGMRES(m,s)

Variablesstep GMRES: VGMRES(m,s)

Variable block sizesjand Krylov sizel1=s1,lj=lj−1+sj,j≥2 [Imberti and JE 2016]

Build a basisWlj of the Krylov subspaceKljfor 1≤j≤J Build an orthonormal basisVlj+1of the Krylov subspaceKlj+1 get the Arnoldi-like relationBWlj=Vlj+1Hlj

Minimize the residual in the Krylov subspace

x=x0+M−1Wljyimpliesr=r0−BWljy=Vlj+1(βe1−Hljy) Solve the least-squares problem:

min

y∈Rljkβe1−Hljyk

Test convergence at each stepj

Restart if not converged at stepJwithlJ=m

x0=x0+M−1Wmym

15 / 27

(32)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Variable s-step VGMRES(m,s)

Variablesstep GMRES: VGMRES(m,s)

Variable block sizesjand Krylov sizel1=s1,lj=lj−1+sj,j≥2 [Imberti and JE 2016]

Build a basisWlj of the Krylov subspaceKljfor 1≤j≤J Build an orthonormal basisVlj+1of the Krylov subspaceKlj+1 get the Arnoldi-like relationBWlj=Vlj+1Hlj

Minimize the residual in the Krylov subspace

x=x0+M−1Wljyimpliesr=r0−BWljy=Vlj+1(βe1−Hljy) Solve the least-squares problem:

min

y∈Rljkβe1−Hljyk

Test convergence at each stepj

Restart if not converged at stepJwithlJ=m

x0=x0+M−1Wmym

15 / 27

(33)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Variable s-step VGMRES(m,s)

Variablesstep GMRES: VGMRES(m,s)

Variable block sizesjand Krylov sizel1=s1,lj=lj−1+sj,j≥2 [Imberti and JE 2016]

Build a basisWlj of the Krylov subspaceKljfor 1≤j≤J Build an orthonormal basisVlj+1of the Krylov subspaceKlj+1 get the Arnoldi-like relationBWlj=Vlj+1Hlj

Minimize the residual in the Krylov subspace

x=x0+M−1Wljyimpliesr=r0−BWljy=Vlj+1(βe1−Hljy) Solve the least-squares problem:

min

y∈Rljkβe1−Hljyk

Test convergence at each stepj

Restart if not converged at stepJwithlJ=m

x0=x0+M−1Wmym

15 / 27

(34)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Block computation and orthogonalization with VGMRES(m,s)

Stepjof VGMRES(m,s)

Initialization:r0=b−Ax0, β=kr0k,v1=r0/β,W0=∅,V1= [v1] Blockjof sizesj, with 1≤j≤J

with an adaptive choice ofsjsuch thatsj≤s First step:sjmatrix vector products

Compute the blockCjas a basis ofKsj(B,u) withu=vlj−1 +1 Compute the blockBCj

Parallel preconditioningt=M−1uthen parallel matrix-vector productAt Define the Krylov basis by

Wlj=h Wlj−1,Cji Second step: orthogonalization

BCj=Vlj+1Sj RODDEC[Sidje 1997, JE 1995]or TSQR[Demmel et al 2011]

By induction, get the Arnoldi-like relation

h v1,BWlji

=Vlj+1Rlj+1,BWlj=Vlj+1Hlj

16 / 27

(35)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Block computation and orthogonalization with VGMRES(m,s)

Stepjof VGMRES(m,s)

Initialization:r0=b−Ax0, β=kr0k,v1=r0/β,W0=∅,V1= [v1] Blockjof sizesj, with 1≤j≤J

with an adaptive choice ofsjsuch thatsj≤s First step:sjmatrix vector products

Compute the blockCjas a basis ofKsj(B,u) withu=vlj−1 +1 Compute the blockBCj

Parallel preconditioningt=M−1uthen parallel matrix-vector productAt Define the Krylov basis by

Wlj=h Wlj−1,Cji Second step: orthogonalization

BCj=Vlj+1Sj RODDEC[Sidje 1997, JE 1995]or TSQR[Demmel et al 2011]

By induction, get the Arnoldi-like relation

h v1,BWlji

=Vlj+1Rlj+1,BWlj=Vlj+1Hlj

16 / 27

(36)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Block computation and orthogonalization with VGMRES(m,s)

Stepjof VGMRES(m,s)

Initialization:r0=b−Ax0, β=kr0k,v1=r0/β,W0=∅,V1= [v1] Blockjof sizesj, with 1≤j≤J

with an adaptive choice ofsjsuch thatsj≤s First step:sjmatrix vector products

Compute the blockCjas a basis ofKsj(B,u) withu=vlj−1 +1 Compute the blockBCj

Parallel preconditioningt=M−1uthen parallel matrix-vector productAt Define the Krylov basis by

Wlj=h Wlj−1,Cji Second step: orthogonalization

BCj=Vlj+1Sj RODDEC[Sidje 1997, JE 1995]or TSQR[Demmel et al 2011]

By induction, get the Arnoldi-like relation

h v1,BWlji

=Vlj+1Rlj+1,BWlj=Vlj+1Hlj

16 / 27

(37)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Block computation and orthogonalization with VGMRES(m,s)

Stepjof VGMRES(m,s)

Initialization:r0=b−Ax0, β=kr0k,v1=r0/β,W0=∅,V1= [v1] Blockjof sizesj, with 1≤j≤J

with an adaptive choice ofsjsuch thatsj≤s First step:sjmatrix vector products

Compute the blockCjas a basis ofKsj(B,u) withu=vlj−1 +1 Compute the blockBCj

Parallel preconditioningt=M−1uthen parallel matrix-vector productAt Define the Krylov basis by

Wlj=h Wlj−1,Cji Second step: orthogonalization

BCj=Vlj+1Sj RODDEC[Sidje 1997, JE 1995]or TSQR[Demmel et al 2011]

By induction, get the Arnoldi-like relation

h v1,BWlji

=Vlj+1Rlj+1,BWlj=Vlj+1Hlj

16 / 27

(38)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Stability issues

Condition number ofBCjandBWlj block basis

Wlj=

C1,C2, . . . ,Cj κ(BWlj)≥ max

1≤k≤jκ(BCk) symmetric case: exponential growth of the condition number of a Krylov basis [Beckermann 2000]

nonsymmetric case with|λ1|>|λ2| monomial basisCj={u,Bu, . . . ,Bsj−1u}

[Imberti and JE 2016]

κ(BCj)≥cste|λ12|sj−1

Choice of the block sizesj

Objective: small condition numbers of the first blocks fixed sequencesj=s: SGMRES(m,s) [Hoemmen 2010, ...]

adaptive increasing sequencesjwithsj≤s: VGMRES(m,s) [Imberti and JE 2016]

17 / 27

(39)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Stability issues

Condition number ofBCjandBWlj block basis

Wlj=

C1,C2, . . . ,Cj κ(BWlj)≥ max

1≤k≤jκ(BCk) symmetric case: exponential growth of the condition number of a Krylov basis [Beckermann 2000]

nonsymmetric case with|λ1|>|λ2| monomial basisCj={u,Bu, . . . ,Bsj−1u}

[Imberti and JE 2016]

κ(BCj)≥cste|λ12|sj−1

Choice of the block sizesj

Objective: small condition numbers of the first blocks fixed sequencesj=s: SGMRES(m,s) [Hoemmen 2010, ...]

adaptive increasing sequencesjwithsj≤s: VGMRES(m,s) [Imberti and JE 2016]

17 / 27

(40)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Communication issues

The number of messages is related to the number of stepsJ Objective: reduce the number of steps

Number of steps

SGMRES(m,s):J=m/ssteps

VGMRES(m,s):Jsteps withlJ=mandJ=J1+J2 FibGMRES(m,s):Fibonacci sequence capped ats

J2=O(m/s),J1=O(logφ(s))

18 / 27

(41)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Communication issues

The number of messages is related to the number of stepsJ Objective: reduce the number of steps

Number of steps

SGMRES(m,s):J=m/ssteps

VGMRES(m,s):Jsteps withlJ=mandJ=J1+J2 FibGMRES(m,s):Fibonacci sequence capped ats

J2=O(m/s),J1=O(logφ(s))

18 / 27

(42)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

FibGMRES(m,s) and Reverse FibGMRES(m,s)

Sequences with m = 48 and s = 16

j 1 2 3 4 5 6 7

s

j

1 2 3 5 8 13 16 l

j

1 3 6 11 19 32 48

Variable increasing block size for m = 48 and s = 16.

j 1 2 3 4 5 6 7

s

j

16 13 8 5 3 2 1

l

j

16 29 37 42 45 47 48

Variable decreasing block size for m = 48 and s = 16.

Numerical experiments done with a monomial basis

19 / 27

(43)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a small nonsymmetric matrix

Nonsymmetric matrix CAGE10 of size n = 11397 and nonzeros nz = 150645 Convergence curves with m = 48 and s = 16

20 / 27

(44)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a small nonsymmetric matrix

Nonsymmetric matrix CAGE10 of size n = 11397 and nonzeros nz = 150645 Condition numbers of AW with m = 48 and s = 16

21 / 27

(45)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a large nonsymmetric matrix

Sequences of FibGMRES(m,s) with m = 96

j 1 2 3 4 5 6 7 8 9 10

s

j

1 2 3 5 8 13 16 16 16 16 l

j

1 3 6 11 19 32 48 64 80 96

Variable increasing block size for m = 96 and s = 16

j 1 2 3 4 5 6 7 8 9

s

j

1 2 3 5 8 13 14 18 32 l

j

1 3 6 11 19 32 46 64 96

Variable increasing block size for m = 96 and s = 32

22 / 27

(46)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a large nonsymmetric matrix

Nonsymmetric matrix (PR02R + 1000 I) with n = 161070 and nz = 8185136 Convergence curves with m = 96 and s = 16 or s = 32

23 / 27

(47)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a large nonsymmetric matrix

Nonsymmetric matrix (PR02R + 1000 I) with n = 161070 and nz = 8185136 Condition numbers of AW with m = 96 and s = 16 or s = 32

24 / 27

(48)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a large restarting parameter

Nonsymmetric matrix (PR02R + 1000 I) with n = 161070 and nz = 8185136 Convergence curves with m = 192 and s = 16 or s = 32

25 / 27

(49)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Numerical experiment with a large restarting parameter

Nonsymmetric matrix (PR02R + 1000 I) with n = 161070 and nz = 8185136 Condition numbers of AW with m = 192 and s = 16 or s = 32

26 / 27

(50)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Conclusion

DGMRES(m,r) and AGMRES(m,r)

Combined DD preconditioning with m-step GMRES and deflation

Parallel efficiency and fast convergence with a large number of subdomains Difficult choice of the restarting parameter m

VGMRES(m,s) and FibGMRES(m,s)

s -step algorithm combined with a variable block size s

j

Relationship between convergence rate and condition numbers of the blocks

Numerical experiments with a Fibonacci sequence

Future work

Adaptive variation of s

Blocks computed via a Newton basis Schwarz preconditioning with deflation Parallel computations

27 / 27

(51)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Conclusion

DGMRES(m,r) and AGMRES(m,r)

Combined DD preconditioning with m-step GMRES and deflation

Parallel efficiency and fast convergence with a large number of subdomains Difficult choice of the restarting parameter m

VGMRES(m,s) and FibGMRES(m,s)

s -step algorithm combined with a variable block size s

j

Relationship between convergence rate and condition numbers of the blocks

Numerical experiments with a Fibonacci sequence

Future work

Adaptive variation of s

Blocks computed via a Newton basis Schwarz preconditioning with deflation Parallel computations

27 / 27

(52)

FibGMRES

DI & JE

&DNW

GMRES DGMRES and AGMRES VGMRES

Conclusion

DGMRES(m,r) and AGMRES(m,r)

Combined DD preconditioning with m-step GMRES and deflation

Parallel efficiency and fast convergence with a large number of subdomains Difficult choice of the restarting parameter m

VGMRES(m,s) and FibGMRES(m,s)

s -step algorithm combined with a variable block size s

j

Relationship between convergence rate and condition numbers of the blocks

Numerical experiments with a Fibonacci sequence

Future work

Adaptive variation of s

Blocks computed via a Newton basis Schwarz preconditioning with deflation Parallel computations

27 / 27

Références

Documents relatifs

In polymer electrolyte membrane (PEM) fuel cells, the most practical catalysts, which are used to catalyze the slow oxygen reduction reaction (ORR) at the cathode and the

Today, hot embossing and injection molding belong to the established plastic molding processes in microengineering. Based on experimental findings, a variety of microstructures

Tasks to parallelize inside the subdomain are mainly the solution of linear systems induced by the application of the preconditioner M − 1. As those systems should be solved

The new Cooperation Strategy 2013-2016 is rooted in the spirit of the ag- reements of cooperation between the governments of Switzerland and Macedonia and was developed by the

As noted earlier, the two main sticky fault models used in this study are: first an adapted version of the model presented in [11], referred to as “Numerical Soft Fault Model”

Unit´e de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex Unit´e de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP

Unit´e de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unit´e de recherche INRIA Rhˆone-Alpes, 655, avenue de l’Europe, 38330 MONTBONNOT ST

Using substructuring techniques from domain decomposition, we then show that the methods of reflections are classical block Jacobi and block Gauss-Seidel methods for the