HAL Id: hal-01400663
https://hal.inria.fr/hal-01400663
Submitted on 22 Nov 2016
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Comparison of solvers performance when solving the 3D
Helmholtz elastic wave equations over the Hybridizable
Discontinuous Galerkin method
Marie Bonnasse-Gahot, Henri Calandra, Julien Diaz, Stephane Lanteri
To cite this version:
Marie Bonnasse-Gahot, Henri Calandra, Julien Diaz, Stephane Lanteri. Comparison of solvers per-formance when solving the 3D Helmholtz elastic wave equations over the Hybridizable Discontinuous Galerkin method. MATHIAS – TOTAL Symposium on Mathematics, Oct 2016, Paris, France. �hal-01400663�
Comparison of solvers performance when solving
the 3D Helmholtz elastic wave equations over
the Hybridizable Discontinuous Galerkin method
M. Bonnasse-Gahot1,2, H. Calandra3, J. Diaz1 and S. Lanteri2
1INRIA Bordeaux-Sud-Ouest, team-project Magique 3D 2INRIA Sophia-Antipolis-Méditerranée, team-project Nachos 3TOTAL Exploration-Production
Motivations
Imaging methods
I Full Wave Inversion (FWI) : inversion process requiring to
solve many forward problems
Seismic imaging : time-domain or harmonic-domain ?
I Time-domain : imaging condition complicatedbutquite low computational cost
I Harmonic-domain : imaging condition simplebuthuge computational cost
Memory usage
Motivations
Imaging methods
I Full Wave Inversion (FWI) : inversion process requiring to
solve many forward problems
Seismic imaging : time-domain or harmonic-domain ?
I Time-domain : imaging condition complicatedbutquite low computational cost
I Harmonic-domain : imaging condition simplebuthuge computational cost
Motivations
Imaging methods
I Full Wave Inversion (FWI) : inversion process requiring to
solve many forward problems
Seismic imaging : time-domain or harmonic-domain ?
I Time-domain : imaging condition complicatedbutquite low computational cost
I Harmonic-domain : imaging condition simplebuthuge computational cost
Memory usage
Motivations
Resolution of the forward problem of the inversion process
I Elastic wave propagation in the frequency domain : Helmholtz
equation
First order formulation of Helmholtz wave equations
x = (x , y , z) ∈ Ω ⊂ R3, ( i ωρ(x)v(x) = ∇·σ(x) +fs(x) i ωσ(x) =C(x) ε(v(x)) I v : velocity vector I σ : stress tensor I ε : strain tensor
Motivations
Resolution of the forward problem of the inversion process
I Elastic wave propagation in the frequency domain : Helmholtz
equation
First order formulation of Helmholtz wave equations
x = (x , y , z) ∈ Ω ⊂ R3, ( i ωρ(x)v(x) = ∇·σ(x) +fs(x) i ωσ(x) =C(x) ε(v(x)) I v : velocity vector I σ : stress tensor I ε : strain tensor
Approximation methods
Discontinuous Galerkin Methods
3unstructured tetrahedral meshes
3combination between FEM and finite volume method (FVM)
3hp-adaptivity
3easily parallelizable method
7 7large number of DOF as compared to classical FEM
b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b bb b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b
Approximation methods
Discontinuous Galerkin Methods
3unstructured tetrahedral meshes
3combination between FEM and finite volume method (FVM)
3hp-adaptivity
3easily parallelizable method
7 7large number of DOF as compared to classical FEM
b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b bb b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b
Approximation methods
Discontinuous Galerkin Methods
3unstructured tetrahedral meshes
3combination between FEM and finite volume method (FVM)
3hp-adaptivity
3easily parallelizable method
7 7large number of DOF as compared to classical FEM
b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b bb b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b
Approximation methods
Hybridizable Discontinuous Galerkin Methods
3same advantages as DG methods : unstructured tetrahedral
meshes, hp-adaptivity, easily parallelizable method, discontinuous basis functions
3introduction of a new variable defined only on the interfaces
3lower number of coupled DOF than classical DG methods
b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b bb b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b b b bb b b b b b b b b b b b b b b b b b b b b
Approximation methods
Hybridizable Discontinuous Galerkin Methods
3same advantages as DG methods : unstructured tetrahedral
meshes, hp-adaptivity, easily parallelizable method, discontinuous basis functions
3introduction of a new variable defined only on the interfaces
3lower number of coupled DOF than classical DG methods
b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b bb b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b b bb b b b b b b b bb b b b b b b bb b b b b b b b b b b b b b b b b b b b b
Principles of the HDG method
1. Introduction of a Lagrange multiplier λ
2. Expressing the initial unknowns WK =vK, σKT as a
function of λ= λF1, λF2, ..., λFnf T WK =KKλ 3. Transmission condition : X K BKWK+LKλ= 0 4. Mλ=S
5. Computation of the solutions of the initial problem, element
by element
Principles of the HDG method
1. Introduction of a Lagrange multiplier λ
2. Expressing the initial unknowns WK =vK, σKT as a
function of λ= λF1, λF2, ..., λFnf T WK =KKλ 3. Transmission condition : X K BKWK+LKλ= 0 4. Mλ=S
5. Computation of the solutions of the initial problem, element
Principles of the HDG method
1. Introduction of a Lagrange multiplier λ
2. Expressing the initial unknowns WK =vK, σKT as a
function of λ= λF1, λF2, ..., λFnf T WK =KKλ 3. Transmission condition : X K BKWK +LKλ= 0 4. Mλ=S
5. Computation of the solutions of the initial problem, element
by element
Principles of the HDG method
1. Introduction of a Lagrange multiplier λ
2. Expressing the initial unknowns WK =vK, σKT as a
function of λ= λF1, λF2, ..., λFnf T WK =KKλ 3. Transmission condition : X K BKWK +LKλ= 0 4. Mλ=S
5. Computation of the solutions of the initial problem, element
Principles of the HDG method
1. Introduction of a Lagrange multiplier λ
2. Expressing the initial unknowns WK =vK, σKT as a
function of λ= λF1, λF2, ..., λFnf T WK =KKλ 3. Transmission condition : X K BKWK +LKλ= 0 4. Mλ=S
5. Computation of the solutions of the initial problem, element
by element
2D Numerical results : performances comparison of the HDG method
Contents
2D Numerical results : performances comparison of the HDG method
2D Numerical results : performances comparison of the HDG method
Anisotropic test case
I Three meshes :
I 600 elements
I 3000 elements
I 28000 elements
2D Numerical results : performances comparison of the HDG method
Anisotropic case : Memory consumption
M1 M2 M3 0 2,000 4,000 Mesh Memo ry (MB) P3 interpolation order HDGm IPDGm
2D Numerical results : performances comparison of the HDG method
Anisotropic case : CPU time (s)
M1 M2 M3 0 200 400 600 Mesh CPU time (s) P3interpolation order HDGm const. 1 HDGm res. 1 M1 M2 M3 0 200 400 600 IPDGm const. IPDGm res.
IPDGCPUtime'9× HDGCPUtime
3D numerical results
Contents
2D Numerical results : performances comparison of the HDG method
3D numerical results : focus on the linear solver
3D plane wave in an homogeneous medium 3D geophysic test-case : Epati test-case
3D numerical results
Resolution of the linear system
M
λ
=
S
I direct solver : MUMPS (MUltifrontal Massively Parallel
sparse direct Solver) :
I Direct factorization A = LU or A = LDLT
I Multiple RHS
I hybrid solver : MaPhys (Massively Parallel Hybrid Solver) :
I combination of direct and iterative methods
I non-overlapping algebraic domain decomposition method (Schur complement method)
3D numerical results
Cluster configuration
Features of the nodes :
I 2 Dodeca-core Haswell Intel Xeon E5-2680
I Frequency : 2,5 GHz
I RAM : 128 Go
I Storage : 500 Go
I Infiniband QDR TrueScale : 40Gb/s
3D numerical results 3D plane wave in an homogeneous medium
3D plane wave in an homogeneous medium
1000 m 1000 m 1000 m Configuration of the computational domain Ω . I Physical parameters : I ρ= 1 kg.m−3 I λ= 16 GPa I µ= 8 GPa I Plane wave : u = ∇ei (kxx +kyy +kzz) I ω = 2πf , f = 8 Hz I Mesh : 21 000 elements
3D numerical results 3D plane wave in an homogeneous medium
3D Plane wave : Memory consumption
2 4 8 16 10 100 # nodes Memo ry (GB)
Average memory for one node (8 MPI by node and 3 threads by MPI)
MaPhys Slope = 2.3
MUMPS Slope = 1.8
3D numerical results 3D plane wave in an homogeneous medium
3D Plane wave : Execution time
48 96 192 384 125 160 # cores Time (s)
Execution time for the resolution of the HDG-P3 system
MaPhys 8 MPI, 3 threads
Slope = 0.8
(matrix order = 1 300 000, # nz=300 000 000 )
3D numerical results 3D plane wave in an homogeneous medium
3D Plane wave : Execution time
48 96 192 384 125 160 # cores Time (s)
Execution time for the resolution of the HDG-P3 system
MUMPS 8 MPI, 3 threads
3D numerical results 3D geophysic test-case : Epati test-case
Epati test-case
Vp-velocity model (m.s−1), vertical section at y = 700 m
Mesh composed of 25 000 tetrahedrons
3D numerical results 3D geophysic test-case : Epati test-case
Epati test-case : Memory consumption
4 8 16 10 30 # nodes Memo ry (GB)
Average memory for one node (8 MPI by node and 3 threads by MPI)
MaPhys Slope = 2.3
MUMPS Slope = 1.8
3D numerical results 3D geophysic test-case : Epati test-case
Epati test-case : Execution time
96 192 384 10 100 # cores Time (s)
Execution time for the resolution of the HDG-P3 system
MaPhys 8 MPI, 3 threads Slope=1.3 MUMPS 8 MPI, 3 threads Slope=0.5 (matrix order = 1 300 000, # nz=365 000 000 )
Conclusions-Perspectives
Conclusion-Perspectives
I more detailed analysis of the comparison between solvers
I larger meshes
I more powerful clusters
I memory crash test
Conclusions-Perspectives