Hybridization of FETI Methods

(1)

HAL Id: tel-01820609

https://tel.archives-ouvertes.fr/tel-01820609

Submitted on 22 Jun 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Roberto Molina-Sepulveda

To cite this version:

Roberto Molina-Sepulveda. Hybridization of FETI Methods. General Mathematics [math.GM]. Uni-versité Pierre et Marie Curie - Paris VI, 2017. English. �NNT : 2017PA066455�. �tel-01820609�

(2)

Doctoral School ED Sciences Mathématiques Paris Centre University Department LJLL Laboratoire Jacques-Louis Lions

Thesis defended by Roberto Molina

Defended on 1

st

_{September, 2017}

In order to become Doctor from Laboratoire Jacques-Louis Lions

Academic Field Applied Mathematics

Speciality Numerical Analysis

Thesis Title

Hybridization of FETI Methods

Thesis supervised byFrançois-Xavier Roux

Committee members

(3)

(4)

Doctoral School ED Sciences Mathématiques Paris Centre University Department LJLL Laboratoire Jacques-Louis Lions

Thesis defended by Roberto Molina

Defended on 1

st

_{September, 2017}

In order to become Doctor from Laboratoire Jacques-Louis Lions

Academic Field Applied Mathematics

Speciality Numerical Analysis

Thesis Title

Hybridization of FETI Methods

Thesis supervised byFrançois-Xavier Roux

Committee members

(5)

(6)

École doctorale ED Sciences Mathématiques Paris Centre Unité de recherche LJLL Laboratoire Jacques-Louis Lions

Thèse présentée par Roberto Molina

Soutenue le 1

er

_{septembre 2017}

En vue de l’obtention du grade de docteur de l’Laboratoire Jacques-Louis Lions

Discipline Mathématiques Appliquées

Spécialité Analyse Numérique

Titre de la thèse

Hybridation de mèthodes FETI

Thèse dirigée parFrançois-Xavier Roux

Composition du jury

(7)

(8)

opinions expressed in the theses: these opinions must be considered to be those of their authors.

(9)

(10)

scientific computation

Mots clés : analyse numérique, méthodes de décomposition de domaine, algèbre, sciences numériques

(11)

(12)

LJLL Laboratoire Jacques-Louis Lions Laboratoire Jacques-Louis Lions

4 place Jussieu

Université Pierre et Marie Curie Boîte courrier 187

75252 Paris Cedex 05 France

T (33)(0) 1 44 27 42 98

v (33)(0)1 44 27 72 00

(13)

(14)

(15)

(16)

Hybridization of FETI Methods

Abstract

In this work new domain decomposition methods and new implementations for existing methods are developed. A new method based on previous domain decomposition methods is formulated. The classic FETI [30] plus FETI-2LM [35] methods are used to build the new Hybrid-FETI. The basic idea is to develop a new algorithm that can use both methods at the same time by choosing in each interface the most suited condition depending on the characteristics of the problem. By doing this we search to have a faster and more robust code that can work with configurations that the base methods will not handle it optimally by himself. The performance is tested on a contact problem. The following part involves the development of a new implementation for the S-FETI method [39], the idea is to reduce the memory usage of this method, to make it able to work in larger problem. Different variation for this method are also proposed, all searching the reduction of directions stored each iteration of the iterative method. Finally, an extension of the FETI-2LM method to his block version as in S-FETI, is developed. Numerical results for the different algorithms are presented.

Keywords: numerical analysis, domain decomposition methods, algebra, scientific computation

LJLL Laboratoire Jacques-Louis Lions

Laboratoire Jacques-Louis Lions – 4 place Jussieu – Université Pierre et Marie Curie – Boîte courrier 187 – 75252 Paris Cedex 05 – France

(17)

(18)

Thanks to my family, friends and people from the University.

(19)

(20)

Abstract xv

Acknowledgements xvii

Contents xix

List of Tables xxiii

List of Figures xxv

Introduction 1

Contributions of this thesis . . . 2

Iterative methods . . . 3

Krylov methods . . . 3

Conjugate gradient . . . 10

ORTHODIR . . . 13

Parallelization of Krylov methods . . . 15

1 Hybrid FETI method 17 1.1 Basic FETI method . . . 18

1.1.1 Model problem and discretization . . . 18

1.1.2 FETI one lagrange multiplier . . . 25

1.1.3 Local preconditioner . . . 30

1.1.4 FETI Algorithms . . . 33

1.2 FETI with two lagrange multipliers . . . 35

1.2.1 FETI-2LM . . . 35

1.2.2 Arbitrary mesh partition . . . 38

1.2.3 Optimal Interface Boundary Conditions . . . 41

1.3 New FETI as an hybrid between one and two lagrange methods 48 1.3.1 Development . . . 48

1.3.2 Extension to a general problem . . . 56 xix

(21)

1.3.3 Preconditioner . . . 58

1.3.4 Implementation . . . 61

1.4 Numerical results . . . 65

1.4.1 Two material bar . . . 66

1.4.2 Contact Problem . . . 71

1.5 Conclusion . . . 80

2 Block FETI methods 81 2.1 Introduction and preliminarities . . . 82

2.1.1 Dirichlet preconditioner for two subdomains . . . 82

2.1.2 Consistent preconditioners . . . 84

2.1.3 Simultaneous FETI . . . 98

2.1.4 The Algorithm . . . 101

2.1.5 Cost and implementation of S-FETI . . . 110

2.2 Sorting search directions in S-FETI . . . 119

2.2.1 Linear dependence in block FETI directions . . . 120

2.2.2 Cholesky factorization with complete pivoting . . . 121

2.2.3 Diagonalization of search directions block . . . 123

2.3 Memory usage in S-FETI . . . 125

2.3.1 New sparse storage . . . 126

2.3.2 Reconstruction of search directions . . . 126

2.3.3 Implementation details and exploitable parallelism . . 136

2.4.1 FETI and S-FETI . . . 143

2.4.2 Decomposition of WtFW . . . 146

2.4.3 S-FETI with sparse storage . . . 149

2.4.4 General comparison . . . 152

2.5 Conclusion . . . 155

3 Block strategies as a preconditioner 157 3.1 Introduction and preliminaries . . . 158

3.1.1 Method of Conjugate Directions . . . 158

3.1.2 Flexible Conjugate Gradient . . . 162

3.2 FETI with recursive preconditioner . . . 164

3.2.1 One direction from S-FETI block . . . 164

3.2.2 Linear combination of directions from block . . . 168

3.3.1 Storage of single direction . . . 171

3.3.2 Storage of reduced directions . . . 172

(22)

4 FETI-2LM with enlarged search space 175 4.1 Introduction . . . 176 4.1.1 The FETI-2LM method . . . 176 4.2 The Block-2LM Algorithm . . . 179 4.3 Implementation and cost of the method . . . 182 4.4 Numerical results . . . 186 4.4.1 Block-2LM vs 2LM . . . 186 4.5 Conclusion . . . 188

Conclusion and perspectives 191

Bibliography 195

(23)

(24)

1.1 Convergence of Hybrid-FETI preconditioner (Number of itera-tions) . . . 67 1.2 Convergence of different marking Hybrid-FETI (Number of

itera-tions) . . . 70

1.3 Convergence of the different methods (Number of iterations) . 70

1.4 Convergence of the three contact examples for the different FETI methods (Number of iterations) . . . 79 2.1 Time for forward-backward substitution on multi-core processor. 115 2.2 Direction stored in a 5x5x5 cube configuration . . . 125 2.3 Difference in subdomain division versus local interface division 145 2.4 Quadruple vs Double comparison . . . 147 2.5 Iterations/Search Directions results for different values of ε. . . 149 2.6 SPARSE vs SPARSE-OPT . . . 150 2.7 FULL vs SPARSE-OPT . . . 152 2.8 Comparative between variations of S-FETIwith corner interfaces.

125 subdomains and 150 thousand elements per subdomain. . 154

2.9 Comparative between variations of S-FETIwithout corner inter-faces. 125 subdomains and 150 thousand elements per subdo-main. . . 154 3.1 Convergence comparative. . . 172 4.1 64 subdomains . . . 187 4.2 125 subdomains . . . 187

(25)

(26)

1.1 Two subdomain splitting . . . 18

1.2 Two subdomain divisions with duplicated nodes. . . 22

1.3 Multiple interface node. . . 26 1.4 Γj division example and crosspoint detail. . . 39 1.5 One way splitting. . . 41 1.6 Nodes numbering in subdomain Ω(s). . . 47 1.7 Interface patch of size p = 1 (red) with one and two layers d = 1, 2

(blue). . . 48 1.8 A contact problem. . . 49 1.9 Three subdomain divisions. . . 50 1.10 Boundary conditions for both preconditioners. . . 60 1.11 Two material bar. . . 66 1.12 Hybrid-FETI Iterations versus Elements number. . . 68 1.13 a) Regular interface marking. b) Extra covering marking. . . . 69 1.14 a) Contact problem. b) Initial gap. c) Contact pressure. . . 73 1.15 First example, initial configuration and subdomain division. . 77 1.16 First example, final configuration (solution). . . 78 1.17 Second example, initial configuration. . . 79 2.1 Preconditioner construction . . . 88 2.2 4 Subdomain problem . . . 91 2.3 Local interfaces with completely different subdomains (Young’s

modulus 0, 1 ≤ E ≤ 1000) . . . . 106

2.4 Three subdomain subdivision and computed corrections . . . . 108

2.5 Subdomain point of view (Left): In red the coarse modes describing

Z or G owned by subdomain Ω(s), in black the interfaces where F times the red modes is also non null. Interface point of view (Right): In doted lines the coarse modes describing FZ or FAG owned by local interface Γ(sq) between subdomains Ω(s) and Ω(q), in black the modes where, due to Γ(sq), ZT(FZ) is non null. . . . 118 2.6 One-way split of six subdomains . . . 137

(27)

2.7 Cube with 125 subdomains. . . 144 2.8 Time of Pardiso vs Dissection for different element number in

each subdomain. . . 145 2.9 Checkerboard cube with 125 subdomains. . . 146 2.10 Max local memory usage in cube problem for 75 (left) and 300

(right) thousand elements per subdomain. . . 151 3.1 Iterations versus percentage of computed directions. Example 1 173

4.1 Two subdomains with Robin Interface Condition . . . 176

4.2 Left: In red the coarse modes describing Z owned by subdomain

Ω(s). In dotted lines, the modes shared between Ω(s) and the subdomains involved in multiplying by the FETI-2LM operator. Right: In dotted lines the coarse modes describing FZ owned by local interface of Ω(s) and conversely they are the non null modes in the interface in red. . . 185

(28)

In the last decades and thanks to the increasing computational power, faster robust and accurate algorithms had been developed to solve numerically a large variety of problems modeled by Partial Differential Equations (PDE). The use of multiple processors to increase the speed of calculations leads to the search of strategies in parallelism that allows to profit of this new computers architectures. Different parallel iterative and direct methods for solving linear systems had been developed [70],[25],[3], both with positive and negative features in terms of speed, memory and accuracy. Results of iterative methods based on Krylov spaces usually depend on the condition number of the matrix representing the system, and the memory requirements are usually not an issue, on the other hand direct methods are more robust but the use of memory can be a problem for some large systems.

For problems coming from discretization of Finite Element Methods we have properties that allow us to use a different approach, that is the Domain Decomposition Methods (DDM), this methods can be considered as an hybrid between iterative and direct methods [60],[19]. They are based on the partition of the domain of the problem into subdomains, where smaller system of equations are defined. From this division, this methods can be categorized into two large groups, the overlapping and non overlapping methods.

In this work we will focus in some DDM with non overlapping interfaces, mainly the method of Finite Elements Tearing and Interconnecting or FETI and other related methods [30],[33],[35].

The objective of our work is to develop new FETI methods, that apply to particular cases, improving the results shown by the existing methods. Also we will expand the results on one of the existing FETI methods to extend his

(29)

applications to problems in which the current formulations does not allows it.

Contributions of this thesis

The following work is based in one of the most known of the non overlapping domain decomposition methods, the Finite Element Tearing and Interconnecting method (FETI) [30],[29].

The works on this method since its first formulation have produced several improvements to the same [34],[61], but also it has permitted the formulation of new FETI methods [33],[35]. Within this context, and using the similarities on the constructions of two of the most used FETI methods, namely the original FETI-1LM and the later developed FETI-2LM. We will formulate a new algorithm based in this two methods, that tries to take advantage of the good properties of one and the other. After the development of this new method, and after a better understanding of the same we can exhibit his advantages in contact cases where it outperforms the FETI-1LM and FETI-2LM methods.

In a different line of work, this time following the development of the new S-FETI method [39], we will continue the analysis within, in order to find new, faster or more robust forms of the same. Different variations of it will be formulated and tested, all of them trying to improve the existing results shown by the method. We will also extend the application of S-FETI to a larger class of problems, with a new implementation, base on a sparse storage, that reduces the memory limitations of this method.

Next we will try to use the ideas presented in the formulation of S-FETI to develop new FETI algorithms with certain improvements, however since they are in a basic stage they present several issues that leads to new sources of research.

Finally, we use the same idea that led to the S-FETI formulation but this time is applied to the FETI-2LM method, in order to develop a new Block version of this last one.

Before giving more details about FETI and the other FETI-like methods, we want to recall some of the basic linear algebra tools needed in this thesis to understand this type of algorithms, we refer to the iterative solvers for linear

(30)

systems which are one of the main elements in the different FETI methods.

Iterative methods

We start by showing the basic properties of the iterative Krylov methods used in the solution of the FETI problems.

Krylov methods

In this section we are interested in solving the following general problem

Ax = b (1)

With A ∈ Mn×n(R) square real matrix, x the unknown, b the right hand side

(rhs) known term, both Rnvectors.

Big part of this work is based on the resolution of a linear system of this type via an iterative method. First we will consider the more general case where A is invertible and our system has a unique solution. The most used methods to solve this kind of problems are the ones based on projections in a particular type of space, the Krylov Space.

From this we can build the Krylov methods, that consist on building some adequate subspace and project our solution in this space, all by just using simple operations such as matrix-vector products, dot products or linear combination of vectors.

This way the Krylov Space can be defined by

Definition 0.1. Lets consider x0as an initial solution of 1. A Krylov space, denoted

by Kp is the space generated by the residual g0 := Ax0−b and its successive p − 1

iterative products

K_p_{= Span{g}₀_{, Ag}₀_{, A}2_g₀_{, ..., A}p−1_g₀} ₍₂₎

We note that this family of subspaces is strictly increasing and bounded, so it has a maximal dimension that we will call pmax. Also with this definition we

(31)

Lemma 0.2. If Apg0∈ Kp then Ap+qg0∈ Kpfor every q > 0

Proof. By induction. For q ≥ 0, if we have Ap+qg0∈ Kp then Ap+qg0= p−1 X k=0 αkAkg0 and therfore Ap+q+1g0= p−2 X k=0 αkAk+1g0+ αp−1Apg0 = p−2 X k=0 αkAk+1g0+ αp−1 p−1 X k=0 βkAkg0 = p−1 X k=0 γkAkg0 (3)

Lemma 0.3. The Krylov space succession its strictly increasing from 1 to pmax then

it stagnates from p = pmax

Proof. If p is the smallest integer that makes Apg0dependent of previous vectors,

then the vectors (g0, Ag0, A2g0, . . . , Ap−1g0) are linearly independent and Kqhas a

dimension of q, for every q ≤ p. In particular Kp has a dimension p.

Furthermore, Apg0∈ Kp and, from Lemma 0.2, every vector Ap+qg0 is in Kp,

for every q > 0, which implies that Kp+q= Kp for every q > 0.

We then have: K1⊂ · · · ⊂ Kp= Kp+qfor every q > 0. And by definition of pmax,

we have that p = pmax

Theorem 0.4. The solution of the linear system Ax = b is in the affine space x₀+Kpmax

Proof. From Lemma 0.2 and Lemma 0.3 the vectors (g0, Ag0, A2g0, . . . , Apmax−1g0)

are linearly independent and

Apmax_g 0 = pmax−1 X k=0 αkAkg0 (4)

In this equation, the coefficient α0is non null, from which, multiplying doth

(32)

Apmax−1_g 0= pmax−1 X k=0 αkAk−1g0 (5)

which is contradictory with the linear dependency of the vectors.

If we divide both terms in Equation 4 by α0and passing all terms to one side,

we have g0+ pmax−1 X k=1 αk α0 Akg0− 1 α0 Apmax_g 0= 0 ⇔ Ax0−b + pmax−1 X k=1 αk α0 Akg0− 1 α0 Apmax_g 0= 0 ⇔ A         x0+ pmax−1 X k=1 αk α0 Ak−1g0− 1 α0 Apmax−1_g 0         = b ⇔ (6)

In practice to build this spaces all we have to do is to compute the basis of the space, but we will never use the “natural” base because it degenerates numerically as it grows. In practice if we use the regular double precision in a standard machine, after about 16 iteration the new values of the succession Apg0

start to be linearly dependent, and depending on the matrix A some values are way too small or too big to be represented.

With this in consideration we need to find another way to reconstruct this space, to do so, we build different basis, for example the one called Base of Arnoldi which has much better numerical properties in terms of representation and stability. Basically the basis are constructed applying the modified Gram-Schmidt orthonormalization procedure to the successive matrix products. The algorithm defined in 1 illustrates this procedure.

With the construction of the Krylov space p basis, that we call Vp, we can

now attack the problem of finding an approximate solution xp. We know, as

shown previously that the real solution is in x0+ Kpmax but since we are working

(33)

Algorithm 1Arnoldi iteration algorithm

1: Initialization

2: g0= Ax0−b

3: v1= kgg00k

4: loopConstruction of the j + 1 vector of the base

5: w = Avj 6: fori = 1 to j do 7: αi = (w, vi) 8: w = w − αivi 9: end for 10: vj+1= kw_wk 11: end loop

solution can be written as

xp= x0+ Vpzp (7)

where zp is a p dimension vector. This approximation allows the writing of

the error and residual vectors as

ep:= xp−x = x0−x + Vpzp= e0+ Vpzp

gp:= Axp−b = Aep= Ae0+ AVpzp= g0+ AVpzp

(8)

A Krylov method consist, in one hand, of an algorithm to compute the base of the Krylov Space and on the other hand an optimal criteria to determine the approximate solution xp. This is made, a priori, by minimizing the error or

residual using an adapted norm. Lanczos method

We will now explain how to build the solution in the special case when A is a symmetric matrix. If we define hij to be the coefficient of orthogonalization of

Avj against vi, and also hj+1,j to be the norm of the w vector we get after the

orthogonalization and Vp the matrix of the first p vectors of the Arnoldi’s base,

(34)

V_ptVp= Ip AVp= Vp+1Hp+1,p (9) where Hp+1,p:=                              h11 h12 . . . . . . h1p h21 h22 . . . . . . h2p 0 h32 h33 . . . h3p .. . . .. ... ... ... .. . . .. ... ... 0 . . . . . . 0 hp+1,p                              (10) and so Hp:= Hpp= VptAVp (11)

For the case of a symmetric matrix the matrix Hp is also symmetric an thus

tridiagonal, so the algorithm for the construction of the Base of Arnoldi is simplified.

Algorithm 2Algorithm of Lanczos

1: Initialization

2: g0= Ax0−b

3: v1=kg_g0₀k

4: loopConstruction of the j + 1 vector of the base of Lanczos

5: w = Avj 6: hj,j−1= hj−1,j 7: w = w − hj−1,jvj−1 8: hjj = (w · vj) 9: w = w − hjjvj 10: hj+1,j = kwk 11: vj+1=k_h_j+1,jw k 12: end loop

(35)

for symmetric matrices, now called Basis of Lanczos. This algorithm has the property of only use short recurrences in the computation so his cost is constant in every iteration.

If the matrix is also positive definite, the Lanczos method consist on minimize the error in the norm defined by A.

E_(x_p_{) = kx}_p−_xk2

A= (A(xp−x) · (xp−x)) = (gp·ep) (12)

This approximate solution has the next properties

Theorem 0.5. The approximate solution xp of the Lanczos method is the projection

of x in x0+ Kp for the inner product derived from A

Proof. From 12 xp is the element from x0+ Kpwhich has a distance to x minimal

in the A-norm.

Corollary 0.6. The residual vector gp= Axp−b of the Lanczos method is orthogonal

to Kp

Proof. Direct of the properties of projections in affine spaces.

All it is missing is the practical computation of xp. To do so, of 12 and 8 we

have that

E_(x_p_{) = (A(e}₀_{+ V}_p_z_p_{) · (e}₀_{+ V}_p_z_p_{)) = (AV}_p_z_p·_V_p_z_p_{) + 2(g}₀·_V_p_z_p_{) + (g}₀·_e₀₎ ₍₁₃₎

To minimize this we only need to use the part that depends on zp and so the

problem is reduce to the minimization of the functional

J_p_(z_p_{) =}1 2(V t pAVpzp·zp) + (Vptg0·zp) = 1 2(Tpzp·zp) + (yp·zp) (14)

(36)

where Tp = Hp is the matrix of the orthonormalization coefficients. This is

a classical finite dimension minimization problem from were we have the next results

Lemma 0.7. If A is a symmetric positive definite matrix then J (x) = 1₂(Ax · x) − (b · x) is strictly convex Proof. =J (αx + (1 − α)y) = =1 2α 2_{(Ax · x) + α(1 − α)(Ax · y) +}1 2(1 − α) 2_{(Ay · y) − α(b · x) − (1 − α)(b · y)} = αJ (x) + (1 − α)J (y) +1 2 h

(α2−_{α)(Ax · x) + 2α(1 − α)(Ax · y) + ((1 − α)}2−_{(1 − α))(Ay · y)}i = αJ (x) + (1 − α)J (y) +1

2α(α − 1) [(Ax · x) − 2(Ax · y) + (Ay · y)]

(15) Since A is positive definite, we have

(Ax · x) − 2(Ax · y) + (Ay · y) = (A(x − y) · (x − y)) > 0 (16) whenever x , y.

Now, if α ∈]0, 1[, then α(1 − α) < 0, hence 1

2α(α − 1) [(Ax · x) − 2(Ax · y) + (Ay · y)] < 0 (17)

Theorem 0.8. The functional J (x) = 1₂(Ax · x) − (b · x) admits and absolute minimum x who also verifies Ax = b

Proof. J is strictly convex and lower bounded , because J (x) → +∞ when kxk → +∞. It is obviously differentiable with a value of

(37)

This functional has an absolute minimum in the unique point where his differ-ential is zero, which is the point where Ax − b = 0

Corollary 0.9. The minimum of E(xp)defined in 13 is the point xp = x0+ Vpzp, zp

being the solution of the system

Tpzp = −yp (19)

Proof. From previous theorem and lemma all we have to prove is that Tp is

positive definite which comes directly from the fact that the matrix A is also positive definite and the vectors Vp are linearly independent.

The Lanczos method consist on building Tp and ypthen find zpand replace it

in 7 to find the approach solution xp who minimizes the error in the A-norm.

One of the good things of this method is that the vectors of the base are calculated using a short recurrence, but the main drawback is that for the computation of xp we need to solve a bigger system every step, so the cost grows

as the number of iteration.

Conjugate gradient

The Lanczos method will be a much better algorithm if it could use a short recurrence for the calculation of the approach solution xp. This can be done if

the first component of zp are the ones of zp−1 which will give a formula of the

type

xp = xp−1+ αpvp (20)

In order to do so, the base of the Krylov space must be one in which the projection matrix WptAWp is a diagonal one. But we will not be able to compute

the error E(xp) because e0will be unknown, the only way to test the convergence

of the method is by using the dimensionless residual k_Ax_p−_bk

k_bk <  (21)

So we are forced to compute the successive gradients in order to control the method. Since gp∈ Kp+1∩ K

⊥

(38)

orthonormal base of vectors vp we can use the orthogonal base of the gradients,

even if it goes to zero, because we will stop before have any representation problem.

Let Gp = (g0, g1, . . . , gp−1) as we just saw, Gp = Vp∆p where ∆p is a diagonal

matrix. The projection matrix is also tridiagonal symmetric positive definite G_ptAGp = ∆tpVptAVp∆p= ∆pTp∆p= ˜Tp (22)

it admits the factorization ˜Tp= ˜LpD˜p˜Ltp and so

GtpAGp= ˜LpD˜p˜Ltp⇔ ˜L

−1

p GtpAGp˜L

−t

p = ˜Dp (23)

The previous shows that the matrix Wp= Gp˜L−pt made of linear combinations

of Gp is a A-orthogonal base of Kp.

Since the projection matrix W_ptAWp is diagonal, this base is ideal to use a

short recurrence in the computation of the solution of the optimization problem 12, as it can be built using the relation

Wp˜Ltp = Gp (24)

Let (w0, w1, . . . , wp−1) be column vectors of Wpand (γ0, γ1, . . . ) the sub-diagonal

elements of ˜L−pt, the equation 24 implies

w0 = g0and γj−1wj−1+ wj= gj∀j > 0 (25)

With the different relations between xp, gp and wp we can formulate a new

method without only using short recurrences in the construction of every new vector. Actually, from the previous equation we have

g0= Ax0−b and w0= g0 (26)

From the properties of the base Wp we have

xp= xp−1+ ρp−1wp−1⇔gp= gp−1+ ρp−1Awp−1 (27)

(39)

ρp−1

(gpw˙p−1) = (gp−1·wp−1) + ρp−1(Awp−1·wp−1) = 0 ⇔ ρp−1= −

(gp−1·wp−1)

(Awp−1·wp−1)

(28) From the equation 25 we can build the new wp with the previous wp−1and gp

wp = gp−γp−1wp−1 (29)

The coefficient γp−1 is also computed using the A-orthogonality relation

between wp and wp−1

(wp·Awp−1) = (gp·Awp−1) − γp−1(wp−1·Awp−1) = 0 ⇔ γp−1=

(gp·Awp−1)

(Awp−1·wp−1)

(30) The method defined this way i called the Conjugate gradient method and it can be resumed in 3.

Algorithm 3Conjugate gradient method

1: Initialization

2: g0= Ax0−b

3: w0= g0

4: loopIteration of the CG method

5: ρp−1= −(gp−1·wp−1)/(Awp−1·wp−1) 6: xp = xp−1+ ρp−1wp−1 7: gp= gp−1+ ρp−1Awp−1 8: if(gp·gp)/(b · b) < 2 then 9: End 10: end if 11: γp−1= (gp·Awp−1)/(Awp−1·wp−1) 12: wp= gp−γp−1wp−1 13: end loop

(40)

to build the new wp descent direction, but in limited arithmetic, errors are

transmitted from each actualization, so in practice, to apply this method in the domain decomposition framework, we will need to reconjugate the vectors of the base, and so the storage of this vectors for a robust method is mandatory [66]. Considering this, the lines 11, 12 from the algorithm 3 are now replaced by the loop necessary to build

γi = (gp·Awi) (Awi·Awi) , i = 0, . . . , p − 1 (31) and wp= gp− p−1 X i=0 γiwi (32)

ORTHODIR

In the case if a non symmetric matrix, the Hp matrix is no longer tridiagonal.

We can not expect to have short recurrence to build an orthogonal basis of the Krylov space Kp. Also, A does not define an inner product and the criteria of

optimality E(xp) may not be used. The logical choice for a stopping criteria is

the computation of the square norm of the residual

R_(x_p_{) = (A(x}_p−_{x) · A(x}_p−_{x)) = (g}_p·_g_p_{) = kx}_p−_xk2

At_A= (AtA(xp−x) · (xp−x)) (33)

We have similar properties as the symmetric case for the approximate solution that minimizes R(xp)

Theorem 0.10. The approximate solution xp that minimizes R(xp)in x0+ Kp is the

projection of x for the inner product associated to AtA. Proof. Direct from equation 33.

(41)

Proof. From the properties of projection in an affine space

(AtA(xp−x) · wp) = (A(xp−x) · Awp) = (gp·Awp) = 0, ∀wp ∈ Kp (34)

We have naturally introduced the scalar product defined by AtA that is symmetric and positive definite if A is invertible. We could think that would be appropriate to use the Conjugate Gradient method to the system

AtAx = Atb (35)

This equation is call the “N ormal equation00 and it does not have a very good conditioning since it can be as much as the square of the original conditioning of A.

The best is to compute the solution using a short recurrence, so from the theorem 0.10 we know that the any AtA-orthogonal base of Kp, Wp implies that

WptAtAWp should be diagonal.

If we look at the structure of Hp+1,p the matrix Hp+1,pt Hp+1,p is full, in this

case the previous basis have no use. To build a AtA-orthogonal base we only need to apply the modified Gram-Schmidt procedure to the vectors obtained by successive multiplication for the matrix, using the AtA-norm. With this considerations we have the next short recurrences

xp= xp−1+ ρpwp

gp= gp−1+ ρpAwp

(36)

and from the AtA-orthonormal properties we have

(gp·Awp) = 0 ⇔ (gp−1·Awp) + ρp(Awp·Awp) = 0 ⇔ ρp = −(gp−1·Awp) (37)

and so the algorithm is described in 4

(42)

Algorithm 4ORTHODIR method

1: Initialization

2: g0= Ax0−b

3: w0= g0

4: loopIterate p = 1, ..., until convergence

5: ρ = −_(Aw(gp−1·Awp−1) p−1·Awp−1) 6: xp= xp−1+ ρwp−1 7: gp= gp−1+ ρAwp−1 8: wp= Awp−1 9: fori = 0 to p − 1 do 10: γ = −(Aw_(Awp·Awi) i·Awi) 11: wp = wp+ γwi 12: end for 13: end loop

in any case we will deal with this issue in the next chapters.

A small variation of this method, equivalent to the regular and with the same properties can be made if we change the line 8 and build the next direction using the gradient, instead of the previous orthonormalized direction, i.e

wp= gp (38)

Parallelization of Krylov methods

The codes presented previously, represent a sequential algorithm, however they can be easily transformed into its respective parallel version.

The implementation of the parallel version of this type of methods is base on a message-passing standard, in practical terms this implies the use of libraries that use the most common parallel computing architectures, the most common communication protocol is the so-called Message Passing Interface (MPI) [40].

Using this protocol we only add two changes to the usual sequential algorithm 1. Exchange of data between processes (or subdomain in the domain

(43)

of how to compute this products will be given when presenting the first domain decomposition method used in this work.

2. Global reduction operations to add the contribution of each process when computing the different scalar products.

Using the MPI libraries, or in general any communication protocol, needed to do the exchanges in parallel algorithm produces synchronization points in the code that need to be reduced as much as possible to avoid major impact in the total computation time.

With the previous basic linear algebra and parallel computing considerations we are ready to presents the main work of this thesis.

(44)

Chapter

1

Hybrid FETI method

In the domain decomposition framework, the Finite Element Tearing and Inter-connecting (FETI) methods have proven to be very effective in the solution of real engineering applications during the last years. This is one of the reason why its current development keeps up to this day, always searching for faster, precise and robust new versions. In this context we have developed a new method built from the base of two existing FETI methods, namely the FETI-1LM and FETI-2LM. This new method tries to recover the good properties of each method, in configurations where the use of one or the other is not so clear. For the devel-opment of this new method, first we need to understand the basics of both basic FETI methods from a theoretical point of view, but also the implementation which will be crucial to show how the new method works.

This chapter is presented first introducing the FETI method in his classic version, including the preconditioners used to achieve good performance and some implementation considerations. Then the FETI method with two lagrange multipliers is explained to finally show the new hybrid method that arises from mixing the two previous FETI methods. The chapter ends with numerical results for all the methods.

(45)

Ω

₂

Γ

₃

Ω

₁

Figure 1.1 – Two subdomain splitting

1.1 Basic FETI method

1.1.1 Model problem and discretization

Model Problem

To begin, we will show the development of the method for a simple model, with the most basic domain decomposition configuration. All the ideas will later be extended to different elliptic problems and configurations. Let us first consider the Poisson problem with Dirichlet boundary condition

       −∆u = f in Ω u = 0 on ∂Ω (1.1)

where Ω ∩ Rd, d = 2, 3 is a bounded domain. To find his variational form, the Stokes formula is used in 1.1, so the problem is now:

Find u ∈ H₀1(Ω)such that Z Ω ∇_{u · ∇v =} Z Ω f v ∀_{v ∈ H}1 0(Ω) (1.2)

The domain is now divided into two smaller subdomains Ω(1) and Ω(2). Let Γ(3)= ∂Ω(1)∩_∂Ω(2) _{be the interface between both subdomains as in Figure 1.1.}

(46)

subdomain that inherits the Dirichlet condition, or any other condition, in a part of the boundary. This problems are written as

       −∆u(s) = f(s) in Ω(s) u(s) = 0 on ∂Ω(s)\ Γ(3) (1.3)

with s = 1, 2. It is clear that the solution of 1.1 will satisfy this equations, but the contrary is not always true, due to the differences that occur in Γ(3)_.

We continue by using again the Stokes formula to find the variational form of 1.3 Z Ω ∇_u(s)∇_v(s)₌ Z Ω f(s)v(s)+ Z Γ(3) ∂u(s) ∂n(s)v (s)_, _∀_v(s)_∈_H1 0∂Ω(s)_\Γ(3)(Ω(s)) (1.4)

For a function v ∈ H₀1(Ω), in particular the solution of the problem, his restriction in the subdomains Ω(s) will be continuous on the interface Γ(3). On the other hand two functions u(s) that satisfies the local Laplace equations 1.3 will not necessarily share the same values on Γ(3). Instead they can be used to build a more general global solution, but not smooth enough as required.

To do this construction, but at the same time recover the unique solution of 1.1 the two variational equations 1.4 are added, giving the following variational equality Z Ω ∇_{u∇v =} Z Ω f v + Z Γ(3) ∂u(1) ∂n(1) + ∂u(2) ∂n(2) ! v(3) ∀_{v ∈ H}1 0(Ω) (1.5) where v(3)= v(1)| Γ(3)= v(2)|Γ(3).

This formulation shows that a new condition is necessary to have an equiva-lence between the solution of the global problem and the solution of the local ones. For u to be in H₀1(Ω) this admissibility condition imposes the continuity on the interface

u(1) = u(2) on Γ(3) (1.6)

(47)

the flux ∂u(1) ∂n(1) + ∂u (2) ∂n(2) = 0 on Γ (3) _(1.7)

In general, a non overlapping domain decomposition method consist in intro-ducing boundary conditions on Γ(3)to complement the local equations 1.3 and to iteratively find the values of these boundary conditions for which both conti-nuity 1.6 and equilibrium 1.7 interface conditions are satisfied, meaning that this solution will be the exact same as the global searched one.

Depending on the condition imposed, two basic methods can be derived, the Schur complement method and the FETI method, later we will show a third method, also of the FETI type, that comes from using a different condition on the interface.

The Schur complement method consists in enforcing consistent Dirichlet boundary conditions on Γ(3) so continuity condition 1.6 is automatically satisfied

u(1) = u(2) = u3 on Γ(3) (1.8)

The local Dirichlet problem to be solved in parallel for a given u3 on each subdomain is            −∆u(s) = f(s) in Ω(s) u(s) = 0 on ∂Ω(s)\ Γ(3) u(s) = u3 on Γ(3) (1.9)

We reduce the computation to find the value of u3 for which the equilibrium interface condition 1.7 is satisfied. From equations 1.9, the functions ∂u(1)

∂n(1) and ∂u(2)

∂n(2) are continuous depending on u

3_{. The Schur complement method consists}

in solving iteratively a condensed interface problem to find u3 whose residual is equal to ∂u(1)

∂n(1) + ∂u

(2)

∂n(2).

The FETI method is based in enforcing consistent Neumann boundary con-ditions on Γ(3)so now the equilibrium interface condition 1.7 is automatically satisfied: ∂u(1) ∂n(1) = − ∂u(2) ∂n(2) = λ on Γ (3) _(1.10)

(48)

subdomain is              −∆u(s) = f(s) in Ω(s) u(s) = 0 on ∂Ω(s)\ Γ(3) ∂u(s) ∂n(s) = ±_λ _on _Γ(3) (1.11)

Now we compute the value λ on the interface for which the continuity condition 1.6 is satisfied. From equations 1.11, u(1)|

Γ(3)and u(2)|Γ(3)are continuous functions

depending on λ. The FETI method consists in solving iteratively a condensed interface problem to find λ and whose residual is equal to u(1)|

Γ(3)−u(2)|Γ(3).

A different interpretation of the FETI method can be considered if we see the unknown λ as the Lagrange multiplier of the continuity condition 1.6. The solu-tion of the global variasolu-tional problem 1.2 is the field u of H₀1(Ω) that minimizes the energy functional

J(v) = 1 2 Z Ω ∇_{v · ∇v −} Z Ω f v (1.12)

This minimization problem is equivalent to finding the couple of fields (u(1), u(2)) of H_0∂Ω1 (1)_\Γ(3)(Ω

(1)_{) × H}1

0∂Ω(2)_\Γ(3)(Ω

(2)_{) that minimizes the sum of the local energy}

functionals J1(v(1)) + J2(v(2)) = 1 2 Z Ω(1) ∇_v(1)· ∇_v(1)− Z Ω(2) f(2)v(2) +1 2 Z Ω(2) ∇_v(2)· ∇_v(2)− Z Ω(2) f(2)v(2) (1.13)

under the continuity constraint u(1)|

Γ(3) = u(2)|Γ(3). This condition can be written

under the weak form R

Γ(3)

u(1)−_u(2)_µ ₌ ₀ ∀_{µ ∈ H}−12(Γ(3)) (1.14)

Now, consider the Lagrangian L(v(1), v(2), µ) =1 2 Z Ω(1) ∇_v(1)· ∇_v(1)− Z Ω(1)f (1) v(1)+1 2 Z Ω(2) ∇_v(2)· ∇_v(2) − Z Ω(2)f (2) v(2)− Z Γ(3) v(1)−_v(2)_µ (1.15)

(49)

Γ

Ω

1 2 3

Figure 1.2 – Two subdomain divisions with duplicated nodes. and we note that the saddle point (u(1), u(2), λ) of L in H_0∂Ω1 (1)_\Γ(3)(Ω(1))×H 1

0∂Ω(2)_\Γ(3)(Ω(2))×

H−12(Γ(3)) is precisely the point where the variational equations 1.11 and 1.14 are satisfied.

Discretization

Lets consider a discretization of variational equation 1.2 using a finite element method. This process works for different elliptic partial differential equations and different finite element discretizations. So from now on we can consider this as a more general work, as long as we have this type of discretization that will lead to a system of the following form

Kx = f (1.16)

The global stiffness matrix of the discrete problem can be arranged to have the block structure showed in equation 1.17, where subscripts i denote the inner degrees of freedom of subdomains Ω(1)and Ω(2) and subscript b is used for the nodes on the interface Γ(3)= ∂Ω(1)∩ Ω(2)

            K_ii(1) 0 K_ib(1) 0 K_ii(2) K_ib(2) K_bi(1) K_bi(2) Kbb                         x_i(1) x_i(2) xb             =             f_i(1) f_i(2) fb             (1.17)

The formulation of each local discretization matrix is made considering that each subdomain has its own mesh and also that the nodes of the interface Γ(3) are shared by both meshes as in Figure 1.2. So there are two interface blocks,

(50)

one in each local matrix, noted with superscripts (1) and (2) . The local stiffness matrices of the two subdomains are

K(1)=        K_ii(1) K_ib(1) K_bi(1) K_bb(1)        K(2)=        K_ii(2) K_ib(2) K_bi(2) K_bb(2)        (1.18) where K_bb(1)+ K_bb(2)= Kbb.

The discretization of variational formulation of equation 1.4 in subdomain Ω(s) leads to the following systems of equations

       K_ii(s) K_ib(s) K_bi(s) K_bb(s)               x_i(s) x_b(s)        =        f_i(s) f_b(s)+ h(s)_b        (1.19) where f_b(1)+ f_b(2)= fb and h (s)

b is the vector representing the discretization of the

flux _∂n∂x(s)(s) on Γ (3)_.

From this we have and explicit relation between the inner an the interface nodes x_i(s)= K(s) −1 ii f (s) i −K (s)−1 ii K (s) ibx (s) b (1.20)

from 1.20 and 1.19 the relation linking the trace and the flux of a vector satisfying the inner subset is derived

h(s)_b = K_bi(s)x_i(s)+ K_bb(s)x(s)_b −_f(s) b = K_bi(s)(K(s) −1 ii f (s) i −K (s)−1 ii K (s) ibx (s) b ) + K (s) bbx (s) b −f (s) b = (K_bb(s)−_K(s) bi K (s)−1 ii Kib)x (s) b −(f (s) b −K (s) bi K (s)−1 ii f (s) i ) = S_bb(s)x_b(s)−_c(s) b (1.21) S_bb(s)= K_bb(s)−_K(s) bi K (s)−1 ii K (s)

ib is the Schur complement matrix. It is the discretization

of the Dirichlet to Neumann mapping that defines the bi-continuous one to one correspondence between the trace and the flux on the boundary (or interface in our case) of a field that satisfies the Laplace equation inside the subdomain. It is symmetric positive definite if the K(s)matrix is symmetric positive definite.

(51)

The discretization of continuity 1.6 and flux 1.7 are

x(1)_b = x_b(2) (1.22)

h(1)_b + h(2)_b = 0 (1.23)

The last condition, also called equilibrium, combined with 1.19, gives the fol-lowing interface equation

K_bi(1)x_i(1)+ K_bb(1)xb−f (1) b + K (2) bi x (2) i + K (2) bbxb−f (2) b = 0 ⇔ K_bi(1)x(1)_i + K_bi(2)x(2)_i + (K_bb(1)+ K_bb(2))xb= f (1) b + f (2) b (1.24)

Finally, for two vectors defined on subdomains Ω(1)and Ω(2) to be considered as the restrictions of the solution of the global discrete problem 1.17, they must meet

• _{the inner equations in each subdomain}        K_ii(1)x(1)_i + K_ib(1)x_b(1) = f_i(1) K_ii(2)x(2)_i + K_ib(2)x_b(2) = f_i(2) (1.25) • _{the interface equation}

K_bi(1)x(1)_i + K_bi(2)x(2)_i + K_bb(1)x(1)_b + K_bb(2)x(2)_b = f_b(1)+ f_b(2) (1.26)

• _{the continuity across the interface Γ}(3)

x(1)_b = x_b(2) (1.27)

If we have the continuity relation 1.27 and use the fact that x(1)_b and x_b(2) are both equal to the restriction of the global solution on Γ(3) then the inner equations 1.25 are the first two rows of 1.17 and the interface equations 1.26 are the third row. Meaning that the methodology derived only from linear algebra is valid for any finite element discretization of PDEs.

(52)

The inner equations 1.25 are common solution vectors of local problems for any kind of boundary conditions on Γ(3). Equations 1.27 and 1.26 are the actual condensed interface problem since the inner equations 1.25 establish that x(1)_i and x(2)_i can be derived from x(1)_b and x_b(2).

1.1.2 FETI one lagrange multiplier

The previous ideas are now formalized for the extended case where we have Ns> 2 subdomains and and interface Γ , defined as

Γ = [

1≤s,q≤Ns

(∂Ω(s)∩_∂Ω(q)₎ _(1.28)

In the FETI method the discrete flux, noted λ, is the unknown defined in the nodes along the interface and the jump of the solutions of the local Neumann problems are the gradient of the condensed interface problem. The discretization of the local Neumann problems in some subdomain s, can also be written as

K(s)x(s)= f(s)+ t(s)TB(s)Tλ (1.29)

where K(s) is the local stiffness matrix, f(s) the right-hand side vector, t(s) ∈ M

#(∂Ω(s)_)×#(Ω(s)₎are trace operators which extracts boundary degrees of freedom

from subdomain Ω(s) and B(s) ∈ M

#(Γ )×#(∂Ω(s)₎ are discrete assembling matrices

which connects pairwise degrees of freedom on the interface. In a general case, the stiffness matrix K(s) _{, the local solution vector x}(s) _{and the local right-hand}

side vector f(s) are defined as

K(s) =        K_ii(s) K_ib(s) K_bi(s) K_bb(s)        (1.30) x(s)=        x(s)_i x(s)_b        , f(s)=        f_i(s) f_b(s)        (1.31)

(53)

Figure 1.3 – Multiple interface node.

The i and b subscripts also denotes the interior and interface nodes of the subdomain respectively. So trace operator applied to the solution x(s) is such that

x(s)_b = t(s)x(s) (1.32)

The discrete operator B(s)is a defined as the mapping of a vector in local interface ∂Ω(s) on the complete interface, so applied to the solution on the local interface x(s)_b we can write the continuity condition across the total interface, degree of freedom per degree of freedom

X

s

B(s)t(s)x(s)= 0 (1.33)

the restriction of B(s) on interface Γij, noted B(ij), is defined as a signed boolean operator such that B(ij)and B(ji) have opposite signs, providing the continuity needed. Any node that it’s in more than two subdomains, Figure 1.3, will generate the same number of continuity conditions and flux as the number of interfaces who shares it.

With the definition of B(s) and t(s), the solutions x(s)of 1.29 and 1.33 are the searched restrictions in every subdomain of the global discrete solution of 1.17. The vectors x(s)are actually continuous and the definition of B(s)is that t(s)TB(s)Tλ is zero for the inner nodes of Ω(s)and in the interface we have B(ij)Tλ+B(ji)Tλ = 0 thanks to the opposite sign. So, again, the assembly of local discrete Neumann equations 1.29 gives exactly global discrete equation 1.16.

(54)

The gradient of the condensed interface problem is defined as

g =X

s

B(s)t(s)x(s) (1.34)

where x(s) is solution of the local discrete Neumann problem 1.29. Continuity relation 1.33 defines the condensed interface problem for FETI.

“Floating” subdomains

For most subdomains, we face the common case of finding that ∂Ω(s)∩_{∂Ω is} void, this means that the Dirichlet condition of the problem is not in Ω(s), so the local discrete Neumann equations 1.29 are ill posed. If K(s)comes from the Laplace equation, its kernel are the constant fields in the subdomain, if it comes from three-dimensional linear elasticity, then the kernel is the subspace of rigid body motions, of dimension 6 in the case of subdomains simply connected.

The pseudo-inverse K(s)+is now needed and the Cholesky factorization with partial pivoting is used in the matrix K(s) to achieve this, and also because it allows to compute a generator of the kernel R(s) and a factorization of a maximal full rank sub-block. Given the pseudo-inverse K(s)+ and the kernel generator R(s), the solution x(s)of the discrete system of equation 1.29 can be written as a particular solution plus an undefined element of the kernel of K(s)

x(s)= K(s)+(f(s)+ t(s)TB(s)Tλ) + R(s)α(s) (1.35) From equation 1.29 we see that the right-hand side belongs to the range space of matrix K(s) , and so is orthogonal to the kernel. This orthogonality constraint can be written

R(s)Tf(s)+ t(s)TB(s)Tλ= 0 (1.36)

This last equation is the admissibility condition for the forces of a floating subdomain. Its interpretation is that fields belonging to the kernel must have zero energy.

(55)

Condensed interface problem

Replacing x(s) from equation 1.35 in the continuity condition 1.33 we have X s B(s)t(s)K(s)+t(s)TB(s)Tλ +X s B(s)t(s)R(s)α(s)= −X s B(s)t(s)K(s)+f(s) (1.37)

To build the condensed interface problem this previous equation is used, and the equation 1.36, leading to the problem to be satisfied by λ and the vector of coefficients of the kernel components α

      F G GT 0             λ α      =       d c       (1.38) Where: • _{F =}P s B (s)_t(s)_K(s)+_t(s)T_B(s)T ₌P s B (s)_S(s)+ bb B(s) T

dual Schur complement matrix

• _{Gα =}P

s B

(s)_t(s)_R(s)_α(s)_{, jump of zero energy fields defined by α}(s)_{in Ω}(s)

• _GT_{λ =}_{. . . , B}(s)_t(s)_R(s)_{, . . .}T _λ • _{d = −}P s B (s)_t(s)_K(s)+ f(s) • _{c =}_{. . . , −b}(s)T_R(s)_{, . . .}T

The condensed interface system 1.38 is an hybrid system. It’s solution λ satisfies the following orthogonality condition

µTFλ = µTd, ∀µ/GTµ = 0 (1.39)

Now consider λ0, for example

λ0= AG(GTAG)

−1

c (1.40)

where A is a matrix symmetric positive definite that is usually taken as being the identity, but it can also be defined as the preconditioner, later to be defined, or some scaling matrix. For details, see [62].

(56)

Then this λ0 satisfies the admissibility constraint of equation 1.36 and

GT(λ − λ0) = 0. So, if P is any projector in the kernel of GT, then from equation

1.39 we have that λ is the solution of the following projected problem

PTFP (λ − λ0) = PT(d − Fλ0) (1.41)

The FETI method solves iteratively via a conjugate gradient algorithm the previous projected condensed interface problem 1.41, using the orthogonal projector in the kernel of GT.

Interpretation of projector P

The orthogonal projection in the kernel of GT can be written algebraically

P = I − AGGTAG−1GT (1.42)

To compute the projection of a given vector g we mainly solve the problem

GTAGα = −GTg (1.43)

which is a global coarse grid problem whose unknowns are the coefficients of zero energy components of the solutions in the floating subdomains.

Now for for a given approximation λpof the flux on the interface, the residual of the condensed interface problem is

gp=X s B(s)S_bb(s)+B(s)Tλp+X s B(s)t(s)K(s)+f(s)=X s B(s)t(s)x(s)p+ (1.44)

where u_ip+ is the solution of the local Neumann problems, computed using the pseudo-inverse matrices

x(s)p+= K(s)+f(s)+ t(s)TB(s)Tλp (1.45) so the gradient is equal to the jump of this particular solutions. From equations

(57)

1.42 and 1.43, the projected gradient P gpis P gp= gp+ AGαp=X s B(s)t(s)x(s)p++X s B(s)t(s)R(s)α(s)p (1.46)

So the projected gradient P gpis equal to the jump of the local particular solutions

of Neumann problems x(s)p+ plus the term of zero energy fields with coefficients α(s)p

x(s)p= x(s)p++ R(s)α(s)p (1.47)

The definition of the constraint 1.36 associated with the orthogonal projector 1.43 entails that the zero energy components of the jump of the local solution fields x(s)p are minimal in the sense that this jump is orthogonal to all the traces of zero energy fields

GTP gp= 0 ⇔B(s)t(s)R(s)TP gp= 0 ∀_s _(1.48) Computing the projected gradient P gp consists in fact in computing the coeffi-cients αpof optimal local zero energy fields. For the linear elasticity problem, the zero energy fields are the rigid body motions. The underlying process is a kind of coarse grid smoothing of approximate solution that ensures a convergence rate for the overall FETI process asymptotically independent upon the number of subdomains. Hence, the FETI method with floating subdomains is a kind of two-level solver that is numerically scalable.

1.1.3 Local preconditioner

The coarse grid smoothing performed by the zero energy fields projector P gives a convergence rate independent upon the number of subdomains, but this is not enough to have a convergence rate that is also independent upon the mesh size. It’s mandatory the use of a preconditioner, which for the FETI method is one of the “Dirichlet” type.

Consider t(s) the trace or restriction operator on the local interface of subdo-main Ω(s) . Then the contribution of subdomain s to the condensed interface

(58)

operator, defined in1.37, is

B(s)t(s)K(s)+t(s)TB(s)T (1.49)

and it just depends on the restriction in the interface Γ(s)= ∂Ω(s) of the pseudo-inverse of K(s), meaning t(s)K(s)+t(s)T = (K_bb(s)−_K(s) bi K (s)−1 ii K (s) ib) +_{=: S}(s) bb + (1.50) As this matrix is the pseudo-inverse of the Schur complement matrix on the interface Γ(s), a preconditioner based on local contributions for FETI is

D−1=X

s

B(s)S_bb(s)B(s)T (1.51)

where again S_bb(s) is the Schur complement. This preconditioner tries to approxi-mate the global inverse of local sums by the sum of the local inverse, meaning

      X s B(s)S_bb(s)+B(s)T       + 'X s B(s)S_bb(s)B(s)T (1.52)

The computation of this preconditioner applied to an interface vector w is done by solving the following local problem with Dirichlet boundary conditions on the interface Γ(s) defined by the assembled local vector t(s)TB(s)Tw

      K_ii(s) K_ib(s) 0 I              ˜ w(s)_i ˜ w(s)_b        = t(s)TB(s)Tw =       0 w(s)_b       (1.53) whose solution is ˜ w_i(s)= −K(s) −1 ii K (s) ibw (s) b (1.54)

and multiplying by the stiffness matrix to obtain        K_ii(s) K_ib(s) K_bi(s) K_bb(s)               ˜ w_i(s) ˜ w_b(s)        =        0 (K_bb(s)−_K(s) bi K (s)−1 ii K (s) ib) ˜w (s) b        =       0 S_bb(s)w(s)_b       (1.55)

(59)

With this preconditioner it has been proved [53],[29], that the FETI method is asymptotically independent upon the mesh size. With a condition number for the projected condensed operator bounded by

C(1 + log(H h))

2 _(1.56)

where h is the mesh size and H is the characteristic subdomain size. Mean-ing, that when decreasing H the number of subdomains increases and when decreasing h the finite element mesh is refined (more elements).

So, with both local and global preconditioners, the number of iterations does not depend anymore on both the number of subdomain nor the mesh size. A second preconditioner, not mathematically optimal can be introduced, as an alternative to the cost of implementation of the Dirichlet one, the so called “Lumped” preconditioner that can de defined as

L−1=X

s

B(s)K_bb(s)B(s)T (1.57)

where K_bb(s) is the finite element discretization matrix on the interface nodes. This preconditioner works as an approximation of the local Schur complements, with a much more economical implementation because it does not require any additional storage and involves only matrix-vector products of sizes equal to the subdomain interfaces.

Both preconditioner were generalized to treat heterogeneous problems [61] by just redefining the Boolean operator B(s)to a more general one

˜

B(s):= β(s)B(s) (1.58)

such thatP

s

˜

B(s)B˜(s)T = I, with β(s)a diagonal scaling matrix, usually based on the diagonal coefficients of the local stiffness matrix on the interface, the so called super-lumped scaling. This scaling is a mechanical consistent combination of the interface reaction forces from the Dirichlet problem in each subdomain. With

(60)

this new scaled assembling operators, the preconditioners are written as D−1=X s ˜ B(s)S_bb(s)B˜(s)T (1.59) L−1=X s ˜ B(s)K_bb(s)B˜(s)T (1.60)

The behaviour of both preconditioner can be seen [29], [61] and will be evocated in later sections.

1.1.4 FETI Algorithms

Using the definitions in 1.38 and the projector in 1.42, the CG algorithm to solve the FETI problem 1.41 can be summarized in Algorithm 5.

Algorithm 5FETI Preconditioned conjugate projected gradient with full recon-jugation

1: Initialization

2: λ0= AG[GTAG]−1c 3: g0= (Fλ0−d) 4: w0= P D−1PTg0

5: loopp = 0, 1, 2, ... until convergence

6: ρp = − (wp, gp) (wp, Fwp) 7: λp+1= λp+ ρpwp 8: gp+1= gp+ ρpFwp 9: wp+1= P D−1PTgp+1 10: fori = 0 to p do 11: γi = − (wi, Fwp+1) (wi, Fwi) 12: wp+1= wp+1+ γiwi 13: Fwp+1= Fwp+1+ γiFwi 14: end for 15: end loop

As we can see, a full reconjugation, instead of the classical CG update without any, is also used (see the line 10 of the algorithm), because in limited arithmetic the orthogonal properties of the CG method are lost, specially in the context

(61)

of the FETI methods, where the multiplication by the operator is not totally accurate, making this part also crucial for good convergence rate [66].

Algorithm 6FETI-1LM unsymmetric

1: Initialization

2: λ0 = AG[GTAG]−1(−RTc) 3: g0= P D−1P (Fλ0−d)

4: w0= g0

5: Fw0 = P D−1P Fw0

6: loopORTHODIR Iteration from p = 0, 1, . . . until convergence

7: ρp= −

(Fwp)Tgp

(Fwp)T(Fwp)

8: λp+1= λp+ ρpwp

9: gp+1= gp+ ρpFwp

10: loopConstruction of the p + 1 vector of the base FTF-orthonormal

11: wp+1= gp+1 12: Fwp+1= P D−1P Fwp+1 13: fori = 0 to p do 14: γi = − (Fwi)T(Fwp+1) (Fwi)T(Fwi) 15: wp+1= wp+1+ γiwi 16: Fwp+1= Fwp+1+ γiFwi 17: end for 18: end loop 19: end loop

Unsymmetric FETI algorithm

The previous method and algorithm only works when the matrix F is symmetric positive definite, for a more general case when our FETI operator F is no longer symmetric or non positive definite, as we will see in later sections, the CG algorithm is no longer appropriate, we use instead the ORTHODIR method with left preconditioner for unsymmetric matrices, mainly for its implementation, simplicity and good properties, as is equivalent to the GMRES algorithm [50, Chapter 12]. One of the main theoretical differences between them two is the storage of previous computed directions which, in any case, it is done to perform