MUTIPLE-GRADIENT DESCENT ALGORITHM FOR MULTIOBJECTIVE OPTIMIZATION

(1)

HAL Id: hal-00764318

https://hal.inria.fr/hal-00764318

Submitted on 12 Dec 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

FOR MULTIOBJECTIVE OPTIMIZATION

Jean-Antoine Désidéri

To cite this version:

Jean-Antoine Désidéri. MUTIPLE-GRADIENT DESCENT ALGORITHM FOR MULTIOBJEC-TIVE OPTIMIZATION. European Congress on Computational Methods in Applied Sciences and Engineering (ECCOMAS 2012), Institute for Mechanics of Materials and Structures, Sep 2012, Vi-enne, Austria. �hal-00764318�

(2)

Vienna, Austria, September 10-14, 2012

MUTIPLE-GRADIENT DESCENT ALGORITHM FOR

MULTIOBJECTIVE OPTIMIZATION

Jean-Antoine D´esid´eri1

1_{INRIA Centre de Sophia Antipolis Mediterranee}

2004 Route des Lucioles, BP 93, F-06902 Sophia Antipolis Cedex (France) e-mail: Jean-Antoine.Desideri@inria.fr

Keywords: multiobjective optimization, multidisciplinary, cooperative algorithms, Pareto op-timality

Abstract. The steepest-descent method is a well-known and effective single-objective descent algorithm when the gradient of the objective function is known. Here, we propose a particular

generalization of this method to multi-objective optimization by considering the concurrent minimization of n smooth criteria {Ji} (i = 1, . . . , n). The novel algorithm is based on the

following observation: consider a finite set of vectors{ui} (ui ∈ RN, n ≤ N); in the convex

hull of this family, there exists a unique element of minimal norm, say _{ω ∈ R}N_{; then, the}

scalar product of ω with any vector in the convex hull, and in particular, with any ui, is at

least equal to kωk2 ≥ 0. Applying this to the objective-function gradients (ui = ∇Ji), we

conclude that either ω = 0, and the current design point belongs to the Pareto set, or −ω is

a descent direction common to all objective functions. We propose to construct a fixed-point iteration in which updates of the elementω are used as successive directions of search. This

method converges to a point on the Pareto set. This result applies to both finite-dimensional and functional design spaces. Numerical illustrations have been provided in both cases using either analytical objective functions, or (discretized) functionals in [9] [5]. Here, following [6], a domain-decomposition framework is used to illustrate the necessity, in a (discretized) functional setting, to scale the gradients appropriately.

(3)

1 INTRODUCTION

Classically, in multi-objective optimization, several fundamental concepts are introduced:

dominance in efficiency between design points, Pareto set, made of non-dominated solutions in

design space, and Pareto front, its image in function space [8]. The Pareto front provides the designer with the system maximum attainable performance. For complex systems, in particular those governed by partial-differential equations, a computational challenge is to devise algo-rithms permitting to identify numerically the Pareto set, or the most useful portions of it. In this respect, certain evolutionary strategies have been adapted to achieve this goal, and appear to provide the most robust algorithms. NSGA-II [2] is certainly one of the most widely-used methods for this purpose.

In the context of differentiable optimization, one would expect adequate strategies based on gradient evaluations to also be capable of capturing Pareto fronts, with less generality or robust-ness, but often far greater cost efficiency. However classical techniques, such as minimizing an agglomerated criterion, or one criterion at a time under the constraints of the others, are lim-ited by hypotheses on the pattern of the Pareto front w.r.t. convexity and, or continuity. The

Multiple-Gradient Descent Algorithm (MGDA ), originally introduced in [3], and again

formal-ized in [5], is based on a very general principle permitting to define at each iteration, a descent

direction common to all criteria. This direction is the support of the minimum-norm element in

the convex hull of the local gradients. The efficacy of the algorithm to identify the Pareto front has been demonstrated in [9] [5] in a test-case in which the Pareto front was non-convex. The method was compared in cost efficiency with an evolutionary strategy, and was found to offer very promising performance.

More recently, a variant, MGDA-II , has been proposed in which the descent direction is cal-culated by a direct procedure [4], which provides a valuable simplification of implementation.

Here, MGDA is tested in the fairly different context of a simulation by domain partitioning, as a technique to match the different interface components concurrently. For this, the very simple test-case of the finite-difference discretization of the Dirichlet problem over a square is considered. Full details have been provided in [6]. The study aims at assessing the performance of MGDA in a discretized functional setting. One of the main teachings is the necessity, here found imperative, to scale the gradients appropriately.

2 DIRICHLET PROBLEM, DOMAIN PARTITIONING AND MATCHING DEFECTS We consider the model problem consisting in solving Laplace’s equation,

−∆u = f (Ω) (1)

over the square

Ω = [−1, 1] × [−1, 1] (2)

subject to homogeneous boundary conditions:

u= 0 (Γ = ∂Ω) (3)

For this, the domainΩ is partitioned in four disjoint sub-domains:          Ω1 = [0, 1] × [0, 1] Ω2 = [−1, 0] × [0, 1] Ω3 = [−1, 0] × [−1, 0] Ω4 = [0, 1] × [−1, 0] (4)

(4)

with the following interfaces supported by the coordinate axes:          γ1 = { 0 ≤ x ≤ 1 ; y = 0 } γ2 = { x = 0 ; 0 ≤ y ≤ 1} γ3 = { −1 ≤ x ≤ 0 ; y = 0 } γ4 = { x = 0 ; − 1 ≤ y ≤ 0 } (5)

along which the following Dirichlet controls are applied:          γ1 : u = v1(x) γ2 : u = v2(y) γ3 : u = v3(x) γ4 : u = v4(y) (6) (see Fig. 1). -1 0 1 1 -1 Γ = ∂Ω : u= 0 γ1 : u = v1(x) γ3 : u = v3(x) γ2 : u = v2(y) γ4 : u = v4(y)

Ω

1

Ω

2

Ω

4

Ω

3

x

y

Figure 1: Partition of a square in sub-domains{Ωi}(i=1,...,4)to solve the Dirichlet problem

A first calculation of the compound solution

(5)

(where the superscripttstands for transposition) is made based on a certain setting of the inter-face controls:

v = (v1, v2, v3, v4)t (8)

by solving, possibly in parallel, the partial problems fori = 1, ..., 4:          − ∆ui = f ui = 0 ui = vi ui = vi+1 (Ωi) (Γ ∩ ∂Ωi) (γi) (γi+1) (9)

In the above, and all throughout thereafter, by periodicity, the indexi is understood modulo 4, i.e. u5 = u1,γ5= γ1, etc.

Since the interface controls are of Dirichlet type, the resulting compound solution u is contin-uous, and its derivative along each interface is also continuous. However, in general, unless the specified controlsvi’s are equal to the restrictions of the global solution, the normal derivatives exhibit jump discontinuities, si’s. Here, each interface is supported by a coordinate axis, and we adopt the following sign convention: on the interfaceγi which is supported by thex (resp.

y) axis for i = 1 and 3 (resp. 2 and 4), the jump, si(x) (resp. si(y)), is defined as the increment

of the partial derivative∂u/∂y (resp. ∂u/∂x) as y (resp. x) goes from 0−_to₀+_{. Thus:}

• over γ1(0 ≤ x ≤ 1 ; y = 0): s1(x) = ∂u ∂y(x, 0 +_{) −} ∂u ∂y(x, 0 −_{) =} ∂u1 ∂y − ∂u4 ∂y (x, 0); • over γ2(x = 0 ; 0 ≤ y ≤ 1): s2(y) = ∂u ∂x(0 +_{, y) −} ∂u ∂x(0 −_{, y) =} ∂u1 ∂x − ∂u2 ∂x (0, y); • over γ3(−1 ≤ x ≤ 0 ; y = 0): s3(x) = ∂u ∂y(x, 0 +_{) −}∂u ∂y(x, 0 −_{) =} ∂u2 ∂y − ∂u3 ∂y (x, 0); • over γ4(x = 0 ; − 1 ≤ y ≤ 0): s4(y) = ∂u ∂x(0 +_{, y) −} ∂u ∂x(0 −_{, y) =} ∂u4 ∂x − ∂u3 ∂x (0, y). The above local measures of the defect in matching conditions can be associated with global measures defined as separate functionals defined over the interfaces:

Ji = Z γi 1 2s 2 i w dγi (10)

that is, explicitly: J1 = Z 1 0 1 2s1(x) 2_{w(x) dx} _J 2 = Z 1 0 1 2s2(y) 2_{w(y) dy} ₍₁₁₎ J3 = Z 0 −1 1 2s3(x) 2_{w(x) dx} _J 4 = Z 0 −1 1 2s4(y) 2_{w(y) dy} ₍₁₂₎

Here,w(t) (t ∈ [0, 1]) is an optional weighting function, and w(−t) = w(t).

The jump si depends on the partial solutions ui−1 and ui, which themselves, depend on

(vi−1, vi) and (vi, vi+1) respectively. Hence, the integral Ji depends on all four sub-controls exceptvi+2. Nevertheless, these four integrals are thereafter considered as functionals ofv.

(6)

The coordination problem is to devise a convergent iteration on the controlv to satisfy in the limit the matching conditions

J1 = J2 = J3 = J4 = 0 (13)

To achieve this, the functional gradients are firstly established using the classical adjoint-equation approach, and several strategies are proposed and tested numerically.

3 ADJOINT PROBLEMS AND FUNCTIONAL GRADIENTS

A first calculation is made based on the four-component control v = (v1, v2, v3, v4)t, re-sulting in the compound solution u = (u1, u2, u3, u4)t, and the multi-component criterionJ =

(J1, J2, J3, J4)t.

Then, one perturbs the controlv of

v′ = (v₁′, v₂′, v₃′, v₄′)t =δv1(x), δv2(y), δv3(x), δv4(y)

t

(14) Consequently, the compound solutionu is perturbed of

u′ = (u′

1, u′2, u′3, u′4) t

=δu1(x, y), δu2(x, y), δu3(x, y), δu4(x, y)

t

(15) in which u′_iis the solution of the following linearized system posed on sub-domainΩi:

         ∆u′i = 0 u′_i = 0 u′_i = vi′ u′_i = v_i+1′ (Ωi) (Γ ∩ ∂Ωi) (γi) (γi+1) (16)

These state perturbations induce the following functional perturbations: Ji′ = δJi = Z γi sis′iw dγi (17) in whichs′ i = δsi. But:                          s1s′1 = ∂u1 ∂y − ∂u4 ∂y ∂u′ 1 ∂y − ∂u′ 4 ∂y (x, 0) s2s′2 = ∂u1 ∂x − ∂u2 ∂x ∂u′ 1 ∂x − ∂u′ 2 ∂x (0, y) s3s′3 = ∂u2 ∂y − ∂u3 ∂y ∂u′ 2 ∂y − ∂u′ 3 ∂y (x, 0) s4s′4 = ∂u4 ∂x − ∂u3 ∂x ∂u′ 4 ∂x − ∂u′ 3 ∂x (0, y) (18)

We now recall Green’s formula for two functionsφ and ψ ∈ H2_{(̟), for a simply-connected} planar domain̟ with smooth enough boundary. Since

Z Z ̟ φ ∆ψ = Z Z ̟ φ ∇.(∇ψ) = Z ∂̟ φ ψn− Z Z ̟ ∇φ.∇ψ (19)

(7)

where ~n is outward unit vector normal to the boundary ∂̟, and ψn = ∂ψ/∂n the normal derivative, and symmetrically,

Z Z ̟ ψ ∆φ = Z Z ̟ ψ ∇.(∇φ) = Z ∂̟ ψ φn− Z Z ̟ ∇ψ.∇φ (20)

whereφn= ∂φ/∂n is the normal derivative of φ, one has:

Z Z ̟ (φ ∆ψ − ψ ∆φ) = Z ∂̟ (φ ψn− ψ φn) (21)

Consider the following eight adjoint systems (two per sub-domain):      ∆pi = 0 pi = 0 pi = siw (Ωi) (∂Ωi\γi) (γi)      ∆qi = 0 qi = 0 qi = si+1w (Ωi) (∂Ωi\γi+1) (γi+1) (22)

Then apply Green’s formula, (21), to the eight cases corresponding to

̟ = Ωi (i = 1, 2, 3, 4), φ = pi orqi, ψ = u′i (23) so that∆φ = ∆ψ = 0 in ̟ giving: Z ∂Ωi φ ψn= Z ∂Ωi ψ φn (24) On the boundary∂̟ = ∂Ωi:

• φ = 0 except for φ = pi = siw along γi, andφ = qi = si+1w along γi+1;

• ψ = u′ i =      v′i alongγi

v′_i+1 alongγi+1

0 along Γ ∩ Ωi Hence (24) reduces to:

Z γi siu′_inw = Z γi p_inv_i′+ Z γi+1 p_inv_i+1′ (25)

forφ = pi, and to:

Z γi+1 si+1u′inw = Z γi q_inv′ i+ Z γi+1 q_inv′ i+1 (26) forφ = qi.

These two equations are particularized to sub-domainsΩias follows.

Sub-domainΩ1: onγ1:~n = ~n14= −~ ; on γ2,~n = ~n12 = −~ı. Thus (25)-(26) write:

         Z 1 0 s1(x) −∂u ′ 1 ∂y (x, 0) w(x) dx = Z 1 0 −∂p1 ∂y (x, 0) v′₁(x) dx + Z 1 0 −∂p1 ∂x(0, y) v₂′(y) dy Z 1 0 s2(y) −∂u ′ 1 ∂x(0, y) w(y) dy = Z 1 0 −∂q1 ∂y(x, 0) v′₁(x) dx + Z 1 0 −∂q1 ∂x(0, y) v₂′(y) dy (27)

(8)

Sub-domainΩ2: onγ2:~n = ~n21= +~ı; on γ3,~n = ~n23 = −~ . Thus (25)-(26) write:          Z 1 0 s2(y) ∂u′ 2 ∂x(0, y) w(y) dy = Z 1 0 ∂p2 ∂x(0, y) v₂′(y) dy + Z 0 −1 −∂p2 ∂y (x, 0) v′₃(x) dx Z 0 −1 s3(x) −∂u ′ 2 ∂y (x, 0) w(x) dx = Z 1 0 ∂q2 ∂x(0, y) v₂′(y) dy + Z 0 −1 −∂q2 ∂y(x, 0) v′₃(x) dx (28) Sub-domainΩ3: onγ3:~n = ~n32= +~ ; on γ4,~n = ~n34= +~ı. Thus (25)-(26) write:

         Z 0 −1 s3(x) ∂u′ 3 ∂y (x, 0) w(x) dx = Z 0 −1 ∂p3 ∂y (x, 0) v′₃(x) dx + Z 0 −1 ∂p3 ∂x(0, y) v₄′(y) dy Z 0 −1 s4(y) ∂u′ 3 ∂x(0, y) w(y) dy = Z 0 −1 ∂q3 ∂y (x, 0) v′ 3(x) dx + Z 0 −1 ∂q3 ∂x(0, y) v′ 4(y) dy (29) Sub-domainΩ4: onγ4:~n = ~n43= −~ı; on γ1,~n = ~n41= +~ . Thus (25)-(26) write:

         Z 0 −1 s4(y) −∂u ′ 4 ∂x(0, y) w(y) dy = Z 0 −1 −∂p4 ∂x(0, y) v′₄(y) dy + Z 1 0 ∂p4 ∂y (x, 0) v₁′(x) dx Z 1 0 s1(x) ∂u′ 4 ∂y (x, 0) w(x) dx = Z 0 −1 −∂q4 ∂x(0, y) v₄′(y) dy + Z 1 0 ∂q4 ∂y(x, 0) v′₁(x) dx (30) Then, (27)-(30) are injected in (18) and (17) to get:

                                                                 J₁′ = Z 1 0 s1(x) s′1(x) w(x) dx = Z 1 0 s1(x) ∂u′ 1 ∂y − ∂u′ 4 ∂y (x, 0) w(x) dx = Z 1 0 ∂(p1− q4) ∂y (x, 0) v ′ 1(x) dx + Z 1 0 ∂p1 ∂x(0, y) v ′ 2(y) dy + Z 0 −1 ∂q4 ∂x(0, y) v ′ 4(y) dy J₂′ = Z 1 0

s2(y) s′2(y) w(y) dy =

Z 1 0 s2(y) ∂u′ 1 ∂x − ∂u′ 2 ∂x (0, y) w(y) dy = Z 1 0 ∂q1 ∂y(x, 0) v ′ 1(x) dx + Z 1 0 ∂(q1− p2) ∂x (0, y) v ′ 2(y) dy + Z 0 −1 ∂p2 ∂y (x, 0) v ′ 3(x) dx J₃′ = Z 0 −1 s3(x) s′3(x) w(x) dx = Z 0 −1 s3(x) ∂u′ 2 ∂y − ∂u′ 3 ∂y (x, 0) w(x) dx = − Z 1 0 ∂q2 ∂x(0, y) v ′ 2(y) dy + Z 0 −1 ∂(q2− p3) ∂y (x, 0) v ′ 3(x) dx − Z 0 −1 ∂p3 ∂x(0, y) v ′ 4(y) dy J₄′ = Z 0 −1

s4(y) s′4(y) w(y) dy =

Z 0 −1 s4(y) ∂u′ 4 ∂x − ∂u′ 3 ∂x (0, y) w(y) dy = − Z 1 0 ∂p4 ∂y (x, 0) v ′ 1(x) dx − Z 0 −1 ∂q3 ∂y(x, 0) v ′ 3(x) dx + Z 0 −1 ∂(p4− q3) ∂x (0, y) v ′ 4(y) dy (31)

(9)

These formulas are of the form: J_i′ = 4 X j=1 Z γj Gi,jvj′ dγj (i = 1, ..., 4) (32)

in which the kernels,{Gi,j}, are partial gradients given in terms of the partial derivatives of the eight adjoint states{pi, qi}(i=1,...,4).

4 DISCRETIZATION

For purpose of numerical treatment, we assume that each sub-problem is discretized by stan-dard centered finite-differences over a uniform (sub-)mesh of dimensionNX × NY rectangular cells. This permits a fast direct inversion by discrete separation of variables:

uh = (ΩX ⊗ ΩY) (ΛX ⊕ ΛY)−1 (ΩX ⊗ ΩY) fh (33)

ΩX andΩY are respectively theNX×NX andNY×NY orthogonal matrices associated with the discrete sine transform. For Dirichlet boundary conditions these matrices are also symmetric. The matricesΛX andΛY are diagonal matrices of the known eigenvalues of the second-order difference operators inx and y respectively (see [6] for details).

For each sub-domainΩi, derivatives normal to a given interface are calculated by one-sided second-order finite differences, and tangential derivatives, by central differencing.

The matching-defect integrals are then approximated by the trapezoidal rule: J1 =. hZ 2 NX−1 X j=1 s12,jwj, J2 =. hY 2 NY−1 X k=1 s22,kwk, J3 =. hX 2 NX−1 X j=1 s32,jwj, J4 =. hY 2 NY−1 X k=1 s42,kwk. (34) Two adjoint problems are solved on each sub-domainΩi, again by direct inversions, to get the functionspiandqiwhose derivatives are then approximated along interfaces and injected in (31). Then [6]: Ji′ . = 4 X j=1 NZ X k=1 G_{i,j .k}vj′_,khZ (i = 1, ..., 4) (35) The discrete gradient of the criterionJiw.r.t. the nodal values of the controlvj is given by:

∂Ji

∂v_{j ,k} .

= G_{i,j .k}hZ (36)

For each criterionJi, four such discrete gradients are calculated (one per controlvj), except that one of them is equal to 0. These four vectors are assembled in one, thereafter denoted∇Ji, of dimension2(NX + NY − 2).

Now, knowing (second-order approximations of) the criteria{Ji}(i=1,...4) and their gradients

{∇Ji}(i=1,...,4)w.r.t. the2(NX + NY − 2) nodal controls, we need to set up a strategy to iterate on these controls to satisfy the matching conditions at convergence.

5 GRADIENT-BASED COORDINATION ITERATIONS

Our main objective is to compare the standard steepest-descent method with the

Multiple-Gradient Descent Algorithm (MGDA ) as potential iterative methods to satisfy the matching

(10)

5.1 Conventional steepest-descent method

In the conventional approach, one considers a single matching defect measure, treating all interfaces as one: J= 4 X i=1 Ji (37)

The discrete gradient is then simply equal to the sum of the individual contributions of the interfaces: ∇J = 4 X i=1 ∇Ji (38)

The above global criterion can then be driven to 0 by the classical steepest-descent method [1] [7]: at iteration ℓ, the control v is updated proportionally to (the opposite of) the discrete gradient:

v(ℓ+1)= v(ℓ)− ρℓ∇J(ℓ) (39) for some appropriate positive step-size ρℓ (see below), and a new compound solution u(ℓ+1) is calculated, the defect-functional and its gradient reevaluated, and so on until a satisfactory convergence is achieved.

Strictly speaking, in the standard steepest-descent method, once the direction of search is identified, by the calculation of the gradient ∇J(ℓ)_{, the step-size}_ρ

ℓ is often defined via a one-dimensional minimization:

ρℓ = Argminρj(ρ) ; j(ρ) := J v(ℓ)− ρ∇J(ℓ)

(40) This minimization is usually carried out by a numerical procedure. However here, we know of an additional information: the targeted value of J is known: J = 0. An estimation of the variation of J is given by the differential:

δJ = ∇J(ℓ)_.δv(ℓ) _{= −ρ} ℓ ∇J(ℓ) 2 (41) Hence, the step-size expected to diminish J(ℓ) of the amountδJ = −εJ(ℓ) _{is estimated to be:}

ρℓ =

εJ(ℓ)

k∇J(ℓ)_k2 (42)

In particular forε = 1, we get the quasi-Newton method since the employed discrete gradient is only approximately equal to the gradient of the discrete J.

5.2 Multiple-gradient descent algorithm (MGDA )

In this subsection, we propose an alternative coordination algorithm in which the matching of the sub-solutions is treated as a multi-objective optimization problem, considering that all defect-functionalsJi’s should be driven to 0 concurrently.

In the Multiple Gradient Descent Algorithm MGDA (see [3] for a detailed definition and convergence proof), once the individual discrete gradients,

(11)

are known, one considers the convex hull U of these vectors, and identifies its minimum-norm elementω:       

ω = Argmin_u∈U kuk2 U= ( u ∈ RN / u = 4 X i=1 αiui; αi ≥ 0 (∀i) ; 4 X i=1 αi = 1 ) (44)

In our problem, the dimensionN is the number of nodal controls:

N = 2(NX + NY − 2) (45)

A special parameterization of the convex hull is proposed to facilitate the determination of the elementω by a numerical optimization procedure (see Appendix A).

Then, once the elementω is determined, if ω = 01_{, the current iterate is, or is treated as Pareto}

stationary. But here, the Pareto front is made of only one point corresponding toJi = 0 for all

i. This situation corresponds to full convergence of the coordination algorithm. Otherwise (ω 6= 0), −ω is a descent direction for all criteria simultaneously. Thus, (39) is replaced by:

v(ℓ+1) = v(ℓ)− ρℓω(ℓ) (46) Here again, we propose to adjust the step-sizeρℓ according to (42). However here, it is not clear a priori that the proper scaling corresponds toε ∼ 1.

6 NUMERICAL EXPERIMENTATION 6.1 Test-case

Leta and b be two adjustable constants, and:                    ψ = (x + a)2+ (y + b)2 (−1 ≤ x ≤ 1) (−1 ≤ y ≤ 1) τ = 1 ln(a2_{+ b}2₎ φ = τ ln ψ f(x) = 1 − x2 f(y) = 1 − y2 ue = f(x)f(y)φ (47)

As a result,φ is a harmonic function:

∆φ = 0 (48)

and this permits us to simplify somewhat the expression of the Laplacian of ue:

∆ue= ∆ f(x)f(y) φ + 2∇ f(x)f(y) .∇φ + f(x)f(y) ∆φ

= −2 f(x)f(y) φ + 2∇ f(x)_f(y)_.∇φ ₍₄₉₎ = −fe (50) where fe= 2 f(x)f(y) φ + 8τ ψ x(x + a)f (y)_{+ y(y + b)f}(x) (51) 1_{In the numerical implementation, this condition is relaxed to be:}_{||ω|| < T OL, for a given tolerance T OL.}

(12)

Hence, forf = fe, the exact solution of the continuous problem is u= ue.

The constantsa and b have been introduced to destroy the symmetry in the solution. More specifically, the following settings were made: a = 5₄ andb = 3₄. The corresponding problem has been discretized and solved using either one domain to establish a reference, or four to experiment multi-criterion optimization algorithms.

The single-domain discrete solution uh is depicted in Fig. 2 as a surface in 3D, and the corresponding contour map is given more precisely in Fig. 3.

-1 -0.8 -0.6 -0.4 -0.2 ₀ 0.2 0.4 0.6 0.8 1 -1 -0.8-0.6 -0.4-0.2 0 0.2 0.4 0.6 0.8 1 -0.5 0 0.5 1 1.5 u_h u_h ’fort.20’ 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 x y u_h

Figure 2: Single-domain discrete solution uh

u_h contour lines 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -1 -0.5 0 0.5 1 x -1 -0.5 0 0.5 1 y

Figure 3: Single-domain discrete solution uh; contour

map

6.2 Quasi-Newton steepest descent

In a first series of numerical experiments, the steepest-descent method was applied to drive to 0 the global criterion J. After a few trials, it appeared that best convergence was achieved by settingε to 1 in (42), which corresponds to the quasi-Newton method.

Two experiments are reported presently. They differ in the setting of the initial interface con-ditions. In the first, thevi’s are initially assigned the restriction to the corresponding interface of the exact solution ue, which differs from the discrete solution by truncation errors. In this case, iterative errors are initially very small, which permits the asymptotic convergence to be assessed. In the second experiment, the controls are initially set to 0 in order to assess the global convergence.

Asymptotic convergence. The convergence history of the global criterion J as well as its individual parts, {Ji}(i=1,...,4) is represented in Fig. 4. The criterion J, in 20 iterations, goes from 2.9 × 10−2 _{to a level below} ₁₀−4_{. Note that} _J

4 is somewhat smaller in magnitude and more subject to oscillations.

Global convergence. The convergence history of the global criterion J as well as its individual parts, {Ji}(i=1,...,4)is represented in Fig. 5. The criterion J, in 200 iterations, is reduced by 8 orders of magnitude. The different criteria, apart from small oscillations, converge at essentially the same rate. In a linear convergence process, this rate is imposed by the most persistent mode, present in all criteria when the initial condition is arbitrary.

Convergence of the gradients. The evolution of the four gradients{∂J/∂vi}(i=1,...,4)over 200 iterations is given on Fig. 6-Fig. 9. They appear as high-frequency modes. Each one vanishes asymptotically over the corresponding interface.

(13)

1e-06 1e-05 0.0001 0.001 0.01 0.1 0 2 4 6 8 10 12 14 16 18 20 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 4: Quasi-Newton steepest descent - convergence history of criteria (discretized continuous solution im-posed initially at interfaces)

1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 10 100 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 5: Quasi-Newton steepest descent - convergence history of criteria (zero initial interface conditions)

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 0 2 4 6 8 10 12 14 16 18 20 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V1

Figure 6: Quasi-Newton steepest descent - 200 itera-tions of∂J/∂v1 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 0 2 4 6 8 10 12 14 16 18 20 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V2

Figure 7: Quasi-Newton steepest descent - 200 itera-tions of∂J/∂v2 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0 2 4 6 8 10 12 14 16 18 20 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V3

Figure 8: Quasi-Newton steepest descent - 200 itera-tions of∂J/∂v3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0 2 4 6 8 10 12 14 16 18 20 DISCRETIZED FUNCTIONAL GRADIENT OF J = SUM_i J_i W.R.T. V4

Figure 9: Quasi-Newton steepest descent - 200 itera-tions of∂J/∂v4

Discrete solution. The four-domain discrete solution is found perfectly smooth, in fact even smoother than the above single-domain discrete solution (Figs. 10 and 11). This is due to a higher degree of iterative convergence.

6.3 Basic MGDA

Practical determination of the minimum-norm elementω. In the experiments of this sec-tion, at each iterasec-tion, the 3 parametersc1, c2 andc3 of (76) have been discretized uniformly by step of 0.01, andω was set equal to the vector of minimum norm among the essentially 106 corresponding candidates. The resulting vector was therefore a very coarse approximation of the actual vectorω.

(14)

-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -0.5 0 0.5 1 1.5 u_h u_h u_h(x,y) 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 x y u_h

Figure 10: Four-domain discrete solution uh

u_h contour lines 1.3 1.1 0.9 0.7 0.5 0.3 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 -1 -0.5 0 0.5 1 x -1 -0.5 0 0.5 1 y

Figure 11: Four-domain discrete solution uh; contour

map

Asymptotic convergence For this experiment, the discretized continuous solution is again imposed initially at the interfaces. The convergence history of the above basic algorithm is in-dicated in Fig. 12 for 20 iterations. After some initial adjustment, the trend is towards decaying, but at a deceiving slow rate.

0.0001 0.01 1 100 10000 1e+06 0 2 4 6 8 10 12 14 16 18 20 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 12: Basic MGDA - asymptotic convergence his-tory of criteria 1e-08 1e-06 0.0001 0.01 1 100 0 2 4 6 8 10 12 14 16 18 20 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 13: MGDA based on logarithmic gradients -asymptotic convergence history of criteria

In an attempt to explain this poor convergence, the following observation was made: sup-pose the gradients of the individual criteria,{∂Ji/∂v}(i=1,...,4), are very different in magnitude. Remember that in a linear iterative process, unless initial conditions are very special, all quan-tities converge at the same rate, say Cρ(iter)_{, where}_{ρ is the spectral radius, and C a constant} which depends on the quantity considered. For example, in the previous experiment,J4 itself was observed to be somewhat smaller than the other criteria, and so was its gradient. Then, the convex-hull minimum-norm elementω is paradoxically dominated by the gradient of small-est magnitude, since in the convex combination of the gradients, putting the largsmall-est weight on the smallest has the effect of reducing the norm of the combination. But this is not efficient, since this gradient corresponds to the already small criterion for which minimization is the least necessary. This observation has led us to calculate the directionω as the minimum-norm ele-ment in the convex hull of normalized gradients. Here, the normalization was made by scaling each gradient to the corresponding value of the individual criterion. In other words, logarithmic

gradients were considered. In fact, that is exactly what Newton’s method does with the global

criterion J.

The above experiment was then repeated, using logarithmic gradients to determine the vector ω. The corresponding convergence history is indicated in Fig. 13. This new result is now found

(15)

very similar to the analogous result previously achieved by the quasi-Newton method (Fig. 4). The importance of scaling is therefore confirmed.

Global convergence. All interface controls vi’s are initially set to 0. The resulting conver-gence history over 200 iterations is indicated in Fig. 14 for the basic algorithm, and in Fig. 15 for the scaled algorithm based on logarithmic gradients. The first algorithm seems to conver-gence, but at a very slow rate. The second seems to subject to accidents and to experience a difficulty to enter the asymptotic convergence phase. The criteria stagnate. This may be caused by many factors on which current investigation is focused:

• the insufficiently accurate determination of ω; • the non-optimality of the scaling of gradients;

• the non-optimality of the step-size, the parameter ε in (42) being maintained equal to 1 throughout;

• the large dimension of the design space, here 76 (4 interfaces associated with 19 d.o.f.’s).

0.0001 0.01 1 100 10000 1e+06 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 14: Basic MGDA : global convergence

0.0001 0.01 1 100 10000 1e+06 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 15: Basic MGDA based on logarithmic gradients: global convergence

6.4 MGDA-II

Recently, a variant, MGDA-II , has been proposed in which the descent direction is calculated by a direct procedure, which provides a valuable simplification of implementation, as well as more accurate definition of the direction of search [4]. This new algorithm is now presented again along with a new variant (MGDA-II b), and tested on the DDM problem.

Basic definition, scaling. In MGDA-II , the possibility to prescribe scales for the gradients, {Si}(i=1,...,n) (Si > 0 (∀i)) is offered. In the following experiments, at a given iteration, these scales are either set all equal to 1 (“no scaling prescribed”), or equal to the current values of the criteria (“prescribed scaling”):

Si = Ji (i = 1, ..., 4) ; (52)

the latter implies that the descent direction is based on logarithmic gradients. These scales being supplied by the user, we define the following ”scaled gradients”:

J′i =

∇Ji

Si

(16)

Assuming that the gradients form a linearly-independent family, an assumption never con-tradicted in the numerical experiments, a family of orthogonal, but usually not orthonormal vectors are formed{ui}(i=1,...,n)according to the following:

u1 = J′1 (54) and, fori = 2, 3, ..., n: ui = J′ i− P k<ici,kuk Ai (55) where: ∀k < i : ci,k = J′ i, uk uk, uk (56) and Ai =    1 −X k<i ci,k if nonzero εi otherwise (57)

for some arbitrary, but smallεi(0 < |εi| ≪ 1).

The minimum-norm element in the convex of the family{ui}(i=1,...,n)is given by:

ω =

n

X

i=1

αiui (58)

and one finds:

αi = 1 kuik2P n j=1 ku1jk2 = 1 1 +P j6=i kuik2 kujk2 < 1 (59)

which confirms thatω does belong to the interior of the convex hull, so that: ∀i : αikuik2 =

λ

2 (a given Lagrange multiplier), (60)

and: ∀k : uk, ω = αkkukk2 = λ 2 = kωk 2 . (61)

Convening thatεi = 0 in the regular case (P_k<ici,k6= 1), and otherwise by modifying slightly the definition of the scaling factor to be

Si′ = (1 + εi)Si, (62)

and redefining the ”scaled gradient” accordingly (J′

i = ∇Ji/Si′), the following holds:

J′i, ω = kωk 2

(∀i) (63)

that is, the same positive constant [4].

As a result of this direct, fast and accurate construction, the vector ω is usually different from the former definition, except in special cases, as for example, whenn = 2 and the angle between the two gradients is obtuse. Nevertheless, the newω also provides a descent direction common to all criteria, scaled essentially as initially prescribed.

(17)

Automatic rescaling: MGDA-II b. The examination of the casen = 2 has led us to propose a slightly different handling of the scales. Here, one lets

Ai = Si − i−1

X

k=1

ci,k (64)

only when this number is strictly-positive. Otherwise, the scale Si is redefined (“automatic rescaling”) as follows: Si = i−1 X k=1 ci,k (65)

and one setsAi = εiSi, for some smallεi.

The result in (63) is still valid, and it now provides an information on gradients that have been weighted as prescribed wheneverSi >

Pi−1

k=1 ci,k, and otherwise by the procedure itself. This rescaling procedure is certainly perfectible.

Convergence experiments and discussion. MGDA-II has been tested on the partitioning

prob-lem in the four possible options corresponding to “no scaling prescribed” vs “prescribed scal-ing”, and “automatic rescale off” vs “on”. In MGDA-II b, whenP

k<i ci,kwas found greater or equal the prescribedSi(=1 orJi),εi was set to 0.01 (and maintained to 0 otherwise).

A first observation was made: the new procedure for determining ω is much faster, and

MGDA-II seems to be less sensitive to round-off errors.

In Fig. 16 and Fig. 17, the automatic rescale is off, and the effect of scaling alone is eval-uated. Over the first 200 iterations, the result is about the same. However the scaled version indicates a trend to convergence acceleration to be confirmed.

In Fig. 18 and Fig. 19, the automatic rescale is on, and the option of prescribed scaling is off/on. Again a better convergence is achieved when scales are prescribed.

In order to confirm these results, the best option “prescribed scales and automatic rescale” is compared with the basic method in Fig. 20 and Fig. 21 over 500 iterations. The trends indicate a linear convergence for the first method, and a seemingly-quadratic convergence for the second. Compared to the quasi-Newton method of Fig. 5, MGDA-II b is grossly-speaking twice slower, but it indicates a more definite trend to asymptotic convergence acceleration.

One last remark: in these experiments, we observe that scaling has the effect of making the convergence curves associated with the different criteria closer to one another.

0.0001 0.01 1 100 10000 1e+06 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 16: MGDA-II , no scaling prescribed

0.0001 0.01 1 100 10000 1e+06 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

(18)

0.0001 0.01 1 100 10000 1e+06 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 18: MGDA-II , no scaling prescribed, automatic rescale 0.0001 0.01 1 100 10000 1e+06 0 20 40 60 80 100 120 140 160 180 200 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 19: MGDA-II b, prescribed scaling, automatic rescale 1e-08 1e-06 0.0001 0.01 1 100 0 50 100 150 200 250 300 350 400 450 500 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 20: MGDA-II , no scaling prescribed, automatic rescale off, 500 iterations

1e-08 1e-06 0.0001 0.01 1 100 0 50 100 150 200 250 300 350 400 450 500 J_1 J_2 J_3 J_4 J = SUM TOTAL

Figure 21: MGDA-II b, prescribed scaling, automatic rescale, 500 iterations

7 CONCLUSION

In this study, various versions of the Multiple-Gradient Descent Algorithm (MGDA ) have been tested numerically over a domain-partitioning problem treated as a multi-objective prob-lem in which matching defect integrals at the different interfaces are to be minimized concur-rently, and in fact, all driven to 0.

The major objective of this experimentation was to assess the potential of MGDA to handle multi-objective problems in which the finite-dimensional setting was the result of discretization, thus approaching a more general functional setting. In this respect, the demonstration was made. Indeed convergence was achieved by MGDA . However the quasi-Newton method applied to the agglomerated criterion was globally found more efficient, but in the most sophisticated version (MGDA-II b), the algorithm seems to demonstrate a promising asymptotically-quadratic convergence.

Thus, if the convergence was not always found satisfactory, several observations should tem-per this somewhat deceiving conclusion, and many promising directions of improvement can be envisaged:

• in the problem under study, the Pareto set was reduced to the single point corresponding to all criteria equal to 0 associated with the unique solution of the discretized Poisson problem; this situation is really atypical of standard multi-objective problems; addition-ally, the criteria to be minimized were not really antagonistic, since they all converged at almost the same rate with the quasi-Newton method, leaving little possibility of im-provement from the start; for these two reasons MGDA has been tested in a very straining situation for which it was not devised originally;

(19)

• the large dimension of the design space, here 76 (4 interfaces associated with 19 d.o.f.’s), was probably a handicap;

• a robust procedure to define the step-size should be devised; in our experiments, the parameterε was not optimized but maintained equal to 1 throughout;

• the determination of ω in the basic method should be made more accurately by iterative refinement;

• the scaling of gradients was found important; alternatives to the logarithmic gradients should be analyzed and rationalized; more generally, preconditioning remains an open question;

• the MGDA-II variant was found faster and more robust;

• at present, our most sophisticated algorithm, MGDA-II b, also involves an automatic rescaling procedure; it indicates a definite trend to asymptotic convergence acceleration (quadratic convergence).

REFERENCES

[1] Gradient descent. WikipediA: The Free Encyclopedia .

http://en.wikipedia.org/wiki/Gradient_descent.

[2] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. In IEEE Transactions on Evolutionary Computation, volume 6 (2), pages 182–197. 2002.

[3] J.-A. D´esid´eri. Multiple-Gradient Descent Algorithm (MGDA). Research Report 6953, INRIA, 2009. http://hal.inria.fr/inria-00389811.

[4] J.-A. D´esid´eri. MGDA II: A direct method for calculating a descent direc-tion common to several criteria. Research Report 7422, INRIA, April 2012.

http://hal.inria.fr/hal-00685762.

[5] J.-A. Désidéri. Multiple-gradient descent algorithm (MGDA) for multiobjective optimiza-tion. Comptes rendus - Mathématique, 350(5-6):313–318, March 2012 .

http://dx.doi.org/10.1016/j.crma.2012.03.014.

[6] Jean-Antoine D´esid´eri. Application of MGDA to domain partitioning. Research Report 7968, INRIA, May 2012. http://hal.inria.fr/hal-00694039.

[7] Ph. E. Gill, W. Murray, and M. H. Wright. Practical Optimization. Academic Press, New York London, 1986.

[8] R. Miettinen. Nonlinear Multiobjective Optimization. Kluer Academic Publishers, Boston London Dordrecht, 1999.

[9] A. Zerbinati, J.-A. D´esid´eri, and R. Duvigneau. Comparison between MGDA and PAES for Multi-Objective Optimization. Research Report 7667, INRIA, June 2011 .

(20)

A GENERALITIES ABOUT CONVEX HULLS AND PARAMETERIZATION Notations            u = n X i=1

αiui; {ui}_(i=1,...,n) : given family of vectors in RN

constraints: ∀i : αi ≥ 0 ; n X i=1 αi = 1 ; n ≤ N (66)

Parameterization - The convex hull may parameterized by identifying the set of allowable coefficients{αi}_(i=1,...,n). To satisfy the positivity condition automatically, one lets:

αi = σi2 (i = 1, ..., n) (67)

Then, the equality constraint

n X i=1 αi = n X i=1 σ2i = 1 (68) states that σ = (σ1, σ2, ..., σn) ∈ Sn (69)

whereSn is the unit sphere of Rn, and precisely notRN. This sphere is easily parameterized using trigonometric functions ofn − 1 independent arcs φ1, φ2, ..., φn−1:

                

σ1 = cos φ1 . cos φ2 . cos φ3 . ... . cos φn−1

σ2 = sin φ1 . cos φ2 . cos φ3 . ... . cos φn−1

σ3 = 1 . sin φ2 . cos φ3 . ... . cos φn−1

.. . ... σn−1 = 1 . 1 . ... . sin φn−2 . cos φn−1 σn = 1 . 1 . ... . 1 . sin φn−1 (70) that is: σi = sin φi−1. n−1 Y j=i cos φj (71)

withφ0 = π₂. It is sufficient to consider the portion of the sphere corresponding toφi ∈ [0,π₂] for alli ≥ 1 since the sign of the σi’s makes no difference.

The usage of trigonometric functions is not really necessary, since one can let:

ci = cos2φi (i = 1, ..., n) (72) and get:                          α1 = c1 . c2 . c3 . ... . cn−1 α2 = (1 − c1) . c2 . c3 . ... . cn−1 α3 = 1 . (1 − c2) . c3 . ... . cn−1 .. . ... αn−1 = 1 . 1 . ... . (1 − cn−2) . cn−1 αn = 1 . 1 . ... . 1 . (1 − cn−1) (73)

(21)

that is: αi = (1 − ci−1) . n−1 Y j=i cj (74)

withc0 = 0, and ci ∈ [0, 1] for all i ≥ 1.

In this way, the constraints on the coefficients {αi} have been replaced by the bounds 0 and 1 on the new parameters{ci}, independently of one another. However, the criterion to be minimized, kuk2 = n X i=1 n X j=1 αiαj(ui, uj) (75)

is now a polynomial of possibly large degree, namely2(n − 1), of the new parameters {ci}. In the particular case of the coordination of 4 sub-domains by MGDA , and independently of the degree of refinement of the spatial discretization controlled by the integersNX andNY,n =

4, and once the 10 scalar products (ui, uj) (i, j = 1, ..., 4) calculated, the determination of the minimum-norm element ω is equivalent to minimizing a 6th-degree polynomial of (c1, c2, c3) in the_R3unit cube (limits included):

         α1 = c1c2c3 α2 = (1 − c1)c2c3 α3 = (1 − c2)c3 α4 = (1 − c3) (76)