Hypervolume in Biobjective Optimization Cannot Converge Faster Than Ω$(1/p)$

(1)

HAL Id: hal-03205870

https://hal.archives-ouvertes.fr/hal-03205870

Submitted on 22 Apr 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Hypervolume in Biobjective Optimization Cannot Converge Faster Than Ω(1/p)

Eugénie Marescaux, Nikolaus Hansen

To cite this version:

Eugénie Marescaux, Nikolaus Hansen. Hypervolume in Biobjective Optimization Cannot Converge Faster Than Ω(1/p). GECCO 2021 - The Genetic and Evolutionary Computation Conference, Jul 2021, Lille / Virtual, France. �hal-03205870�

(2)

Faster Than Ω(1/p)

Eugénie Marescaux

Ecole Polytechnique, CMAP 91128 Palaiseau, France forename.lastname@inria.fr

Nikolaus Hansen

Inria

CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris

91128 Palaiseau, France forename.lastname@inria.fr

ABSTRACT

The hypervolume indicator is widely used by multi-objective optimization algorithms and for assessing their performance. We investigate a set ofpvectors in the biobjective space that maxi- mizes the hypervolume indicator with respect to some reference point, referred to asp-optimal distribution. We prove explicit lower and upper bounds on the gap between the hypervolumes of the p-optimal distribution and the∞-optimal distribution (the Pareto front) as a function ofp, of the reference point, and of some Lips- chitz constants. On a wide class of functions, this optimality gap can not be smaller thanΩ(1/p), thereby establishing a bound on the optimal convergence speed of any algorithm. For functions with either bilipschitz or convex Pareto fronts, we also establish an upper bound and the gap is henceΘ(1/p). The presented bounds are not only asymptotic. In particular, functions with a linear Pareto front have the normalized exact gap of 1/(p+1)for any reference point dominating the nadir point.

We empirically investigate on a small set of Pareto fronts the exact optimality gap for values ofpup to 1000 and find in all cases a dependency resembling 1/(p+CONST).

CCS CONCEPTS

•Theory of computation→ Mathematical optimization; • Mathematics of computing→Continuous optimization;

KEYWORDS

p-optimal distribution, biobjective optimization, convergence rate analysis

ACM Reference Format:

Eugénie Marescaux and Nikolaus Hansen. 2021. Hypervolume in Biobjective Optimization Cannot Converge Faster ThanΩ(1/p). In2021 Genetic and Evolutionary Computation Conference (GECCO ’21), July 10–14, 2021, Lille, France.ACM, New York, NY, USA, 9 pages. https://doi.org/10.1145/3449639.

3459371

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and /or a fee. Request permissions from permissions@acm.org.

GECCO ’21, July 10–14, 2021, Lille, France

ACM ISBN 978-1-4503-8350-9/21/07. . . $15.00 https://doi.org/10.1145/3449639.3459371

1 INTRODUCTION

Multi-objective optimization aims at minimizing a vector-valued functionF. Nowadays, a main goal of multi-objective optimization is to find a good approximation of the Pareto set, the set of all non-dominated feasible vectors of the search space. When measur- ing the performance in the objective space, there are at least three different ways to define the convergence of a multi-objective optimization algorithm towards the Pareto front. First, theF-values of a subsequence of vectors explored by the algorithm may converge to a vector of the Pareto front. Proof of such convergence already exists for many multi-objective algorithms, such as MultiGLODS [5], Newton’s method [8], the projected gradient method [9] and (1+1) evolutionary multiobjective algorithms [3], often with a guarantee on the convergence rate. Second, for evolutionary algorithms, the set ofF-values of the population can converge to a good approximation of the Pareto front. In the case of evolutionary algorithms with a hypervolume based selection, ideally, the image of the population would converge to a set ofpvectors of the objective space maximizing the hypervolume, withpbeing the population size. We call such a setap-optimal distribution. Finally, a dynamic subset of the archive can converge to the entire Pareto front, in some sense which is to be defined. For example, it has been known for a long time that under the hypothesis of an uncountably infinite population, the set of all non-dominated vectors explored by an evolutionary algorithm converges almost surely to the Pareto front [12].

Set-quality indicators are widely used for assessing the performance of multi-objective optimization algorithms. They create a total order where there only was a partial one. Many indicators have been invented and are thoroughly used, like the multiplicative and additive epsilon indicators [16] or the hypervolume indicator.

The hypervolume and its variants such as the weighted hypervolume [13] are the only known strictly Pareto compliant indicators [15]. The multiplicative epsilon indictor is also called multiplicative approximation ratio when the Pareto front is used as reference set.

In the biobjective case, it has been proven that the multiplicative approximation ratio of the set of all vectors explored by the algorithm cannot converge to 1 more rapidly thanΩ(1/p), withpbeing the number of function evaluations [4]. It is a direct consequence of Corollary 3.2 in [4], which gives a lower bound of the form 1+Θ(1/p)of the minimum multiplicative approximation ratio of a set ofpvectors.

In this paper, we derive lower and upper bounds of the form Θ(1/p)of the difference in hypervolume between ap-optimal distribution and the Pareto front. We call this differenceoptimality gap.

(3)

GECCO ’21, July 10–14, 2021, Lille, France E. Marescaux and N. Hansen

For bilipschitz Pareto fronts, we have a tight lower bound on the optimality gap of the form CONST/(p+1). The constant depends on the bilipschitz constants and on the position of the reference point of the hypervolume indicator with respect to the nadir point.

The lower bound we found is exact in the case of a linear Pareto front where the reference point dominates the nadir point. In this case, the constant is simply the hypervolume of the Pareto front. We generalize this result to Pareto fronts with a bilipschitz subsection which dominates the reference point. For this wide class of Pareto fronts (see Figure 6), the rate of convergence of multi-objective algorithms in terms of hypervolume cannot be better thanΩ(1/p). For bilipschitz or convex Pareto fronts, we prove an upper bound of the form CONST×(p+1)/p². The constant depends on the extreme values of the Pareto front, on the reference point, and additionally on the bilipschitz constants in the bilipschitz case. Since any convex Pareto front has a bilipschitz subsection, both convex and bilipschitz Pareto fronts abide by the above lower bound. As a consequence, for either bilipschitz or convex Pareto fronts, the optimality gap evolves asΘ(1/p).

We empirically evaluated the optimality gap forpup to 1000 on six different Pareto fronts, among which three are convex and three are concave. We observe convergence ofptimes the optimality gap to a constant, even for non-bilipschitz and non-convex Pareto fronts.

The rest of the paper is organized as follows. In Section 2, we define formally the Pareto front, the hypervolume and thep-optimal distribution. Additionally, we introduce the concept of gap region.

In Section 3, we derive first lower and upper bounds on the optimality gap, respectively for bilipschitz and for bilipschitz or convex Pareto fronts. In Section 4, we derive sufficient conditions on the objective-functions and the search space for the Pareto front to be convex and bilipschitz. Additionally, we generalize the lower bound for Pareto fronts with only a bilipschitz subsection. In Section 5, we examine the empirical optimality gap on six different Pareto fronts.

Notations.We denote the search space byΩ. We denote elements ofΩbyX, which should not be confused withxdenoting the first coordinate of a vector of the Pareto front. A vector of the objective space,v, is called feasible ifv∈F(Ω). In order to avoid confusion, we always use the termareato refer to the Lebesgue-measure in dimension 2 and never to refer to a part of the objective space.

Further notations are defined in the next section.

2 PRELIMINARIES

We focus on biobjective optimization, which aims at minimizing two objective functions,F1andF2over the search spaceΩ⊂Rⁿ. We denoteF:X ∈Ω7→(F1(X),F2(X)).

2.1 Domination and Pareto front

A vectoru∈R²of the objective space is said to weakly dominate a vectorv∈R²whenu1≤v1andu2≤v2. We denote itu⪯v. A vectoruis said to dominate a vectorvwhenu⪯vandu,v. We denote itu≺v. If a vectorudoes not dominate a vectorv, we denote itu ⊀v. ThePareto frontis the set of all non-dominated feasible vectors:{u∈F(Ω):∀v ∈F(Ω),v ⊀u}. We will assume here that the Pareto front has an explicit representation viaf, namely that it can be written as{(x,f(x)):x ∈[xmin;xmax]}withxmin,xmax.

By definition of the Pareto front,f must be strictly decreasing. We say thatf is (Lmin,Lmax)-bilipschitz when|f(x)−f(y)|is between Lmin× |x−y|andL^max× |x−y|for allx,y∈R. This is one of the main assumptions of interest here. We will also consider Pareto fronts for whichf is convex and the (much) wider class of Pareto fronts with a bilipschitz subsection.

Given a reference pointr, we denote the extremes of the part of the Pareto front dominatingrwith ˜umin,r :=(x˜min,r,f(x˜min,r)) and ˜umax,r :=(x˜max,r,f(x˜max,r))where ˜xmin,r :=xminwhenr2≥ f(xmin)andf⁻¹(r2)otherwise, andx˜max,r :=min(xmax,r1), see Figure 1. In this paper, we will also assume that the reference point r is valid, in the sense that there exists a feasible vector of the objective space dominatingr. Thenadir pointis the vector which

F1

F²

r

u˜max,r

u˜min,r

x˜min,r x˜^max,r

Figure 1: Depiction ofu˜min,r andu˜max,r withr=(0.7,0.8)for the Pareto front associated with the functionf :x7→1−√

x forx ∈[0,1].

coordinate in each objective is the worst value achieved on the Pareto front for this objective. It equals(xmax,f(xmin)).

2.2 Hypervolume andp-optimal distribution

The hypervolume is a set-quality indicator which depends on a reference pointr. The hypervolume of a setS is the Lebesgue measure of the region weakly dominated bySand dominatingr :λ({u ∈ F(Ω) :∃s ∈S,s ⪯u ≺r}). We denote it HV_r(S). The hypervolume improvement and hypervolume contribution of a vectoruto a setSquantifies how much addinguto or removingu from the setS, respectively, affects its hypervolume: HVI_r(u,S):= HV_r(S∪ {u})−HV_r(S)and HVC_r(u,S):=HVI_r(u,S\ {u}).

Here, we will study sets ofpfeasible vectors of the objective space maximizing the hypervolume. We call themp-optimal distri- butionsand denote themS^p_r. We denotev_p,1, . . . ,v_p,pthe vectors ofS_r^pordered by increasingF1values andx_p,1, . . . ,x_p,ptheir first coordinates. We also denotexp,0:=x˜min,r andx_p,p+¹:=x˜^max,r.

2.3 Gap regions

In this paper, we will examine the dependency of theoptimality gapHV_r(PF)−HV_r(S^p_r)inp. The optimality gap is the area of the region of the objective space dominated by the Pareto front PF but not by thep-optimal distributionS^p_r. We call this region of the objective spacetotal gap regionand denote itG_r^p. We can now write the optimality gap asλ(G_r^p). Since thep-optimal distributionS_r^pis a subset of the Pareto front, the total gap region can be decomposed intop+1 disjoints regions, that we callgap regions, see Figure 2. The

(4)

i-th gap region ofS_r^pis the region of the objective space dominated by the Pareto front and dominating the reference pointr_i^pwith r^p₁:=(xp,1,r2),r_p+^p ₁:=(r¹,f(x_p,p))andr_i^p:=(xp,i,f(x_p,i−1))for all i∈J2,pK. We denote itG_r,i^p .

G_r,³₁

G_r,³

2

G_r,³₃

G_r,³₄ v3,1

v3,2

v3,3

r³1

r₂³ r₃³

r₄³ F1

F2 r

G_r³

Figure 2: The four gap regions of the 3-optimal distribution S³_rfor the concave and bilipschitz functionf :x 7→1−0.5x− 0.5x²forx∈[0,1].

3 THEORETICAL BOUNDS ON THE OPTIMALITY GAP HV_r(PF)−HV_r(S^p_r)

In this section, we will prove both upper and lower theoretical bounds on the optimality gap HV_r(PF)−HV_r(S^p_r)whenf is bilipschitz, and a theoretical upper bound whenf is convex. The lower bound is generalized to a wider class of functions in Section 4.2. All bounds are equivalent to a constant times 1/p.

3.1 Lower bound on the optimality gap

We will prove that iff is (Lmin,Lmax)-bilipschitz, then the optimality gapλ(G_r^p)=HV_r(PF)−HV_r(S^p_r)is greater than ¹

p+1 times a constant depending only on the hypervolume of the Pareto front, on the reference point and on the bilipschitz constantsLminand Lmax.

Theorem 1. If f restricted to[ ˜xmin,r,x˜max,r]is (Lmin,Lmax)- bilipschitz, then the normalized optimality gap is bounded from below as

λ(G_r^p) HVr(PF) ≥ 1

p+1

×Lmin

Lmax

× 1

1+2×(q1+q2+q1×q2) (1) withq1:=x˜max^r¹⁻,r^x^˜−^max,rx˜min,r andq2:=_f₍x˜min^r²^−f,r)−f⁽^x^˜^min,(x˜^rmax⁾,r).

In particular, whenrdominates the nadir point(x^max,f(xmin)), bothq¹andq²equal 0, and thus the lower bound is simply_p+¹1×_L^L^min

max. Proof. We notec_p,i :=x_p,i−x_p,i−1fori ∈ J1,p+1

K,∆1 := r¹−x˜^max,r and∆2:=r²−f(x˜min,r), see Figure 3.

The optimality gap HV_r(PF) − HV_r(S^p_r) is greater than Pp+1

i=1

Rxp,i

x=xp,i−1(f(x)−f(x_p,i))dx. Since the functionf is decreasing and its restriction to [ ˜xmin,r,x˜max,r] is (Lmin,Lmax)-bilipschitz, f(x)−f(x_p,i)is greater thanLmintimesx_p,i−x. Therefore, the optimality gapλ(S^p_r)is greater than¹

2×Lmin×Pp+1 i=1c_p,i² .

v3,1

v³,2

v3,3

c3,1

c3,2

c³,3

c3,4

x3,1 x3,2 x3,3

x3,0 x3,4

∆1

∆2

F1

F2

r

Figure 3: Illustration of the notations of the proof of The- orem 1 for the convex and bilipschitz function f : x 7→

e−e1×e^−x+1−_e−^e₁forx∈[0,1]andr=(1.2,1.15).

The hypervolume of the Pareto front is equal to

∆1×(f(x˜min,r)−f(x˜^max,r))+∆2×(x˜^max,r−x˜min,r) +∆1×∆2+

Z x˜max,r x˜min,r

(f(x)−f(x˜^max,r))dx.

The integral and∆1 ×(f(x˜min,r)−f(x˜max,r)) are respectively smaller than¹

2×Lmax×(x˜max,r−x˜min,r)²and∆1×Lmax×(x˜max,r− x˜min,r). By rewriting ˜xmax,r−x˜min,ras the sum overi∈J1,p+1

Kof cp,i, we obtain that the optimality gap divided by the hypervolume of the Pareto front is greater than

1

2Lmin×Pp+1 i=1c²_p,i

1

2Lmax×(Pp+1

i=1c_p,i)²+(∆1Lmax+∆2×(Pp+1

i=1c_p,i))+∆1∆2

. The termsPp+1

i=1c_p,i² and Pp

i=0c_p,i²

being respectively∥c_p∥₂²and

∥cp∥²

1withcp:=(cp,i)_i_∈

J1,p+1 K

∈R^p+¹, it is well-known that their ratio is superior to ¹

p+1. Therefore, the normalized optimality gap is greater than

1 p+1

× Lmin

Lmax+2×_P^∆¹_p+^×L1^max

i=1cp,i +2×_P_p+^∆1²

i=1cp,i +2× ^∆¹^×∆²

(Pp+1 i=1cp,i)²

.

We can rewrite backPp+1

i=1cp,i asx˜max,r −x˜min,r. Additionally, sincef is (Lmin,Lmax)-bilipschitz, ^∆²^×L^max

f(x˜max,r)−f(x˜min,r)is greater than

∆2

x˜max,r−x˜min,r. □

The largerLmax/Lminis, the less information we have on the shape of (Lmin,Lmax)-bilipschitz Pareto fronts, and the looser is the bound. The quantitiesq1andq2reflect the distance between rand the nadir point normalized by the scale of the Pareto front, respectively in the first and in the second objective. When the nadir point dominates the reference pointr, the bound gets looser asr moves away from the nadir point. In case of a linear front, we get the following tight result.

(5)

GECCO ’21, July 10–14, 2021, Lille, France E. Marescaux and N. Hansen

Corollary 1. Whenr dominates the nadir point andLmin = Lmax, the normalized optimality gap ^λ(G

pr)

HVr(PF) equals_p+¹1, which cor- responds to the lower bound(1)in Theorem 1.

Proof. - In this case, the Pareto front is linear, and thusS^p_r is evenly distributed [1, Theorem 6]. As a consequence, the normalized optimality gap of thep-optimal distribution is exactly 1/(p+1), which is the lower bound given by Theorem 1. □

3.2 Upper bound on the optimality gap

We will prove that iff is either (Lmin,L^max)-bilipschitz or convex, then the optimality gap is smaller than^p+

1

p² times a constant. The proof idea is largely inspired by the proof of Theorem 4.4 in [4]. This theorem gives an upper bound on the multiplicative approximation ratio of thep-optimal distribution of the form 1+_p−^C4withCbeing a constant depending on the Pareto front.

The following lemma is a natural extension of [4, Lemma 4.3], where we consider not only thep−2 inner vectors of thep-optimal distributionS^p_r but also its two extreme vectors. It states that the smallest hypervolume contribution of any element inS^p_r is below 1/p²times a constant depending only on the extreme values of the Pareto front and on the reference pointr.

Lemma 1. For anyf such that there exists ap-optimal distribution S^p_r, we have

min

v∈S_r^pHVCr(v,S^p_r)≤ (r1−x˜min,r)(r2−f(x˜max,r))

p² . (2)

Proof. We denotea_p,i :=x_p,i−x_p,i−1andb_p,i :=f(x_p,i)− f(xp,i−1)fori ∈ J1,p+1

Kexcept fora_p,p+1andbp,0which are respectivelyr1−x_p,p andr2−f(x_p,1), see Figure 4. For alli ∈

v3,1

v3,2

v3,3

a3,1

a3,2

a3,3

a3,4

b3,1

b3,2

b3,3

b3,4

x3,1 x3,2 x3,3 F1

F2

r

HVC_r(v_i,3,S_r³)

Figure 4: Illustration of the notations of the proof of Lemma 1 forf :x 7→ _e−^e₁×e^−x+1−_e−^e₁ forx ∈[0,1]with r=(1.1,1.1)andp=3.

J1,pK, the hypervolume contribution ofv_p,itoS^p_r isa_p,i+1×b_p,i. In particular, this implies that for anyi the quantityap,i+1 is greater than

minv∈Spr HVC_r(v,S^p_r)

bp,i . Summing overi, we obtain that

(Pp i=1

1 b_p,i)×min

v∈S^p_r HVC_r(v,S_r^p)is lower thanPp

i=1a_p,i+1, and thus lower thanr1−x˜min,r. Additionally, the harmonic mean of the vector(bp,i)_i_∈

J1,pK

is lower than its arithmetic mean:

p Pp

i=1 1 bp,i

≤ Pp

i=1b_p,i

p .

As a consequence, min

v∈S^pr HVC_r(v,S^p_r)is lower than(r1−x˜min,r) times

Pp+1 i=1bp,i

p² , that is ^(r¹⁻

x˜min,r)×(r2−f(x˜max,r))

p² . □

We will also use the lower bounds on the maximum hypervolume of a single vector of the Pareto front forf either convex or (Lmin,Lmax)-bilipschitz stated in [10]. We recall these results below, with a slight reformulation, for sake of completeness.

Proposition 1. Iff restricted to[ ˜xmin,r,x˜max,r]is convex, we have

umax∈PFHVr(u)≥1 2

×HVr(PF). (3)

Iff restricted to[ ˜xmin,r,x˜max,r]is(Lmin,Lmax)-bilipschitz, we have

umax∈PFHVr(u)≥ 1 2

×Lmin

Lmax

×HVr(PF). (4)

For allp, for alli ∈ J1,p+1

K, the hypervolume associated with the reference pointr_i^pof a vectoruis equal to HVI_r(u,S^p_r) whenu ∈ G_r,i^p and 0 otherwise. Thus, max_u_∈PFHV

r_i^p(u)equals maxu∈ G_r,i^p HVI_r(u,S_r^p). Additionally, the hypervolume associated withr_i^pof the Pareto front is simply the area of the gap regionG_r,i^p . Using the optimality of thep-optimal distribution, we can deduce from Lemma 1 and Proposition 1 an upper bound on the area of any gap regionG_r,i^p , and thus an upper bound on the optimality gap at iterationp.

By applying Proposition 1 in the convex case, we obtain the following theorem.

Theorem 2. Iff restricted to[ ˜xmin,r,x˜max,r]is convex, then the optimality gap is bounded from above as

λ(G_r^p)≤2×(r1−x˜min,r)×(r2−f(x˜max,r))×p+1

p² (5)

Proof. Thep-optimal distributionS^p_r is a set ofpfeasible vectors of the objective space maximizing the hypervolume. As a consequence, for anyu ∈ F(Ω), for anyv ∈S^p_r, the hypervolume of S^p_r \ {v} ∪ {u}is lower than the hypervolume ofS^p_r. In other words, the hypervolume improvement with respect to the setS^p_r \ {v}of any feasible vectoruis lower than the one ofvitself. Additionally, for any feasibleu, the hypervolume improvement ofutoS_r^pis lower than the hypervolume improvement ofutoS^p_r \ {v}. Indeed, they are equal to the area of the region dominated byubut not by respec- tivelyS_r^pandS^p_r \ {v}. As a consequence, for any feasible vectoru, HVI_r(u,S^p_r)is lower than HVI_r(v,S^p_r \ {v}), that is HVC_r(v,S_r^p).

LetG_r,i^p be thei-th gap region ofS^p_r. By Proposition 1, the area of G_r,i^p is lower than 2×max

u∈ G^p_r,iHVI_r(u,S^p_r). We just proved that maxu∈ G_r,i^p HVI_r(u,S_r^p)is lower than min

v∈S_r^pHVC_r(v,S_r^p), and