Une approche Bayésienne pour l'optimisation multi-objectif sous contraintes

(1)

HAL Id: tel-01629453

https://tel.archives-ouvertes.fr/tel-01629453

Submitted on 6 Nov 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Une approche Bayésienne pour l’optimisation

multi-objectif sous contraintes

Paul Feliot

To cite this version:

Paul Feliot. Une approche Bayésienne pour l’optimisation multi-objectif sous contraintes. Autre.

Université Paris Saclay (COmUE), 2017. Français. �NNT : 2017SACLC045�. �tel-01629453�

(2)

multi-obje tif sous ontraintes

Thèse de do torat de l'université Paris Sa lay préparée à CentraleSupéle

É ole do torale n

◦

580

S ien es et te hnologies de l'information et de la ommuni ation STIC

Spé ialité de do torat: Traitement du signal

Thèse présentée et soutenue à Gif-sur-Yvette, le 12 Juillet 2017, par

Paul FELIOT

Composition du jury

M. Patri e Aknin IRT SystemX Éxaminateur

Mme Anne Auger INRIA, E ole Polyte hnique Éxaminateur

M. Julien Be t CentraleSupéle En adrant

M. Sébastien Da Veiga Safran Te h Éxaminateur

M. David Ginsbourger Université de Bern Rapporteur

M. Serge Gratton ENSEEIHT, CERFACS Président

M. Lu Pronzato CNRS, UNSA Rapporteur

M. Emmanuel Vazquez CentraleSupéle Dire teur de thèse

(3)

(4)

Je remer ie en premier lieu mes superviseurs Julien Be t et Emmanuel Vazquez, pour leur

im-pli ation et pour leurs onseils toujours avisés. Je ne saurais dire i i tout e que vous m'avez

apporté. Je me ontenterais don de vousexprimer ma plus profonde gratitude etj'espère que

l'avenir nouspermettra de travaillerà nouveau ensembles arj'ai en ore beau oupàapprendre

devous.

Jeremer ieensuitemes ollèguesetamisdel'IRTSystemX,tropnombreuxpourlesnommer.

J'ai beau oup appré ié es trois années passées ensembles et je vous souhaite toute la réussite

quevousméritez. Notre amitié perdurera j'ensuis ertain.

Jeremer ieaussilesdiérents ollaborateursave quij'aieul'o asiondetravailler;NabilR.,

AbdelkaderO., SébastienD. V., Christian R., Emmanuel C.,Caroline S., Amin E. B.,Yves T.

etSophie F.J'ai véritablement appré iétravaillerave vousetjevoussouhaite lemeilleurpour

lasuite.

Enn, je souhaite remer ier haleureusement ma famille, et en parti ulier ma mère et ma

(5)

(6)

Cestravauxde thèse portentsurl'optimisation multi-obje tif de fon tionsàvaleursréelles sous

ontraintes d'inégalités. En parti ulier, nousnousintéressons à desproblèmes pour lesquels les

fon tions obje tifs et ontraintes sont évaluées au moyen d'un programme informatique

né es-sitant potentiellement plusieurs heures de al ul pour retourner un résultat. Dans e adre, il

est souhaitable de résoudre le problème d'optimisation en utilisant le moins possible d'appels

au ode de al ul. Par ailleurs, nous nous intéressons à des problèmes d'optimisation

poten-tiellementfortement ontraints, 'estàdiredesproblèmespourlesquelssatisfairesimultanément

l'ensemble des ontraintes est di ile. Ce type de problème est ara téristique des problèmes

d'optimisationdesystèmes omplexesetmetendéfautdenombreuxalgorithmesd'optimisation.

Nous proposons dans ette thèse un algorithme d'optimisation Bayésienne baptisé BMOO.

Cetalgorithme en ode unnouveau ritèred'amélioration espérée spé iquement développé an

d'être appli able à des problèmes fortement ontraints et/ou ave de nombreux obje tifs. Ce

ritères'appuiesurunefon tiondepertemesurantlevolumedel'espa edominé parlesobserv

a-tions ourantes, e dernierétant déniaumoyen d'unerègle dedomination étenduepermettant

de omparerdessolutionspotentiellesàlafoisselonlesvaleursdesobje tifsetdes ontraintesqui

leurs sont asso iées. Le ritère ainsidénigénéralise plusieurs ritères lassiquesd'amélioration

espéréeissusdelalittératureau asdel'optimisationmulti-obje tifsous ontraintesd'inégalités.

Ce ritère prend la forme d'une intégrale sur l'espa e joint des obje tifs et des ontraintes

quin'est pas al ulable analytiquement dansle asgénéral. Par ailleurs, ildoit êtremaximisé à

haqueitérationdel'algorithmeandeséle tionnerlepro hainpointd'évaluation;maximisation

quiest onnuepourêtredi ile arles ritères d'améliorationespéréeonttendan eàêtre

multi-modaux. An de pallier es di ultés, nous proposons dans ette thèse des algorithmes de

Monte-Carloséquentieldanslalignéedetravauxpré édemmentréalisésparBenassi(2013)dans

le asdel'optimisation globalesans ontraintes. En parti ulier,nousproposonsune densité

L

2

optimalepourle al uldunouveau ritèrepour unensembledepoints andidats,etunedensité

dédiéeà l'optimisation du ritèrepour desproblèmes fortement ontraints.

Quatreextensionsdel'algorithmesontparailleursproposées, esdernièrespouvantêtrevues

omme des ontributions indépendantes. Tout d'abord, BMOO est généralisé à des problèmes

dénis sur des espa es de re her he non-hyper ubiques, dénis par exemple par une fon tion

d'appartenan e ou par des ontraintes peu oûteuses à évaluer, ainsiqu'à desproblèmes ayant

des ontraintes a hées. Cesdernièresapparaissent,par exemple,lorsquele odede al ulutilisé

(7)

eux- i sont disponibles, uneversionmulti-point de l'algorithmeest proposée. Enn, un ritère

d'améliorationespéréepermettantd'orienterlare her he desolutionsoptimalesversdesrégions

hoisiespar l'utilisateur estnalement proposé. Ce ritèrepermetàl'expertmétierd'inuen er

lepro essusd'optimisation an d'obtenirdessolutions plus pertinentes.

L'algorithmeproposéobtient demeilleurs résultatsquedesalgorithmes d'optimisation état

del'art surdesproblèmesd'optimisationàlafoismono-etmulti-obje tifsissusdelalittérature.

Nous montrons qu'il peutêtre appliqué ave de bons résultatsetune bonne répétabilité surun

large ensemble de problèmes. En parti ulier, l'algorithme permet de résoudre des problèmes

fortement ontraintset/ou faisant étatde nombreux obje tifs, e quiétait l'obje tif initial.

BMOO est également appliqué ave su ès à quatre problèmes représentatifs des types de

problèmes d'optimisation ren ontrés dans l'industrie. Il est appliqué au dimensionnement du

système de régulation d'air d'un avion ommer ial ( ollaboration ave Airbus Group

Innova-tion), audimensionnement de la haînede tra tion d'unvéhi ule éle trique ( ollaboration ave

Renault), au paramétrage optimal d'un ontrleur de ligne de visée ( ollaboration ave Safran

Ele troni s&Defense), ainsiqu'au dimensionnement d'une aube desouantede turboma hine

( ollaboration ave Safran Air raft Engines et Cénaéro). Il est montré en parti ulier que les

extensionssus-mentionnéessontpertinentesau regard de etype de problèmesd'optimisation.

Certaineslimitationsintrinsèques rendent ependant BMOOine a e sur ertains typesde

problèmes d'optimisation, qui sont illustrés dans ette thèse. Tout d'abord, BMOO n'est pas

adaptéàlarésolutionde problèmesayant desfon tionsnon-stationnaires. Eneet,l'algorithme

utilise des modèles de pro essus Gaussiens et la stationnarité des obje tifs et des ontraintes

est une des hypothèses de modélisation qui sont faites. Lorsque elle- i n'est pas respe tée,

les modèles ne permettent pas une optimisation e a e. Nous montrons ependant que dans

ertains as, l'utilisation de transformations simples permet de rendre stationnaires ertaines

fon tions qui ne le sont pas à l'origine, et don d'utiliser BMOO de manière e a e. Par

ailleurs, l'algorithme utilise l'hypervolume de la région dominée omme fon tion de perte. Or,

l'hypervolumeatendan eàfavoriser ertainesrégionsdufrontdeParetodavantagequed'autres,

en fon tion de sa ourbure. De e fait, BMOOest sujetà unbiais intrinsèqueetil peutarriver

que sur ertains problèmes, typiquement des problèmes pour lesquels le front de Pareto a des

on avités, ladistribution dessolutions obtenues par l'algorithme ne représentepasde manière

(8)

Remer iements iii

Résumé v

1 Introdu tion 1

1.1 Context . . . 2

1.1.1 Industrial design of omplex systems . . . 2

1.1.2 A brief literaturereviewof ontinuousoptimization . . . 3

1.2 Ba kground literature . . . 6

1.2.1 Bayesian optimization . . . 6

1.2.2 Previous work on similartopi s . . . 8

1.2.3 Illustration . . . 10

1.3 About this thesiswork . . . 12

1.3.1 Main ontributions andoutlineof the manus ript . . . 12

1.3.2 Publi ations and ommuni ations . . . 13

2 A Bayesian approa h to onstrained single- and multi-obje tive optimization 15 2.1 Introdu tion . . . 16

2.2 Ba kground literature . . . 17

2.2.1 Expe tedImprovement. . . 17

2.2.2 EI-based multi-obje tive optimization without onstraints . . . 19

2.2.3 EI-based optimization with onstraints . . . 22

2.3 An EI riterionfor onstrained multi-obje tive optimization . . . 24

2.3.1 Extended domination rule . . . 24

2.3.2 A new EI riterion . . . 25

2.3.3 De omposition oftheexpe tedimprovement: feasibleandunfeasible om-ponents . . . 27

2.4 Sequential Monte Carlo te hniques to ompute and optimize the expe ted im-provement . . . 28

2.4.1 Computation ofthe expe tedimprovement . . . 28

2.4.2 Maximization ofthe sampling riterion . . . 31

2.5 Experiments. . . 35

(9)

2.5.3 Mono-obje tive optimization ben hmark . . . 36

2.5.4 Multi-obje tive optimization ben hmark . . . 43

2.6 Con lusions andfuture work. . . 53

2.7 AdditionalMaterial. . . 54

2.7.1 Onthe bounded hyper-re tangles

B

o

and

B

c

. . . 54

2.7.2 An adaptive pro edureto set

B

o

and

B

c

. . . 55

2.7.3 Modied g3mod,g10 and PVD4test problems . . . 55

2.7.4 Mono-obje tive ben hmarkresult tables . . . 56

3 Improvements and extensions of the BMOO algorithm 61 3.1 Introdu tion . . . 62

3.2 E ient optimization ofthe EI riterion . . . 62

3.2.1 Introdu tion. . . 62

3.2.2 TheYUCCA test problem . . . 63

3.2.3 Failureoftheprobabilityofimprovement samplingdensityontheYUCCA test problem . . . 65

3.2.4 Novelsampling densities . . . 66

3.2.5 Samplingpro edure . . . 67

3.2.6 Numeri alexperiments . . . 68

3.2.7 Con lusions . . . 77

3.3 E ient omputation ofthe EHVI riterion . . . 77

3.3.2 The

L

opt

2

density . . . 80

3.3.3 Complexityof the exa t andapproximate omputation methods. . . 86

3.3.4 Toward a better ontrol ofthesample size . . . 92

3.3.5 Con lusions . . . 94

3.4 BMOOfor Bayesian Many-Obje tive Optimization . . . 96

3.4.2 TheFICUStest problem. . . 96

3.4.3 Empiri alstudy of thehypervolume . . . 97

3.4.4 Numeri alexperiments . . . 105

3.4.5 Con lusions . . . 105

3.5 Extensionsof the BMOOalgorithm . . . 109

3.5.2 Non-hyper ubi design spa es . . . 110

3.5.3 Hidden onstraintsmanagement . . . 111

3.5.4 Bat h sequential multi-obje tive optimization . . . 114

3.5.5 User preferen esinmulti-obje tive optimization . . . 117

3.6 Additionalmaterial . . . 125

(10)

3.6.2 Corre tion oftheadaptive pro edure to set

B

o

. . . 125

3.6.3 Varian e of the EIestimator. . . 130

3.6.4 Experimental resultsfor

p = 6

and

p = 8

. . . 134

4 Appli ations 143 4.1 Introdu tion . . . 144

4.2 Design ofa ommer ial air raftenvironment ontrol system . . . 145

4.2.2 Thermodynami analysis ofthe ECS . . . 146

4.2.3 Optimization ofthe system . . . 151

4.2.4 Con lusions . . . 155

4.2.5 Additional material. . . 157

4.3 Design ofan ele tri vehi lepowertrain . . . 159

4.3.2 Spe i ations . . . 159

4.3.3 Numeri al model . . . 162

4.3.4 Optimization ofthe system . . . 167

4.3.5 Con lusions . . . 170

4.4 Tuningof aLineof Sight ontroller . . . 176

4.4.2 Stabilization ar hite ture model. . . 176

4.4.3 Image quality riteria . . . 178

4.4.4 Tuningof the ontroller . . . 180

4.4.5 Con lusions . . . 183

4.5 Design ofa turboma hine fanblade . . . 187

4.5.1 Introdu tion. . . 187 4.5.2 Simulation hain . . . 187 4.5.3 Blade optimization . . . 188 4.5.4 Con lusions . . . 192 4.6 Con lusions . . . 192 5 Con lusions 193 5.1 Summary of ontributions . . . 194

5.2 Maina hievements and limitations . . . 194

5.3 Perspe tivesfor futurework . . . 195

(11)

(12)

(13)

1.1.1 Industrial design of omplex systems

Theobje tofthis thesisistheoptimal designof omplex systems. Asanintrodu toryexample,

onsiderthedesignofa ommer ialair raftturboma hine. Aturboma hineisa omplexsystem

made of several intera ting subsystems. The main omponents of a typi al turboma hine are

represented onFigure1.1.

Fan

Low pressure

compressor

High pressure

compressor

High pressure

turbine

Low pressure

turbine

Combustion chamber

Secondary flow

Primary flow

Figure 1.1: Globalar hite tureofaturboma hine.

When designing su h a system, a manufa turer has to make several design hoi es. What

should bethe shape of the ombustion hamber? How many ompressor stagesare required to

a hieve a given level of performan e? What materials should the fan blades be omposed of?

Whatistheinnerbladeradiusofthe rststageofthehighpressureturbine? Et . Those hoi es

are oftenmade using past experien e indesigning similar systems and performan e assessment

studies. Anestablishedpra ti etoassesstheperforman esofagivendesign,istorelyon

numer-i al models of thephysi al system. This is in general less ostly and less time- onsuming than

prototyping. Besides, using numeri al models makesit possible to onsider far more andidate

designs.

A ommon approa hto astadesignproblemintoamathemati al frameworkisto formulate

thede ision-makerwishes interms ofobje tivesand onstraints. Intheturboma hine example,

obje tives for the design of the ombustion hamber ould be to minimize fuel onsumption

or to maximize the mixing of fuel and air inside the hamber. One ould also try to do both

simultaneously. Constraints ould be to keep the temperature andpressure inside the hamber

belowsomethresholdvalueto avoiddamagingthe asing. Naturally,thosethresholdvaluesmay

dependon the design ofthe asing itself.

Withinthisframework,anotionof optimaldesign an be introdu ed: adesign is onsidered

optimalifitrespe tsallthe onstraintsanda hievesanoptimaltrade-obetweentheobje tives.

Froma mathemati al point ofview,the problem onsistsinndinganapproximationoftheset

Γ =

_{{x ∈ X : c(x) ≤ 0}

and

∄ x

′

_{∈ X}

s.t.

c(x

′

₎

_{≤ 0}

and

f (x

′

₎

_{≺ f(x)},}

(1.1)

(14)

where

X

isa designdomain,

c = (c

i

)

1≤i≤q

isa ve tor of onstraint fun tions,

f = (f

j

)

1≤j≤p

isa

ve torofobje tivefun tionstobeminimized, and

≺

isapartial orderrelation. Theelementsof

Γ

orrespondto designsolutions thatbothrespe tthe onstraintsanda hieveoptimal trade-o

between theobje tives,asformulated bythede ision-maker

1

.

In thesetting that we onsider, for a given design

x

∈ X

, thevalues

f (x)

and

c(x)

in(1.1)

orrespondtothe outputsofa numeri al modelthatmayinvolvetheresolution ofpartial

dier-ential equations,meshingstepsorlarge matrixinversions. Theaordablenumber ofevaluations

of

f

and

c

isthereforelimitedbythe omputational ost. Whenitishigh,nding

Γ

isadi ult

problem.

1.1.2 A brief literature review of ontinuous optimization

In the literature, several algorithms have been proposed for solving the optimization

prob-lem(1.1). Forthesakeof larity,welimitthes opeofourreviewto the ontinuousoptimization

of deterministi fun tions, i.e. we onsider problems where

X

is a subset of

R

d

,

d

being the

numberof design variables, and for whi h theve tors

f (x)

and

c(x)

for some

x

∈ X

are

deter-ministi (as opposed to sto hasti , see e.g. Fu (2002); Tekin and Sabun uoglu (2004); Kleijnen

(2008)andreferen estherein). Moreover, wedo not onsideroptimizationmethods thatrequire

assumptionson the stru ture of the fun tions of theproblem, su h as onvexity or linearityfor

example, and we do not onsider optimization problems with equality onstraints. These give

risetoaspe i literaturethatfallsoutofthes opeofthisthesis. See,e.g.,thebookofBonnans

etal.(2006) for abroader dis ussionon ontinuousoptimization.

Lo aland global optimization

Optimization problems with only one obje tive fun tion fall into the ategory of the

single-obje tive optimization problems. Thisis probably themost do umented ategory and the rst

thatwasaddressedinthe literature. The solutionto asingle-obje tiveproblemis oftenasingle

point alledthe global optimizer.

Single-obje tiveproblems anbesolvedusinglo alandglobaloptimizationalgorithms. Given

astartingpoint,lo aloptimizationalgorithmsperformalo alsear handhopefully onvergetoa

lo aloptimumoftheobje tivefun tion. Algorithmsinthis lassusuallyhaveagood onvergen e

rate and require few obje tive fun tion evaluations. In this ategory, we nd rst and se ond

order gradient-based optimizers su h as the method of steepest des ent, the onjugate gradient

method,themodiedNewton'smethod orthequasi-Newtonmethod,andderivative-freeoptimizers

su has theDire t Sear h algorithm of Hooke and Jeeves (1961), theTrust-Region algorithm of

Powell (1964), the Simplex algorithm of Nelder and Mead (1965) or the Generalized Pattern

Sear h algorithmofTor zon(1997). Formoredetailson theseapproa hes, thereaderisreferred

tothe bookofNo edal andWright (2006) andreferen es therein.

Global optimization algorithms on the other hand seek a global optimum of the obje tive

fun tion. Theyare oftenpopulation-based and/orintrodu e some randomness inthe

optimiza-1

(15)

for example,the Simulated Annealing algorithm of Kirkpatri k et al. (1983), the Hill Climbing

algorithm of Russell etal. (2003), the DIRECT algorithm of Jones etal. (1993), theMultilevel

CoordinateSear h algorithmofHuyerandNeumaier(1999),severalgeneti andevolutionary

al-gorithms(seee.g.Ba k(1996)),randomsear halgorithms(seee.g.Zhigljavsky(2012)),Bayesian

optimization algorithms, su h as the EGO algorithm of Jones etal. (1998) or theIAGO

algo-rithm of Villemonteix et al. (2009), and surrogate-based optimization algorithms su h as the

COBRA algorithms of Regis (2014). Lo al optimization algorithms an also be made global

by running them several times with dierent starting points (multi-start approa h). For more

detailsonglobal optimizationmethods,thereaderisreferredtothebooksofTorn andZilinskas

(1989);Weise(2009); Zhigljavsky (2012) andNo edal andWright (2006).

Multi-obje tive optimization

Optimization problems with more than one obje tive are alled multi-obje tive optimization

problems. Thetermmany-obje tiveoptimizationproblemsisalsousedtorefertomulti-obje tive

problems with more than two obje tives. Unlike single-obje tive problems, the solution to a

multi-obje tive optimization problemis oftenaset ofoptimal solutions alleda Pareto front.

In the literature, a distin tion is made between algorithms that look for a single solution

on the Pareto front and algorithms thatbuild an approximation of the Pareto front. For both

ategories, a survey of approa hes is provided by Marler and Arora (2004). See also thebooks

of Miettinen (2012) and Collette and Siarry (2013) for more in-depth dis ussions on

multi-obje tive optimization.

ThemostpopularalgorithmsforapproximatingParetofrontsareprobablygeneti and

evolu-tionaryalgorithms. Sin etheyarepopulation-based, theyarewell-suited toapproximatingaset

of solutions. A omprehensive review of geneti and evolutionary multi-obje tive optimization

algorithms isprovided byCoello (2000)and Coello etal. (2002).

IntheBayesianoptimizationliterature,algorithmsforapproximatingParetofrontshavebeen

proposedbyKnowles(2006); Svenson(2011);Keane(2006); Hernández-Lobatoetal.(2015)and

Emmeri hetal.(2006),amongothers. Comparedtogeneti andevolutionaryapproa hes, these

approa hesusually requirefewerobje tivefun tionsevaluations,whi hmakesthemparti ularly

interesting inour ontext.

Constraint handling

Most of the above ited algorithms an be extended to handle onstrained optimization

prob-lems, i.e.problems with at leastone onstraints. The mostpopular approa h is to penalize the

obje tive fun tion(s)byaquantityrelatedto the onstraintsviolation. Lagrangian formulations

for example fallinthis ategory.

Amongthe lassoflo aloptimizationalgorithmsthat an handle onstraints, letus ite,for

example, the Sequential Quadrati Programming algorithm of Han (1977), the COBYLA

(16)

obje tive optimization algorithms, a omprehensive review of onstraint-handling te hniques is

provided by Mezura-Montes and Coello (2011). More generally, see the book of No edal and

Wright (2006) for areviewof onstraint-handling approa hes insingle-obje tive optimization.

Asregards onstrainedmulti-obje tiveoptimization,mostofthere entliterature omesfrom

the geneti and evolutionary ommunities. Popular algorithms for solving onstrained

multi-obje tive problems are the NSGA2 algorithm of Deb et al. (2002) or the SPEA2 algorithm of

Zitzler et al. (2002). For more details about this lass of approa hes, the reader is referred to

the book of Deb (2001). In the Bayesian optimization literature, onstrained multi-obje tive

optimization algorithms have been proposed by Emmeri h et al. (2006); Garrido-Mer hán and

Hernández-Lobato (2016).

Gradient-based optimization

When gradient information is available, whi h happens for example when adjoint solvers are

used(seee.g. Giles and Pier e(2000)), itisoftenadvantageous to useitto guidethesear hfor

optimal solutions. In parti ular when the number

d

of variables is large, gradient information

an prove invaluable to fo usthe sear h in the right dire tion and solve theproblem using few

fun tionsevaluations.

Notethatinthe asewhere gradientsarenot given,they an stillbeestimated(see e.g.

No- edalandWright(2006)). However,gradientapproximationmethodsusuallys aleunfavourably

withthedimensionoftheproblem(

d

evaluationsarerequired toestimateagradient usingnite

dieren es),whi hoftenrendersthemimpra ti alwhen

d

islarge(say

d > 10

)andthefun tions

oftheproblemare expensive to evaluate.

Population-based optimization

Population-based algorithms su h as random sear h algorithms (see e.g. Zhigljavsky (2012)),

geneti andevolutionaryalgorithms(seee.g.Coello(2000);Coelloetal.(2002))orEstimationof

distributionalgorithms(seee.g.Haus hildandPelikan(2011))alsoformanimportantsub lassof

thederivative-freeoptimization algorithmswhi hhasgainedinpopularityoverthelastde ades.

Oneadvantageofpopulation-basedalgorithmsisthattheyareoftenrobusttodi ult

land-s apes (involvingforexample dis ontinuities, irregularitiesor multiplemodes)andhigh

dimen-sional input spa es (see, e.g., Hansen and Kern (2004)). In a sense, it an be said that they

ompensatefor thela kofinformation about thegradients andstru tureof thefun tionsofthe

problemby usingstatisti s (see,e.g.,Bä k (1996)). This usually omes atthe expenseof many

fun tionsevaluationsthough.

Model-based optimization

Many derivative-free optimization algorithms aremodel-based, inthe sense thatthey rely on a

mathemati almodeltoguidethesear hforoptimalsolutions. Forexample,it anbeastatisti al

(17)

global approximation model, asin surrogate-based optimization algorithms (see e.g. Wang and

Shan (2007);Kozieletal. (2011);Queipoetal. (2005); Booker etal. (1999); Regis (2016)).

Algorithmsof this lassusually require fewfun tions evaluations. However, some degree of

smoothnessfrom the fun tions of theproblem is often ne essary and they do not usually s ale

favourablywiththedimension. Assu h,theiruseremainslimitedto a ertain lassofproblems.

1.2 Ba kground literature

1.2.1 Bayesian optimization

For this thesis, the hoi e was made to take a Bayesian approa h to the optimization problem

(1.1). Histori ally, this approa h wasintrodu ed by Kushner(1964) and developed by Mo kus

(1975),Mo kus etal.(1978), Ar hetti and Betrò (1979) and Mo kus (2012). It waslater made

popular by Jones et al. (1998) who proposed the EGO algorithm, whi h is one of the most

e ient existing algorithms for solving global optimization problems with a small number of

fun tion evaluations.

TopresenttheBayesianapproa htooptimization, itisuseful tore alltheBayesrule,whi h

statesthat given astatisti al modelwhere

ξ

isa quantityof interest and

I

represents available

information about

ξ

,the posterior probabilityof

ξ

knowing theinformation

I

isproportional to

thelikelihoodof the information

I

assuming

ξ

times theprior probabilitythatis pla edon

ξ

:

p(ξ

|I) ∝ p(I|ξ) p(ξ).

(1.2)

In a Bayesian optimization setting, it is assumed that the fun tions of the problem are

sample paths of a ve tor valued random pro ess

ξ

. Then,

p(ξ)

represents a priori knowledge

about these fun tions, su h as regularity for example. Usually, stationary Gaussian pro ess

priors are used be ause of their exibility and be ause they yield good results in pra ti e (see

e.g. Williams and Rasmussen (2006)). The information

I

is made of the past observations of

ξ

. In a sequential optimization pro edure, assuming that a set

X

n

= (X

1 , . . . , X

n

)

∈ X

n

of

n

observations have been made at time

n

, then

I = I

n

is the information

Y

n

= ξ(X

n

)

, where

Y

_n

_{= (Y}

₁

_{, . . . , Y}

_n

₎

_{∈ R}

p+q

is theve tor of the observed values. Under this framework,

p(ξ

|I

n

)

is the posterior distribution of

ξ

, onditional on the past observations. An illustration of this

approa h is proposed inFigure1.2.

In the Bayesian optimization literature, various riteria have been proposed to sele t the

evaluation points

(X

1 , X

2 , . . .)

. In this thesis, the hoi e was made to fo us on the expe ted

improvement (EI)sampling riterion (see, e.g., Joneset al. (1998)) but other approa hes based

onstepwiseun ertaintyredu tion (seee.g.Villemonteixetal.(2009);Be tetal.(2012);Chevalier

et al. (2014a); Pi heny (2014b); Hennig and S huler (2012); Hernández-Lobato et al. (2015))

onstitute alternativedire tions thatmayhave been taken.

Consider theglobal optimization settingwhere theobje tive is to ndthe minimum

m

ofa

(18)

ξ

PSfragrepla ements

(x

)

x

0 0.5 1 -5 0 0

ξ

PSfragrepla ements

(x

)

x

0 0.5 1 -5 0 0 5

Figure1.2: Realizationsof

ξ

underaGaussianpro esspriordistribution(top). Conditionalrealizations

(19)

for

f

after

n

evaluations, an be measuredusing thelossfun tion

ε

n

(X, f ) = m

n

− m ,

(1.3)

where

m

n

= f (X

1 )

∧ · · · ∧ f(X

n

)

isthe bestsolution thathasbeenobserved after

n

evaluations.

UsingtheBayesianformalism,theimprovementbroughtbytheobservationofanewpoint

x

∈ X

at time

n

an bemeasured bythe redu tion ofthe loss:

I

n

(x) = ε

n

(X, f )

− ε

n+1

(X, f ) = m

n

− m

n

∧ ξ(x) = (m

n

− ξ(x))

+

.

(1.4)

Notethatsin e

ξ(x)

isarandomvariable,theimprovement(1.4)isarandomquantity. Then,

aone-steplookaheadoptimal hoi e forthenextevaluationpoint

X

n+1

istotakethepointthat

maximizesthe onditional expe tationof theimprovement

I

n

:

X

n+1

= argmax

x

_∈X

E

n

(I

n

(x)) ,

(1.5)

where

E

n

stands for the onditional expe tationwithrespe tto

Y

n

= f (X

n

)

. Inthe following,

we shalldenote

ρ

n

(x) = E

n

(I

n

(x))

,

x

∈ X

. See Figure1.3foran illustrationoftheoperation of

this optimization pro edure.

Thesampling riterion(1.5)is alledtheexpe tedimprovement. IntheBayesianoptimization

literature,ithasbeenextendedto onstrainedsingle-obje tiveproblemsbyS honlauetal.(1998)

and to multi-obje tive problems by Emmeri h et al. (2006), among others

2

. The

state-of-the-artapproa htohandle onstraintsinBayesianoptimization onsistsinmultiplyingtheexpe ted

improvementbytheposteriorprobabilityofjointlysatisfyingthe onstraints,aswillbedis ussed

in more details in Se tion 2.2.3 of this manus ript. This approa h however, is not suitable for

highly onstrained problems,where nding afeasible solutionis a hallengeinitself.

Moreover, note that hoosing

X

n+1

using (1.5) requires to solve an auxiliary optimization

problem. TheEIis heapto evaluate butitisknowntobehighlymulti-modal(see Figure1.3),

whi h makessolving this problemdi ult in some ases. In the global optimization ontext, a

review of approa hes that have been proposed to solve this problem an be found in the PhD

thesisof Benassi (2013).

Inthisthesisweaddressbothdi ulties. Tohandlehighly- onstrainedproblems,wepropose

anextension of the expe tedimprovement riterion. For solving theoptimization problem(1.5)

we proposededi atedsequential Monte-Carlote hniques,following inthisrespe tBenassiet al.

(2012).

1.2.2 Previous work on similar topi s

This thesis work is a ontinuation of previous work initiated by Julien Be t and Emmanuel

Vazquez, supervisors of this thesis, on the oupling between Gaussian pro ess models and

se-2

(20)

x

f

(x

)

m

n

0 0.5 1 -4 0 0 6 (a)Iteration 1 PSfragrepla ements

x

ρ

n

(x

)

0 0.5 1 0 0 0.3 0.6 (b)Iteration 1 PSfragrepla ements

x

f

(x

)

m

n

0 0.5 1 -4 0 0 6 ( )Iteration2 PSfragrepla ements

x

ρ

n

(x

)

0 0.5 1 0 0 0.02 0.04 (d)Iteration 2 PSfragrepla ements

x

f

(x

)

m

n

0 0.5 1 -4 0 0 6 (e)Iteration3 PSfragrepla ements

x

ρ

n

(x

)

0 0.5 1 0 0 0.004 0.008 (f) Iteration3

Figure 1.3: Bayesianoptimization using theEI sampling riterion. On theleft olumn, thefun tion

to be minimized is representedas adashed blue line, theposteriormean of

ξ

is shown in red and the

shadedregion orrespondsto a 95% onden eintervalof theposteriordistribution. The observations

areshownas bla kdisks and the urrent best observed valueis shown asa bla kdashed line. On the

right olumn,thevaluesoftheEIfun tionareshownasabla k urve. Onboth olumns,thelo ationof

(21)

quential Monte-Carlo(SMC) te hniques .

InthePhDthesisofLi(2012),aBayesianapproa htotheestimationofsmallprobabilitiesof

failureisdeveloped. Theproposedapproa hisanadaptationoftheSubsetSimulationalgorithm

ofAuandBe k(2001),anSMCalgorithmfor omputingsmallprobabilitiesoffailure,tothe ase

where the fun tionsof the problem areexpensive to evaluate, and are modeled using Gaussian

pro esses.

In the PhD thesis of Benassi (2013), a fully Bayesian approa h to global optimization is

proposedandsequential Monte-Carlote hniquesinspired fromtheSubsetSimulation algorithm

are used for optimizing the EI riterion

4

. In this thesis, we go a step farther and propose an

extension of theapproa h to the ase of onstrainedmulti-obje tive optimization.

1.2.3 Illustration

Asthenamesuggests,sequentialMonte-Carlote hniquesaresequentialsamplingte hniques(see,

e.g.,DelMoraletal.(2006)). Givenasequen eofdistributions

(π

n

)

n

≥1

denedon

X

,they anbe

usedtoiteratively drawweightedsamples

(

X

n

)

n

≥1

,where

X

n

= (x

n,i

, w

n,i

)

1≤i≤m

∈ X

m

× [0, 1]

m

is approximately distributed from

π

n

, i.e. the empiri al distribution

π

˜

n

=

P

1≤i≤m

w

n,i

δ

x

n,i

is

an approximationof

π

n

.

In the Bayesian global optimization setting where the obje tive is to minimize a fun tion

f : X

_{→ R}

modeledbya Gaussian pro ess

ξ

,Benassi etal. (2012)dene thedensity

π

n

,

n

≥ 1

as:

π

n

(x)

∝ P

n

(ξ(x)

≤ m

n

),

(1.6)

where

P

n

denotes the onditional probability knowing the past observations and

m

n

is the

urrent best solution as in Se tion 1.2.1. In other words,

π

n

is hosen proportional to the

posteriorprobabilityofimproving uponthe urrent bestsolution. Then, usingSMC,aweighted

sampledistributed from

π

n

an beobtained and theresulting parti les

(x

n,i

)

1≤i≤m

anbe used

as andidates forthe optimizationof theEI riterion:

X

n+1

= argmax

_1≤i≤m

ρ

n

(x

n,i

) .

(1.7)

Theoperationofthispro edureisillustratedinFigure1.4. Noteinparti ularhowthedensity

of parti lesfollowsthe on entration ofthe EIfromone iteration to theother.

3

This oupling has also been studied by Dubourg et al. (2011) in the ontext of reliability based design

optimization.

4

InthethesisworkofBenassi(2013),theEI riterionthatis onsideredisnotexa tlytheonethatisintrodu ed

inSe tion1.2.1buttheideasthatareusedfor optimizingthe riterion anbegeneralizedtootherdenitionsof

(22)

x

f

(x

)

m

n

0 0.5 1 -4 0 0 6 (a)Iteration 1 PSfragrepla ements

x

ρ

n

(x

)

0 0.5 1 0 0 0.3 0.6 (b)Iteration 1 PSfragrepla ements

x

f

(x

)

m

n

0 0.5 1 -4 0 0 6 ( )Iteration2 PSfragrepla ements

x

ρ

n

(x

)

0 0.5 1 0 0 0.03 0.06 (d)Iteration 2 PSfragrepla ements

x

f

(x

)

m

n

0 0.5 1 -4 0 0 6 (e)Iteration3 PSfragrepla ements

x

ρ

n

(x

)

0 0.5 1 0 0 0.01 0.02 (f) Iteration3

Figure1.4: IllustrationoftheSMCpro edure foroptimizingtheEI riterion. Theparti lesareshown

asreddots. Theyaredistributed fromadensity proportionaltotheprobabilityofimprovement,whi h

isshownasabla k urveintheguresoftheleft olumn. ThemaximizeroftheEIamongtheparti les

(23)

1.3.1 Main ontributions and outline of the manus ript

Themain ontributionofthisthesisistheproposalofanalgorithmforsolving onstrained

multi-obje tive optimization problems in the ase where the fun tions of the problem are expensive

to evaluate. In parti ular, our fo us ison heavily onstrained problems,i.e.problems for whi h

ndinga feasiblesolution isdi ult initself,andon many-obje tive problems.

The proposed algorithm, whi h we all BMOO, implements a Bayesian approa h and is

detailedinChapter2. This hapterisareprodu tion ofFeliotetal.(2017)withafew

modi a-tions. Itisstru turedasfollows. InSe tion2.2,were alltheframeworkofBayesianoptimization

based on theexpe ted improvement riterion and dis uss some of its extensionsto onstrained

optimization and to multi-obje tive optimization. Then, we introdu e a new EI formulation in

Se tion 2.3. Thisnewformulationisageneralization oftheexpe tedhypervolumeimprovement

(EHVI) riterionofEmmeri hetal.(2006)andisadaptedtoboththesear hoffeasiblesolutions

and to the onstrained optimization of multiple obje tives. For the omputation and

optimiza-tionofthe riterion,weproposededi atedsequentialMonte-Carloalgorithms. Thesearedetailed

respe tively inSe tions2.4.1 and 2.4.2. Theyhave appli ations outsideoftheframeworkof the

BMOOalgorithm and anbeviewedas ontributions ofindependentinterest. Then, wepresent

experimentalresultsinSe tion 2.5. TheBMOOalgorithm isshownto ompare favourablywith

state-of-the-artalgorithmsforsolving onstrainedsingle-andmulti-obje tiveoptimization

prob-lems under a limited budget of fun tion evaluations. Con lusions and perspe tives for future

work aredis ussed inSe tion 2.6.

In Chapter 3, we propose improvements and extensions of the algorithm. In Se tions 3.2

and 3.3, the omputation and optimization of the riterion are revisited and novel sampling

densities to be used inthe sequential Monte-Carlosamplers areproposed. These new densities

makeitpossibletoimprove theperforman es oftheBMOOalgorithm. Then, inSe tion 3.4,the

algorithm is tested on many-obje tive problems with up to eight obje tive fun tions. Finally,

inSe tion 3.5, we proposeextensions of thealgorithm. BMOOis extendedto handle problems

denedonnon-hyper ubi designspa es(i.e.designspa esdenedbybound onstraints,a

heap-to-evaluate indi ator fun tion and/or heap-to-evaluate onstraints) and to problems having

hidden onstraints (due to numeri al simulation failures for example). Also, to take advantage

of parallel omputation fa ilities when available, a bat h version of the algorithm is proposed.

Finally, thenew EI riterion is extended to in lude user preferen es into the sear h for Pareto

optimal solutions.

InChapter4,we present appli ationsof thealgorithm to real-lifedesign optimization

prob-lems. BMOO is applied to the design of a ommer ial air raft environment ontrol system

(Feliot et al., 2016), to the design of an ele tri vehi le power-train, to the tuning of a line of

sight ontroller and to the design ofa turbo-ma hinefan blade.

To on lude, inChapter5, we make a summaryof the manus ript anddis uss perspe tives

(24)

2017

Journal paper

P.Feliot,J.Be t,andE.Vazquez. ABayesianapproa hto onstrainedsingle-andmulti-obje tive

optimization. Journal of GlobalOptimization, 67(1-2):97133, 2017.

2016

Conferen e paper

P.Feliot,Y.LeGuenne ,J.Be t,andE. Vazquez. Design ofa ommer ialair raftenvironment

ontrol systemusing Bayesianoptimizationte hniques. InEngineeringOptimization(ENGOPT

2016),Iguassu,Brazil, 2016.

Communi ations

P. Feliot, J. Be t, and E. Vazquez. Bayesian multi-obje tive optimization: Appli ation to the

design of an ele tri vehi le powertrain. Forum in ertitudes CEA/DAM. Méthodesde

quanti- ationdesin ertitudes, Bruyères-le-Châtel, O tober2016.

P.Feliot,J.Be t,andE.Vazquez. Bayesianmulti-obje tiveoptimizationwith onstraints:

Appli- ation tothe designof a ommer ialair raftenvironment ontrol system. GdRMASCOT-NUM

working meeting: Dealing with sto hasti s in optimization problems, Institut Henri Poin aré

(IHP),Paris,May2016.

P. Feliot., J. Be t and E. Vazquez. BMOO: a Bayesian multi-obje tive optimization algorithm.

Mas otNumAnnualConferen e, Toulouse, Fran e, Mar h 2016.

2015

Conferen e paper

P. Feliot, J. Be t, and E. Vazquez. A Bayesian approa h to onstrained multi-obje tive

opti-mization. InInternationalConferen e onLearning andIntelligent Optimization,pages256261.

(25)

Communi ations

P.Feliot,J.Be t,andE.Vazquez. ABayesianapproa hto onstrained multi-obje tive

optimiza-tion of expensive-to-evaluate fun tions. In World Congress on Global Optimization (WCGO

2015), Gainesville (Florida), United States,2015b.

optimiza-tion. Journéesannuelles duGdR MASCOTNUM, Saint-Etienne, Fran e, April2015. (Poster)

optimiza-tion. InSequential Monte Carloworkshop(SMC2015), Paris,Fran e, August 2015. (Poster)

2014

Communi ation

P. Feliot, J.Be t,and E. Vazquez. A Bayesian subset simulationapproa h to onstrained global

optimizationofexpensive-to-evaluate bla k-boxfun tions. InPGMO-COPI14,Palaiseau, Fran e,

(26)

A Bayesian approa h to onstrained

(27)

In this thesis, we address the problem of derivative-free multi-obje tive optimization of

real-valued fun tionssubje tto multiple inequality onstraints. The problem onsists innding an

approximation ofthe set

Γ =

{x ∈ X : c(x) ≤ 0

and

∄ x

′

_{∈ X}

s.t.

c(x

′

₎

_{≤ 0}

and

f (x

′

₎

_{≺ f(x)}}

(2.1) where

X

⊂ R

d

isthesear hdomain,

c = (c

i

)

1≤i≤q

isave torof onstraintfun tions(

c

i

: X

→ R

),

c(x)

_{≤ 0}

means that

c

i

(x)

≤ 0

for all

1 ≤ i ≤ q

,

f = (f

j

)

1≤j≤p

isa ve tor ofobje tive fun tions

tobeminimized(

f

j

: X

→ R

),and

≺

denotestheParetodominationrule(see,e.g.,Fonse aand

Fleming,1998). Boththeobje tivefun tions

f

j

andthe onstraintfun tions

c

i

areassumedtobe

ontinuous. Thesear h domain

X

isassumed to be ompa ttypi ally,

X

is a hyper-re tangle

dened bybound onstraints. Moreover, theobje tive and onstraint fun tionsare regarded as

bla k boxes and, in parti ular, we assume that no gradient information is available. Finally,

theobje tive andthe onstraint fun tionsareassumedto beexpensive toevaluate,whi harises

for instan e when the values

f (x)

and

c(x)

, for a given

x

∈ X

, orrespond to the outputs of

a omputationally expensive omputer program. In this setting, the emphasis is on building

optimizationalgorithmsthatperformwellunderaverylimited budgetofevaluations(e.g., afew

hundred evaluations).

We adopt a Bayesian approa h to this optimization problem. The essen e of Bayesian

op-timization is to hoose a prior model for the expensive-to-evaluate fun tion(s) involved in the

optimizationproblemusuallyaGaussianpro essmodel(Santneretal.,2003;Williamsand

Ras-mussen,2006) for tra tabilityand thento sele t the evaluationpointssequentiallyinorder to

obtainasmall averageerror between the approximationobtainedbytheoptimization algorithm

and the optimal solution, under the sele ted prior. See, e.g., Kushner (1964), Mo kus (1975),

Mo kusetal.(1978),Ar hettiandBetrò(1979)andMo kus(2012)forsomeoftheearliest

refer-en esintheeld. Bayesianoptimizationresear hwasrstfo usedonthe aseofsingle-obje tive

bound- onstrainedoptimization: theExpe tedImprovement(EI) riterion (Mo kusetal.,1978;

Jones et al., 1998) has emerged in this ase as one of the most popular riteria for sele ting

evaluation points. Later, the EI riterion has been extended to handle onstraints (S honlau

et al.,1998; Sasena etal., 2002; Grama y and Lee, 2011; Gelbart et al., 2014; Grama y et al.,

2016)andtoaddressbound- onstrainedmulti-obje tive problems(Emmeri het al.,2006;Jeong

et al.,2006; Wagneretal.,2010; Svensonand Santner, 2010).

With this hapter, our ontribution is twofold. The rst part of the ontribution is the

proposition of a new sampling riterion that handles multiple obje tives and non-linear

on-straints simultaneously. This riterion orresponds to a one-steplook-ahead Bayesian strategy,

usingthedominatedhyper-volumeasautilityfun tion(followinginthisrespe tEmmeri hetal.,

2006). Morespe i ally,the dominatedhyper-volume is dened usingan extended domination

rule,whi hhandlesobje tivesand onstraintsinauniedway(inthespiritofFonse aand

Flem-ing, 1998; Ray etal.,2001; Oyama etal., 2007). Thisnew riterion is naturally adapted to the

(28)

whenat leastonefeasiblepointisknown. These ondpartofthe ontributionliesinthe

numer-i al methods employed to ompute and optimize the sampling riterion. Indeed, this riterion

takestheformofanintegraloverthespa eof onstraintsandobje tives,forwhi hnoanalyti al

expression is available in the general ase. Besides, it must be optimized at ea h iteration of

thealgorithm to determine the nextevaluation point. In orderto ompute theintegral, we use

analgorithm similar to thesubset simulation method (Auand Be k, 2001;Cérou etal.,2012),

whi h isawellknownSequentialMonteCarlo(SMC) te hnique(seeDelMoraletal.,2006;Liu,

2001,andreferen estherein)fromtheeldofstru turalreliabilityandrareeventestimation. For

theoptimization ofthe riterion,we resorttoan SMCmethodaswell,following earlierwork by

Benassi et al. (2012) for single-obje tive bound- onstrained problems. The resulting algorithm

is alledBMOO(forBayesianmulti-obje tive optimization).

This hapterisbasedonFeliotetal.(2017). Itsstru tureisasfollows. InSe tion2.2,were all

theframeworkofBayesian optimizationbasedon theexpe tedimprovement sampling riterion,

startingwith the un onstrained single-obje tive setting. Se tion 2.3presents our newsampling

riterion for onstrained multi-obje tive optimization. The al ulation and the optimization

of the riterion are dis ussed in Se tion 2.4. Se tion 2.5 presents experimental results. An

illustrationon a two-dimensional toy problem isproposed for visualization purpose. Then, the

performan es of the method are ompared to those of referen e methods on both single- and

multi-obje tive onstrained optimization problems from the literature. Finally, future work is

dis ussedinSe tion 2.6.

2.2 Ba kground literature

2.2.1 Expe ted Improvement

Considerthe single-obje tive un onstrained optimization problem

x

⋆

= argmin

_x

_∈X

f (x) ,

where

f

is a ontinuous real-valued fun tion dened over

X

⊂ R

d

. Our obje tive is to nd an

approximationof

x

⋆

usingasequen eofevaluationpoints

X

1 , X

2 , . . .

∈ X

. Be ausethe hoi eof

anewevaluationpoint

X

n+1

atiteration

n

dependsontheevaluationresultsof

f

at

X

1 , . . . , X

n

,

the onstru tion of an optimization strategy

X : f

7→ (X

1 , X

2 , X

3 . . .)

is a sequential de ision

problem.

The Bayesian approa h to this de ision problem originates from the early work of Kushner

(1964) and Mo kus et al. (1978). Assume that a loss fun tion

ε

n

(X, f )

has been hosen to

measure the performan e of the strategy

X

on

f

after

n

evaluations, for instan e the lassi al

lossfun tion

ε

n

(X, f ) = m

n

− m ,

(2.2)

with

m

n

= f (X

1 )

∧ · · · ∧ f(X

n

)

and

m = min

x

∈X

f (x)

. Then, a good strategy in theBayesian

(29)

theaverage istakenwithrespe tto asto hasti pro essmodel

ξ

(denedona probabilityspa e

(Ω,

_{A, P}

0 )

, with parameter in

X

) for the fun tion

f

. In other words, the Bayesian approa h

assumes that

f = ξ(ω,

·)

for some

ω

∈ Ω

. The probability distribution of

ξ

represents prior

knowledgeaboutthefun tion

f

beforea tualevaluationsareperformed. Thereaderisreferred

to Vazquez and Be t (2014) for a dis ussion of other possible loss fun tions in the ontext of

Bayesian optimization.

ObservingthattheBayes-optimal strategyforabudgetof

N

evaluationsisintra tablefor

N

greaterthanafewunits,Mo kusetal.(1978)proposedtouseaone-steplook-aheadstrategy(also

known asa myopi strategy). Given

n < N

evaluation results,the next evaluationpoint

X

n+1

is hosen in order to minimize the onditional expe tation of the future loss

ε

n+1

(X, ξ)

given

available evaluationresults:

X

n+1

= argmin

x

_∈X

E

n

ε

n+1

(X, ξ)

| X

n+1

= x

,

(2.3)

where

E

n

stands for the onditional expe tation with respe t to

X

1 , ξ(X

1 ), . . . , X

n

, ξ(X

n

)

.

Most of the work produ ed in the eld of Bayesian optimization sin e then has been fo using,

asthe present paperwill, on one-step look-ahead (orsimilar) strategies

1

;the reader is referred

toGinsbourgerandLeRi he(2010)andBenassi(2013)fordis ussionsabouttwo-steplook-ahead

strategies.

When(2.2) isusedasa lossfun tion,theright-hand side of (2.3) an berewritten as

argmin E

n

ε

n+1

(X, ξ)

| X

n+1

= x

= argmin E

n

m

n+1

X

n+1

= x

= argmax E

n

(m

n

− ξ(x))

+

,

(2.4)

with

z

+

= max (z, 0)

. The fun tion

ρ

n

(x) : x

7→ E

n

(m

n

− ξ(x))

+

(2.5)

is alled the Expe ted Improvement (EI) riterion (S honlau et al., 1998; Jones et al., 1998).

When

ξ

is a Gaussian pro ess with known mean and ovarian e fun tions,

ρ

n

(x)

has a

losed-formexpression:

ρ

n

(x) = γ

m

n

− b

ξ

n

(x), σ

n

2 (x)

,

(2.6) where

γ(z, s) =







√

s ϕ

√

z

_s

+ z Φ

√

z

_s

if

s > 0,

max (z, 0)

if

s = 0,

with

Φ

standingforthenormal umulativedistributionfun tion,

ϕ = Φ

′

forthenormal

probabil-itydensityfun tion,

ξ

b

n

(x) = E

n

(ξ(x))

for thekrigingpredi tor at

x

(theposteriormeanof

ξ(x)

after

n

evaluations) and

σ

2 n

(x)

for the kriging varian e at

x

(theposterior varian e of

ξ(x)

after

1

Mo kus(2012, Se tion2.5) heuristi allyintrodu esamodi ation of (2.3) to ompensatefor the fa tthat

subsequentevaluationresultsare nottakenintoa ountinthemyopi strategyand thus enfor eamoreglobal

(30)

n

evaluations). See, e.g., the books of Stein (1999), Santner et al. (2003), and Williams and

Rasmussen (2006)for moreinformation on Gaussianpro ess modelsand kriging(also knownas

Gaussianpro ess interpolation).

Finally, observe that the one-step look-ahead strategy (2.3) requires to solve an auxiliary

global optimization problem on

X

for ea h new evaluation point to be sele ted. The obje tive

fun tion

ρ

n

is rather inexpensive to evaluate when

ξ

is a Gaussian pro ess, using (2.6) , but it

is typi ally severely multi-modal. A simple method to optimize

ρ

n

onsists in hoosing a xed

nitesetofpointsthat overs

X

reasonablywellandthenperformingadis retesear h. Re ently,

sequentialMonteCarlote hniques(seeDelMoraletal.,2006;Liu,2001, andreferen estherein)

have been shown to be a valuable tool for this task (Benassi et al., 2012). A review of other

approa hesis providedin thePhD thesis ofBenassi (2013, Se tion 4.2).

2.2.2 EI-based multi-obje tive optimization without onstraints

We now turnto the aseof un onstrained multi-obje tive optimization. Under this framework,

we onsider a set of obje tive fun tions

f

j

: X

→ R

,

j = 1, . . . , p

, to be minimized, and the

obje tiveistobuildanapproximationoftheParetofrontandofthesetof orrespondingsolutions

Γ =

{x ∈ X : ∄ x

′

∈ X

su hthat

f (x

′

₎

_{≺ f(x)} ,}

(2.7)

where

≺

stands forthePareto domination ruledened by

y = (y

1 , . . . , y

p

)

≺ z = (z

1 , . . . , z

p

)

⇐⇒







∀i ≤ p,

y

i

≤ z

i

,

∃j ≤ p,

y

j

< z

j

.

(2.8)

Given evaluationresults

f (X

1 ) = (f

1 (X

1 ), . . . , f

p

(X

1 )), . . . , f (X

n

) =

(f

1 (X

n

), . . . , f

p

(X

n

))

,dene

H

n

=

{y ∈ B; ∃i ≤ n, f(X

i

)

≺ y} ,

(2.9)

where

B

⊂ R

p

is a set of the form

B

=

{y ∈ R

p

_{; y}

_{≤ y}

upp

}

for some

y

upp

∈ R

p

, whi h is

introdu ed to ensure that the volume of

H

n

is nite.

H

n

is the subset of

B

whose points are

dominatedbythe evaluations.

A natural idea, to extend the EI sampling riterion (2.5) to the multi-obje tive ase, is to

usethevolume ofthenon-dominated region aslossfun tion:

ε

n

(X, f ) =

|H \ H

n

| ,

where

H =

{y ∈ B; ∃x ∈ X, f(x) ≺ y}

and

| · |

denotes theusual(Lebesgue) volume in

R

p

.

The improvement yielded by a new evaluation result

f (X

n+1

) = (f

1 (X

n+1

), . . . , f

p

(X

n+1

))

isthenthein rease ofthevolume of the dominatedregion (see Figure2.1):

(31)

sin e

H

n

⊂ H

n+1

⊂ H

. Given a ve tor-valued Gaussian random pro ess model

ξ = (ξ

1 , . . . , ξ

p

)

of

f = (f

1 , . . . , f

p

)

, dened on aprobability spa e

(Ω,

A, P

0 )

,a multi-obje tive EI riterion an

thenbederived as

ρ

n

(x) = E

n

(I

n

(x))

= E

n

Z

B

_\H

_n

1 _ξ

(x)≺y

dy

!

=

Z

B

_\H

_n

E

_n

1 _ξ

(x)≺y

dy

=

Z

B

_\H

_n

P

_n

_(ξ(x)

_{≺ y) dy ,}

(2.11)

where

P

n

stands for theprobability

P

0

onditioned on

X

1 , ξ(X

1 ), . . . , X

n

, ξ(X

n

)

. The

multi-obje tive sampling riterion (2.11) , also alled Expe ted Hyper-Volume Improvement (EHVI),

has been proposed by Emmeri h and oworkers (Emmeri h, 2005; Emmeri h et al., 2006;

Em-meri h and Klinkenberg, 2008).

Remark 1 A variety of alternative approa hes have been proposed to extend the EI riterion

to the multi-obje tive ase, whi h an be roughly lassied into aggregation-based te hniques

(Knowles, 2006; Knowles and Hughes, 2005; Zhang et al., 2010) and domination-based

te h-niques (see e.g.JeongandObayashi, 2005; Keane,2006; Ponweiseretal.,2008; Bautista,2009;

Svenson and Santner, 2010; Wagner et al., 2010). We onsider these approa hes are heuristi

extensions of the EI riterion, in the sense that none of them emerges from a proper Bayesian

formulation (i.e., a myopi strategy asso iated tosome well-identied loss fun tion). A detailed

des riptionoftheseapproa hesisoutofthes ope ofthisthesis. Thereaderisreferred toWagner

et al. (2010), Cou kuyt et al. (2014) and Horn et al. (2015) for some omparisons and

dis us-sions. See also Pi heny (2014b) and Hernández-Lobato et al. (2015) for other approa hes not

dire tly related tothe on ept of expe ted improvement.

Remark 2 Themulti-obje tivesampling riterion (2.11) redu es tothe usual EI riterion (2.5)

in the single-obje tive ase (assuming that

f (X

i

)

≤ y

upp

forat least one

i

≤ n

).

Undertheassumption thatthe omponents

ξ

i

of

ξ

aremutually independent

2

,

P

n

(ξ(x)

≺ y)

an be expressedin losed form: for all

x

∈ X

and

y

∈ B \ H

n

,

P

_n

_(ξ(x)

_{≺ y) =}

p

Y

i=1

Φ

y

i

− b

ξ

i,n

(x)

σ

i,n

(x)

!

,

(2.12)

where

ξ

b

i,n

(x)

and

σ

2 i,n

(x)

denoterespe tively thekriging predi torand thekriging varian eat

x

for the

i

th

omponent of

ξ

.

2

Thisisthemost ommonmodelingassumptionintheBayesianoptimizationliterature,whenseveral

obje -tive fun tions,and possiblyalso several onstraint fun tions,have to bedealtwith. SeetheVIPER algorithm

(32)

(33)

The integration of (2.12) over

B

\ H

n

, in the expression (2.11) of the multi-obje tive EI

riterion, is a non-trivial problem. Several authors (Emmeri h and Klinkenberg, 2008; Bader

and Zitzler, 2011; Hupkens et al., 2014; Cou kuyt et al., 2014) have proposed de omposition

methods to arry out this omputation, where the integration domain

B

\ H

n

is partitioned

into hyper-re tangles, overwhi h theintegral an be omputedanalyti ally. The omputational

omplexity of these methods, however, in reases exponentially with the numberof obje tives

3

,

whi h makes the approa h impra ti al in problems with more than a few obje tive fun tions.

Themethodproposedinthis workalso en ounters thistype ofintegrationproblem, but takesa

dierent route to solve it (using SMC te hniques; see Se tion 2.4). Our approa h will make it

possibleto deal withmoreobje tivefun tions.

Remark 3 Exa tandapproximateimplementationsoftheEHVI riterionareavailable, together

withother Gaussian-pro ess-based riteria forbound- onstrained multi-obje tive optimization,in

the Matlab/O tave toolbox STK (Be t et al., 2016b) and in the R pa kages GPareto (Binois

and Pi heny, 2015) and mlrMBO (Horn et al., 2015). Note that several approa hes dis ussed

in Remark 1 maintain an aordable omputational ost when the number of obje tives grows,

and therefore onstitute possible alternatives to the SMC te hnique proposed in this paper for

many-obje tivebox- onstrained problems.

2.2.3 EI-based optimization with onstraints

Inthisse tion,wedis ussextensionsoftheexpe tedimprovement riterionforsingle-and

multi-obje tive onstrained optimization.

Considerrst the aseof problems withasingleobje tive andseveral onstraints:







min

x

∈X

f (x) ,

c(x)

≤ 0 ,

(2.13)

where

c = (c

1 , . . . , c

q

)

is a ve tor of ontinuous onstraints. The set

C =

{x ∈ X; c(x) ≤ 0}

is

alledthe feasible domain. Ifit is assumedthat at leastone evaluationhasbeen made in

C

, it

is natural to dene a notion of improvement withrespe t to thebest observed obje tive value

m

n

= min

{f(x); x ∈ {X

1 , . . . , X

n

} ∩ C}

:

I

n

(X

n+1

) = m

n

− m

n+1

= 1

c(X

n+1

)≤0

· m

n

− f(X

n+1

)

+

=







m

n

− f(X

n+1

)

if

X

n+1

∈ C

and

f (X

n+1

) < m

n

,

0

otherwise

.

(2.14)

In other words, a new observation makes an improvement if it is feasible and improves upon

thebest pastvalue (S honlau etal.,1998). The orresponding expe ted improvement riterion

3

See,e.g.,Beume(2009),Hupkensetal.(2014),Cou kuytetal.(2014)andreferen esthereinforde omposition

(34)

ρ

n

(x) = E

n

1 _ξ

c

(x)≤0

· m

n

− ξ

o

(x)

+

.

(2.15)

If

f

is modeledbya random pro ess

ξ

o

and

c

ismodeledbyave tor-valuedrandom pro ess

ξ

c

= (ξ

c,1

, . . . , ξ

c,q

)

independent of

ξ

o

,thenthe sampling riterion (2.15) simplies to S honlau

etal.'s riterion:

ρ

n

(x) = P

n

(ξ

c

(x)

≤ 0) E

n

(m

n

− ξ

o

(x))

+

.

(2.16)

In other words, the expe ted improvement is equal in this ase to the produ t of the

un on-strained expe ted improvement, with respe t to

m

n

, with the probability of feasibility. The

sampling riterion (2.16) is extensively dis ussed, and ompared with other

Gaussian-pro ess-based onstrainthandlingmethods,inthePhDthesisofSasena(2002). Moregenerally,sampling

riteriafor onstrained optimizationproblemshavebeen reviewedbyParretal.(2012)and

Gel-bart(2015).

In thegeneral ase of onstrained multi-obje tive problems, theaim is to build an

approxi-mationof

Γ

dened by(2.1). If itisassumedthatan observationhasbeen madeinthefeasible

set

C

,areasoning similarto thatusedinthesingle-obje tive ase anbemade toformulate an

extensionof the EI(2.11) :

ρ

n

(x) = E

n

(

|H

n+1

| − |H

n

|) ,

(2.17) where

H

n

=

{y ∈ B; ∃i ≤ n, X

i

∈ C

and

f (X

i

)

≺ y}

(2.18)

is the subset of

B

, dened asin Se tion 2.2.2, whose points are dominated by feasible

evalua-tions. When

ξ

o

and

ξ

c

areassumedindependent,(2.17) boilsdownto theprodu tofa modied

EHVI riterion,where only feasible points are onsidered

4

, and theprobabilityof feasibility,as

suggestedby Emmeri h etal. (2006)and Shimoyamaetal. (2013b):

ρ

n

(x) = P

n

(ξ

c

(x)

≤ 0)

Z

B

_\H

_n

P

_n

_(ξ

_o

_(x)

_{≺ y) dy.}

(2.19)

Observe that the sampling riterion (2.17) is the one-step look-ahead riterion asso iated

to the loss fun tion

ε

n

(X, f ) =

− |H

n

|

, where

H

n

is dened by (2.18) . This loss fun tion

remains onstantaslongasnofeasiblepointhasbeenfoundand,therefore,isnotanappropriate

measureof loss for heavily onstrained problems where ndingfeasible pointsis sometimes the

main di ulty

5

. From a pra ti al point of view, not all unfeasible points should be onsidered

equivalent: a point that does not satisfy a onstraint by a small amount has probably more

value than one that does not satisfy the onstraint by a large amount, and should therefore

4

NotethatthismodiedEHVI riterionremainswelldenedevenwhen

H

n

=

∅

,owingtotheintrodu tionofan

upperbound

y

upp

inthedenitionof

B

. Itssingle-obje tive ounterpartintrodu edearlier(seeEquation(2.15) ),

however,wasonly welldenedunderthe assumptionthatat leastonefeasiblepoint is known. Introdu ingan

upperbound

y

upp

is of oursealsopossibleinthesingle-obje tive ase.

5

Thesameremarkholdsforthevariant(see,e.g.,Gelbartetal.,2014)whi h onsistsinusingtheprobability

offeasibilityasasampling riterionwhennofeasiblepointisavailable. Thisisindeedequivalenttousingtheloss

fun tion

ε

n

(X, f ) =

−1

∃i≤n,X

(35)

for onstrained problems, relying on a new loss fun tion that en odes this preferen e among

unfeasible solutions.

Remark 4 Other Gaussian-pro ess-based approa hes that an be used tohandle onstraints

in- ludethemethodbyGrama yetal.(2016),basedonthe augmentedLagrangianapproa hofConn

etal.(1991),andseveralre entmethods(Pi heny,2014a;Gelbart,2015;Hernández-Lobatoetal.,

2015, 2016a) based on stepwise un ertainty redu tion strategies (see, e.g., Villemonteix et al.,

2009; Be t etal., 2012; Chevalier et al.,2014a, for more informationon this topi ).

Remark 5 The term

E

n

(m

n

− ξ

o

(x))

+

in (2.16) an be omputed analyti ally as in

Se -tion2.2.1,andthe omputation ofthe integral in (2.19)has beendis ussed inSe tion2.2.2. Ifit

isfurther assumed that the omponents of

ξ

c

are Gaussian andindependent,then the probability

of feasibility an be written as

P

_n

_(ξ

_c

_(x)

_{≤ 0) =}

q

Y

j=1

Φ

₋

ξ

b

c, j, n

(x)

σ

c, j, n

(x)

!

(2.20) where

ξ

b

c, j, n

(x)

and

σ

2 c, j, n

(x)

standrespe tively forthe krigingpredi tor andthe krigingvarian e

of

ξ

c, j

at

x

.

2.3 An EI riterion for onstrained multi-obje tive optimization

2.3.1 Extended domination rule

Ina onstrainedmulti-obje tiveoptimization setting,weproposetohandlethe onstraintsusing

an extendedPareto domination rule thattakesboth obje tivesand onstraints into a ount, in

thespiritofFonse aandFleming (1998),Rayetal.(2001)and Oyamaetal.(2007). Foreaseof