HAL Id: tel-01629453
https://tel.archives-ouvertes.fr/tel-01629453
Submitted on 6 Nov 2017
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Une approche Bayésienne pour l’optimisation
multi-objectif sous contraintes
Paul Feliot
To cite this version:
Paul Feliot. Une approche Bayésienne pour l’optimisation multi-objectif sous contraintes. Autre.
Université Paris Saclay (COmUE), 2017. Français. �NNT : 2017SACLC045�. �tel-01629453�
multi-obje tif sous ontraintes
Thèse de do torat de l'université Paris Sa lay préparée à CentraleSupéle
É ole do torale n
◦
580
S ien es et te hnologies de l'information et de la ommuni ation STIC
Spé ialité de do torat: Traitement du signal
Thèse présentée et soutenue à Gif-sur-Yvette, le 12 Juillet 2017, par
Paul FELIOT
Composition du jury
M. Patri e Aknin IRT SystemX Éxaminateur
Mme Anne Auger INRIA, E ole Polyte hnique Éxaminateur
M. Julien Be t CentraleSupéle En adrant
M. Sébastien Da Veiga Safran Te h Éxaminateur
M. David Ginsbourger Université de Bern Rapporteur
M. Serge Gratton ENSEEIHT, CERFACS Président
M. Lu Pronzato CNRS, UNSA Rapporteur
M. Emmanuel Vazquez CentraleSupéle Dire teur de thèse
Je remer ie en premier lieu mes superviseurs Julien Be t et Emmanuel Vazquez, pour leur
im-pli ation et pour leurs onseils toujours avisés. Je ne saurais dire i i tout e que vous m'avez
apporté. Je me ontenterais don de vousexprimer ma plus profonde gratitude etj'espère que
l'avenir nouspermettra de travaillerà nouveau ensembles arj'ai en ore beau oupàapprendre
devous.
Jeremer ieensuitemes ollèguesetamisdel'IRTSystemX,tropnombreuxpourlesnommer.
J'ai beau oup appré ié es trois années passées ensembles et je vous souhaite toute la réussite
quevousméritez. Notre amitié perdurera j'ensuis ertain.
Jeremer ieaussilesdiérents ollaborateursave quij'aieul'o asiondetravailler;NabilR.,
AbdelkaderO., SébastienD. V., Christian R., Emmanuel C.,Caroline S., Amin E. B.,Yves T.
etSophie F.J'ai véritablement appré iétravaillerave vousetjevoussouhaite lemeilleurpour
lasuite.
Enn, je souhaite remer ier haleureusement ma famille, et en parti ulier ma mère et ma
Cestravauxde thèse portentsurl'optimisation multi-obje tif de fon tionsàvaleursréelles sous
ontraintes d'inégalités. En parti ulier, nousnousintéressons à desproblèmes pour lesquels les
fon tions obje tifs et ontraintes sont évaluées au moyen d'un programme informatique
né es-sitant potentiellement plusieurs heures de al ul pour retourner un résultat. Dans e adre, il
est souhaitable de résoudre le problème d'optimisation en utilisant le moins possible d'appels
au ode de al ul. Par ailleurs, nous nous intéressons à des problèmes d'optimisation
poten-tiellementfortement ontraints, 'estàdiredesproblèmespourlesquelssatisfairesimultanément
l'ensemble des ontraintes est di ile. Ce type de problème est ara téristique des problèmes
d'optimisationdesystèmes omplexesetmetendéfautdenombreuxalgorithmesd'optimisation.
Nous proposons dans ette thèse un algorithme d'optimisation Bayésienne baptisé BMOO.
Cetalgorithme en ode unnouveau ritèred'amélioration espérée spé iquement développé an
d'être appli able à des problèmes fortement ontraints et/ou ave de nombreux obje tifs. Ce
ritères'appuiesurunefon tiondepertemesurantlevolumedel'espa edominé parlesobserv
a-tions ourantes, e dernierétant déniaumoyen d'unerègle dedomination étenduepermettant
de omparerdessolutionspotentiellesàlafoisselonlesvaleursdesobje tifsetdes ontraintesqui
leurs sont asso iées. Le ritère ainsidénigénéralise plusieurs ritères lassiquesd'amélioration
espéréeissusdelalittératureau asdel'optimisationmulti-obje tifsous ontraintesd'inégalités.
Ce ritère prend la forme d'une intégrale sur l'espa e joint des obje tifs et des ontraintes
quin'est pas al ulable analytiquement dansle asgénéral. Par ailleurs, ildoit êtremaximisé à
haqueitérationdel'algorithmeandeséle tionnerlepro hainpointd'évaluation;maximisation
quiest onnuepourêtredi ile arles ritères d'améliorationespéréeonttendan eàêtre
multi-modaux. An de pallier es di ultés, nous proposons dans ette thèse des algorithmes de
Monte-Carloséquentieldanslalignéedetravauxpré édemmentréalisésparBenassi(2013)dans
le asdel'optimisation globalesans ontraintes. En parti ulier,nousproposonsune densité
L
2
optimalepourle al uldunouveau ritèrepour unensembledepoints andidats,etunedensité
dédiéeà l'optimisation du ritèrepour desproblèmes fortement ontraints.
Quatreextensionsdel'algorithmesontparailleursproposées, esdernièrespouvantêtrevues
omme des ontributions indépendantes. Tout d'abord, BMOO est généralisé à des problèmes
dénis sur des espa es de re her he non-hyper ubiques, dénis par exemple par une fon tion
d'appartenan e ou par des ontraintes peu oûteuses à évaluer, ainsiqu'à desproblèmes ayant
des ontraintes a hées. Cesdernièresapparaissent,par exemple,lorsquele odede al ulutilisé
eux- i sont disponibles, uneversionmulti-point de l'algorithmeest proposée. Enn, un ritère
d'améliorationespéréepermettantd'orienterlare her he desolutionsoptimalesversdesrégions
hoisiespar l'utilisateur estnalement proposé. Ce ritèrepermetàl'expertmétierd'inuen er
lepro essusd'optimisation an d'obtenirdessolutions plus pertinentes.
L'algorithmeproposéobtient demeilleurs résultatsquedesalgorithmes d'optimisation état
del'art surdesproblèmesd'optimisationàlafoismono-etmulti-obje tifsissusdelalittérature.
Nous montrons qu'il peutêtre appliqué ave de bons résultatsetune bonne répétabilité surun
large ensemble de problèmes. En parti ulier, l'algorithme permet de résoudre des problèmes
fortement ontraintset/ou faisant étatde nombreux obje tifs, e quiétait l'obje tif initial.
BMOO est également appliqué ave su ès à quatre problèmes représentatifs des types de
problèmes d'optimisation ren ontrés dans l'industrie. Il est appliqué au dimensionnement du
système de régulation d'air d'un avion ommer ial ( ollaboration ave Airbus Group
Innova-tion), audimensionnement de la haînede tra tion d'unvéhi ule éle trique ( ollaboration ave
Renault), au paramétrage optimal d'un ontrleur de ligne de visée ( ollaboration ave Safran
Ele troni s&Defense), ainsiqu'au dimensionnement d'une aube desouantede turboma hine
( ollaboration ave Safran Air raft Engines et Cénaéro). Il est montré en parti ulier que les
extensionssus-mentionnéessontpertinentesau regard de etype de problèmesd'optimisation.
Certaineslimitationsintrinsèques rendent ependant BMOOine a e sur ertains typesde
problèmes d'optimisation, qui sont illustrés dans ette thèse. Tout d'abord, BMOO n'est pas
adaptéàlarésolutionde problèmesayant desfon tionsnon-stationnaires. Eneet,l'algorithme
utilise des modèles de pro essus Gaussiens et la stationnarité des obje tifs et des ontraintes
est une des hypothèses de modélisation qui sont faites. Lorsque elle- i n'est pas respe tée,
les modèles ne permettent pas une optimisation e a e. Nous montrons ependant que dans
ertains as, l'utilisation de transformations simples permet de rendre stationnaires ertaines
fon tions qui ne le sont pas à l'origine, et don d'utiliser BMOO de manière e a e. Par
ailleurs, l'algorithme utilise l'hypervolume de la région dominée omme fon tion de perte. Or,
l'hypervolumeatendan eàfavoriser ertainesrégionsdufrontdeParetodavantagequed'autres,
en fon tion de sa ourbure. De e fait, BMOOest sujetà unbiais intrinsèqueetil peutarriver
que sur ertains problèmes, typiquement des problèmes pour lesquels le front de Pareto a des
on avités, ladistribution dessolutions obtenues par l'algorithme ne représentepasde manière
Remer iements iii
Résumé v
1 Introdu tion 1
1.1 Context . . . 2
1.1.1 Industrial design of omplex systems . . . 2
1.1.2 A brief literaturereviewof ontinuousoptimization . . . 3
1.2 Ba kground literature . . . 6
1.2.1 Bayesian optimization . . . 6
1.2.2 Previous work on similartopi s . . . 8
1.2.3 Illustration . . . 10
1.3 About this thesiswork . . . 12
1.3.1 Main ontributions andoutlineof the manus ript . . . 12
1.3.2 Publi ations and ommuni ations . . . 13
2 A Bayesian approa h to onstrained single- and multi-obje tive optimization 15 2.1 Introdu tion . . . 16
2.2 Ba kground literature . . . 17
2.2.1 Expe tedImprovement. . . 17
2.2.2 EI-based multi-obje tive optimization without onstraints . . . 19
2.2.3 EI-based optimization with onstraints . . . 22
2.3 An EI riterionfor onstrained multi-obje tive optimization . . . 24
2.3.1 Extended domination rule . . . 24
2.3.2 A new EI riterion . . . 25
2.3.3 De omposition oftheexpe tedimprovement: feasibleandunfeasible om-ponents . . . 27
2.4 Sequential Monte Carlo te hniques to ompute and optimize the expe ted im-provement . . . 28
2.4.1 Computation ofthe expe tedimprovement . . . 28
2.4.2 Maximization ofthe sampling riterion . . . 31
2.5 Experiments. . . 35
2.5.3 Mono-obje tive optimization ben hmark . . . 36
2.5.4 Multi-obje tive optimization ben hmark . . . 43
2.6 Con lusions andfuture work. . . 53
2.7 AdditionalMaterial. . . 54
2.7.1 Onthe bounded hyper-re tangles
B
o
andB
c
. . . 542.7.2 An adaptive pro edureto set
B
o
andB
c
. . . 552.7.3 Modied g3mod,g10 and PVD4test problems . . . 55
2.7.4 Mono-obje tive ben hmarkresult tables . . . 56
3 Improvements and extensions of the BMOO algorithm 61 3.1 Introdu tion . . . 62
3.2 E ient optimization ofthe EI riterion . . . 62
3.2.1 Introdu tion. . . 62
3.2.2 TheYUCCA test problem . . . 63
3.2.3 Failureoftheprobabilityofimprovement samplingdensityontheYUCCA test problem . . . 65
3.2.4 Novelsampling densities . . . 66
3.2.5 Samplingpro edure . . . 67
3.2.6 Numeri alexperiments . . . 68
3.2.7 Con lusions . . . 77
3.3 E ient omputation ofthe EHVI riterion . . . 77
3.3.1 Introdu tion. . . 77
3.3.2 The
L
opt
2
density . . . 803.3.3 Complexityof the exa t andapproximate omputation methods. . . 86
3.3.4 Toward a better ontrol ofthesample size . . . 92
3.3.5 Con lusions . . . 94
3.4 BMOOfor Bayesian Many-Obje tive Optimization . . . 96
3.4.1 Introdu tion. . . 96
3.4.2 TheFICUStest problem. . . 96
3.4.3 Empiri alstudy of thehypervolume . . . 97
3.4.4 Numeri alexperiments . . . 105
3.4.5 Con lusions . . . 105
3.5 Extensionsof the BMOOalgorithm . . . 109
3.5.1 Introdu tion. . . 109
3.5.2 Non-hyper ubi design spa es . . . 110
3.5.3 Hidden onstraintsmanagement . . . 111
3.5.4 Bat h sequential multi-obje tive optimization . . . 114
3.5.5 User preferen esinmulti-obje tive optimization . . . 117
3.6 Additionalmaterial . . . 125
3.6.2 Corre tion oftheadaptive pro edure to set
B
o
. . . 1253.6.3 Varian e of the EIestimator. . . 130
3.6.4 Experimental resultsfor
p = 6
andp = 8
. . . 1344 Appli ations 143 4.1 Introdu tion . . . 144
4.2 Design ofa ommer ial air raftenvironment ontrol system . . . 145
4.2.1 Introdu tion. . . 145
4.2.2 Thermodynami analysis ofthe ECS . . . 146
4.2.3 Optimization ofthe system . . . 151
4.2.4 Con lusions . . . 155
4.2.5 Additional material. . . 157
4.3 Design ofan ele tri vehi lepowertrain . . . 159
4.3.1 Introdu tion. . . 159
4.3.2 Spe i ations . . . 159
4.3.3 Numeri al model . . . 162
4.3.4 Optimization ofthe system . . . 167
4.3.5 Con lusions . . . 170
4.3.6 Additional material. . . 173
4.4 Tuningof aLineof Sight ontroller . . . 176
4.4.1 Introdu tion. . . 176
4.4.2 Stabilization ar hite ture model. . . 176
4.4.3 Image quality riteria . . . 178
4.4.4 Tuningof the ontroller . . . 180
4.4.5 Con lusions . . . 183
4.4.6 Additional material. . . 183
4.5 Design ofa turboma hine fanblade . . . 187
4.5.1 Introdu tion. . . 187 4.5.2 Simulation hain . . . 187 4.5.3 Blade optimization . . . 188 4.5.4 Con lusions . . . 192 4.6 Con lusions . . . 192 5 Con lusions 193 5.1 Summary of ontributions . . . 194
5.2 Maina hievements and limitations . . . 194
5.3 Perspe tivesfor futurework . . . 195
1.1.1 Industrial design of omplex systems
Theobje tofthis thesisistheoptimal designof omplex systems. Asanintrodu toryexample,
onsiderthedesignofa ommer ialair raftturboma hine. Aturboma hineisa omplexsystem
made of several intera ting subsystems. The main omponents of a typi al turboma hine are
represented onFigure1.1.
Fan
Low pressure
compressor
High pressure
compressor
High pressure
turbine
Low pressure
turbine
Combustion chamber
Secondary flow
Primary flow
Figure 1.1: Globalar hite tureofaturboma hine.
When designing su h a system, a manufa turer has to make several design hoi es. What
should bethe shape of the ombustion hamber? How many ompressor stagesare required to
a hieve a given level of performan e? What materials should the fan blades be omposed of?
Whatistheinnerbladeradiusofthe rststageofthehighpressureturbine? Et . Those hoi es
are oftenmade using past experien e indesigning similar systems and performan e assessment
studies. Anestablishedpra ti etoassesstheperforman esofagivendesign,istorelyon
numer-i al models of thephysi al system. This is in general less ostly and less time- onsuming than
prototyping. Besides, using numeri al models makesit possible to onsider far more andidate
designs.
A ommon approa hto astadesignproblemintoamathemati al frameworkisto formulate
thede ision-makerwishes interms ofobje tivesand onstraints. Intheturboma hine example,
obje tives for the design of the ombustion hamber ould be to minimize fuel onsumption
or to maximize the mixing of fuel and air inside the hamber. One ould also try to do both
simultaneously. Constraints ould be to keep the temperature andpressure inside the hamber
belowsomethresholdvalueto avoiddamagingthe asing. Naturally,thosethresholdvaluesmay
dependon the design ofthe asing itself.
Withinthisframework,anotionof optimaldesign an be introdu ed: adesign is onsidered
optimalifitrespe tsallthe onstraintsanda hievesanoptimaltrade-obetweentheobje tives.
Froma mathemati al point ofview,the problem onsistsinndinganapproximationoftheset
Γ =
{x ∈ X : c(x) ≤ 0
and∄ x
′
∈ X
s.t.c(x
′
)
≤ 0
andf (x
′
)
≺ f(x)},
(1.1)where
X
isa designdomain,c = (c
i
)
1≤i≤q
isa ve tor of onstraint fun tions,f = (f
j
)
1≤j≤p
isave torofobje tivefun tionstobeminimized, and
≺
isapartial orderrelation. TheelementsofΓ
orrespondto designsolutions thatbothrespe tthe onstraintsanda hieveoptimal trade-obetween theobje tives,asformulated bythede ision-maker
1
.
In thesetting that we onsider, for a given design
x
∈ X
, thevaluesf (x)
andc(x)
in(1.1)orrespondtothe outputsofa numeri al modelthatmayinvolvetheresolution ofpartial
dier-ential equations,meshingstepsorlarge matrixinversions. Theaordablenumber ofevaluations
of
f
andc
isthereforelimitedbythe omputational ost. Whenitishigh,ndingΓ
isadi ultproblem.
1.1.2 A brief literature review of ontinuous optimization
In the literature, several algorithms have been proposed for solving the optimization
prob-lem(1.1). Forthesakeof larity,welimitthes opeofourreviewto the ontinuousoptimization
of deterministi fun tions, i.e. we onsider problems where
X
is a subset ofR
d
,
d
being thenumberof design variables, and for whi h theve tors
f (x)
andc(x)
for somex
∈ X
aredeter-ministi (as opposed to sto hasti , see e.g. Fu (2002); Tekin and Sabun uoglu (2004); Kleijnen
(2008)andreferen estherein). Moreover, wedo not onsideroptimizationmethods thatrequire
assumptionson the stru ture of the fun tions of theproblem, su h as onvexity or linearityfor
example, and we do not onsider optimization problems with equality onstraints. These give
risetoaspe i literaturethatfallsoutofthes opeofthisthesis. See,e.g.,thebookofBonnans
etal.(2006) for abroader dis ussionon ontinuousoptimization.
Lo aland global optimization
Optimization problems with only one obje tive fun tion fall into the ategory of the
single-obje tive optimization problems. Thisis probably themost do umented ategory and the rst
thatwasaddressedinthe literature. The solutionto asingle-obje tiveproblemis oftenasingle
point alledthe global optimizer.
Single-obje tiveproblems anbesolvedusinglo alandglobaloptimizationalgorithms. Given
astartingpoint,lo aloptimizationalgorithmsperformalo alsear handhopefully onvergetoa
lo aloptimumoftheobje tivefun tion. Algorithmsinthis lassusuallyhaveagood onvergen e
rate and require few obje tive fun tion evaluations. In this ategory, we nd rst and se ond
order gradient-based optimizers su h as the method of steepest des ent, the onjugate gradient
method,themodiedNewton'smethod orthequasi-Newtonmethod,andderivative-freeoptimizers
su has theDire t Sear h algorithm of Hooke and Jeeves (1961), theTrust-Region algorithm of
Powell (1964), the Simplex algorithm of Nelder and Mead (1965) or the Generalized Pattern
Sear h algorithmofTor zon(1997). Formoredetailson theseapproa hes, thereaderisreferred
tothe bookofNo edal andWright (2006) andreferen es therein.
Global optimization algorithms on the other hand seek a global optimum of the obje tive
fun tion. Theyare oftenpopulation-based and/orintrodu e some randomness inthe
optimiza-1
for example,the Simulated Annealing algorithm of Kirkpatri k et al. (1983), the Hill Climbing
algorithm of Russell etal. (2003), the DIRECT algorithm of Jones etal. (1993), theMultilevel
CoordinateSear h algorithmofHuyerandNeumaier(1999),severalgeneti andevolutionary
al-gorithms(seee.g.Ba k(1996)),randomsear halgorithms(seee.g.Zhigljavsky(2012)),Bayesian
optimization algorithms, su h as the EGO algorithm of Jones etal. (1998) or theIAGO
algo-rithm of Villemonteix et al. (2009), and surrogate-based optimization algorithms su h as the
COBRA algorithms of Regis (2014). Lo al optimization algorithms an also be made global
by running them several times with dierent starting points (multi-start approa h). For more
detailsonglobal optimizationmethods,thereaderisreferredtothebooksofTorn andZilinskas
(1989);Weise(2009); Zhigljavsky (2012) andNo edal andWright (2006).
Multi-obje tive optimization
Optimization problems with more than one obje tive are alled multi-obje tive optimization
problems. Thetermmany-obje tiveoptimizationproblemsisalsousedtorefertomulti-obje tive
problems with more than two obje tives. Unlike single-obje tive problems, the solution to a
multi-obje tive optimization problemis oftenaset ofoptimal solutions alleda Pareto front.
In the literature, a distin tion is made between algorithms that look for a single solution
on the Pareto front and algorithms thatbuild an approximation of the Pareto front. For both
ategories, a survey of approa hes is provided by Marler and Arora (2004). See also thebooks
of Miettinen (2012) and Collette and Siarry (2013) for more in-depth dis ussions on
multi-obje tive optimization.
ThemostpopularalgorithmsforapproximatingParetofrontsareprobablygeneti and
evolu-tionaryalgorithms. Sin etheyarepopulation-based, theyarewell-suited toapproximatingaset
of solutions. A omprehensive review of geneti and evolutionary multi-obje tive optimization
algorithms isprovided byCoello (2000)and Coello etal. (2002).
IntheBayesianoptimizationliterature,algorithmsforapproximatingParetofrontshavebeen
proposedbyKnowles(2006); Svenson(2011);Keane(2006); Hernández-Lobatoetal.(2015)and
Emmeri hetal.(2006),amongothers. Comparedtogeneti andevolutionaryapproa hes, these
approa hesusually requirefewerobje tivefun tionsevaluations,whi hmakesthemparti ularly
interesting inour ontext.
Constraint handling
Most of the above ited algorithms an be extended to handle onstrained optimization
prob-lems, i.e.problems with at leastone onstraints. The mostpopular approa h is to penalize the
obje tive fun tion(s)byaquantityrelatedto the onstraintsviolation. Lagrangian formulations
for example fallinthis ategory.
Amongthe lassoflo aloptimizationalgorithmsthat an handle onstraints, letus ite,for
example, the Sequential Quadrati Programming algorithm of Han (1977), the COBYLA
obje tive optimization algorithms, a omprehensive review of onstraint-handling te hniques is
provided by Mezura-Montes and Coello (2011). More generally, see the book of No edal and
Wright (2006) for areviewof onstraint-handling approa hes insingle-obje tive optimization.
Asregards onstrainedmulti-obje tiveoptimization,mostofthere entliterature omesfrom
the geneti and evolutionary ommunities. Popular algorithms for solving onstrained
multi-obje tive problems are the NSGA2 algorithm of Deb et al. (2002) or the SPEA2 algorithm of
Zitzler et al. (2002). For more details about this lass of approa hes, the reader is referred to
the book of Deb (2001). In the Bayesian optimization literature, onstrained multi-obje tive
optimization algorithms have been proposed by Emmeri h et al. (2006); Garrido-Mer hán and
Hernández-Lobato (2016).
Gradient-based optimization
When gradient information is available, whi h happens for example when adjoint solvers are
used(seee.g. Giles and Pier e(2000)), itisoftenadvantageous to useitto guidethesear hfor
optimal solutions. In parti ular when the number
d
of variables is large, gradient informationan prove invaluable to fo usthe sear h in the right dire tion and solve theproblem using few
fun tionsevaluations.
Notethatinthe asewhere gradientsarenot given,they an stillbeestimated(see e.g.
No- edalandWright(2006)). However,gradientapproximationmethodsusuallys aleunfavourably
withthedimensionoftheproblem(
d
evaluationsarerequired toestimateagradient usingnitedieren es),whi hoftenrendersthemimpra ti alwhen
d
islarge(sayd > 10
)andthefun tionsoftheproblemare expensive to evaluate.
Population-based optimization
Population-based algorithms su h as random sear h algorithms (see e.g. Zhigljavsky (2012)),
geneti andevolutionaryalgorithms(seee.g.Coello(2000);Coelloetal.(2002))orEstimationof
distributionalgorithms(seee.g.Haus hildandPelikan(2011))alsoformanimportantsub lassof
thederivative-freeoptimization algorithmswhi hhasgainedinpopularityoverthelastde ades.
Oneadvantageofpopulation-basedalgorithmsisthattheyareoftenrobusttodi ult
land-s apes (involvingforexample dis ontinuities, irregularitiesor multiplemodes)andhigh
dimen-sional input spa es (see, e.g., Hansen and Kern (2004)). In a sense, it an be said that they
ompensatefor thela kofinformation about thegradients andstru tureof thefun tionsofthe
problemby usingstatisti s (see,e.g.,Bä k (1996)). This usually omes atthe expenseof many
fun tionsevaluationsthough.
Model-based optimization
Many derivative-free optimization algorithms aremodel-based, inthe sense thatthey rely on a
mathemati almodeltoguidethesear hforoptimalsolutions. Forexample,it anbeastatisti al
global approximation model, asin surrogate-based optimization algorithms (see e.g. Wang and
Shan (2007);Kozieletal. (2011);Queipoetal. (2005); Booker etal. (1999); Regis (2016)).
Algorithmsof this lassusually require fewfun tions evaluations. However, some degree of
smoothnessfrom the fun tions of theproblem is often ne essary and they do not usually s ale
favourablywiththedimension. Assu h,theiruseremainslimitedto a ertain lassofproblems.
1.2 Ba kground literature
1.2.1 Bayesian optimization
For this thesis, the hoi e was made to take a Bayesian approa h to the optimization problem
(1.1). Histori ally, this approa h wasintrodu ed by Kushner(1964) and developed by Mo kus
(1975),Mo kus etal.(1978), Ar hetti and Betrò (1979) and Mo kus (2012). It waslater made
popular by Jones et al. (1998) who proposed the EGO algorithm, whi h is one of the most
e ient existing algorithms for solving global optimization problems with a small number of
fun tion evaluations.
TopresenttheBayesianapproa htooptimization, itisuseful tore alltheBayesrule,whi h
statesthat given astatisti al modelwhere
ξ
isa quantityof interest andI
represents availableinformation about
ξ
,the posterior probabilityofξ
knowing theinformationI
isproportional tothelikelihoodof the information
I
assumingξ
times theprior probabilitythatis pla edonξ
:p(ξ
|I) ∝ p(I|ξ) p(ξ).
(1.2)In a Bayesian optimization setting, it is assumed that the fun tions of the problem are
sample paths of a ve tor valued random pro ess
ξ
. Then,p(ξ)
represents a priori knowledgeabout these fun tions, su h as regularity for example. Usually, stationary Gaussian pro ess
priors are used be ause of their exibility and be ause they yield good results in pra ti e (see
e.g. Williams and Rasmussen (2006)). The information
I
is made of the past observations ofξ
. In a sequential optimization pro edure, assuming that a setX
n
= (X
1
, . . . , X
n
)
∈ X
n
of
n
observations have been made at time
n
, thenI = I
n
is the informationY
n
= ξ(X
n
)
, whereY
n
= (Y
1
, . . . , Y
n
)
∈ R
p+q
is theve tor of the observed values. Under this framework,
p(ξ
|I
n
)
is the posterior distribution of
ξ
, onditional on the past observations. An illustration of thisapproa h is proposed inFigure1.2.
In the Bayesian optimization literature, various riteria have been proposed to sele t the
evaluation points
(X
1
, X
2
, . . .)
. In this thesis, the hoi e was made to fo us on the expe tedimprovement (EI)sampling riterion (see, e.g., Joneset al. (1998)) but other approa hes based
onstepwiseun ertaintyredu tion (seee.g.Villemonteixetal.(2009);Be tetal.(2012);Chevalier
et al. (2014a); Pi heny (2014b); Hennig and S huler (2012); Hernández-Lobato et al. (2015))
onstitute alternativedire tions thatmayhave been taken.
Consider theglobal optimization settingwhere theobje tive is to ndthe minimum
m
ofaξ
PSfragrepla ements(x
)
x
0 0.5 1 -5 0 0ξ
PSfragrepla ements(x
)
x
0 0.5 1 -5 0 0 5Figure1.2: Realizationsof
ξ
underaGaussianpro esspriordistribution(top). Conditionalrealizationsfor
f
aftern
evaluations, an be measuredusing thelossfun tionε
n
(X, f ) = m
n
− m ,
(1.3)where
m
n
= f (X
1
)
∧ · · · ∧ f(X
n
)
isthe bestsolution thathasbeenobserved aftern
evaluations.UsingtheBayesianformalism,theimprovementbroughtbytheobservationofanewpoint
x
∈ X
at time
n
an bemeasured bythe redu tion ofthe loss:I
n
(x) = ε
n
(X, f )
− ε
n+1
(X, f ) = m
n
− m
n
∧ ξ(x) = (m
n
− ξ(x))
+
.
(1.4)Notethatsin e
ξ(x)
isarandomvariable,theimprovement(1.4)isarandomquantity. Then,aone-steplookaheadoptimal hoi e forthenextevaluationpoint
X
n+1
istotakethepointthatmaximizesthe onditional expe tationof theimprovement
I
n
:X
n+1
= argmax
x
∈X
E
n
(I
n
(x)) ,
(1.5)where
E
n
stands for the onditional expe tationwithrespe ttoY
n
= f (X
n
)
. Inthe following,we shalldenote
ρ
n
(x) = E
n
(I
n
(x))
,x
∈ X
. See Figure1.3foran illustrationoftheoperation ofthis optimization pro edure.
Thesampling riterion(1.5)is alledtheexpe tedimprovement. IntheBayesianoptimization
literature,ithasbeenextendedto onstrainedsingle-obje tiveproblemsbyS honlauetal.(1998)
and to multi-obje tive problems by Emmeri h et al. (2006), among others
2
. The
state-of-the-artapproa htohandle onstraintsinBayesianoptimization onsistsinmultiplyingtheexpe ted
improvementbytheposteriorprobabilityofjointlysatisfyingthe onstraints,aswillbedis ussed
in more details in Se tion 2.2.3 of this manus ript. This approa h however, is not suitable for
highly onstrained problems,where nding afeasible solutionis a hallengeinitself.
Moreover, note that hoosing
X
n+1
using (1.5) requires to solve an auxiliary optimizationproblem. TheEIis heapto evaluate butitisknowntobehighlymulti-modal(see Figure1.3),
whi h makessolving this problemdi ult in some ases. In the global optimization ontext, a
review of approa hes that have been proposed to solve this problem an be found in the PhD
thesisof Benassi (2013).
Inthisthesisweaddressbothdi ulties. Tohandlehighly- onstrainedproblems,wepropose
anextension of the expe tedimprovement riterion. For solving theoptimization problem(1.5)
we proposededi atedsequential Monte-Carlote hniques,following inthisrespe tBenassiet al.
(2012).
1.2.2 Previous work on similar topi s
This thesis work is a ontinuation of previous work initiated by Julien Be t and Emmanuel
Vazquez, supervisors of this thesis, on the oupling between Gaussian pro ess models and
se-2
x
f
(x
)
m
n
0 0.5 1 -4 0 0 6 (a)Iteration 1 PSfragrepla ementsx
ρ
n
(x
)
0 0.5 1 0 0 0.3 0.6 (b)Iteration 1 PSfragrepla ementsx
f
(x
)
m
n
0 0.5 1 -4 0 0 6 ( )Iteration2 PSfragrepla ementsx
ρ
n
(x
)
0 0.5 1 0 0 0.02 0.04 (d)Iteration 2 PSfragrepla ementsx
f
(x
)
m
n
0 0.5 1 -4 0 0 6 (e)Iteration3 PSfragrepla ementsx
ρ
n
(x
)
0 0.5 1 0 0 0.004 0.008 (f) Iteration3Figure 1.3: Bayesianoptimization using theEI sampling riterion. On theleft olumn, thefun tion
to be minimized is representedas adashed blue line, theposteriormean of
ξ
is shown in red and theshadedregion orrespondsto a 95% onden eintervalof theposteriordistribution. The observations
areshownas bla kdisks and the urrent best observed valueis shown asa bla kdashed line. On the
right olumn,thevaluesoftheEIfun tionareshownasabla k urve. Onboth olumns,thelo ationof
quential Monte-Carlo(SMC) te hniques .
InthePhDthesisofLi(2012),aBayesianapproa htotheestimationofsmallprobabilitiesof
failureisdeveloped. Theproposedapproa hisanadaptationoftheSubsetSimulationalgorithm
ofAuandBe k(2001),anSMCalgorithmfor omputingsmallprobabilitiesoffailure,tothe ase
where the fun tionsof the problem areexpensive to evaluate, and are modeled using Gaussian
pro esses.
In the PhD thesis of Benassi (2013), a fully Bayesian approa h to global optimization is
proposedandsequential Monte-Carlote hniquesinspired fromtheSubsetSimulation algorithm
are used for optimizing the EI riterion
4
. In this thesis, we go a step farther and propose an
extension of theapproa h to the ase of onstrainedmulti-obje tive optimization.
1.2.3 Illustration
Asthenamesuggests,sequentialMonte-Carlote hniquesaresequentialsamplingte hniques(see,
e.g.,DelMoraletal.(2006)). Givenasequen eofdistributions
(π
n
)
n
≥1
denedonX
,they anbeusedtoiteratively drawweightedsamples
(
X
n
)
n
≥1
,whereX
n
= (x
n,i
, w
n,i
)
1≤i≤m
∈ X
m
× [0, 1]
m
is approximately distributed from
π
n
, i.e. the empiri al distributionπ
˜
n
=
P
1≤i≤m
w
n,i
δ
x
n,i
isan approximationof
π
n
.In the Bayesian global optimization setting where the obje tive is to minimize a fun tion
f : X
→ R
modeledbya Gaussian pro essξ
,Benassi etal. (2012)dene thedensityπ
n
,n
≥ 1
as:π
n
(x)
∝ P
n
(ξ(x)
≤ m
n
),
(1.6)where
P
n
denotes the onditional probability knowing the past observations andm
n
is theurrent best solution as in Se tion 1.2.1. In other words,
π
n
is hosen proportional to theposteriorprobabilityofimproving uponthe urrent bestsolution. Then, usingSMC,aweighted
sampledistributed from
π
n
an beobtained and theresulting parti les(x
n,i
)
1≤i≤m
anbe usedas andidates forthe optimizationof theEI riterion:
X
n+1
= argmax
1≤i≤m
ρ
n
(x
n,i
) .
(1.7)Theoperationofthispro edureisillustratedinFigure1.4. Noteinparti ularhowthedensity
of parti lesfollowsthe on entration ofthe EIfromone iteration to theother.
3
This oupling has also been studied by Dubourg et al. (2011) in the ontext of reliability based design
optimization.
4
InthethesisworkofBenassi(2013),theEI riterionthatis onsideredisnotexa tlytheonethatisintrodu ed
inSe tion1.2.1buttheideasthatareusedfor optimizingthe riterion anbegeneralizedtootherdenitionsof
x
f
(x
)
m
n
0 0.5 1 -4 0 0 6 (a)Iteration 1 PSfragrepla ementsx
ρ
n
(x
)
0 0.5 1 0 0 0.3 0.6 (b)Iteration 1 PSfragrepla ementsx
f
(x
)
m
n
0 0.5 1 -4 0 0 6 ( )Iteration2 PSfragrepla ementsx
ρ
n
(x
)
0 0.5 1 0 0 0.03 0.06 (d)Iteration 2 PSfragrepla ementsx
f
(x
)
m
n
0 0.5 1 -4 0 0 6 (e)Iteration3 PSfragrepla ementsx
ρ
n
(x
)
0 0.5 1 0 0 0.01 0.02 (f) Iteration3Figure1.4: IllustrationoftheSMCpro edure foroptimizingtheEI riterion. Theparti lesareshown
asreddots. Theyaredistributed fromadensity proportionaltotheprobabilityofimprovement,whi h
isshownasabla k urveintheguresoftheleft olumn. ThemaximizeroftheEIamongtheparti les
1.3.1 Main ontributions and outline of the manus ript
Themain ontributionofthisthesisistheproposalofanalgorithmforsolving onstrained
multi-obje tive optimization problems in the ase where the fun tions of the problem are expensive
to evaluate. In parti ular, our fo us ison heavily onstrained problems,i.e.problems for whi h
ndinga feasiblesolution isdi ult initself,andon many-obje tive problems.
The proposed algorithm, whi h we all BMOO, implements a Bayesian approa h and is
detailedinChapter2. This hapterisareprodu tion ofFeliotetal.(2017)withafew
modi a-tions. Itisstru turedasfollows. InSe tion2.2,were alltheframeworkofBayesianoptimization
based on theexpe ted improvement riterion and dis uss some of its extensionsto onstrained
optimization and to multi-obje tive optimization. Then, we introdu e a new EI formulation in
Se tion 2.3. Thisnewformulationisageneralization oftheexpe tedhypervolumeimprovement
(EHVI) riterionofEmmeri hetal.(2006)andisadaptedtoboththesear hoffeasiblesolutions
and to the onstrained optimization of multiple obje tives. For the omputation and
optimiza-tionofthe riterion,weproposededi atedsequentialMonte-Carloalgorithms. Thesearedetailed
respe tively inSe tions2.4.1 and 2.4.2. Theyhave appli ations outsideoftheframeworkof the
BMOOalgorithm and anbeviewedas ontributions ofindependentinterest. Then, wepresent
experimentalresultsinSe tion 2.5. TheBMOOalgorithm isshownto ompare favourablywith
state-of-the-artalgorithmsforsolving onstrainedsingle-andmulti-obje tiveoptimization
prob-lems under a limited budget of fun tion evaluations. Con lusions and perspe tives for future
work aredis ussed inSe tion 2.6.
In Chapter 3, we propose improvements and extensions of the algorithm. In Se tions 3.2
and 3.3, the omputation and optimization of the riterion are revisited and novel sampling
densities to be used inthe sequential Monte-Carlosamplers areproposed. These new densities
makeitpossibletoimprove theperforman es oftheBMOOalgorithm. Then, inSe tion 3.4,the
algorithm is tested on many-obje tive problems with up to eight obje tive fun tions. Finally,
inSe tion 3.5, we proposeextensions of thealgorithm. BMOOis extendedto handle problems
denedonnon-hyper ubi designspa es(i.e.designspa esdenedbybound onstraints,a
heap-to-evaluate indi ator fun tion and/or heap-to-evaluate onstraints) and to problems having
hidden onstraints (due to numeri al simulation failures for example). Also, to take advantage
of parallel omputation fa ilities when available, a bat h version of the algorithm is proposed.
Finally, thenew EI riterion is extended to in lude user preferen es into the sear h for Pareto
optimal solutions.
InChapter4,we present appli ationsof thealgorithm to real-lifedesign optimization
prob-lems. BMOO is applied to the design of a ommer ial air raft environment ontrol system
(Feliot et al., 2016), to the design of an ele tri vehi le power-train, to the tuning of a line of
sight ontroller and to the design ofa turbo-ma hinefan blade.
To on lude, inChapter5, we make a summaryof the manus ript anddis uss perspe tives
2017
Journal paper
P.Feliot,J.Be t,andE.Vazquez. ABayesianapproa hto onstrainedsingle-andmulti-obje tive
optimization. Journal of GlobalOptimization, 67(1-2):97133, 2017.
2016
Conferen e paper
P.Feliot,Y.LeGuenne ,J.Be t,andE. Vazquez. Design ofa ommer ialair raftenvironment
ontrol systemusing Bayesianoptimizationte hniques. InEngineeringOptimization(ENGOPT
2016),Iguassu,Brazil, 2016.
Communi ations
P. Feliot, J. Be t, and E. Vazquez. Bayesian multi-obje tive optimization: Appli ation to the
design of an ele tri vehi le powertrain. Forum in ertitudes CEA/DAM. Méthodesde
quanti- ationdesin ertitudes, Bruyères-le-Châtel, O tober2016.
P.Feliot,J.Be t,andE.Vazquez. Bayesianmulti-obje tiveoptimizationwith onstraints:
Appli- ation tothe designof a ommer ialair raftenvironment ontrol system. GdRMASCOT-NUM
working meeting: Dealing with sto hasti s in optimization problems, Institut Henri Poin aré
(IHP),Paris,May2016.
P. Feliot., J. Be t and E. Vazquez. BMOO: a Bayesian multi-obje tive optimization algorithm.
Mas otNumAnnualConferen e, Toulouse, Fran e, Mar h 2016.
2015
Conferen e paper
P. Feliot, J. Be t, and E. Vazquez. A Bayesian approa h to onstrained multi-obje tive
opti-mization. InInternationalConferen e onLearning andIntelligent Optimization,pages256261.
Communi ations
P.Feliot,J.Be t,andE.Vazquez. ABayesianapproa hto onstrained multi-obje tive
optimiza-tion of expensive-to-evaluate fun tions. In World Congress on Global Optimization (WCGO
2015), Gainesville (Florida), United States,2015b.
P.Feliot,J.Be t,andE.Vazquez. ABayesianapproa hto onstrained multi-obje tive
optimiza-tion. Journéesannuelles duGdR MASCOTNUM, Saint-Etienne, Fran e, April2015. (Poster)
P.Feliot,J.Be t,andE.Vazquez. ABayesianapproa hto onstrained multi-obje tive
optimiza-tion. InSequential Monte Carloworkshop(SMC2015), Paris,Fran e, August 2015. (Poster)
2014
Communi ation
P. Feliot, J.Be t,and E. Vazquez. A Bayesian subset simulationapproa h to onstrained global
optimizationofexpensive-to-evaluate bla k-boxfun tions. InPGMO-COPI14,Palaiseau, Fran e,
A Bayesian approa h to onstrained
In this thesis, we address the problem of derivative-free multi-obje tive optimization of
real-valued fun tionssubje tto multiple inequality onstraints. The problem onsists innding an
approximation ofthe set
Γ =
{x ∈ X : c(x) ≤ 0
and∄ x
′
∈ X
s.t.c(x
′
)
≤ 0
andf (x
′
)
≺ f(x)}
(2.1) whereX
⊂ R
d
isthesear hdomain,
c = (c
i
)
1≤i≤q
isave torof onstraintfun tions(c
i
: X
→ R
),c(x)
≤ 0
means thatc
i
(x)
≤ 0
for all1
≤ i ≤ q
,f = (f
j
)
1≤j≤p
isa ve tor ofobje tive fun tionstobeminimized(
f
j
: X
→ R
),and≺
denotestheParetodominationrule(see,e.g.,Fonse aandFleming,1998). Boththeobje tivefun tions
f
j
andthe onstraintfun tionsc
i
areassumedtobeontinuous. Thesear h domain
X
isassumed to be ompa ttypi ally,X
is a hyper-re tangledened bybound onstraints. Moreover, theobje tive and onstraint fun tionsare regarded as
bla k boxes and, in parti ular, we assume that no gradient information is available. Finally,
theobje tive andthe onstraint fun tionsareassumedto beexpensive toevaluate,whi harises
for instan e when the values
f (x)
andc(x)
, for a givenx
∈ X
, orrespond to the outputs ofa omputationally expensive omputer program. In this setting, the emphasis is on building
optimizationalgorithmsthatperformwellunderaverylimited budgetofevaluations(e.g., afew
hundred evaluations).
We adopt a Bayesian approa h to this optimization problem. The essen e of Bayesian
op-timization is to hoose a prior model for the expensive-to-evaluate fun tion(s) involved in the
optimizationproblemusuallyaGaussianpro essmodel(Santneretal.,2003;Williamsand
Ras-mussen,2006) for tra tabilityand thento sele t the evaluationpointssequentiallyinorder to
obtainasmall averageerror between the approximationobtainedbytheoptimization algorithm
and the optimal solution, under the sele ted prior. See, e.g., Kushner (1964), Mo kus (1975),
Mo kusetal.(1978),Ar hettiandBetrò(1979)andMo kus(2012)forsomeoftheearliest
refer-en esintheeld. Bayesianoptimizationresear hwasrstfo usedonthe aseofsingle-obje tive
bound- onstrainedoptimization: theExpe tedImprovement(EI) riterion (Mo kusetal.,1978;
Jones et al., 1998) has emerged in this ase as one of the most popular riteria for sele ting
evaluation points. Later, the EI riterion has been extended to handle onstraints (S honlau
et al.,1998; Sasena etal., 2002; Grama y and Lee, 2011; Gelbart et al., 2014; Grama y et al.,
2016)andtoaddressbound- onstrainedmulti-obje tive problems(Emmeri het al.,2006;Jeong
et al.,2006; Wagneretal.,2010; Svensonand Santner, 2010).
With this hapter, our ontribution is twofold. The rst part of the ontribution is the
proposition of a new sampling riterion that handles multiple obje tives and non-linear
on-straints simultaneously. This riterion orresponds to a one-steplook-ahead Bayesian strategy,
usingthedominatedhyper-volumeasautilityfun tion(followinginthisrespe tEmmeri hetal.,
2006). Morespe i ally,the dominatedhyper-volume is dened usingan extended domination
rule,whi hhandlesobje tivesand onstraintsinauniedway(inthespiritofFonse aand
Flem-ing, 1998; Ray etal.,2001; Oyama etal., 2007). Thisnew riterion is naturally adapted to the
whenat leastonefeasiblepointisknown. These ondpartofthe ontributionliesinthe
numer-i al methods employed to ompute and optimize the sampling riterion. Indeed, this riterion
takestheformofanintegraloverthespa eof onstraintsandobje tives,forwhi hnoanalyti al
expression is available in the general ase. Besides, it must be optimized at ea h iteration of
thealgorithm to determine the nextevaluation point. In orderto ompute theintegral, we use
analgorithm similar to thesubset simulation method (Auand Be k, 2001;Cérou etal.,2012),
whi h isawellknownSequentialMonteCarlo(SMC) te hnique(seeDelMoraletal.,2006;Liu,
2001,andreferen estherein)fromtheeldofstru turalreliabilityandrareeventestimation. For
theoptimization ofthe riterion,we resorttoan SMCmethodaswell,following earlierwork by
Benassi et al. (2012) for single-obje tive bound- onstrained problems. The resulting algorithm
is alledBMOO(forBayesianmulti-obje tive optimization).
This hapterisbasedonFeliotetal.(2017). Itsstru tureisasfollows. InSe tion2.2,were all
theframeworkofBayesian optimizationbasedon theexpe tedimprovement sampling riterion,
startingwith the un onstrained single-obje tive setting. Se tion 2.3presents our newsampling
riterion for onstrained multi-obje tive optimization. The al ulation and the optimization
of the riterion are dis ussed in Se tion 2.4. Se tion 2.5 presents experimental results. An
illustrationon a two-dimensional toy problem isproposed for visualization purpose. Then, the
performan es of the method are ompared to those of referen e methods on both single- and
multi-obje tive onstrained optimization problems from the literature. Finally, future work is
dis ussedinSe tion 2.6.
2.2 Ba kground literature
2.2.1 Expe ted Improvement
Considerthe single-obje tive un onstrained optimization problem
x
⋆
= argmin
x
∈X
f (x) ,
where
f
is a ontinuous real-valued fun tion dened overX
⊂ R
d
. Our obje tive is to nd an
approximationof
x
⋆
usingasequen eofevaluationpoints
X
1
, X
2
, . . .
∈ X
. Be ausethe hoi eofanewevaluationpoint
X
n+1
atiterationn
dependsontheevaluationresultsoff
atX
1
, . . . , X
n
,the onstru tion of an optimization strategy
X : f
7→ (X
1
, X
2
, X
3
. . .)
is a sequential de isionproblem.
The Bayesian approa h to this de ision problem originates from the early work of Kushner
(1964) and Mo kus et al. (1978). Assume that a loss fun tion
ε
n
(X, f )
has been hosen tomeasure the performan e of the strategy
X
onf
aftern
evaluations, for instan e the lassi allossfun tion
ε
n
(X, f ) = m
n
− m ,
(2.2)with
m
n
= f (X
1
)
∧ · · · ∧ f(X
n
)
andm = min
x
∈X
f (x)
. Then, a good strategy in theBayesiantheaverage istakenwithrespe tto asto hasti pro essmodel
ξ
(denedona probabilityspa e(Ω,
A, P
0
)
, with parameter inX
) for the fun tionf
. In other words, the Bayesian approa hassumes that
f = ξ(ω,
·)
for someω
∈ Ω
. The probability distribution ofξ
represents priorknowledgeaboutthefun tion
f
beforea tualevaluationsareperformed. Thereaderisreferredto Vazquez and Be t (2014) for a dis ussion of other possible loss fun tions in the ontext of
Bayesian optimization.
ObservingthattheBayes-optimal strategyforabudgetof
N
evaluationsisintra tableforN
greaterthanafewunits,Mo kusetal.(1978)proposedtouseaone-steplook-aheadstrategy(also
known asa myopi strategy). Given
n < N
evaluation results,the next evaluationpointX
n+1
is hosen in order to minimize the onditional expe tation of the future loss
ε
n+1
(X, ξ)
givenavailable evaluationresults:
X
n+1
= argmin
x
∈X
E
n
ε
n+1
(X, ξ)
| X
n+1
= x
,
(2.3)where
E
n
stands for the onditional expe tation with respe t toX
1
, ξ(X
1
), . . . , X
n
, ξ(X
n
)
.Most of the work produ ed in the eld of Bayesian optimization sin e then has been fo using,
asthe present paperwill, on one-step look-ahead (orsimilar) strategies
1
;the reader is referred
toGinsbourgerandLeRi he(2010)andBenassi(2013)fordis ussionsabouttwo-steplook-ahead
strategies.
When(2.2) isusedasa lossfun tion,theright-hand side of (2.3) an berewritten as
argmin E
n
ε
n+1
(X, ξ)
| X
n+1
= x
= argmin E
n
m
n+1
X
n+1
= x
= argmax E
n
(m
n
− ξ(x))
+
,
(2.4)with
z
+
= max (z, 0)
. The fun tionρ
n
(x) : x
7→ E
n
(m
n
− ξ(x))
+
(2.5)is alled the Expe ted Improvement (EI) riterion (S honlau et al., 1998; Jones et al., 1998).
When
ξ
is a Gaussian pro ess with known mean and ovarian e fun tions,ρ
n
(x)
has alosed-formexpression:
ρ
n
(x) = γ
m
n
− b
ξ
n
(x), σ
n
2
(x)
,
(2.6) whereγ(z, s) =
√
s ϕ
√
z
s
+ z Φ
√
z
s
ifs > 0,
max (z, 0)
ifs = 0,
with
Φ
standingforthenormal umulativedistributionfun tion,ϕ = Φ
′
forthenormal
probabil-itydensityfun tion,
ξ
b
n
(x) = E
n
(ξ(x))
for thekrigingpredi tor atx
(theposteriormeanofξ(x)
after
n
evaluations) andσ
2
n
(x)
for the kriging varian e atx
(theposterior varian e ofξ(x)
after1
Mo kus(2012, Se tion2.5) heuristi allyintrodu esamodi ation of (2.3) to ompensatefor the fa tthat
subsequentevaluationresultsare nottakenintoa ountinthemyopi strategyand thus enfor eamoreglobal
n
evaluations). See, e.g., the books of Stein (1999), Santner et al. (2003), and Williams andRasmussen (2006)for moreinformation on Gaussianpro ess modelsand kriging(also knownas
Gaussianpro ess interpolation).
Finally, observe that the one-step look-ahead strategy (2.3) requires to solve an auxiliary
global optimization problem on
X
for ea h new evaluation point to be sele ted. The obje tivefun tion
ρ
n
is rather inexpensive to evaluate whenξ
is a Gaussian pro ess, using (2.6) , but itis typi ally severely multi-modal. A simple method to optimize
ρ
n
onsists in hoosing a xednitesetofpointsthat overs
X
reasonablywellandthenperformingadis retesear h. Re ently,sequentialMonteCarlote hniques(seeDelMoraletal.,2006;Liu,2001, andreferen estherein)
have been shown to be a valuable tool for this task (Benassi et al., 2012). A review of other
approa hesis providedin thePhD thesis ofBenassi (2013, Se tion 4.2).
2.2.2 EI-based multi-obje tive optimization without onstraints
We now turnto the aseof un onstrained multi-obje tive optimization. Under this framework,
we onsider a set of obje tive fun tions
f
j
: X
→ R
,j = 1, . . . , p
, to be minimized, and theobje tiveistobuildanapproximationoftheParetofrontandofthesetof orrespondingsolutions
Γ =
{x ∈ X : ∄ x
′
∈ X
su hthatf (x
′
)
≺ f(x)} ,
(2.7)
where
≺
stands forthePareto domination ruledened byy = (y
1
, . . . , y
p
)
≺ z = (z
1
, . . . , z
p
)
⇐⇒
∀i ≤ p,
y
i
≤ z
i
,
∃j ≤ p,
y
j
< z
j
.
(2.8)Given evaluationresults
f (X
1
) = (f
1
(X
1
), . . . , f
p
(X
1
)), . . . , f (X
n
) =
(f
1
(X
n
), . . . , f
p
(X
n
))
,deneH
n
=
{y ∈ B; ∃i ≤ n, f(X
i
)
≺ y} ,
(2.9)where
B
⊂ R
p
is a set of the form
B
=
{y ∈ R
p
; y
≤ y
upp}
for somey
upp∈ R
p
, whi h is
introdu ed to ensure that the volume of
H
n
is nite.H
n
is the subset ofB
whose points aredominatedbythe evaluations.
A natural idea, to extend the EI sampling riterion (2.5) to the multi-obje tive ase, is to
usethevolume ofthenon-dominated region aslossfun tion:
ε
n
(X, f ) =
|H \ H
n
| ,
where
H =
{y ∈ B; ∃x ∈ X, f(x) ≺ y}
and| · |
denotes theusual(Lebesgue) volume inR
p
.
The improvement yielded by a new evaluation result
f (X
n+1
) = (f
1
(X
n+1
), . . . , f
p
(X
n+1
))
isthenthein rease ofthevolume of the dominatedregion (see Figure2.1):
sin e
H
n
⊂ H
n+1
⊂ H
. Given a ve tor-valued Gaussian random pro ess modelξ = (ξ
1
, . . . , ξ
p
)
off = (f
1
, . . . , f
p
)
, dened on aprobability spa e(Ω,
A, P
0
)
,a multi-obje tive EI riterion anthenbederived as
ρ
n
(x) = E
n
(I
n
(x))
= E
n
Z
B
\H
n
1
ξ
(x)≺y
dy
!
=
Z
B
\H
n
E
n
1
ξ
(x)≺y
dy
=
Z
B
\H
n
P
n
(ξ(x)
≺ y) dy ,
(2.11)where
P
n
stands for theprobabilityP
0
onditioned onX
1
, ξ(X
1
), . . . , X
n
, ξ(X
n
)
. Themulti-obje tive sampling riterion (2.11) , also alled Expe ted Hyper-Volume Improvement (EHVI),
has been proposed by Emmeri h and oworkers (Emmeri h, 2005; Emmeri h et al., 2006;
Em-meri h and Klinkenberg, 2008).
Remark 1 A variety of alternative approa hes have been proposed to extend the EI riterion
to the multi-obje tive ase, whi h an be roughly lassied into aggregation-based te hniques
(Knowles, 2006; Knowles and Hughes, 2005; Zhang et al., 2010) and domination-based
te h-niques (see e.g.JeongandObayashi, 2005; Keane,2006; Ponweiseretal.,2008; Bautista,2009;
Svenson and Santner, 2010; Wagner et al., 2010). We onsider these approa hes are heuristi
extensions of the EI riterion, in the sense that none of them emerges from a proper Bayesian
formulation (i.e., a myopi strategy asso iated tosome well-identied loss fun tion). A detailed
des riptionoftheseapproa hesisoutofthes ope ofthisthesis. Thereaderisreferred toWagner
et al. (2010), Cou kuyt et al. (2014) and Horn et al. (2015) for some omparisons and
dis us-sions. See also Pi heny (2014b) and Hernández-Lobato et al. (2015) for other approa hes not
dire tly related tothe on ept of expe ted improvement.
Remark 2 Themulti-obje tivesampling riterion (2.11) redu es tothe usual EI riterion (2.5)
in the single-obje tive ase (assuming that
f (X
i
)
≤ y
upp
forat least one
i
≤ n
).Undertheassumption thatthe omponents
ξ
i
ofξ
aremutually independent2
,
P
n
(ξ(x)
≺ y)
an be expressedin losed form: for all
x
∈ X
andy
∈ B \ H
n
,P
n
(ξ(x)
≺ y) =
p
Y
i=1
Φ
y
i
− b
ξ
i,n
(x)
σ
i,n
(x)
!
,
(2.12)where
ξ
b
i,n
(x)
andσ
2
i,n
(x)
denoterespe tively thekriging predi torand thekriging varian eatx
for the
i
th
omponent of
ξ
.2
Thisisthemost ommonmodelingassumptionintheBayesianoptimizationliterature,whenseveral
obje -tive fun tions,and possiblyalso several onstraint fun tions,have to bedealtwith. SeetheVIPER algorithm
The integration of (2.12) over
B
\ H
n
, in the expression (2.11) of the multi-obje tive EIriterion, is a non-trivial problem. Several authors (Emmeri h and Klinkenberg, 2008; Bader
and Zitzler, 2011; Hupkens et al., 2014; Cou kuyt et al., 2014) have proposed de omposition
methods to arry out this omputation, where the integration domain
B
\ H
n
is partitionedinto hyper-re tangles, overwhi h theintegral an be omputedanalyti ally. The omputational
omplexity of these methods, however, in reases exponentially with the numberof obje tives
3
,
whi h makes the approa h impra ti al in problems with more than a few obje tive fun tions.
Themethodproposedinthis workalso en ounters thistype ofintegrationproblem, but takesa
dierent route to solve it (using SMC te hniques; see Se tion 2.4). Our approa h will make it
possibleto deal withmoreobje tivefun tions.
Remark 3 Exa tandapproximateimplementationsoftheEHVI riterionareavailable, together
withother Gaussian-pro ess-based riteria forbound- onstrained multi-obje tive optimization,in
the Matlab/O tave toolbox STK (Be t et al., 2016b) and in the R pa kages GPareto (Binois
and Pi heny, 2015) and mlrMBO (Horn et al., 2015). Note that several approa hes dis ussed
in Remark 1 maintain an aordable omputational ost when the number of obje tives grows,
and therefore onstitute possible alternatives to the SMC te hnique proposed in this paper for
many-obje tivebox- onstrained problems.
2.2.3 EI-based optimization with onstraints
Inthisse tion,wedis ussextensionsoftheexpe tedimprovement riterionforsingle-and
multi-obje tive onstrained optimization.
Considerrst the aseof problems withasingleobje tive andseveral onstraints:
min
x
∈X
f (x) ,
c(x)
≤ 0 ,
(2.13)where
c = (c
1
, . . . , c
q
)
is a ve tor of ontinuous onstraints. The setC =
{x ∈ X; c(x) ≤ 0}
isalledthe feasible domain. Ifit is assumedthat at leastone evaluationhasbeen made in
C
, itis natural to dene a notion of improvement withrespe t to thebest observed obje tive value
m
n
= min
{f(x); x ∈ {X
1
, . . . , X
n
} ∩ C}
:I
n
(X
n+1
) = m
n
− m
n+1
= 1
c(X
n+1
)≤0
· m
n
− f(X
n+1
)
+
=
m
n
− f(X
n+1
)
ifX
n+1
∈ C
andf (X
n+1
) < m
n
,
0
otherwise.
(2.14)In other words, a new observation makes an improvement if it is feasible and improves upon
thebest pastvalue (S honlau etal.,1998). The orresponding expe ted improvement riterion
3
See,e.g.,Beume(2009),Hupkensetal.(2014),Cou kuytetal.(2014)andreferen esthereinforde omposition
ρ
n
(x) = E
n
1
ξ
c
(x)≤0
· m
n
− ξ
o
(x)
+
.
(2.15)If
f
is modeledbya random pro essξ
o
andc
ismodeledbyave tor-valuedrandom pro essξ
c
= (ξ
c,1
, . . . , ξ
c,q
)
independent ofξ
o
,thenthe sampling riterion (2.15) simplies to S honlauetal.'s riterion:
ρ
n
(x) = P
n
(ξ
c
(x)
≤ 0) E
n
(m
n
− ξ
o
(x))
+
.
(2.16)In other words, the expe ted improvement is equal in this ase to the produ t of the
un on-strained expe ted improvement, with respe t to
m
n
, with the probability of feasibility. Thesampling riterion (2.16) is extensively dis ussed, and ompared with other
Gaussian-pro ess-based onstrainthandlingmethods,inthePhDthesisofSasena(2002). Moregenerally,sampling
riteriafor onstrained optimizationproblemshavebeen reviewedbyParretal.(2012)and
Gel-bart(2015).
In thegeneral ase of onstrained multi-obje tive problems, theaim is to build an
approxi-mationof
Γ
dened by(2.1). If itisassumedthatan observationhasbeen madeinthefeasibleset
C
,areasoning similarto thatusedinthesingle-obje tive ase anbemade toformulate anextensionof the EI(2.11) :
ρ
n
(x) = E
n
(
|H
n+1
| − |H
n
|) ,
(2.17) whereH
n
=
{y ∈ B; ∃i ≤ n, X
i
∈ C
andf (X
i
)
≺ y}
(2.18)is the subset of
B
, dened asin Se tion 2.2.2, whose points are dominated by feasibleevalua-tions. When
ξ
o
andξ
c
areassumedindependent,(2.17) boilsdownto theprodu tofa modiedEHVI riterion,where only feasible points are onsidered
4
, and theprobabilityof feasibility,as
suggestedby Emmeri h etal. (2006)and Shimoyamaetal. (2013b):
ρ
n
(x) = P
n
(ξ
c
(x)
≤ 0)
Z
B
\H
n
P
n
(ξ
o
(x)
≺ y) dy.
(2.19)Observe that the sampling riterion (2.17) is the one-step look-ahead riterion asso iated
to the loss fun tion
ε
n
(X, f ) =
− |H
n
|
, whereH
n
is dened by (2.18) . This loss fun tionremains onstantaslongasnofeasiblepointhasbeenfoundand,therefore,isnotanappropriate
measureof loss for heavily onstrained problems where ndingfeasible pointsis sometimes the
main di ulty
5
. From a pra ti al point of view, not all unfeasible points should be onsidered
equivalent: a point that does not satisfy a onstraint by a small amount has probably more
value than one that does not satisfy the onstraint by a large amount, and should therefore
4
NotethatthismodiedEHVI riterionremainswelldenedevenwhen
H
n
=
∅
,owingtotheintrodu tionofanupperbound
y
upp
inthedenitionof
B
. Itssingle-obje tive ounterpartintrodu edearlier(seeEquation(2.15) ),however,wasonly welldenedunderthe assumptionthatat leastonefeasiblepoint is known. Introdu ingan
upperbound
y
upp
is of oursealsopossibleinthesingle-obje tive ase.
5
Thesameremarkholdsforthevariant(see,e.g.,Gelbartetal.,2014)whi h onsistsinusingtheprobability
offeasibilityasasampling riterionwhennofeasiblepointisavailable. Thisisindeedequivalenttousingtheloss
fun tion
ε
n
(X, f ) =
−1
∃i≤n,X
for onstrained problems, relying on a new loss fun tion that en odes this preferen e among
unfeasible solutions.
Remark 4 Other Gaussian-pro ess-based approa hes that an be used tohandle onstraints
in- ludethemethodbyGrama yetal.(2016),basedonthe augmentedLagrangianapproa hofConn
etal.(1991),andseveralre entmethods(Pi heny,2014a;Gelbart,2015;Hernández-Lobatoetal.,
2015, 2016a) based on stepwise un ertainty redu tion strategies (see, e.g., Villemonteix et al.,
2009; Be t etal., 2012; Chevalier et al.,2014a, for more informationon this topi ).
Remark 5 The term
E
n
(m
n
− ξ
o
(x))
+
in (2.16) an be omputed analyti ally as in
Se -tion2.2.1,andthe omputation ofthe integral in (2.19)has beendis ussed inSe tion2.2.2. Ifit
isfurther assumed that the omponents of
ξ
c
are Gaussian andindependent,then the probabilityof feasibility an be written as
P
n
(ξ
c
(x)
≤ 0) =
q
Y
j=1
Φ
−
ξ
b
c, j, n
(x)
σ
c, j, n
(x)
!
(2.20) whereξ
b
c, j, n
(x)
andσ
2
c, j, n
(x)
standrespe tively forthe krigingpredi tor andthe krigingvarian eof
ξ
c, j
atx
.2.3 An EI riterion for onstrained multi-obje tive optimization
2.3.1 Extended domination rule
Ina onstrainedmulti-obje tiveoptimization setting,weproposetohandlethe onstraintsusing
an extendedPareto domination rule thattakesboth obje tivesand onstraints into a ount, in
thespiritofFonse aandFleming (1998),Rayetal.(2001)and Oyamaetal.(2007). Foreaseof
presentation,denote by
Y
o
= R
p
and
Y
c
= R
q
theobje tive and onstraint spa es respe tively,
and let
Y
= Y
o
× Y
c
.Weshallsaythat
y
1
∈ Y
dominatesy
2
∈ Y
,whi hwillbewrittenasy
1
⊳
y
2
,ifψ(y
1
)
≺ ψ(y
2
)
,where
≺
is the usual Pareto domination rule re alled in Se tion 2.2.2 and, denoting byR
theextendedreal line,
ψ : Y
o
× Y
c
→ R
p
× R
q
(y
o
, y
c
)
7→
(y
o
, 0)
ify
c
≤ 0,
+
∞, max(y
c
, 0)
otherwise. (2.21)Theextendeddomination rule(2.21) hasthefollowing properties:
(i) For un onstrained problems (