HAL Id: hal-00723427
https://hal.archives-ouvertes.fr/hal-00723427v2
Submitted on 17 Sep 2012
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Parallel Budgeted Optimization Applied to the Design
of an Air Duct
Ramunas Girdziusas, Rodolphe Le Riche, Fabien Viale, David Ginsbourger
To cite this version:
Ramunas Girdziusas, Rodolphe Le Riche, Fabien Viale, David Ginsbourger. Parallel Budgeted
Opti-mization Applied to the Design of an Air Duct. 2012. �hal-00723427v2�
Applied to the Design of an Air Du t
ANROMD2Proje tDeliverableNo. WP3.2.1
Ram unasGirdziu²as 1
RodolpheLeRi he 1
,
2 1DEMO,CentreFayol,EMSE,
158 oursFauriel, Saint-Étienne,Fran e 2 CNRSUMR6158
girdziusas[at]emse.fr
leriche[at]emse.fr
FabienViale 3 David Ginsbourger 4 3INRIA,2004ruedesLu ioles,
BP9306902,Sophia Antipolis,
Fran e
4
IMSV,UniversityofBern,
Alpeneggstrasse22,CH-3012
Bern,Switzerland
fabien.viale[at]inria.fr
david.ginsbourger[at]stat.
unibe.ch
Abstra t
Thisworkexploresthebenetsof loud omputinginthedevelopmentofkriging-basedparallel
opti-mizationalgorithmsdedi atedtoexpensive-to-evaluatefun tions.Werstshowhowtheappli ationofa
multi-pointexpe tedimprovement riterionallowstogaininsightsintotheproblemofshapeoptimization
inaturbulentuidow,whi harisesintheautomobileindustry.Ourworkthenpro eedswithavariety
ofexperiments ondu tedontheProA tivePACAGrid loud. Duetoamultipli ativein reaseinsear h
spa edimensionality,themulti-point riterion annotexploitalargenumberof omputingnodes.
There-fore,weemploythe riterionwithanasyn hronousa esstothesimulationresour es,whentheavailable
nodes are immediatelyupdatedwhile a ountingfor the remainingrunning simulations. Comparisons
are madewith domainde ompositionwhi hisappliedhere asanalternative parallelizationte hnique.
Our experimentsindi ate weaknesses inthe use of the multi-point riterion witha syn hronous node
a ess, and benets whenworking inthe asyn hronous mode. Finally, a relatively fast and a urate
1 Introdu tion 3
1.1 Expe ted Improvement. . . 3
1.2 EarlyIdeasofParallelization . . . 4
1.3 Dynami Parallelization . . . 4
1.4 OurPreferen es . . . 5
1.5 Stru tureoftheReport . . . 6
2 ShapeOptimization 8 2.1 Expensive-to-EvaluateFun tion . . . 8 2.2 Algorithm . . . 8 2.3 Results. . . 9 2.4 AnalysisofResults . . . 10 2.5 Con lusions . . . 11
3 Experimentswith Syn hronous Node A ess 19 3.1 Introdu tion. . . 19 3.2 Algorithms . . . 19 3.2.1 Multi-PointImprovements. . . 19 3.2.2 DomainDe omposition . . . 20 3.3 Performan eCriteria . . . 20 3.4 Results. . . 21 3.5 Con lusions . . . 22
4 Experimentswith Asyn hronousNode A ess 26 4.1 Asyn hronousModel . . . 26
4.2 ComputationalAnalysisofWallClo kTime. . . 27
4.3 Testing Asyn hronousAlgorithms. . . 28
4.4 Con lusions . . . 29
5 Integral of theExpe tedImprovementat MultiplePoints 38 5.1 Introdu tion. . . 38
5.2 Integrand . . . 38
5.3 NewMethodsforAdaptiveIntegration . . . 40
5.4 Results. . . 41
6 Con lusions 46
A EstimationofWallClo k Time 46
B Momentsofthe Censored NormalVariable 46
We shall study optimization of expensive-to-evaluatefun tions (budgeted optimization) with a parti ular
appli ation inthedesignof theshapeofanairdu t. Thelatterdemands time- onsumingnumeri al
simu-lationsofaturbulentuidow. OuraimistoimplementandparallelizethealgorithmsknownasBayesian
optimization[11℄,andinparti ular,theexpe tedimprovement (EI)algorithm[28,29℄. Morespe i ally,our
work relies onamulti-pointEI riterionstudied in [35,21, 18℄, andthegoalisto test thealgorithmswith
syn hronousandasyn hronousnodea ess.
1.1 Expe ted Improvement
Thesequentialalgorithmsthatweaimto parallelizehaverstbeendevelopedindependentlybyJ.Mo kus
andH.Kushnerintheearly60s[28,23℄. Bothauthors onsideredGaussianpro essmodelsforan
expensive-to-evaluatefun tionandsuggestedmaximizationoftheauxiliaryquantitiesforthegenerationofnew
andi-date lo ations. H. Kushneradvo atedmaximizationof theprobabilityofanimprovement(PI), J.Mo kus
studiedboth,theprobabilityandexpe tationof animprovement.
The third prominent dire tion of a budgeted optimization utilizes theupper onden e bound (UCB)
of an improvement [14℄. Re ent publi ations have abbreviated the algorithm as GPUCB and supplied
it with a wealth of analyses about Gaussian pro esses in the setting of the so alled multi-armed bandit
problem [5,38,6℄. Thekeydieren efromthepreviousalgorithmshereisthatthelawbetweenexploitation
(samplingattheregionswith alowpreditivemean)and exploration(samplingwhere apredi tivevarian e
is high) hanges as theoptimization pro eeds in time. Inaddition, thefo ushereis onsharperbounds of
theso alled umulativeregretfun tionwhi h anbeatemporalintegraloftheabsolutedieren esbetween
theidealsought ostfun tion valueandthevalueobtainedataparti ulartime. Theaimisto minimizeor
boundthe umulativeregretbytemporally hangingthedeviationweightin theUCBexpressions.
A re ent surveyof the useof the three riteria anbefound in [11℄where theyare alsoreferred to as
a quisitionfun tions. Preferen esoverthemremainrathersubje tive,andweshallfo usonthe
expe tation-basedalgorithmsbe ausetheyhavelessparameterstoadjust.
Theauthorsof[33℄emphasizethela kof onvergen eproofsrelatedtotheEIalgorithms. Thisproblem
has been investigated more thoroughly only re ently [40, 12, 42℄. The exists a proof for two ontinuity
lassesofobje tivefun tions,albeitforalgorithmsthatuseGaussianpro esseswithxed ovarian efun tion
parameters[40℄. Inthisregard,it ouldbeworth itingthefollowingtext[12℄:
"...For pra titioners, however, these resultsare somewhat misleading. In typi al appli ations, the prior
isnot held xed, but depends on parametersestimatedsequentially from the data. This pro ess ensures the
hoi e of observations is invariant under translation and s aling of f, and is believed to be more e ient
(Jonesetal., 1998,2). Ithas aprofoundee ton onvergen e,however: Lo atelli(1997,3.2)showsthat,
for aBrownian motion priorwith estimatedparameters, expe tedimprovementmay not onvergeatall."
Itispossibletodevelopbetterparameterestimators[12℄,butouralgorithmsingeneraldonotre-estimate
the ovarian efun tion,asthey aneasilybexedbeforeea h optimization, lf. Eq.(9). Anotherexample
ofasimpleruleofthumb forsettingupthe ovarian efun tion parametersbefore theoptimization anbe
foundin[6℄. Theabilitytousexedparametersishardlyapra ti allimitationoftheEI algorithms.
Amorerelevantproblemisthattheso alledNEBassumptionstatedin[40℄doesnotprovide onvergen e
resultsfor theGaussian pro esseswhose ovarian efun tion isthe Gaussian kernel. Re ently, ithas been
establishedin[42℄thatthereexistsa lassofunivariateanalyti (innitelydierentiable)obje tivefun tions
whi h annot be optimized with the EI algorithm that relies on the Gaussian kernel. One should bear
in mind, however, that "realisti optimization budgets may be too low in many problems for the indi ated
asymptoti behavior toberelevant"[42℄.
Themismat hbetweenthetheoryandpra ti eisalsoevidentasoftenthesmoothness lassofanobje tive
fun tionisneitherknownnorevenrelevant. Inaddition,hardlyanyexistingalgorithm anbeimplemented
sothattheglobalmaximumofana quisitionfun tionisalwaysrea hed. Thisdispla esthea tualprograms
On ethe loud omputingbe amewidespread,ithasbeenrealizedthatmostofthealgorithmsaresequential,
and theirparallelizationdemands aseparate resear h. Aparallel EI algorithm [37℄mayutilize a
gradient-basedmaximization of thesingle point EI riterion, applied with multiple starting points. Parallelization
anthusbea hievedbyenri hingthestandardEI algorithmwithlo almaximaofana quisitionfun tion.
Another early pra ti al attempt to parallelize relevant algorithms is reported in [33℄. Instead of the
improvement-based riteria,theauthorsutilizeavarietyofother"a quisitionfun tions"and omparetheir
algorithms withthe onedeveloped in [37℄. Notably, parallelizationis a hieved byusing multiple referen e
ostfun tion values
f
minintheEI-related riteria. Thegenerationis reatedbyaddingonepointatatime,
and ea h point is obtainedby maximizing the EI riterion with dierent referen e values. Uniqueness of
andidatepointsisa hievedbyimposing distan e onstraints.
Consideringtheparallelizationperformedin [33,34℄, one andrawausefulwarningthatthespeed-ups
oversequentialalgorithms anbe quitesmall. Forexample,with four omputingnodes, thespeed-upsare
generallylessthanfour,andforthemodiedRosenbro kandA kleyfun tions,ea hwithvevariables,the
reported speed-ups are
1.83
, and1.44
. Ourresults will indi ate aproblem where speed-ups anbe lower. This di ulty ould beavoidedbydesigningalgorithms whi h anleverage alargernumberof omputingnodes. However,oneshould notethatvarious sto hasti samplingmethods havealreadybeenstudied with
largegenerationsizes,andthespeed-upvalueshaveoftenturnedouttobebounded by
O(1)
[39℄.1.3 Dynami Parallelization
Manyexistingparallelizationideassomewhatblindlygeneratemultiple andidatepointsatatimeby
apital-izingonthefa tthatbudgetedoptimizationalgorithmshavealotoffreeparameters. Dynami parallelization
triesto predi ttheout omeof asequentialalgorithmwithouttheuseofexpensivefun tion evaluations. It
may also swit h o parallelizationat the times when the predi tion is not possible, and thus adaptively
requestadditionalevaluationsofanexpensivefun tion.
Most of the presently known dynami parallelization algorithms, see e.g. [10, 15℄ rely on a heuristi
sequentialte hnique,rstintrodu edbyM.S honlau[35℄. The oreinsightutilizesthefa tthatthevarian e
ofanyGaussianpro ess onditionedontheobservationsdoesnotdependonthea tualobservationvalue,but
onlyonitsspatiallo ation. Thisproperty anbeexploitedto reateabat h(generation)ofdistin t andidate
lo ations bypassing their expensive evaluation sequantially, thus, allegedly speeding up the optimization.
The andidatepointsaregeneratedoneatatimebymaximizing ana quisitionfun tion andupdatingthe
predi tivevarian e(andpossibly,butnotne essarily,predi tivemean).
This te hniqueisappliedin [6,8℄,wherethegenerationofnewlo ationsisbuiltin asequentialmanner
des ribed above, by maximizing one-pointEI riterion at a time, and simply repla ing the orresponding
expensivefun tionvalueswiththeonessampledfromitsposteriordensity onditionedonthe urrentdesign
ofexperiments(DOE).Afterobtainingasampleof andidatepoints, lusteringisthenperformedtode rease
redundan y and size of the generation. The lustering riterion is simply thesum of weighted Eu lidean
distan esbetweenthegenerationpointsandits luster enters. Theweightsareprobabilitiesthata ertain
lusterpointprovidesabetter ostfun tionvaluethantherestof the luster enters. There arenoknown
expli it expressions for su h probabilities even in the ase of normal variables, and thus the assumption
of independen e is made and the standard formulae of the Gaussian order statisti s is employed. The
experimentshavebeenperformedwithgenerationsizesxedto5and10.
In theirmorere ent resear h[10℄, theauthors dropoutthe lustering-based methodentirely, andthey
buildthegenerationdire tly(withoutanypostpro essing)bymaximizingone-pointEI riterioninthespirit
oftheirpreviousmethod. However,thegenerationsizeismadeadaptiveanditin reasesonlyiftheboundon
thedeviationofthepredi tivemeanfromitstruevalue(thatwould,intheory,beobtainedwithasequential
one-point EI algorithm) does notex eed a spe ied threshold. A newly added lo ation in the generation
mustbeasso iatednotwithanarbitrary ostfun tionvalue(mean,randomsamplefromposterior),butits
globallyoptimalvaluewhi hisassumedtobeknown. Often,thisisindeedthe asewhenonlytheglobally
Averysimilarin spiritparallelization,albeitoftheGPUCBalgorithm, alled GPBUCB,ispresented
in [15℄. One dieren e is that the pro ess mean fun tion is employed to model the expensive-to-evaluate
valuesduringthe onstru tionofthegeneration,butthemeanvaluesofthegenerationpointsmaynoteven
beupdated. Instead, theUCB deviation weight is adjusted when building the generation whose size also
hangesdynami ally. Thelatteris ontrolledbyanavailable umulativeregretbound. Theauthorsof [15℄
alsosuggestrepla ingtheexa tvarian eupdateswith ertainboundsinordertospeedupthe reationofa
newgenerationof andidatelo ations. Thistri kisalsoemployedin[10℄,but thelatterworkusesdierent
bounds. An interesting byprodu t of both of these methods is that they provide indi ators of when an
expensivefun tionevaluationshouldbeperformed,andwhenitisgoodenoughtousetheregressionmodel
togenerateanew andidatelo ation.
However,inadditiontothedi ultiesofsettingupnewlyintrodu edthresholdparameters,theproblem
withthese methodsisthat they annotee tivelyexplorealltheavailable omputationalnodesasthesize
of the generation is determined algorithmi ally and hanges with time, while parallel resour es are often
xedandlimited. Anotherdrawba kisthatthesizes anbenonuniform,whi hmayyieldsuboptimaltotal
optimizationtimes.
Thelatteraspe tisaddressedin[7,9℄. Theauthorsassumethatthereexistsaspe i distributionforthe
durationofanexpensiveevaluation,andthetotaloptimizationtimeislimitedbyaxedknownvalue. The
numberoftotalfun tion evaluationsisalsoxed, andsoisthemaximalsizeofthegenerationof andidate
lo ations. Assumingthis informationexists, the authorsdevelopageneralmodelwhi h aimsto distribute
generation sizes and determine the orrespondingdurations fortheir parallel evaluations. They introdu e
theso alledCPE riterion,whi hisa umulativetemporalsumofthenumberofjobs ompletedatatime.
Its maximizationisshown toprefer uniform s hedules (distributions ofthe generation sizes)and anthus
beusedto limittheparallelizationsothat thealgorithmutilizesmoreexpensivefun tionvaluesandisstill
ableto meetaspe iedtimehorizon.
Onedi ultywiththisgeneralsettingisthatparallelexe utiontimesaresto hasti (andoftentheexa t
distributions are unknown or hanging), but the model imposes the upper limit on the duration for the
evaluation of thegeneration. Thus, theevaluation may a tually fail to omplete, and the authors further
addressthisdi ultybyintrodu ingthenotionofaprobabilisti safetyofanalgorithm. Therefore,theaim
istomaximizetheprobabilityofasafe ompletion whi hisnotguaranteedtobeunity.
1.4 Our Preferen es
Instead of applying sequential heuristi te hniques dis ussed above, we shall dire tly maximize the
multi-pointEI riterion,whi hseemstohavebeenintrodu edbyM.S honlau,see5.3in[35℄,andwhosepra ti al
relevan e hasbeenjustied onlyre ently,see e.g.[18,21℄. Ithasbeendemonstratedthat amulti-point EI
will be large where, simultaneously, the orresponding one-point EI values are large, and the generation
pointsarenot orrelated. Thus,themulti-pointEI riteriongivespreferen etodistin tmultiple andidate
pointsautomati ally,withoutanyadditionalparameters,heuristi distan e onstraints,oradditivepenalty
fun tions.
The riterion demands fewer adjustable parameters, but its maximization is only possible when the
generationsizes
λ
aresmall,typi allyO(1)
. Itshould beunderstoodthat asmallvalueofλ
doesnotlimit theparallelization. Inparti ular, weshalladvo ateanasyn hronousnode a ess whereonerstsubmitsalargenumberof expensivefun tion evaluations tothe loud,and thenupdatesonly
λ
nodesat atime(the algorithmremainsparallelevenwhenλ = 1
).Themulti-pointEI riterionhasalreadybeenapplied tosele t parametersof variousstatisti al models
in orderto furtherin reasetheirperforman eonsomeknownma hinelearningben hmarks[36℄. Weshall
reportdeviationsintheoptimizationevolutionsw.r.t. theinitialDOE,whi hturnouttobehigherthanthe
errorbarsthat anbeseenin[36℄. Thisindi atesthat ertainparameters,su hasaninitialDOE, anae t
theout omeoftheoptimizationresultsmorethanabetterregressionmodel. Highperforman evariability
howto omputethegradientofthisa quisitionfun tionanalyti ally. Thisisaresear hdire tionwhi h ould
beveryimportantfor theasyn hronousnodea ess where thetimeit takesto generate and ommuni ate
new points (blo king time) should be minimal. Maximization of the multi-point EI riterion is also a
omputational bottlene k during the testing of any of the relevantalgorithms, and afaster maximization
would provideanappre iableaidhere. However,oneshould bearinmind thatthemulti-pointEI riterion
ismultimodal,andthereisnoeasywaytorea hitsglobalmaximumwithlo al optimizationte hniques.
One ouldemphasizethattheframeworkintrodu edin[7,9℄isaverygeneralformalizationofabudgeted
optimizationproblem. Ourasyn hronousoptimizationstudythatwillbepresentedinSe tion4 orresponds
toaparti ular asewhi h theauthors allOnlineFastestCompletion Poli y (OFCP).This poli y isjusta
strategyto al ulateandevaluatenew
λ
andidatelo ationsimmediatelyasλ
omputationalnodesbe ome available. Theirmain ritique,andquiteaprofoundinsight,isthat"itdoesnotusethefulltimehorizon,evenwhendoing sowouldallow for mu h less on urren y"[7℄. Theworksin [7,9℄introdu eanewperspe tive
toBayesianoptimizationbe ausetheyexpli itlyquantifyandminimizethea tualoptimizationtimeinstead
ofrelyingonaprevalentstatementthatBayesianoptimizationis"knowntobee ient".
Wedonotne essarilyadvo atetheuseof thispoli y overothersandourresults,providedinSe tion4,
ouldbeseenasafurtheranalysis andnumeri aleviden ethat onlybetter hara terizesthis poli y.
How-ever, the OFCP poli y is anatural hoi e when the overall time horizon is notgiven, orwhen the exa t
timing hara teristi sof theexpensiveevaluationsarenotknown(but weshallprovideanalysiswhensu h
information is available). The OFCP poli y simply works with an assumption of a xed numberof total
fun tion evaluations, itmaximizes the node o upation time, does notneed anysophisti ated s heduling,
andthere isobviouslynoneedto onsider aprobabilisti safetyinthis ase. Forthesakeof simpli ity,we
shallbypassthede isiontheoreti vo abularyandinsteadoftheOFCPpoli yshallfrequentlyemployaless
informativedes riptionoftheasyn hronousnodea ess.
1.5 Stru ture of the Report
The report rst provides the results of the appli ation of a syn hronous four-point EI algorithm to the
industrial problemofshapedesign,whi haresummarized in Se tion2. Theoptimization operatesinsu h
awaythatonerstsubmitsfourpointsfortheirevaluation,andthenwaitsuntilallofthemare ompleted.
Theregressionmodelisthenupdated,themulti-pointEI riterionismaximized,andthepro essis ontinued
untilthebudgetofexpensiveevaluationsisex eeded. Theevaluationofa ostfun tiontakesabouttwenty
minutes. Weimprovea re ently reported result in [31℄, and provide insightsinto physi al, and statisti al
aspe tsoftheproblem.
Se tion3statesperforman eresultsofvariousparallelizationte hniquesdedi atedtoasyn hronousnode
a ess. Ourresultsindi atethatasimplestrategysu hasthedomainde ompositionis ompetitivewithmore
advan ed methods, butthere areproblems where noneofthe methodsis suitableforparallel optimization
andasinglepointEIalgorithmperformsequallywell. Oneshouldnotethatthetestsarestru turedinsu h
awaythat parallelalgorithmsareexe utedonasinglema hine,andindependentsimulationspertainingto
dierentinitialDOEsarethensenttothe loudtoassesshowaninitialDOEae tstheresults. Thereason
forthis parti ularwayof utilizingthe loudis thattiming hara teristi softheparallel algorithms anbe
ratherobvious,andin thetestingphasethe ostfun tions arenotexpensivetoevaluate.
Optimizationwith anasyn hronousnode a ess isdis ussed in Se tion 4. Westateaparti ular model
for the exe ution time of an expensive-to-evaluate fun tion, simulate the asyn hronous point generation
s enariosbasedon theproposed timing model, and testthe performan eof themulti-pointalgorithms by
submitting independent optimizations, ea h with adierent initial DOE, to the loud. Here the fo us is
on the average time for a new generation to a tually be sent to the loud, whi h will bereferred to asa
wall lo k time. A wall lo k time depends not only on the time it takes to maximize the improvement
andto ommuni atetheresultstotheremotenodes,butalsoonwhenandwhereaparti ularnode be omes
available while other nodesare a tivewith the evaluation. Thisis onedieren ewith thepreviousworkon
EI riterion. Wehaveobservedin ournumeri alexperimentsthattheintegralhasape uliarpropertythat
itsupperboundliesextremely losetothea tualvalue,espe ially(butnotne essarily)whentheexamined
expe tedimprovementsarefurtherawayfromthelo ationswhere theyaremaximal. Inessen e,we hoose
to work within theframework ofsystemati sampling [16℄ (as opposed to importan e sampling) and show
thatone an onsiderablyimprovesymmetri monomialrules(uns entedtransforms)byrepla ingmonomials
withone-pointimprovements. However,onemustalsomentionthatastandardMonteCarlosamplingproves
tobeaveryreliableintegrationte hnique,espe iallyatthelo ationswheretheexpe tedimprovementsare
maximal.
Aswillbeseenintheresultsprovidedinthisreport,asigni antbenetofusinga omputing loudisthat
itallowslarges aletestingofthealgorithmswithdierentparametersettings. Forexample,parameters,su h
asaninitialDOE,greatlyae ttheoptimizationresultsandareveryhardto"integrateout". Theabilityto
utilize loudresour esallowsonetoa tuallysendrepli asoftheoriginalsimulationwithparameter hanges
andthen seetheee ts. Thisisveryhardto a hievewhenrunningthingslo ally onasingle omputer(in
aserialmanner)be auseabudgetedoptimizationofinexpensive-to-evaluatefun tions isitselfavery
time- onsuming pro ess. In our work, a single ost fun tion evaluation in the rank-one matrix approximation
problemmaytakemi rose ondstoevaluate,but asingle ompleteoptimizationmayeasilyrea htenhours
(whentheCPUrateis2.5GHz). Ourabilitytorun the odesontheProA tivePACAGrid loud[3℄allows
2.1 Expensive-to-Evaluate Fun tion
Ourgoalistooptimizethegeometryofa oolingdu t,whi hhasalreadybeenstudiedin[31℄. The riterion
isthenormalizedpressuredieren eoftheowattheinletandoutletofadu t,whi hisindi atedinFig.1a.
TheoptimizationparametersareshowninFigs.1bd.
It will su e toemphasize that the riterionis apositivequantitywhose omputation is ademanding
numeri al solution of the
k
ǫ
model of a uid ow. The ow is linear, vis ous (ν = 1.6 · 10
−4
m
2
/s
.),
in ompressible,andturbulent(
Re = 4000
).The
k
ǫ
modelisamixedsystemofnonlinear partialdierentialandalgebrai equations[17℄:∂ ¯
u
i
∂t
+ ¯
u
j
∂ ¯
u
i
∂x
j
=
∂
∂x
j
(ν + ν
T
)
∂ ¯
u
j
∂x
i
+
∂ ¯
u
i
∂x
j
−
1
ρ
∂
x
i
¯
p +
2
3
ρk
,
(1)∂ ¯
u
j
∂x
j
= 0,
(2)∂k
∂t
+ ¯
u
j
∂k
∂x
j
=
∂
∂x
j
ν +
ν
T
σ
k
∂k
∂x
j
+ ν
T
∂ ¯
u
i
∂x
j
+
∂ ¯
u
j
∂x
i
∂ ¯
u
i
∂x
j
− ǫ,
(3)∂ǫ
∂t
+ ¯
u
j
∂ǫ
∂x
j
=
∂
∂x
j
ν
T
σ
ǫ
∂ǫ
∂x
j
+ C
ǫ1
ǫ
k
ν
T
∂ ¯
u
i
∂x
j
+
∂ ¯
u
j
∂x
i
∂ ¯
u
i
∂x
j
− C
ǫ2
ǫ
2
k
,
(4)ν
T
= C
µ
k
2
/ǫ.
(5)Itdes ribesthetimeaveragesofthepressureeld
p
andtheowvelo ityeldu
:¯
p ≡ lim
T →0
1
T
Z
T
0
p(x, t)dt, ¯
u
i
≡ lim
T →∞
1
T
Z
T
0
u
i
(x, t)dt.
(6)Theauxilliaryelds
k
,ǫ
,andν
T
aretheturbulentkineti energyk
,thespatialdissipationrateofk
, alledǫ
,andtheturbulentvis osityν
T
,resp. Oneshouldnoti ethatthekinemati vis osityν
isa onstant,whileν
T
isaeld.Theinitialandboundary onditionsareindi atedinTable1. Theimplementationusestheopensour e
library alled OpenFOAM [2℄. The wall fun tions "
k
w", "ǫ
w", and "ν
T
w" are the OpenFOAM fun tions "kqRWallFun tion", "epsilonWallFun tion", and "nutWallFun tion", resp. The latter two override theirdefault parametervalueswith
C
µ
= 0.09
,κ = 0.41
,E = 9.8
. Theinitial valuesof thequantities omputed bythewall"fun tions" orrespondtotheinitialvaluesoftheeldsshownin thelast olumnofTable1.Inaddition to OpenFOAM, a omplete software sta k of this uid dynami s simulation in ludes
CA-TIA [1℄ (a 3D model of a du t), STARCCM+ [4℄ ( omputational mesh generation), and ParaView [20℄
(visualization).
2.2 Algorithm
It is not transparent how the pressure dieren e depends on the parameterswhi h spe ify the geometry
of a du t. Various admissable hanges of the geometry are not visually dis ernable, and the model is a
massivenonlineardynami alsystem. Thismotivatestheappli ationofabudgetedoptimization. Thistype
optimizationestimatesthekrigingmodelofanexpensive-to-evaluatefun tion,andgeneratesnew andidate
lo ations by maximizing the multi-pointexpe ted improvement. Inparti ular, given
µ
a tivepointsx
1:µ
andλ
freenodes,thealgorithmndsλ
newpointsbysolvingthefollowingproblem:max
x
∈
Rdλ
E
max (0, min (f
min
, Y (x
1:µ
)) − min Y (x)) |A
Table1: Initialandboundary onditionsforkeyquantities ofthe
k
ǫ
model.Name Field Units Boundary onditions Initial onditions
Inlet Outlet Wall
˜
p = ¯
p/ρ
Normalizedpressurem
2
s
2
∇˜
p = 0
p = 0
˜
∇˜
p = 0
0
u
Flowvelo itym
s
−n
0
ifu
· n ≤ 0
0
0
k
Turb. kin. energym
2
s
2
10
−3
∇k = 0
"kw"10
−3
ǫ
DissipationRateofk
m
2
s
3
10
−1
∇ǫ = 0
"ǫ
w"10
−1
ν
T
TurbulentVis ositym
2
s
3
0
∇ν
T
= 0
"ν
T
w" 0where
f
min
isthe urrentminimum,Y (x
1:µ
) = (Y (x
1
), . . . , Y (x
µ
))
andY (x) = (Y (x
µ+1
), . . . , Y (x
µ+λ
))
are random surrogates(kriging model).A
denotes the event whenY
values equalto all known expensive-to-evaluatefun tions atalltheknownlo ations. Methodsto omputetheexpe tationinEq.(7)aredis ussedinSe tion 5.
Consideringtheuseofkrigingintheoptimization,onemayreferto[18,21℄formoredetails. Inaddition,
wehaveappliedafew hangestowhatseemstobeastandardpra ti e. Theyarenot on eptuallyinteresting,
butareworthmentioning:
1. The expe ted improvementis maximized by using theCMA-ES algorithm [19, 39℄. Box onstraints
arehandled byproje tingthe oordinatesontheboundsandaddingthepenaltytermtoanexpe ted
improvement. The penaltyis proportional to theEu lidean distan e from the optimization point to
theboundaryifthepointisoutofbounds,and iszerootherwise.
2. Conditionalexpe tationsare al ulatedbyusingthepseudoinverseoftheDOE ovarian ematrix. This
method overestimates the onditional varian es, but it does notdemand any additional parameters,
anditalsoredu es tothestandardinversein theabsen eofsingularities.
3. Whenthe onditional ovarian ematrixofthekrigingresponsesissingular,thevalueoftheexpe ted
improvementissimplysettozero. Hereby"singularity"itismeantanythingthatbreakstheCholesky
de omposition. Thelatterispla edinsidethe"tryblo k" ofthe"tryand at h"ex eptionhandling.
4. Multi-pointexpe tedimprovementsare al ulatedbyusingtheMonteCarlosamplingwithone
thou-sand points. This standard method is simple, omputationally inexpensive, and reliable w.r.t.
in- reasingdimensionsofanintegrationdomain. Theseedoftherandomgeneratoris settothe urrent
generation number,sothat theintegrationroutineusesthesamerandompointswhenevaluatingthe
expe tedimprovementat dierentspatiallo ations.
5. KrigingisappliedwithGaussiankernelswhosevarian esvarywithea h oordinate. Thevarian esare
determined by squaringthemedian of theabsolute deviationsfrom the median ofa parti ular
oor-dinate. Thisis simplerand fasterthan anyiterativeestimation and, moreimportantly,it guarantees
that theappearan eof losepointsinDOEdoesnot hangekernelvarian esunexpe tedly.
We shall apply what is known as the syn hronousmulti-point algorithm [18℄ with
λ = 4
points, whi h is brieyabbreviatedasEI0,4
. The hoi eofgenerating fourpointsatatimedemands theoptimizationwith
8 × 4 = 32
variables. Askingformorepointsatatime,orusingDOEswithmorepointsthanO(10
3
)
would
introdu eseverenumeri aldi ulties.
2.3 Results
The minimization of the pressure dieren e is shown in Fig. 2. The rst 320 observations are generated
LOBS UPBS Worst [31℄ Ourresult
x
1
0.0036
0.0166
0.0036
0.0149
0.0132
x
2
0.3
0.8
0.3760
0.4202
0.4756
x
3
0.0027
0.0207
0.0207
0.0102
0.0207
x
4
0.0405
0.0595
0.0595
0.0479
0.0450
x
5
1.25
1.5707
1.2525
1.5582
1.5707
x
6
0.21
0.42
0.2254
0.3849
0.3914
x
7
0.047
0.055
0.055
0.0541
0.0547
x
8
0.0008
0.0088
0.00081
0.0014
0.0016
pd nan nan1.28
0.59 ± 0.01
0.56 ± 0.01
observation number 320. The optimization then pro eeds viaa syn hronousgeneration of four andidate
points. Theyareobtainedbymaximizingtheexpe tedimprovementwiththeCMA-ESalgorithm[19℄whi h
usesitsdefaultparameters,ex eptthattheinitial oordinatedeviationis hosentobe
0.1
,andthenumber ofiterationsissetto3000
.OptimizationresultsarepresentedinTable2. One anseetheboundsofthevariables,theworstobserved
pointwhi hgivesthemaximalpressuredieren evalue
1.28
,previouslyavailablebestresult[31℄,alongwith ourimprovement. Thepresen e of"nan" valuesindi atesthat thepressurevaluesare notavailable atthepoints whose all oordinates are simultaneously equalto either the lowerorupper bound. The geometry
annotbemeshed inthesetwoextreme ases.
2.4 Analysis of Results
The optimization results analso be highlightedby omparing theoptimal elds with theworstobserved
ases. Theworst observedgeometryisshownin Fig.3. Itonlyservesthepurposeofdisplayingthesli ing
planeonwhi h theeld valueswill bedisplayed,and insettinguptherangeforthepressurevalues,whi h
is
[0, 4]
. Thesurfa eofthedu tis oloredwiththeParaView[20℄s heme"hsv-blue-to-red"whoserangeof olorsisalsodisplayedinthe olorbar.Thevaluesofthepressureeldonthesurfa eanditssli eareshowninFig.4. Bothshapesarehardto
dis ernvisually,butthedieren es anstillbenoti edwithoutanyspe ialtools. Intheoptimized ase,the
pressurevaluesaresmalleronthewallsattheine tionofthedu t.
The omponentsofthevelo ityeldareshowninFig.5. For omparisonpurposes,therangesoftheeld
valuesarekeptthesameinboththeworstandoptimal ases,andthe olorspa eistheoneusedwiththe
pressureelds, lf. Fig.3. Therangesforthe
x
,y
,andz
- omponentsare[−0.3, 1]
,[−0.4, 1.4]
,and−1.6, 0.2
, resp. One anseethatthevelo ityeldofaowintheoptimized aseisgenerallysmoother,andee tivelyusesalargervolumeofadu t.
Theoptimizedgeometries areveryhardto dis ernvisually, and thepressureelds arenearly opti ally
identi al,whi hisalsoa ompaniedbyrathersmalldieren esinthenumeri alvaluesofthepressureelds.
However,theresultsarenotidenti al,andthedieren esbe omemostpronoun edwhenlookingatvelo ity
eldsshowninFig.6. One ansee thatourresultis slightlysmoother,whi h anbeseenintheupperleft
areasof thesli es in the
x
andz
- omponents(a,d, ,f), and at theine tionpoint of adu tin the aseof they
omponent(b,e).Inorder to see if our result diers from the onein [31℄ statisti ally, we have performed the prin ipal
omponentanalysis onthedata orrelation (not ovarian e)matrix. Thedataisthematrixof size
8 × 788
whose olumns are the andidatelo ations generatedduring theoptimization (the data orrelationmatrixis of size
8 × 8
). The results are shown in Fig. 7. They indi ate the proje tions of the data ve torson the hoseneigenve torsof the orrelationmatrix. Inadditionto the data, severalimportant lo ationsareCoord.
v
1
v
2
v
3
v
4
v
5
v
6
v
7
v
8
1
0.47061
−0.01957
0.00218
−0.34455
−0.09434
−0.65004
−0.26550
−0.39683
2
−0.24119
−0.39696
0.20201
−0.27114
−0.80995
0.01527
0.04344
0.10855
3
−0.50275
−0.03497
0.19678
0.10206
0.19948
−0.48401
−0.50937
0.40419
4
0.04089
0.33272
0.87867
0.14932
−0.04204
0.13262
−0.04856
−0.26749
5
−0.49912
0.22259
−0.30076
−0.06057
−0.11561
0.22181
−0.39944
−0.62056
6
−0.33944
0.20969
0.11621
−0.74648
0.29979
−0.07762
0.41716
0.01265
7
−0.16022
0.56533
−0.18227
0.33491
−0.36915
−0.44528
0.41711
0.02842
8
−0.27554
−0.56302
0.10589
0.31937
0.23239
−0.26807
0.39781
−0.45799
indi atedwithdierentmarkers,andtheyare: thepresentoptimalsolution(Opt),lowerandupperbbounds
(LB,UP,resp.),previousresult[31℄(PrevBest),theaveragevalueofthebounds(Midpoint),andtheworst
observedpointduringtheoptimization(Worst).
Asthe on entrationofvarian ebytherstprin ipal omponentsisnotverypronoun ed,onendsout
thatdata doesnotlivein asubspa eofR
8
andallthe oordinatesarevaluable. Therefore,the
parameter-ization ofthe problemis notredundant. However,thedimensionof theproblem ouldhavebeenredu ed
downtoR
5
be ausethese ond,fth,andeightpri ipal omponentsdonotdis riminatetheoptimallo ation
fromthemiddlepointortheworstpoint.
When ompared to the previouslyavailable result [31℄, oursolution is situated further away from the
worst ases enariowhen looking at thingsalong theprin ipal dire tions
1
,6
, and7
, but is loserto it in thedire tion8
. Interestingly, in thefour-dimensionalsubpsa espannedbytheeigenve tors2
,3
,4
, and5
, theresultin [31℄isalmostidenti al toours.Therstprin ipal omponentallowstoseparatetheoptimizedpointsfromtheinitialDOE.Itturnsout
that thethird oordinateofthersteigenve torhasthelargestmagnitude,whi h,in identally,istheonly
oordinatewhi hmakesoursolutionsigni antlydierentfromthepreviousresult(inour ase
x
3
isroughly doubled). Forthesakeof ompleteness,the oordinatesofalloftheeigenve torsareshowninTable3.2.5 Con lusions
When a vast majority of admissible uid domains are opti ally indistinguishable, the optimization of a
geometry anbehardtoperformmanually. Kriging-basedoptimization provestobehandywhenmakinga
progresswithasmallbudgetof the ostfun tion evaluationswhi h istypi allylessthan
O(10
3
)
. Wehave
madeanimprovementtotheprevioussolutionobtainedin[31℄andhaveidentieditsrelationtoourresult.
Interestingly, theprevious optimization is almost identi al to oursin the subspa eof R
8
spanned by four
eigenve torsofthe orrelationmatrixofallthepointsgatheredduringthesear h. Theprin ipal omponent
analysis suggeststhat
x
3
is an imporantparameter, and theintrinsi dimensionof the problem, i. e. the numberofindependentparameterswhi h oulddierentiatetheoptimalgeometryfromthesuboptimalone,(b) (d)
Figure1: Optimization riterionisthedieren ebetweentheaverage(normalized)pressureeldattheinlet
0
100
200
300
400
500
600
700
787
Observations
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
Pressure difference
Moving minimum
Figure2: Normalizedpressuredieren e[
m
2
s
2
℄w.r.t. in reasingnumberofobservationsduringthe optimiza-tion. Therst320observationsaregeneratedviatheLHSalgorithm.(b) (d)
(b) (e)
( ) (f)
Figure5: Velo ityelds:
x
- omponents(a,d),y
- omponents(b,e),andz
- omponents( ,f). Therst olumn orrespondstotheworstobserveds enario;these ond olumn showstheoptimizedelds.(b) (e)
( ) (f)
Figure6: The omponentsofthevelo ityeldofaow:
x
-dire tion(a,d),y
-dire tion(b,e),andz
-dire tion ( ,f). Therst olumn orrespondstotheresultin[31℄;these ond olumnisourresultwhi histherepli a−3
−2
−1
0
1
2
3
4
5
pc1 (26%)
−4
−3
−2
−1
0
1
2
3
pc
2
(1
8%
)
DOE
New
Opt
LB
UB
PrevBest
Midpoint
Worst
−3
−2
−1
0
1
2
3
pc5 (10%)
−3
−2
−1
0
1
2
3
pc
6
(9
%
)
DOE
New
Opt
LB
UB
PrevBest
Midpoint
Worst
(b) (d)−3
−2
−1
0
1
2
3
pc3 (12%)
−3
−2
−1
0
1
2
3
4
pc
4
(1
1%
)
DOE
New
Opt
LB
UB
PrevBest
Midpoint
Worst
−3
−2
−1
0
1
2
3
4
pc7 (7%)
−3
−2
−1
0
1
2
3
pc
8
(6
%
)
DOE
New
Opt
LB
UB
PrevBest
Midpoint
Worst
3.1 Introdu tion
Se tion2hasfo usedontheappli ationofaparti ularkriging-basedoptimizationalgorithmtotheindustrial
problem. On average, it takes twenty minutes to evaluate a ost fun tion in su h a problem. A single
optimizationthendemandsdaysto omplete. Consideringthedown-timesofthe loud,asingleoptimization
maydemand weeksto omplete.
Thus, onemay ask whether our results rightfully ree t what an be a hieved with a whole family of
multi-pointimprovement-basedalgorithms des ribed in [21℄. Oneshould note that so far we haveapplied
onlyonesu halgorithm, whi h generates
λ = 4
pointsatatime, syn hronously. It wasapplied on e, and onlywithasingle ostfun tion.Weshallreportourtestswitharti ialfun tions,whi hwillfurtherindi ate somelimitationsand
unex-ploredpossibilitiesofthekriging-basedalgorithms. Inthisse tion,wewill fo usontheasyn hronousnode
a essandwilltrytomeasure whethermulti-pointimprovementshelp. Thealgorithmswillbetestedalong
withthestrategyofthedomainde omposition.
3.2 Algorithms
3.2.1 Multi-PointImprovements
The use of multi-point improvements [21, 18℄ is a theoreti ally appealing dire t extension of the kriging
algorithmswith one-pointimprovements. The problemwith thisapproa histhat it doesnots alewell as
themaximizationoftheexpe ted
λ
-pointimprovementsdemandstheoptimizationinλ × d
dimensions. In addition,λ
annot beverylarge in prin iplebe ause theminimum overan in reasing numberof random variablesispusheddownindependentlyofthedemandsofaproblem,andthustheexpe tedimprovementsbe omeseverely overestimated. Theyare typi allyoverestimatedanyway,but onesuspe ts that whenthe
generationsizesarenotbig,su has
λ = 4
,thealgorithm anbeimplemented orre tlyandonemaya hieve afaster optimization.Howfastanoptimization anbe? Letusintrodu ethequantity alledwall lo ktime (WCT),whi his
theaveragetimebetweentwo onse utiveupdatesofthenodesinthe loud. Itdeterminestherateatwhi h
thepointsaresentto(re eivedfrom) aremote loud. Figure8presentstiminganalysisofthesyn hronous
optimizationwithmulti-pointimprovements.
Syn hronousmode,
λ = 1
Node1t
u
3 3 3 3 3 3 3 3 3 3 3 3 Syn hronousmode,λ = 4
Node1 Node2 Node3 Node4t
u
6 6 6 6 6 6Figure 8: Inthesyn hronous ase, itisthe slowest nodethat determinesthe nodeupdate time. Thetime
ostsof updating
λ > 1
points willtypi allybegreaterthan inthe singlepoint ase, unless everyoneout ofλ
omputationalnodes isfaster than theoneapplied in theoptimization withone-pointimprovements. Thisexampleshowsthe asewhent
b
isonetimeunit,independentlyofanalgorithm. Thewall lo ktime in reasestwi ewhenλ
hangesfrom1to4.to-evaluate fun tion,and thelowlevelspans thetime when thenode is idle. One ansee that theuse of
multiplepointsin reasesthewall lo ktime(WCT),andthelatterwillbesolelydeterminedbytheslowest
omputationalnode.
Letusintrodu etheblo kingtime
t
b
,whi histhetimeittakesto: (i)re eiveλ
fun tionevaluations,(ii) generateλ
andidatepoints,and (iii)sendthemto theλ
freenodes. One anthenperformamorepre ise analysisbyassumingthatthetimeittakesto evaluateanexpensivefun tion isuniformlydistributed. Thenodeupdatetimewill thenbearandomvariabledened as
T
u
= t
b
+ max{T
1
, T
2
, . . . , T
λ
}, T
i
∼ U(t
min, t
max).
(8)
Forexample, let
t
min
= 10
,t
max
= 30
and
t
b
= 2
timeunits. Then, WCT≡ E(T
u
) = 22
whenλ = 1
, but itin reasesup toWCT= 28
whenλ = 4
. Therefore, thesyn hronousmulti-pointoptimization algorithm needstoapproa htheoptimumatleast28/22 = 1.27
timesasfastinordertosavetime.3.2.2 DomainDe omposition
Thisisoneofthesimpleststrategiestoemploywhenmakinganyoptimizationalgorithmparallel. Thedomain
is dividedinto
s
parts (subdomains)and theoptimization is performedin ea h subdomain independently, preferrablyin parallel. Inwhat follows, we shallperform theoptimization ind = 2
,6
, and9
dimensions, andthenumberofsubdomainswillbe32
. Inthe aseoftwodimensions,wedividetherst oordinateinto eightequalparts,andthese ondintofour. Inthe aseofalargernumberdimensions,wesimplyhalvetherstve oordinatesandobtainin thisway
2
5
= 32
subdomains.
Itisimportanttoemphasizethatthedomainde ompositionisastrategy. It anbeappliedtomakeany
algorithmparallel.Weshalluseitwithboth,one-pointimprovements,andmulti-pointimprovementsaswell.
Theinterestingquestioniswhethertheuseof thedomain de omposition withtheone-pointimprovements
anbeasgoodastheuseofmulti-pointimprovementsalone. Inthat ase,onewould denitelypreferthe
former,asita hievesaperfe tisolationbetweentheparallel owsof theprogram whereasthe multi-point
algorithmismoredemandingregardingitsimplementation.
3.3 Performan e Criteria
Alloftheoptimizationalgorithmsaremadetobedeterministi inordertoremovetheunne essarydegrees
of varian e. Firstly, weswit h o themaximum likelihood estimation of theGaussian kernelvarian es in
thekriging. Insteadofestimatingthem,thefollowingsimpleruleisapplied
kernelvarian e
i
=
|
upbi
−
lobi
|
2
1+
8
d
2
, i = 1, . . . , d.
(9)Here
d
isthenumberofdimensionsoftheoptimizationspa e.The main idea behind this formula is that we shall typi ally generate
500
points during the entire optimization(in ludingthepointsoftheinitialDOE).Thisisarealisti budgetforanexpensive-to-evaluatefun tion on onehand, and the limitafter whi h working with densematri es be omesveryine ient (at
best). Thus, in all of thesimulations, on average, thenumber ofobservations used in krigingis
250
. We then"round"thisnumberupto2
8
= 256
,andthen
2
8/d
be omesthenumberof"ti ks"that anbepla ed
onea h oordinateaxiswhenassumingthat thepointsare distributeduniformlyin spa e. Theadditionof
unity is somewhat arbitrary and notreally ru ial,but it servesone purpose. When
d = 8
, thevarian e be omesequaltoasquared"medianofthemedian ofabsolutedieren es oordinate-wise".Inaddition, theMonteCarlo(MC)integrationoftheexpe tedimprovementisalwaysinitializedtothe
urrentgeneration number. Thus, theonly"degree offreedom"is theinitialDOE, and ea h familyofthe
algorithms annowbetestedwithanumberofoptimizations. Ea hoptimizationwillthen orrespondtoa
dierentinitial DOE.This numberwillbesetto onehundred,but itmaya tually be ome smallerifsome
riteria. Thedetails aregivenin Table4.
Table4: OptimizationCriteria
Label Costfun tion Domain Minimalvalue Modality
"mi halewi z2d"
P
2
i=1
sin(x
i
) sin
2
(ix
2
i
/π)
[0, 5]
2
−1.841
multimodal "rosenbro k6d"P
5
i=1
100(x
i+1
− x
2
i
)
2
+ (1 − x
i
)
2
[0, 5]
6
0
unimodal "rank1approx9d"kA
4×5
− x
4×1
y
1×5
k
2
,a
ij
∼ U(0, 1)
1
[−1, 1]
9
0.7119
bimodal
Thesefun tionsaresimpletostateandtoimplement. Theyarealsofasttoevaluate. Thelatterfeature
still does notlet to perform testing on a single ma hine easily asa kriging-based optimization may take
hoursevenwhenappliedto reateonlyonehundredgenerations. However,theuseoftheProA tivePACA
Grid loud[3℄ providesthepossibilityto testthealgorithmswithdierentinitial onditionsat on e.
Theoptimizationqualitywill beassessedbyusing thenormalizedrealimprovement(NRI) denedas
NRI
(
generation) =
f
0
− f
min(
generation
)
f
0
− f
true.
(10)Here
f
0
isthesmallestvalueofthe ostfun tiona hievedontheinitialDOE,whi his reatedbyusingthe LatinHyper-CubeSampling(LHS),f
min
denotes thevaluea hievedafteraparti ular generationofpoints
isevaluated,and
f
trueisthetrueidealminimal value,whi hisgivenin Table4.
Also,itisusefultosummarizetheperforman eofvariousalgorithmsbydeningtheirspeed-up,su has
S
0
(
NRI) ≡
timetorea hNRIbyEI
0,1
timeto rea hNRIbyEI
0,λ
.
(11)Here the referen e algorithm is krigingwith one-pointimprovements, and the speed-up is dened for the
kriging-basedoptimizationwith
λ ≥ 1
points.Inorder to takeinto a ount the blo king time, onedenes thereal-time speed-up of the multi-point
algorithmoveritssinglepoint ounterparta ordingto
S
1
(
NRI) =
S
0
(
NRI)
RTF= S
0
×
WCTforthealgorithmEI
0,1
WCTforthealgorithmEI
0,λ
.
(12)HereRTFisareal timefa tor whi histheratioofthe orrespondingwall lo ktimes. The orresponding
riteriaforthedomain de ompositionaredened similarly. TheWCTvaluesofallthealgorithms thatare
testedwiththesyn hronousnodea essaregiveninTable5.
3.4 Results
Theoptimization resultsare shown in Figs. 9and 10. One ansee that the optimizationpaths vary alot
w.r.t. the initial DOE, but this ee t is less pronoun ed in the problem "rank1approx9d". In the spa e
with alarge numberof dimensions it is harder to generate aninitial DOE whi h ontainspoints lose to
theglobaloptimum. Theproblem"rosenbro k6d"seemstobeeasyanditssolutionis losertotheproblem
"mi halewi z2d" thanthe"rank1approx9d" ase. Intheformer two asestheapproa hto theoptimumis
mu hfaster.
Thevaluesofthespeed-up
S
0
are omparedin Table5. One anseethat parallelizationbringsnotable improvementswhen solvingtheproblems "mi halewi z2d"and "rosenbro k6d", but thegainisverysmallfor theproblem "rank1approx9d". The latterpointbe omesespe ially strong ifwe onsider thespeed-up
S
1
whi h takesintoa ounttherealtimefa tor.1
Thea tualmatrixisgenerated withtheS ilab 5.3.3"grand"fun tion. TheMersenne Twister isappliedwithan initial seedsettothenumber29.
Table5: Wall lo ktimes,realtimefa tors,andspeed-upsofsyn hronousoptimization,NRI
= 0.8
. Param-eters:t
min= 10
,t
max= 30
,t
b
= 2
.WCT RTF "mi halewi z2d" "rosenbro k6d" "rank1approx9d"
S
0
S
1
S
0
S
1
S
0
S
1
EI0,1
22
1
1
1
1
1
1
1
EI0,4
28
1.3
3.7
2.9
2.7
2.1
1.3
1.0
EI0,1
+de om30
1.4
2.4
1.8
2.1
1.5
0.70
0.50
EI0,4
+de om30
1.4
4.6
3.4
4.5
3.2
1.2
0.86
Onendsoutthatdomainde ompositionisaboutasgoodastheuseofmulti-pointimprovements. Both
paralleloptimizationmethods analsobe ombinedtoyieldanevengreaterperforman e. However,noneof
theparallelizationmethods areworththeeort onsideringthe"rank1approx9d"problem. Consideringthe
domain de omposition, perhaps this isnot veryhard to explain. Inahigh-dimensional spa e, i.e.,
d = 9
, halvingtherstve oordinates anmakethekrigingalgorithmslessexplorative(global).Agooduseofdomainde ompositionseemstobeaqui kassessmentofmultimodalityofthe ostfun tion.
Figure 11 indi ates one out of one hundred optimizations in full detail. One an see that some of the
optimizations rea h very high NRI values indi ating that the orresponding subdomain may ontain the
globaloptimum.
3.5 Con lusions
The use of multi-point improvements (
λ = 4
) brings notable speed-ups to the problems "mi halewi z2d" and"rosenbro k6d". However,thealgorithmis notase ientasthebaseline EI0,1
method inthe aseof
"rank1approx9d". Thisismostlikelyduetothein reaseddimensionallityoftheproblemalthoughadditional
testswouldbene essarytostudyifthis ouldalsobeanee tofthefun tionallands ape. Thesameapplies
tothedomainde omposition. Consideringthe"mi halewi z2d"and"rosenbro k6d"problems, runningthe
EI
0,1
algorithm with 32 subdomains is better than using EI
0,4
without any domain de omposition. Both
methods havebeen ombinedto gainan additive ee t ontheoverall speed-up. However,neither domain
de omposition nor multi-point improvementsprovide an advantage to the use of a single-point EI in the
"rank1approx9d"problem.
Thealgorithms withmulti-pointimprovementsfundamentally annots alewellbe ausetheyinternally
involveamaximizationofthejointimprovementin
d × λ
dimensions. Inordertomakeafurtherprogress,it seemsthatone ould: (i)eitherin reasetheλ
valuedramati ally(totensandhundredsofpoints)bymaking substantialsa ri esinthequalityoftheimprovementmaximization,or(ii)resorttotheasyn hronousnodea ess whi h may redu e wall lo k times. The se ond way seems to be more viable and will further be
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=1, sync
mu=0, lambda=1, sync, decom, best
mu=0, lambda=1, sync, av
mu=0, lambda=1, sync, decom, best, av
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=4, sync
mu=0, lambda=4, sync, decom, best
mu=0, lambda=4, sync, av
mu=0, lambda=4, sync, decom, best, av
"rosenbro k6d"
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=1, sync
mu=0, lambda=1, sync, decom, best
mu=0, lambda=1, sync, av
mu=0, lambda=1, sync, decom, best, av
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=4, sync
mu=0, lambda=4, sync, decom, best
mu=0, lambda=4, sync, av
mu=0, lambda=4, sync, decom, best, av
"rank1approx9d"
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=1, sync
mu=0, lambda=1, sync, decom, best
mu=0, lambda=1, sync, av
mu=0, lambda=1, sync, decom, best, av
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=4, sync
mu=0, lambda=4, sync, decom, best
mu=0, lambda=4, sync, av
mu=0, lambda=4, sync, decom, best, av
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=4, sync, av
mu=0, lambda=4, sync, decom, av
mu=0, lambda=1, sync, av
mu=0, lambda=1, sync, decom, av
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=4, sync, av
mu=0, lambda=4, sync, decom, av
mu=0, lambda=1, sync, av
mu=0, lambda=1, sync, decom, av
"rank1approx9d"
10
0
10
1
10
2
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=4, sync, av
mu=0, lambda=4, sync, decom, av
mu=0, lambda=1, sync, av
mu=0, lambda=1, sync, decom, av
0
50
100
150
200
250
300
350
400
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=1, sync, decom
0
50
100
150
200
250
300
350
400
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=1, sync, decom
"rank1approx9d"
0
50
100
150
200
250
300
350
400
generation
0.0
0.2
0.4
0.6
0.8
1.0
NR
I
mu=0, lambda=1, sync, decom
Figure 11: Optimization withdomain de omposition allowsto dete t thepresen eof multimodality. Here
ea h single optimization path orresponds to the optimization in a dierent subdomain. There are 32
subdomainswhi h ompletely overtheoriginaloptimizationdomain. Oneinfers thatthe"mi halewi z2d"
riterionismultimodal,whilethe"rosenbro k6d"and"rank1approx9d"problemsare learlyunimodaland
4.1 Asyn hronous Model
Let
m
bethenumberofnodes,i.e. thenumberofvirtualma hines( omputers)availableonaremote loud toevaluateanexpensivefun tion. Lettheaveragetimeofthefun tionevaluationbedistributeduniformlyin theinterval
(t
min
, t
max)
,and supposethat thea esstothe loudispossibleeverytime
λ
nodesprovide aresult. Typi ally,λ ≪ m
,su h asλ = 1, 2, 3, 4
whilem = 32
. Lett
b
betheblo kingtimewhi hthetime ittakesto al ulate andsendλ
newargumentstoupdate thefreenodes.We will show that the wall lo k time an be redu ed to the blo king time by simply in reasing the
numberofnodes
m
. Moreover,itturnsoutthatthede reaseoftheWCTvaluew.r.t.m
ishyperboli ,and itsvarian ebe omesnegligiblewithanin reasingvalueofm
.Inorder to showthat thisis possible,letus introdu eanasyn hronousa essmodel. Let
T
be theset ofm
elementst
i
whi haretherealnumbersindi atingthetime ittakestoevaluateanexpensivefun tion. Thenodeupdatetime anthenbe omputedbyusing thesesteps:1. Find
λ
smallestelementsofT
(notne essarilydistin t),and reatethesetS
outofthem:S = {t
i
1
, t
i
2
, . . . , t
i
λ
}.
(13) 2. Find thelargestelementinS
,and allitthe omputationtimet
c
:t
c
= max S.
(14)3. Computetheupdatetime
t
u
= t
b
+ t
c
.
(15)4. Formtheset
M = T \ S
,andmapeveryelementt
ofM
a ordingto:t 7→ max(0, t − t
u
).
(16)5. Update theset
T
:T = M ∪ S.
(17)Thepro essofthenode update withtheasyn hronousbuermodelisshowninFig.12. Thefollowing
threerulesareenfor edhere:
1. Thefallingfrontindi atesthat thenodebe omesavailable.
2. Ittakesonetimeunit toupdatethenode.
3. In asemorethanonenodeisavailableatthea esstime,thefasternodeispreferred.
Theinitialset
T
modelsthea tual omputationaltimesofexpensiveevaluations. Thesimplestadequate modelsofarseemstobetheuniformdistributionwithanitesupportgivenbyt
min and
t
max
. Themotivation
behindthis hoi eistheanalysisofthedatawhi hwehavegatheredduringthesimulationoftheexpensive
toevaluatefun tions. Thelatterhavebeen hosentobethekriging-basedoptimizationpro essesthemselves.
Figure13indi atesthedistributionsoftimesthatnodesdemandtoevaluateanexpensivefun tiononthe
ProA tivePACAGrid loud[3℄. Hereexpensive-to-evaluatefun tionsare ompletebudgetedoptimizations
of inexpensive fun tions whose evaluation takes only mi rose onds to omplete. One an see that the
heterogeneousnature ofthe loudissu hthat
t
max
= O(t
min)
.Syn hronousmode,
λ = 1
Node1t
u
3 3 3 3 3 3 3 3 3 3 3 3 Syn hronousmode,λ = 4
Node1 Node2 Node3 Node4t
u
6 6 6 6 6 6 Asyn hronousmode,λ = 4
Node1 Node2 Node3 Node4t
u
1 1 11 11 1 2 1 11 11 1 2 1 11 1 2 11 1 11 2 1 11Figure 12: Advantages of theasyn hronous nodea ess. Inthe syn hronous asewith
λ = 1
, WCT= 3
. AddingthreeslowerNodes24allowstohavefoursimultaneousevaluations,butthewall lo ktimewillbedeterminedbytheworstnode. However,theasyn hronousa essredu esthe
t
u
valuestot
b
forthemajority ofexpensivefun tion evaluations.4.2 Computational Analysis of Wall Clo k Time
The wall lo k time ould be omputed by performing the ve steps indi ated above. They need to be
repeatedasmanytimesasthenumberof
λ
-generationsdemands,andalsorepeatingtherunswithdierent initialsetsT
. TheS ilab odeofasinglerun isprovided inAppendix se t:listingw tasyn ,where"busz" standsform
,and"lamb"forλ
.Fig.14indi ates howtheWCTvaluede reasesw.r.t. anin reasingvalueof
m
. One anseethat whenm
islargeenough,theWCTvaluesbe omesharply on entratedatthet
b
value.TheWCTvaluesde reaseroughlyas
O(m
−1
)
. Amorepre iserulethattsthedatapresentedinFig.14
is
O(m
−1−α
)
,whereα ≈
3t
t
b
min(λ − 1).
(18) Noti ethatt
maxisnotpresentintheequation.
Thesettingthatmat hestheProA tivePACAGrid loudbestistheonewith
t
min
= 10
,andt
max
= 30
.When
m = 32
, this allows to updateλ = 4
nodes with thewall lo ktime approa hingt
b
. The relevant WCTvaluesareshownin Table6.For omparison,herewehavealsopresentedthe orrespondingstatisti swithasyn hronoussimulation.
As one an see in Table6, the redu tionof theWCT valuedue to theasyn hronous simulationseems to
beimpressive. Sowhat exa tly isoptimization ofan expensive-to-evaluatefun tion? Thepra ti al fun tion
Table6: Mean and deviation ofthe node update time
t
u
for dierentalgorithms. Parameters:t
min
= 10
,t
max= 30
,
t
b
= 2
. Averagingisperformedwith25 · 10
4
points.
Asyn hronous m
λ
Mean(WCT) DeviationTrue
32
1
2.04
0.0024
True
32
4
2.77
0.13
False
0
1
22.0
5.77
False
0
4
28.0
3.27
Table7: Wall lo k times,real time fa tors,and speed-ups ofasyn hronousoptimization omparedto the
syn hronous ase,NRI
= 0.8
. Parameters:t
min
= 10
,t
max= 30
,t
b
= 2
. WCT RTF "rank1approx9d"S
0
S
1
EI0,4
syn28
1
1
1
EI0,4
asyn2.77
0.099
0.42
4.2
EI28,4
asyn2.77
0.099
0.56
5.7
4.3 Testing Asyn hronous Algorithms
Asyn hronousalgorithmsareexpe tedtoredu ethespeedoftheevolutionoftheoptimizationpathtowards
theoptimumw.r.t. thenumberofgenerations. Thereasonisthatadire tuseofthemulti-pointimprovement
riteriondoesnotex ludethepossibilityofadupli atepointgeneration. Oneexampleoftheappearan eof
dupli atepointsisillustratedin Fig.15.
Asa onsequen e,theevolutionpathsofoptimizationmighttendtohavemorejumpdis ontinuitieswhen
the riterionEI
0,λ
is employedin theasyn hronous settings. A dire t remedy is to utilize afull riterion
EI
µ,λ
where
µ
points orrespondtothe andidatelo ationswhoseexpensivefun tionvaluesarebeinga tively evaluated,butarenotknownatthetimewhenarequest omestosendanew andidatefortheevaluation.Eq.(7)statesthatin ludinga tivepoints
x
1:µ
in thetargetpartoftheEI riterionpreventsthealgorithm fromresamplingthere[21℄. It anbeseenthatifthenewλ
pointsformasubsetoftheµ
a tivepoints,then EIµ,λ
will bezero. More generally, EI
µ,λ
de reases assomeofthe new
λ
sear h pointsget loserto a tive points[21℄.Theappli ationofthesyn hrononousalgorithmwiththeEI
0,4
riterion,aswellasthetwo orresponding
asyn hronousalgorithms,to the"rank1approx9d"problemissummarizedin Fig.16.
One an see that the asyn hronous algorithm with the EI
0,4
riterion is inferior to its syn hronous
ounterpart, but the in lusion of
µ = 28
a tivepoints improves thealgorithm. Still, theEI28,4
algorithm
makesaslowerprogressw.r.t. thenumberofgenerations. Whiledupli atesarenotthemajorissueanymore,
one an noti e that a syn hronous algorithm always uses a omplete information, i.e. both, the lo ation,
and theexpensivefun tion value,whiletheasyn hronous aseonly ex ludestheappearan e ofdupli ates,
butitwill oftendoit"blindly"withoutanavailablefun tion value.
Theexamplesofthespeed-upvaluesare providedinTable7.
The
S
0
values indi ate that asyn hronous algorithms anmake theprogress w.r.t. generations slower (2x) than the orresponding syn hronous ases, but the real time fa tor is ru ial and may result in anasyn hronousalgorithmwhi hruns vetimesfasterin arealtime.
Optimization paths of asyn hronous algorithms are ompared with the syn hronous ases in Fig. 17.
The orrespondingmeansanddeviationsareshowninFig.18. Theresultsindi atethatoptimizationpaths
in reaseslowerw.r.t. thenumberofgenerationswhenthealgorithmsareasyn hronous. However,onemust