Parallel Budgeted Optimization Applied to the Design of an Air Duct

(1)

HAL Id: hal-00723427

https://hal.archives-ouvertes.fr/hal-00723427v2

Submitted on 17 Sep 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Parallel Budgeted Optimization Applied to the Design

of an Air Duct

Ramunas Girdziusas, Rodolphe Le Riche, Fabien Viale, David Ginsbourger

To cite this version:

Ramunas Girdziusas, Rodolphe Le Riche, Fabien Viale, David Ginsbourger. Parallel Budgeted

Opti-mization Applied to the Design of an Air Duct. 2012. �hal-00723427v2�

(2)

Applied to the Design of an Air Du t

ANROMD2Proje tDeliverableNo. WP3.2.1

Ram unasGirdziu²as 1

RodolpheLeRi he 1

,

2 1

DEMO,CentreFayol,EMSE,

158 oursFauriel, Saint-Étienne,Fran e 2 CNRSUMR6158

girdziusas[at]emse.fr

leriche[at]emse.fr

FabienViale 3 David Ginsbourger 4 3

INRIA,2004ruedesLu ioles,

BP9306902,Sophia Antipolis,

Fran e

4

IMSV,UniversityofBern,

Alpeneggstrasse22,CH-3012

Bern,Switzerland

fabien.viale[at]inria.fr

david.ginsbourger[at]stat.

unibe.ch

Abstra t

Thisworkexploresthebenetsof loud omputinginthedevelopmentofkriging-basedparallel

opti-mizationalgorithmsdedi atedtoexpensive-to-evaluatefun tions.Werstshowhowtheappli ationofa

multi-pointexpe tedimprovement riterionallowstogaininsightsintotheproblemofshapeoptimization

inaturbulentuidow,whi harisesintheautomobileindustry.Ourworkthenpro eedswithavariety

ofexperiments ondu tedontheProA tivePACAGrid loud. Duetoamultipli ativein reaseinsear h

spa edimensionality,themulti-point riterion annotexploitalargenumberof omputingnodes.

There-fore,weemploythe riterionwithanasyn hronousa esstothesimulationresour es,whentheavailable

nodes are immediatelyupdatedwhile a ountingfor the remainingrunning simulations. Comparisons

are madewith domainde ompositionwhi hisappliedhere asanalternative parallelizationte hnique.

Our experimentsindi ate weaknesses inthe use of the multi-point riterion witha syn hronous node

a ess, and benets whenworking inthe asyn hronous mode. Finally, a relatively fast and a urate

(3)

1 Introdu tion 3

1.1 Expe ted Improvement. . . 3

1.2 EarlyIdeasofParallelization . . . 4

1.3 Dynami Parallelization . . . 4

1.4 OurPreferen es . . . 5

1.5 Stru tureoftheReport . . . 6

2 ShapeOptimization 8 2.1 Expensive-to-EvaluateFun tion . . . 8 2.2 Algorithm . . . 8 2.3 Results. . . 9 2.4 AnalysisofResults . . . 10 2.5 Con lusions . . . 11

3 Experimentswith Syn hronous Node A ess 19 3.1 Introdu tion. . . 19 3.2 Algorithms . . . 19 3.2.1 Multi-PointImprovements. . . 19 3.2.2 DomainDe omposition . . . 20 3.3 Performan eCriteria . . . 20 3.4 Results. . . 21 3.5 Con lusions . . . 22

4 Experimentswith Asyn hronousNode A ess 26 4.1 Asyn hronousModel . . . 26

4.2 ComputationalAnalysisofWallClo kTime. . . 27

4.3 Testing Asyn hronousAlgorithms. . . 28

4.4 Con lusions . . . 29

5 Integral of theExpe tedImprovementat MultiplePoints 38 5.1 Introdu tion. . . 38

5.2 Integrand . . . 38

5.3 NewMethodsforAdaptiveIntegration . . . 40

5.4 Results. . . 41

6 Con lusions 46

A EstimationofWallClo k Time 46

B Momentsofthe Censored NormalVariable 46

(4)

We shall study optimization of expensive-to-evaluatefun tions (budgeted optimization) with a parti ular

appli ation inthedesignof theshapeofanairdu t. Thelatterdemands time- onsumingnumeri al

simu-lationsofaturbulentuidow. OuraimistoimplementandparallelizethealgorithmsknownasBayesian

optimization[11℄,andinparti ular,theexpe tedimprovement (EI)algorithm[28,29℄. Morespe i ally,our

work relies onamulti-pointEI riterionstudied in [35,21, 18℄, andthegoalisto test thealgorithmswith

syn hronousandasyn hronousnodea ess.

1.1 Expe ted Improvement

Thesequentialalgorithmsthatweaimto parallelizehaverstbeendevelopedindependentlybyJ.Mo kus

andH.Kushnerintheearly60s[28,23℄. Bothauthors onsideredGaussianpro essmodelsforan

expensive-to-evaluatefun tionandsuggestedmaximizationoftheauxiliaryquantitiesforthegenerationofnew

andi-date lo ations. H. Kushneradvo atedmaximizationof theprobabilityofanimprovement(PI), J.Mo kus

studiedboth,theprobabilityandexpe tationof animprovement.

The third prominent dire tion of a budgeted optimization utilizes theupper onden e bound (UCB)

of an improvement [14℄. Re ent publi ations have abbreviated the algorithm as GPUCB and supplied

it with a wealth of analyses about Gaussian pro esses in the setting of the so alled multi-armed bandit

problem [5,38,6℄. Thekeydieren efromthepreviousalgorithmshereisthatthelawbetweenexploitation

(samplingattheregionswith alowpreditivemean)and exploration(samplingwhere apredi tivevarian e

is high) hanges as theoptimization pro eeds in time. Inaddition, thefo ushereis onsharperbounds of

theso alled umulativeregretfun tionwhi h anbeatemporalintegraloftheabsolutedieren esbetween

theidealsought ostfun tion valueandthevalueobtainedataparti ulartime. Theaimisto minimizeor

boundthe umulativeregretbytemporally hangingthedeviationweightin theUCBexpressions.

A re ent surveyof the useof the three riteria anbefound in [11℄where theyare alsoreferred to as

a quisitionfun tions. Preferen esoverthemremainrathersubje tive,andweshallfo usonthe

expe tation-basedalgorithmsbe ausetheyhavelessparameterstoadjust.

Theauthorsof[33℄emphasizethela kof onvergen eproofsrelatedtotheEIalgorithms. Thisproblem

has been investigated more thoroughly only re ently [40, 12, 42℄. The exists a proof for two ontinuity

lassesofobje tivefun tions,albeitforalgorithmsthatuseGaussianpro esseswithxed ovarian efun tion

parameters[40℄. Inthisregard,it ouldbeworth itingthefollowingtext[12℄:

"...For pra titioners, however, these resultsare somewhat misleading. In typi al appli ations, the prior

isnot held xed, but depends on parametersestimatedsequentially from the data. This pro ess ensures the

hoi e of observations is invariant under translation and s aling of f, and is believed to be more e ient

(Jonesetal., 1998,2). Ithas aprofoundee ton onvergen e,however: Lo atelli(1997,3.2)showsthat,

for aBrownian motion priorwith estimatedparameters, expe tedimprovementmay not onvergeatall."

Itispossibletodevelopbetterparameterestimators[12℄,butouralgorithmsingeneraldonotre-estimate

the ovarian efun tion,asthey aneasilybexedbeforeea h optimization, lf. Eq.(9). Anotherexample

ofasimpleruleofthumb forsettingupthe ovarian efun tion parametersbefore theoptimization anbe

foundin[6℄. Theabilitytousexedparametersishardlyapra ti allimitationoftheEI algorithms.

Amorerelevantproblemisthattheso alledNEBassumptionstatedin[40℄doesnotprovide onvergen e

resultsfor theGaussian pro esseswhose ovarian efun tion isthe Gaussian kernel. Re ently, ithas been

establishedin[42℄thatthereexistsa lassofunivariateanalyti (innitelydierentiable)obje tivefun tions

whi h annot be optimized with the EI algorithm that relies on the Gaussian kernel. One should bear

in mind, however, that "realisti optimization budgets may be too low in many problems for the indi ated

asymptoti behavior toberelevant"[42℄.

Themismat hbetweenthetheoryandpra ti eisalsoevidentasoftenthesmoothness lassofanobje tive

fun tionisneitherknownnorevenrelevant. Inaddition,hardlyanyexistingalgorithm anbeimplemented

sothattheglobalmaximumofana quisitionfun tionisalwaysrea hed. Thisdispla esthea tualprograms

(5)

On ethe loud omputingbe amewidespread,ithasbeenrealizedthatmostofthealgorithmsaresequential,

and theirparallelizationdemands aseparate resear h. Aparallel EI algorithm [37℄mayutilize a

gradient-basedmaximization of thesingle point EI riterion, applied with multiple starting points. Parallelization

anthusbea hievedbyenri hingthestandardEI algorithmwithlo almaximaofana quisitionfun tion.

Another early pra ti al attempt to parallelize relevant algorithms is reported in [33℄. Instead of the

improvement-based riteria,theauthorsutilizeavarietyofother"a quisitionfun tions"and omparetheir

algorithms withthe onedeveloped in [37℄. Notably, parallelizationis a hieved byusing multiple referen e

ostfun tion values

f

min

intheEI-related riteria. Thegenerationis reatedbyaddingonepointatatime,

and ea h point is obtainedby maximizing the EI riterion with dierent referen e values. Uniqueness of

andidatepointsisa hievedbyimposing distan e onstraints.

Consideringtheparallelizationperformedin [33,34℄, one andrawausefulwarningthatthespeed-ups

oversequentialalgorithms anbe quitesmall. Forexample,with four omputingnodes, thespeed-upsare

generallylessthanfour,andforthemodiedRosenbro kandA kleyfun tions,ea hwithvevariables,the

reported speed-ups are

1.83

, and

1.44

. Ourresults will indi ate aproblem where speed-ups anbe lower. This di ulty ould beavoidedbydesigningalgorithms whi h anleverage alargernumberof omputing

nodes. However,oneshould notethatvarious sto hasti samplingmethods havealreadybeenstudied with

largegenerationsizes,andthespeed-upvalueshaveoftenturnedouttobebounded by

O(1)

[39℄.

1.3 Dynami Parallelization

Manyexistingparallelizationideassomewhatblindlygeneratemultiple andidatepointsatatimeby

apital-izingonthefa tthatbudgetedoptimizationalgorithmshavealotoffreeparameters. Dynami parallelization

triesto predi ttheout omeof asequentialalgorithmwithouttheuseofexpensivefun tion evaluations. It

may also swit h o parallelizationat the times when the predi tion is not possible, and thus adaptively

requestadditionalevaluationsofanexpensivefun tion.

Most of the presently known dynami parallelization algorithms, see e.g. [10, 15℄ rely on a heuristi

sequentialte hnique,rstintrodu edbyM.S honlau[35℄. The oreinsightutilizesthefa tthatthevarian e

ofanyGaussianpro ess onditionedontheobservationsdoesnotdependonthea tualobservationvalue,but

onlyonitsspatiallo ation. Thisproperty anbeexploitedto reateabat h(generation)ofdistin t andidate

lo ations bypassing their expensive evaluation sequantially, thus, allegedly speeding up the optimization.

The andidatepointsaregeneratedoneatatimebymaximizing ana quisitionfun tion andupdatingthe

predi tivevarian e(andpossibly,butnotne essarily,predi tivemean).

This te hniqueisappliedin [6,8℄,wherethegenerationofnewlo ationsisbuiltin asequentialmanner

des ribed above, by maximizing one-pointEI riterion at a time, and simply repla ing the orresponding

expensivefun tionvalueswiththeonessampledfromitsposteriordensity onditionedonthe urrentdesign

ofexperiments(DOE).Afterobtainingasampleof andidatepoints, lusteringisthenperformedtode rease

redundan y and size of the generation. The lustering riterion is simply thesum of weighted Eu lidean

distan esbetweenthegenerationpointsandits luster enters. Theweightsareprobabilitiesthata ertain

lusterpointprovidesabetter ostfun tionvaluethantherestof the luster enters. There arenoknown

expli it expressions for su h probabilities even in the ase of normal variables, and thus the assumption

of independen e is made and the standard formulae of the Gaussian order statisti s is employed. The

experimentshavebeenperformedwithgenerationsizesxedto5and10.

In theirmorere ent resear h[10℄, theauthors dropoutthe lustering-based methodentirely, andthey

buildthegenerationdire tly(withoutanypostpro essing)bymaximizingone-pointEI riterioninthespirit

oftheirpreviousmethod. However,thegenerationsizeismadeadaptiveanditin reasesonlyiftheboundon

thedeviationofthepredi tivemeanfromitstruevalue(thatwould,intheory,beobtainedwithasequential

one-point EI algorithm) does notex eed a spe ied threshold. A newly added lo ation in the generation

mustbeasso iatednotwithanarbitrary ostfun tionvalue(mean,randomsamplefromposterior),butits

globallyoptimalvaluewhi hisassumedtobeknown. Often,thisisindeedthe asewhenonlytheglobally

(6)

Averysimilarin spiritparallelization,albeitoftheGPUCBalgorithm, alled GPBUCB,ispresented

in [15℄. One dieren e is that the pro ess mean fun tion is employed to model the expensive-to-evaluate

valuesduringthe onstru tionofthegeneration,butthemeanvaluesofthegenerationpointsmaynoteven

beupdated. Instead, theUCB deviation weight is adjusted when building the generation whose size also

hangesdynami ally. Thelatteris ontrolledbyanavailable umulativeregretbound. Theauthorsof [15℄

alsosuggestrepla ingtheexa tvarian eupdateswith ertainboundsinordertospeedupthe reationofa

newgenerationof andidatelo ations. Thistri kisalsoemployedin[10℄,but thelatterworkusesdierent

bounds. An interesting byprodu t of both of these methods is that they provide indi ators of when an

expensivefun tionevaluationshouldbeperformed,andwhenitisgoodenoughtousetheregressionmodel

togenerateanew andidatelo ation.

However,inadditiontothedi ultiesofsettingupnewlyintrodu edthresholdparameters,theproblem

withthese methodsisthat they annotee tivelyexplorealltheavailable omputationalnodesasthesize

of the generation is determined algorithmi ally and hanges with time, while parallel resour es are often

xedandlimited. Anotherdrawba kisthatthesizes anbenonuniform,whi hmayyieldsuboptimaltotal

optimizationtimes.

Thelatteraspe tisaddressedin[7,9℄. Theauthorsassumethatthereexistsaspe i distributionforthe

durationofanexpensiveevaluation,andthetotaloptimizationtimeislimitedbyaxedknownvalue. The

numberoftotalfun tion evaluationsisalsoxed, andsoisthemaximalsizeofthegenerationof andidate

lo ations. Assumingthis informationexists, the authorsdevelopageneralmodelwhi h aimsto distribute

generation sizes and determine the orrespondingdurations fortheir parallel evaluations. They introdu e

theso alledCPE riterion,whi hisa umulativetemporalsumofthenumberofjobs ompletedatatime.

Its maximizationisshown toprefer uniform s hedules (distributions ofthe generation sizes)and anthus

beusedto limittheparallelizationsothat thealgorithmutilizesmoreexpensivefun tionvaluesandisstill

ableto meetaspe iedtimehorizon.

Onedi ultywiththisgeneralsettingisthatparallelexe utiontimesaresto hasti (andoftentheexa t

distributions are unknown or hanging), but the model imposes the upper limit on the duration for the

evaluation of thegeneration. Thus, theevaluation may a tually fail to omplete, and the authors further

addressthisdi ultybyintrodu ingthenotionofaprobabilisti safetyofanalgorithm. Therefore,theaim

istomaximizetheprobabilityofasafe ompletion whi hisnotguaranteedtobeunity.

1.4 Our Preferen es

Instead of applying sequential heuristi te hniques dis ussed above, we shall dire tly maximize the

multi-pointEI riterion,whi hseemstohavebeenintrodu edbyM.S honlau,see5.3in[35℄,andwhosepra ti al

relevan e hasbeenjustied onlyre ently,see e.g.[18,21℄. Ithasbeendemonstratedthat amulti-point EI

will be large where, simultaneously, the orresponding one-point EI values are large, and the generation

pointsarenot orrelated. Thus,themulti-pointEI riteriongivespreferen etodistin tmultiple andidate

pointsautomati ally,withoutanyadditionalparameters,heuristi distan e onstraints,oradditivepenalty

fun tions.

The riterion demands fewer adjustable parameters, but its maximization is only possible when the

generationsizes

λ

aresmall,typi ally

O(1)

. Itshould beunderstoodthat asmallvalueof

λ

doesnotlimit theparallelization. Inparti ular, weshalladvo ateanasyn hronousnode a ess whereonerstsubmitsa

largenumberof expensivefun tion evaluations tothe loud,and thenupdatesonly

λ

nodesat atime(the algorithmremainsparallelevenwhen

λ = 1

).

Themulti-pointEI riterionhasalreadybeenapplied tosele t parametersof variousstatisti al models

in orderto furtherin reasetheirperforman eonsomeknownma hinelearningben hmarks[36℄. Weshall

reportdeviationsintheoptimizationevolutionsw.r.t. theinitialDOE,whi hturnouttobehigherthanthe

errorbarsthat anbeseenin[36℄. Thisindi atesthat ertainparameters,su hasaninitialDOE, anae t

theout omeoftheoptimizationresultsmorethanabetterregressionmodel. Highperforman evariability

(7)

howto omputethegradientofthisa quisitionfun tionanalyti ally. Thisisaresear hdire tionwhi h ould

beveryimportantfor theasyn hronousnodea ess where thetimeit takesto generate and ommuni ate

new points (blo king time) should be minimal. Maximization of the multi-point EI riterion is also a

omputational bottlene k during the testing of any of the relevantalgorithms, and afaster maximization

would provideanappre iableaidhere. However,oneshould bearinmind thatthemulti-pointEI riterion

ismultimodal,andthereisnoeasywaytorea hitsglobalmaximumwithlo al optimizationte hniques.

One ouldemphasizethattheframeworkintrodu edin[7,9℄isaverygeneralformalizationofabudgeted

optimizationproblem. Ourasyn hronousoptimizationstudythatwillbepresentedinSe tion4 orresponds

toaparti ular asewhi h theauthors allOnlineFastestCompletion Poli y (OFCP).This poli y isjusta

strategyto al ulateandevaluatenew

λ

andidatelo ationsimmediatelyas

λ

omputationalnodesbe ome available. Theirmain ritique,andquiteaprofoundinsight,isthat"itdoesnotusethefulltimehorizon,even

whendoing sowouldallow for mu h less on urren y"[7℄. Theworksin [7,9℄introdu eanewperspe tive

toBayesianoptimizationbe ausetheyexpli itlyquantifyandminimizethea tualoptimizationtimeinstead

ofrelyingonaprevalentstatementthatBayesianoptimizationis"knowntobee ient".

Wedonotne essarilyadvo atetheuseof thispoli y overothersandourresults,providedinSe tion4,

ouldbeseenasafurtheranalysis andnumeri aleviden ethat onlybetter hara terizesthis poli y.

How-ever, the OFCP poli y is anatural hoi e when the overall time horizon is notgiven, orwhen the exa t

timing hara teristi sof theexpensiveevaluationsarenotknown(but weshallprovideanalysiswhensu h

information is available). The OFCP poli y simply works with an assumption of a xed numberof total

fun tion evaluations, itmaximizes the node o upation time, does notneed anysophisti ated s heduling,

andthere isobviouslynoneedto onsider aprobabilisti safetyinthis ase. Forthesakeof simpli ity,we

shallbypassthede isiontheoreti vo abularyandinsteadoftheOFCPpoli yshallfrequentlyemployaless

informativedes riptionoftheasyn hronousnodea ess.

1.5 Stru ture of the Report

The report rst provides the results of the appli ation of a syn hronous four-point EI algorithm to the

industrial problemofshapedesign,whi haresummarized in Se tion2. Theoptimization operatesinsu h

awaythatonerstsubmitsfourpointsfortheirevaluation,andthenwaitsuntilallofthemare ompleted.

Theregressionmodelisthenupdated,themulti-pointEI riterionismaximized,andthepro essis ontinued

untilthebudgetofexpensiveevaluationsisex eeded. Theevaluationofa ostfun tiontakesabouttwenty

minutes. Weimprovea re ently reported result in [31℄, and provide insightsinto physi al, and statisti al

aspe tsoftheproblem.

Se tion3statesperforman eresultsofvariousparallelizationte hniquesdedi atedtoasyn hronousnode

a ess. Ourresultsindi atethatasimplestrategysu hasthedomainde ompositionis ompetitivewithmore

advan ed methods, butthere areproblems where noneofthe methodsis suitableforparallel optimization

andasinglepointEIalgorithmperformsequallywell. Oneshouldnotethatthetestsarestru turedinsu h

awaythat parallelalgorithmsareexe utedonasinglema hine,andindependentsimulationspertainingto

dierentinitialDOEsarethensenttothe loudtoassesshowaninitialDOEae tstheresults. Thereason

forthis parti ularwayof utilizingthe loudis thattiming hara teristi softheparallel algorithms anbe

ratherobvious,andin thetestingphasethe ostfun tions arenotexpensivetoevaluate.

Optimizationwith anasyn hronousnode a ess isdis ussed in Se tion 4. Westateaparti ular model

for the exe ution time of an expensive-to-evaluate fun tion, simulate the asyn hronous point generation

s enariosbasedon theproposed timing model, and testthe performan eof themulti-pointalgorithms by

submitting independent optimizations, ea h with adierent initial DOE, to the loud. Here the fo us is

on the average time for a new generation to a tually be sent to the loud, whi h will bereferred to asa

wall lo k time. A wall lo k time depends not only on the time it takes to maximize the improvement

andto ommuni atetheresultstotheremotenodes,butalsoonwhenandwhereaparti ularnode be omes

available while other nodesare a tivewith the evaluation. Thisis onedieren ewith thepreviousworkon

(8)

EI riterion. Wehaveobservedin ournumeri alexperimentsthattheintegralhasape uliarpropertythat

itsupperboundliesextremely losetothea tualvalue,espe ially(butnotne essarily)whentheexamined

expe tedimprovementsarefurtherawayfromthelo ationswhere theyaremaximal. Inessen e,we hoose

to work within theframework ofsystemati sampling [16℄ (as opposed to importan e sampling) and show

thatone an onsiderablyimprovesymmetri monomialrules(uns entedtransforms)byrepla ingmonomials

withone-pointimprovements. However,onemustalsomentionthatastandardMonteCarlosamplingproves

tobeaveryreliableintegrationte hnique,espe iallyatthelo ationswheretheexpe tedimprovementsare

maximal.

Aswillbeseenintheresultsprovidedinthisreport,asigni antbenetofusinga omputing loudisthat

itallowslarges aletestingofthealgorithmswithdierentparametersettings. Forexample,parameters,su h

asaninitialDOE,greatlyae ttheoptimizationresultsandareveryhardto"integrateout". Theabilityto

utilize loudresour esallowsonetoa tuallysendrepli asoftheoriginalsimulationwithparameter hanges

andthen seetheee ts. Thisisveryhardto a hievewhenrunningthingslo ally onasingle omputer(in

aserialmanner)be auseabudgetedoptimizationofinexpensive-to-evaluatefun tions isitselfavery

time- onsuming pro ess. In our work, a single ost fun tion evaluation in the rank-one matrix approximation

problemmaytakemi rose ondstoevaluate,but asingle ompleteoptimizationmayeasilyrea htenhours

(whentheCPUrateis2.5GHz). Ourabilitytorun the odesontheProA tivePACAGrid loud[3℄allows

(9)

2.1 Expensive-to-Evaluate Fun tion

Ourgoalistooptimizethegeometryofa oolingdu t,whi hhasalreadybeenstudiedin[31℄. The riterion

isthenormalizedpressuredieren eoftheowattheinletandoutletofadu t,whi hisindi atedinFig.1a.

TheoptimizationparametersareshowninFigs.1bd.

It will su e toemphasize that the riterionis apositivequantitywhose omputation is ademanding

numeri al solution of the

k

ǫ

model of a uid ow. The ow is linear, vis ous (

ν = 1.6 · 10

−4

_m

2 _/s

.),

in ompressible,andturbulent(

Re = 4000

).

The

k

ǫ

modelisamixedsystemofnonlinear partialdierentialandalgebrai equations[17℄:

∂ ¯

u

i

∂t

+ ¯

u

j

∂ ¯

u

i

∂x

j

=

∂

∂x

j

(ν + ν

T

)

∂ ¯

u

j

∂x

i

+

∂ ¯

u

i

∂x

j

−

1 ρ

∂

x

i

¯

p +

2

3 ρk

,

(1)

∂ ¯

u

j

∂x

j

= 0,

(2)

∂k

∂t

+ ¯

u

j

∂k

∂x

j

=

∂

∂x

j

ν +

ν

T

σ

k

∂k

∂x

j

+ ν

T

∂ ¯

u

i

∂x

j

+

∂ ¯

u

j

∂x

i

∂ ¯

u

i

∂x

j

− ǫ,

(3)

∂ǫ

∂t

+ ¯

u

j

∂ǫ

∂x

j

=

∂

∂x

j

ν

T

σ

ǫ

∂ǫ

∂x

j

+ C

ǫ1

ǫ

k

ν

T

∂ ¯

u

i

∂x

j

+

∂ ¯

u

j

∂x

i

∂ ¯

u

i

∂x

j

− C

ǫ2

ǫ

2 k

,

(4)

ν

T

= C

µ

k

2 /ǫ.

(5)

Itdes ribesthetimeaveragesofthepressureeld

p

andtheowvelo ityeld

u

:

¯

p ≡ lim

T →0

1 T

Z

T

0 p(x, t)dt, ¯

u

i

≡ lim

T →∞

1 T

Z

T

0 u

i

(x, t)dt.

(6)

Theauxilliaryelds

k

,

ǫ

,and

ν

T

aretheturbulentkineti energy

k

,thespatialdissipationrateof

k

, alled

ǫ

,andtheturbulentvis osity

ν

T

,resp. Oneshouldnoti ethatthekinemati vis osity

ν

isa onstant,while

ν

T

isaeld.

Theinitialandboundary onditionsareindi atedinTable1. Theimplementationusestheopensour e

library alled OpenFOAM [2℄. The wall fun tions "

k

w", "

ǫ

w", and "

ν

T

w" are the OpenFOAM fun tions "kqRWallFun tion", "epsilonWallFun tion", and "nutWallFun tion", resp. The latter two override their

default parametervalueswith

C

µ

= 0.09

,

κ = 0.41

,

E = 9.8

. Theinitial valuesof thequantities omputed bythewall"fun tions" orrespondtotheinitialvaluesoftheeldsshownin thelast olumnofTable1.

Inaddition to OpenFOAM, a omplete software sta k of this uid dynami s simulation in ludes

CA-TIA [1℄ (a 3D model of a du t), STARCCM+ [4℄ ( omputational mesh generation), and ParaView [20℄

(visualization).

2.2 Algorithm

It is not transparent how the pressure dieren e depends on the parameterswhi h spe ify the geometry

of a du t. Various admissable hanges of the geometry are not visually dis ernable, and the model is a

massivenonlineardynami alsystem. Thismotivatestheappli ationofabudgetedoptimization. Thistype

optimizationestimatesthekrigingmodelofanexpensive-to-evaluatefun tion,andgeneratesnew andidate

lo ations by maximizing the multi-pointexpe ted improvement. Inparti ular, given

µ

a tivepoints

x

1:µ

and

λ

freenodes,thealgorithmnds

λ

newpointsbysolvingthefollowingproblem:

max

x

_∈

R

dλ

E

max (0, min (f

min

, Y (x

1:µ

)) − min Y (x)) |A

(10)

Table1: Initialandboundary onditionsforkeyquantities ofthe

k

ǫ

model.

Name Field Units Boundary onditions Initial onditions

Inlet Outlet Wall

˜

p = ¯

p/ρ

Normalizedpressure

m

2 s

2 ∇˜

p = 0

˜

∇˜

p = 0

0 u

Flowvelo ity

m

s

−n

0

if

u

· n ≤ 0

0

0 k

Turb. kin. energy

m

2 s

2

10 −3

∇k = 0

"kw"

10 −3

ǫ

DissipationRateof

k

m

2 s

3

10 −1

∇ǫ = 0

"

ǫ

w"

10 −1

ν

T

TurbulentVis osity

m

2 s

3

0 ∇ν

T

= 0

"

ν

T

w" 0

where

f

min

isthe urrentminimum,

Y (x

1:µ

) = (Y (x

1 ), . . . , Y (x

µ

))

and

Y (x) = (Y (x

µ+1

), . . . , Y (x

µ+λ

))

are random surrogates(kriging model).

A

denotes the event when

Y

values equalto all known expensive-to-evaluatefun tions atalltheknownlo ations. Methodsto omputetheexpe tationinEq.(7)aredis ussed

inSe tion 5.

Consideringtheuseofkrigingintheoptimization,onemayreferto[18,21℄formoredetails. Inaddition,

wehaveappliedafew hangestowhatseemstobeastandardpra ti e. Theyarenot on eptuallyinteresting,

butareworthmentioning:

1. The expe ted improvementis maximized by using theCMA-ES algorithm [19, 39℄. Box onstraints

arehandled byproje tingthe oordinatesontheboundsandaddingthepenaltytermtoanexpe ted

improvement. The penaltyis proportional to theEu lidean distan e from the optimization point to

theboundaryifthepointisoutofbounds,and iszerootherwise.

2. Conditionalexpe tationsare al ulatedbyusingthepseudoinverseoftheDOE ovarian ematrix. This

method overestimates the onditional varian es, but it does notdemand any additional parameters,

anditalsoredu es tothestandardinversein theabsen eofsingularities.

3. Whenthe onditional ovarian ematrixofthekrigingresponsesissingular,thevalueoftheexpe ted

improvementissimplysettozero. Hereby"singularity"itismeantanythingthatbreakstheCholesky

de omposition. Thelatterispla edinsidethe"tryblo k" ofthe"tryand at h"ex eptionhandling.

4. Multi-pointexpe tedimprovementsare al ulatedbyusingtheMonteCarlosamplingwithone

thou-sand points. This standard method is simple, omputationally inexpensive, and reliable w.r.t.

in- reasingdimensionsofanintegrationdomain. Theseedoftherandomgeneratoris settothe urrent

generation number,sothat theintegrationroutineusesthesamerandompointswhenevaluatingthe

expe tedimprovementat dierentspatiallo ations.

5. KrigingisappliedwithGaussiankernelswhosevarian esvarywithea h oordinate. Thevarian esare

determined by squaringthemedian of theabsolute deviationsfrom the median ofa parti ular

oor-dinate. Thisis simplerand fasterthan anyiterativeestimation and, moreimportantly,it guarantees

that theappearan eof losepointsinDOEdoesnot hangekernelvarian esunexpe tedly.

We shall apply what is known as the syn hronousmulti-point algorithm [18℄ with

λ = 4

points, whi h is brieyabbreviatedasEI

0,4

. The hoi eofgenerating fourpointsatatimedemands theoptimizationwith

8 × 4 = 32

variables. Askingformorepointsatatime,orusingDOEswithmorepointsthan

O(10

3 ₎

would

introdu eseverenumeri aldi ulties.

2.3 Results

The minimization of the pressure dieren e is shown in Fig. 2. The rst 320 observations are generated

(11)

LOBS UPBS Worst [31℄ Ourresult

x

1 0.0036

0.0166

0.0036

0.0149

0.0132

x

2

0.3

0.8 0.3760

0.4202

0.4756

x

3 0.0027

0.0207

0.0102

0.0207

x

4 0.0405

0.0595

0.0479

0.0450

x

5

1.25 1.5707

1.2525

1.5582

1.5707

x

6

0.21

0.42 0.2254

0.3849

0.3914

x

7

0.047

0.055

0.055 0.0541

0.0547

x

8 0.0008

0.0088

0.00081

0.0014

0.0016

pd nan nan

1.28 0.59 ± 0.01

0.56 ± 0.01

observation number 320. The optimization then pro eeds viaa syn hronousgeneration of four andidate

points. Theyareobtainedbymaximizingtheexpe tedimprovementwiththeCMA-ESalgorithm[19℄whi h

usesitsdefaultparameters,ex eptthattheinitial oordinatedeviationis hosentobe

0.1

,andthenumber ofiterationsissetto

3000

.

OptimizationresultsarepresentedinTable2. One anseetheboundsofthevariables,theworstobserved

pointwhi hgivesthemaximalpressuredieren evalue

1.28

,previouslyavailablebestresult[31℄,alongwith ourimprovement. Thepresen e of"nan" valuesindi atesthat thepressurevaluesare notavailable atthe

points whose all oordinates are simultaneously equalto either the lowerorupper bound. The geometry

annotbemeshed inthesetwoextreme ases.

2.4 Analysis of Results

The optimization results analso be highlightedby omparing theoptimal elds with theworstobserved

ases. Theworst observedgeometryisshownin Fig.3. Itonlyservesthepurposeofdisplayingthesli ing

planeonwhi h theeld valueswill bedisplayed,and insettinguptherangeforthepressurevalues,whi h

is

[0, 4]

. Thesurfa eofthedu tis oloredwiththeParaView[20℄s heme"hsv-blue-to-red"whoserangeof olorsisalsodisplayedinthe olorbar.

Thevaluesofthepressureeldonthesurfa eanditssli eareshowninFig.4. Bothshapesarehardto

dis ernvisually,butthedieren es anstillbenoti edwithoutanyspe ialtools. Intheoptimized ase,the

pressurevaluesaresmalleronthewallsattheine tionofthedu t.

The omponentsofthevelo ityeldareshowninFig.5. For omparisonpurposes,therangesoftheeld

valuesarekeptthesameinboththeworstandoptimal ases,andthe olorspa eistheoneusedwiththe

pressureelds, lf. Fig.3. Therangesforthe

x

,

y

,and

z

- omponentsare

[−0.3, 1]

,

[−0.4, 1.4]

,and

−1.6, 0.2

, resp. One anseethatthevelo ityeldofaowintheoptimized aseisgenerallysmoother,andee tively

usesalargervolumeofadu t.

Theoptimizedgeometries areveryhardto dis ernvisually, and thepressureelds arenearly opti ally

identi al,whi hisalsoa ompaniedbyrathersmalldieren esinthenumeri alvaluesofthepressureelds.

However,theresultsarenotidenti al,andthedieren esbe omemostpronoun edwhenlookingatvelo ity

eldsshowninFig.6. One ansee thatourresultis slightlysmoother,whi h anbeseenintheupperleft

areasof thesli es in the

x

and

z

- omponents(a,d, ,f), and at theine tionpoint of adu tin the aseof the

y

omponent(b,e).

Inorder to see if our result diers from the onein [31℄ statisti ally, we have performed the prin ipal

omponentanalysis onthedata orrelation (not ovarian e)matrix. Thedataisthematrixof size

8 × 788

whose olumns are the andidatelo ations generatedduring theoptimization (the data orrelationmatrix

is of size

8 × 8

). The results are shown in Fig. 7. They indi ate the proje tions of the data ve torson the hoseneigenve torsof the orrelationmatrix. Inadditionto the data, severalimportant lo ationsare

(12)

Coord.

v

1 v

2 v

3 v

4 v

5 v

6 v

7 v

8

1 0.47061

−0.01957

0.00218

−0.34455

−0.09434

−0.65004

−0.26550

−0.39683

2 −0.24119

−0.39696

0.20201

−0.27114

−0.80995

0.01527

0.04344

0.10855

3 −0.50275

−0.03497

0.19678

0.10206

0.19948

−0.48401

−0.50937

0.40419

4 0.04089

0.33272

0.87867

0.14932

−0.04204

0.13262

−0.04856

−0.26749

5 −0.49912

0.22259

−0.30076

−0.06057

−0.11561

0.22181

−0.39944

−0.62056

6 −0.33944

0.20969

0.11621

−0.74648

0.29979

−0.07762

0.41716

0.01265

7 −0.16022

0.56533

−0.18227

0.33491

−0.36915

−0.44528

0.41711

0.02842

8 −0.27554

−0.56302

0.10589

0.31937

0.23239

−0.26807

0.39781

−0.45799

indi atedwithdierentmarkers,andtheyare: thepresentoptimalsolution(Opt),lowerandupperbbounds

(LB,UP,resp.),previousresult[31℄(PrevBest),theaveragevalueofthebounds(Midpoint),andtheworst

observedpointduringtheoptimization(Worst).

Asthe on entrationofvarian ebytherstprin ipal omponentsisnotverypronoun ed,onendsout

thatdata doesnotlivein asubspa eofR

8

andallthe oordinatesarevaluable. Therefore,the

parameter-ization ofthe problemis notredundant. However,thedimensionof theproblem ouldhavebeenredu ed

downtoR

5

be ausethese ond,fth,andeightpri ipal omponentsdonotdis riminatetheoptimallo ation

fromthemiddlepointortheworstpoint.

When ompared to the previouslyavailable result [31℄, oursolution is situated further away from the

worst ases enariowhen looking at thingsalong theprin ipal dire tions

1

,

6

, and

7

, but is loserto it in thedire tion

8

. Interestingly, in thefour-dimensionalsubpsa espannedbytheeigenve tors

2

,

3

,

4

, and

5

, theresultin [31℄isalmostidenti al toours.

Therstprin ipal omponentallowstoseparatetheoptimizedpointsfromtheinitialDOE.Itturnsout

that thethird oordinateofthersteigenve torhasthelargestmagnitude,whi h,in identally,istheonly

oordinatewhi hmakesoursolutionsigni antlydierentfromthepreviousresult(inour ase

x

3

isroughly doubled). Forthesakeof ompleteness,the oordinatesofalloftheeigenve torsareshowninTable3.

2.5 Con lusions

When a vast majority of admissible uid domains are opti ally indistinguishable, the optimization of a

geometry anbehardtoperformmanually. Kriging-basedoptimization provestobehandywhenmakinga

progresswithasmallbudgetof the ostfun tion evaluationswhi h istypi allylessthan

O(10

3 ₎

. Wehave

madeanimprovementtotheprevioussolutionobtainedin[31℄andhaveidentieditsrelationtoourresult.

Interestingly, theprevious optimization is almost identi al to oursin the subspa eof R

8

spanned by four

eigenve torsofthe orrelationmatrixofallthepointsgatheredduringthesear h. Theprin ipal omponent

analysis suggeststhat

x

3

is an imporantparameter, and theintrinsi dimensionof the problem, i. e. the numberofindependentparameterswhi h oulddierentiatetheoptimalgeometryfromthesuboptimalone,

(13)

(b) (d)

Figure1: Optimization riterionisthedieren ebetweentheaverage(normalized)pressureeldattheinlet

(14)

0

100

200

300

400

500

600

700

787 Observations

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

1.3 Pressure difference

Moving minimum

Figure2: Normalizedpressuredieren e[

m

2 s

2

℄w.r.t. in reasingnumberofobservationsduringthe optimiza-tion. Therst320observationsaregeneratedviatheLHSalgorithm.

(15)

(16)

(b) (d)

(17)

(b) (e)

( ) (f)

Figure5: Velo ityelds:

x

- omponents(a,d),

y

- omponents(b,e),and

z

- omponents( ,f). Therst olumn orrespondstotheworstobserveds enario;these ond olumn showstheoptimizedelds.

(18)

(b) (e)

( ) (f)

Figure6: The omponentsofthevelo ityeldofaow:

x

-dire tion(a,d),

y

-dire tion(b,e),and

z

-dire tion ( ,f). Therst olumn orrespondstotheresultin[31℄;these ond olumnisourresultwhi histherepli a

(19)

−3

−2

−1

0

1

2

3

4

5 pc1 (26%)

−4

−3

−2

−1

0

1

2

3 pc

2 (1

8%

)

DOE

New

Opt

LB

UB

PrevBest

Midpoint

Worst

−3

−2

−1

0

1

2

3 pc5 (10%)

−3

−2

−1

0

1

2

3 pc

6 (9

%

)

DOE

New

Opt

LB

UB

PrevBest

Midpoint

Worst

(b) (d)

−3

−2

−1

0

1

2

3 pc3 (12%)

−3

−2

−1

0

1

2

3

4 pc

4 (1

1%

)

DOE

New

Opt

LB

UB

PrevBest

Midpoint

Worst

−3

−2

−1

0

1

2

3

4 pc7 (7%)

−3

−2

−1

0

1

2

3 pc

8 (6

%

)

DOE

New

Opt

LB

UB

PrevBest

Midpoint

Worst

(20)

3.1 Introdu tion

Se tion2hasfo usedontheappli ationofaparti ularkriging-basedoptimizationalgorithmtotheindustrial

problem. On average, it takes twenty minutes to evaluate a ost fun tion in su h a problem. A single

optimizationthendemandsdaysto omplete. Consideringthedown-timesofthe loud,asingleoptimization

maydemand weeksto omplete.

Thus, onemay ask whether our results rightfully ree t what an be a hieved with a whole family of

multi-pointimprovement-basedalgorithms des ribed in [21℄. Oneshould note that so far we haveapplied

onlyonesu halgorithm, whi h generates

λ = 4

pointsatatime, syn hronously. It wasapplied on e, and onlywithasingle ostfun tion.

Weshallreportourtestswitharti ialfun tions,whi hwillfurtherindi ate somelimitationsand

unex-ploredpossibilitiesofthekriging-basedalgorithms. Inthisse tion,wewill fo usontheasyn hronousnode

a essandwilltrytomeasure whethermulti-pointimprovementshelp. Thealgorithmswillbetestedalong

withthestrategyofthedomainde omposition.

3.2 Algorithms

3.2.1 Multi-PointImprovements

The use of multi-point improvements [21, 18℄ is a theoreti ally appealing dire t extension of the kriging

algorithmswith one-pointimprovements. The problemwith thisapproa histhat it doesnots alewell as

themaximizationoftheexpe ted

λ

-pointimprovementsdemandstheoptimizationin

λ × d

dimensions. In addition,

λ

annot beverylarge in prin iplebe ause theminimum overan in reasing numberof random variablesispusheddownindependentlyofthedemandsofaproblem,andthustheexpe tedimprovements

be omeseverely overestimated. Theyare typi allyoverestimatedanyway,but onesuspe ts that whenthe

generationsizesarenotbig,su has

λ = 4

,thealgorithm anbeimplemented orre tlyandonemaya hieve afaster optimization.

Howfastanoptimization anbe? Letusintrodu ethequantity alledwall lo ktime (WCT),whi his

theaveragetimebetweentwo onse utiveupdatesofthenodesinthe loud. Itdeterminestherateatwhi h

thepointsaresentto(re eivedfrom) aremote loud. Figure8presentstiminganalysisofthesyn hronous

optimizationwithmulti-pointimprovements.

Syn hronousmode,

λ = 1

Node1

t

u

3 3 3 3 3 3 3 3 3 3 3 3 Syn hronousmode,

λ = 4

Node1 Node2 Node3 Node4

t

u

6 6 6 6 6 6

Figure 8: Inthesyn hronous ase, itisthe slowest nodethat determinesthe nodeupdate time. Thetime

ostsof updating

λ > 1

points willtypi allybegreaterthan inthe singlepoint ase, unless everyoneout of

λ

omputationalnodes isfaster than theoneapplied in theoptimization withone-pointimprovements. Thisexampleshowsthe asewhen

t

b

isonetimeunit,independentlyofanalgorithm. Thewall lo ktime in reasestwi ewhen

λ

hangesfrom1to4.

(21)

to-evaluate fun tion,and thelowlevelspans thetime when thenode is idle. One ansee that theuse of

multiplepointsin reasesthewall lo ktime(WCT),andthelatterwillbesolelydeterminedbytheslowest

omputationalnode.

Letusintrodu etheblo kingtime

t

b

,whi histhetimeittakesto: (i)re eive

λ

fun tionevaluations,(ii) generate

λ

andidatepoints,and (iii)sendthemto the

λ

freenodes. One anthenperformamorepre ise analysisbyassumingthatthetimeittakesto evaluateanexpensivefun tion isuniformlydistributed. The

nodeupdatetimewill thenbearandomvariabledened as

T

u

= t

b

+ max{T

1 , T

2 , . . . , T

λ

}, T

i

∼ U(t

min

, t

max

).

(8)

Forexample, let

t

min

= 10

,

t

max

= 30

and

t

b

= 2

timeunits. Then, WCT

≡ E(T

u

) = 22

when

λ = 1

, but itin reasesup toWCT

= 28

when

λ = 4

. Therefore, thesyn hronousmulti-pointoptimization algorithm needstoapproa htheoptimumatleast

28/22 = 1.27

timesasfastinordertosavetime.

3.2.2 DomainDe omposition

Thisisoneofthesimpleststrategiestoemploywhenmakinganyoptimizationalgorithmparallel. Thedomain

is dividedinto

s

parts (subdomains)and theoptimization is performedin ea h subdomain independently, preferrablyin parallel. Inwhat follows, we shallperform theoptimization in

d = 2

,

6

, and

9

dimensions, andthenumberofsubdomainswillbe

32

. Inthe aseoftwodimensions,wedividetherst oordinateinto eightequalparts,andthese ondintofour. Inthe aseofalargernumberdimensions,wesimplyhalvethe

rstve oordinatesandobtainin thisway

2

5 _{= 32}

subdomains.

Itisimportanttoemphasizethatthedomainde ompositionisastrategy. It anbeappliedtomakeany

algorithmparallel.Weshalluseitwithboth,one-pointimprovements,andmulti-pointimprovementsaswell.

Theinterestingquestioniswhethertheuseof thedomain de omposition withtheone-pointimprovements

anbeasgoodastheuseofmulti-pointimprovementsalone. Inthat ase,onewould denitelypreferthe

former,asita hievesaperfe tisolationbetweentheparallel owsof theprogram whereasthe multi-point

algorithmismoredemandingregardingitsimplementation.

3.3 Performan e Criteria

Alloftheoptimizationalgorithmsaremadetobedeterministi inordertoremovetheunne essarydegrees

of varian e. Firstly, weswit h o themaximum likelihood estimation of theGaussian kernelvarian es in

thekriging. Insteadofestimatingthem,thefollowingsimpleruleisapplied

kernelvarian e

i

=

|

upb

i

−

lob

i

|

2 1+

8 d

2 , i = 1, . . . , d.

(9)

Here

d

isthenumberofdimensionsoftheoptimizationspa e.

The main idea behind this formula is that we shall typi ally generate

500

points during the entire optimization(in ludingthepointsoftheinitialDOE).Thisisarealisti budgetforanexpensive-to-evaluate

fun tion on onehand, and the limitafter whi h working with densematri es be omesveryine ient (at

best). Thus, in all of thesimulations, on average, thenumber ofobservations used in krigingis

250

. We then"round"thisnumberupto

2

8 _{= 256}

,andthen

2 8/d

be omesthenumberof"ti ks"that anbepla ed

onea h oordinateaxiswhenassumingthat thepointsare distributeduniformlyin spa e. Theadditionof

unity is somewhat arbitrary and notreally ru ial,but it servesone purpose. When

d = 8

, thevarian e be omesequaltoasquared"medianofthemedian ofabsolutedieren es oordinate-wise".

Inaddition, theMonteCarlo(MC)integrationoftheexpe tedimprovementisalwaysinitializedtothe

urrentgeneration number. Thus, theonly"degree offreedom"is theinitialDOE, and ea h familyofthe

algorithms annowbetestedwithanumberofoptimizations. Ea hoptimizationwillthen orrespondtoa

dierentinitial DOE.This numberwillbesetto onehundred,but itmaya tually be ome smallerifsome

(22)

riteria. Thedetails aregivenin Table4.

Table4: OptimizationCriteria

Label Costfun tion Domain Minimalvalue Modality

"mi halewi z2d"

P

2 i=1

sin(x

i

) sin

2 _(ix

2 i

/π)

[0, 5]

2 −1.841

multimodal "rosenbro k6d"

P

5 i=1

100(x

i+1

− x

2 i

)

2 + (1 − x

i

)

2 [0, 5]

6

0

unimodal "rank1approx9d"

kA

4×5

− x

4×1

y

1×5

k

2

,

a

ij

∼ U(0, 1)

1

[−1, 1]

9 _0.7119

bimodal

Thesefun tionsaresimpletostateandtoimplement. Theyarealsofasttoevaluate. Thelatterfeature

still does notlet to perform testing on a single ma hine easily asa kriging-based optimization may take

hoursevenwhenappliedto reateonlyonehundredgenerations. However,theuseoftheProA tivePACA

Grid loud[3℄ providesthepossibilityto testthealgorithmswithdierentinitial onditionsat on e.

Theoptimizationqualitywill beassessedbyusing thenormalizedrealimprovement(NRI) denedas

NRI

(

generation

) =

f

0 − f

min

(

generation

)

f

0 − f

true

.

(10)

Here

f

0

isthesmallestvalueofthe ostfun tiona hievedontheinitialDOE,whi his reatedbyusingthe LatinHyper-CubeSampling(LHS),

f

min

denotes thevaluea hievedafteraparti ular generationofpoints

isevaluated,and

f

true

isthetrueidealminimal value,whi hisgivenin Table4.

Also,itisusefultosummarizetheperforman eofvariousalgorithmsbydeningtheirspeed-up,su has

S

0 (

NRI

) ≡

timetorea hNRIbyEI

0,1

timeto rea hNRIbyEI

0,λ

.

(11)

Here the referen e algorithm is krigingwith one-pointimprovements, and the speed-up is dened for the

kriging-basedoptimizationwith

λ ≥ 1

points.

Inorder to takeinto a ount the blo king time, onedenes thereal-time speed-up of the multi-point

algorithmoveritssinglepoint ounterparta ordingto

S

1 (

NRI

) =

S

0 (

NRI

)

RTF

= S

0 ×

WCTforthealgorithmEI

0,1

WCTforthealgorithmEI

0,λ

.

(12)

HereRTFisareal timefa tor whi histheratioofthe orrespondingwall lo ktimes. The orresponding

riteriaforthedomain de ompositionaredened similarly. TheWCTvaluesofallthealgorithms thatare

testedwiththesyn hronousnodea essaregiveninTable5.

3.4 Results

Theoptimization resultsare shown in Figs. 9and 10. One ansee that the optimizationpaths vary alot

w.r.t. the initial DOE, but this ee t is less pronoun ed in the problem "rank1approx9d". In the spa e

with alarge numberof dimensions it is harder to generate aninitial DOE whi h ontainspoints lose to

theglobaloptimum. Theproblem"rosenbro k6d"seemstobeeasyanditssolutionis losertotheproblem

"mi halewi z2d" thanthe"rank1approx9d" ase. Intheformer two asestheapproa hto theoptimumis

mu hfaster.

Thevaluesofthespeed-up

S

0

are omparedin Table5. One anseethat parallelizationbringsnotable improvementswhen solvingtheproblems "mi halewi z2d"and "rosenbro k6d", but thegainisverysmall

for theproblem "rank1approx9d". The latterpointbe omesespe ially strong ifwe onsider thespeed-up

S

1

whi h takesintoa ounttherealtimefa tor.

1

Thea tualmatrixisgenerated withtheS ilab 5.3.3"grand"fun tion. TheMersenne Twister isappliedwithan initial seedsettothenumber29.

(23)

Table5: Wall lo ktimes,realtimefa tors,andspeed-upsofsyn hronousoptimization,NRI

= 0.8

. Param-eters:

t

min

= 10

,

t

max

= 30

,

t

b

= 2

.

WCT RTF "mi halewi z2d" "rosenbro k6d" "rank1approx9d"

S

0 S

1 S

0 S

1 S

0 S

1

EI

0,1

₂₂

₁

EI

0,4

₂₈

_1.3

_3.7

_2.9

_2.7

_2.1

_1.3

_1.0

EI

0,1

+de om

30

1.4

2.4

1.8

2.1

1.5

0.70

0.50

EI

0,4

+de om

30

1.4

4.6

3.4

4.5

3.2

1.2

0.86

Onendsoutthatdomainde ompositionisaboutasgoodastheuseofmulti-pointimprovements. Both

paralleloptimizationmethods analsobe ombinedtoyieldanevengreaterperforman e. However,noneof

theparallelizationmethods areworththeeort onsideringthe"rank1approx9d"problem. Consideringthe

domain de omposition, perhaps this isnot veryhard to explain. Inahigh-dimensional spa e, i.e.,

d = 9

, halvingtherstve oordinates anmakethekrigingalgorithmslessexplorative(global).

Agooduseofdomainde ompositionseemstobeaqui kassessmentofmultimodalityofthe ostfun tion.

Figure 11 indi ates one out of one hundred optimizations in full detail. One an see that some of the

optimizations rea h very high NRI values indi ating that the orresponding subdomain may ontain the

globaloptimum.

3.5 Con lusions

The use of multi-point improvements (

λ = 4

) brings notable speed-ups to the problems "mi halewi z2d" and"rosenbro k6d". However,thealgorithmis notase ientasthebaseline EI

0,1

method inthe aseof

"rank1approx9d". Thisismostlikelyduetothein reaseddimensionallityoftheproblemalthoughadditional

testswouldbene essarytostudyifthis ouldalsobeanee tofthefun tionallands ape. Thesameapplies

tothedomainde omposition. Consideringthe"mi halewi z2d"and"rosenbro k6d"problems, runningthe

EI

0,1

algorithm with 32 subdomains is better than using EI

0,4

without any domain de omposition. Both

methods havebeen ombinedto gainan additive ee t ontheoverall speed-up. However,neither domain

de omposition nor multi-point improvementsprovide an advantage to the use of a single-point EI in the

"rank1approx9d"problem.

Thealgorithms withmulti-pointimprovementsfundamentally annots alewellbe ausetheyinternally

involveamaximizationofthejointimprovementin

d × λ

dimensions. Inordertomakeafurtherprogress,it seemsthatone ould: (i)eitherin reasethe

λ

valuedramati ally(totensandhundredsofpoints)bymaking substantialsa ri esinthequalityoftheimprovementmaximization,or(ii)resorttotheasyn hronousnode

a ess whi h may redu e wall lo k times. The se ond way seems to be more viable and will further be

(24)

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=1, sync

mu=0, lambda=1, sync, decom, best

mu=0, lambda=1, sync, av

mu=0, lambda=1, sync, decom, best, av

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=4, sync

mu=0, lambda=4, sync, decom, best

mu=0, lambda=4, sync, av

mu=0, lambda=4, sync, decom, best, av

"rosenbro k6d"

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=1, sync

mu=0, lambda=1, sync, decom, best

mu=0, lambda=1, sync, av

mu=0, lambda=1, sync, decom, best, av

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=4, sync

mu=0, lambda=4, sync, decom, best

mu=0, lambda=4, sync, av

mu=0, lambda=4, sync, decom, best, av

"rank1approx9d"

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=1, sync

mu=0, lambda=1, sync, decom, best

mu=0, lambda=1, sync, av

mu=0, lambda=1, sync, decom, best, av

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=4, sync

mu=0, lambda=4, sync, decom, best

mu=0, lambda=4, sync, av

mu=0, lambda=4, sync, decom, best, av

(25)

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=4, sync, av

mu=0, lambda=4, sync, decom, av

mu=0, lambda=1, sync, av

mu=0, lambda=1, sync, decom, av

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=4, sync, av

mu=0, lambda=4, sync, decom, av

mu=0, lambda=1, sync, av

mu=0, lambda=1, sync, decom, av

"rank1approx9d"

10

0 ₁₀

1 ₁₀

2 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=4, sync, av

mu=0, lambda=4, sync, decom, av

mu=0, lambda=1, sync, av

mu=0, lambda=1, sync, decom, av

(26)

0

50

100

150

200

250

300

350

400 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=1, sync, decom

0

50

100

150

200

250

300

350

400 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=1, sync, decom

"rank1approx9d"

0

50

100

150

200

250

300

350

400 generation

0.0

0.2

0.4

0.6

0.8

1.0 NR

I

mu=0, lambda=1, sync, decom

Figure 11: Optimization withdomain de omposition allowsto dete t thepresen eof multimodality. Here

ea h single optimization path orresponds to the optimization in a dierent subdomain. There are 32

subdomainswhi h ompletely overtheoriginaloptimizationdomain. Oneinfers thatthe"mi halewi z2d"

riterionismultimodal,whilethe"rosenbro k6d"and"rank1approx9d"problemsare learlyunimodaland

(27)

4.1 Asyn hronous Model

Let

m

bethenumberofnodes,i.e. thenumberofvirtualma hines( omputers)availableonaremote loud toevaluateanexpensivefun tion. Lettheaveragetimeofthefun tionevaluationbedistributeduniformly

in theinterval

(t

min

, t

max

)

,and supposethat thea esstothe loudispossibleeverytime

λ

nodesprovide aresult. Typi ally,

λ ≪ m

,su h as

λ = 1, 2, 3, 4

while

m = 32

. Let

t

b

betheblo kingtimewhi hthetime ittakesto al ulate andsend

λ

newargumentstoupdate thefreenodes.

We will show that the wall lo k time an be redu ed to the blo king time by simply in reasing the

numberofnodes

m

. Moreover,itturnsoutthatthede reaseoftheWCTvaluew.r.t.

m

ishyperboli ,and itsvarian ebe omesnegligiblewithanin reasingvalueof

m

.

Inorder to showthat thisis possible,letus introdu eanasyn hronousa essmodel. Let

T

be theset of

m

elements

t

i

whi haretherealnumbersindi atingthetime ittakestoevaluateanexpensivefun tion. Thenodeupdatetime anthenbe omputedbyusing thesesteps:

1. Find

λ

smallestelementsof

T

(notne essarilydistin t),and reatetheset

S

outofthem:

S = {t

i

1 , t

i

2 , . . . , t

i

λ

}.

(13) 2. Find thelargestelementin

S

,and allitthe omputationtime

t

c

:

t

c

= max S.

(14)

3. Computetheupdatetime

t

u

= t

b

+ t

c

.

(15)

4. Formtheset

M = T \ S

,andmapeveryelement

t

of

M

a ordingto:

t 7→ max(0, t − t

u

).

(16)

5. Update theset

T

:

T = M ∪ S.

(17)

Thepro essofthenode update withtheasyn hronousbuermodelisshowninFig.12. Thefollowing

threerulesareenfor edhere:

1. Thefallingfrontindi atesthat thenodebe omesavailable.

2. Ittakesonetimeunit toupdatethenode.

3. In asemorethanonenodeisavailableatthea esstime,thefasternodeispreferred.

Theinitialset

T

modelsthea tual omputationaltimesofexpensiveevaluations. Thesimplestadequate modelsofarseemstobetheuniformdistributionwithanitesupportgivenby

t

min and

t

max

. Themotivation

behindthis hoi eistheanalysisofthedatawhi hwehavegatheredduringthesimulationoftheexpensive

toevaluatefun tions. Thelatterhavebeen hosentobethekriging-basedoptimizationpro essesthemselves.

Figure13indi atesthedistributionsoftimesthatnodesdemandtoevaluateanexpensivefun tiononthe

ProA tivePACAGrid loud[3℄. Hereexpensive-to-evaluatefun tionsare ompletebudgetedoptimizations

of inexpensive fun tions whose evaluation takes only mi rose onds to omplete. One an see that the

heterogeneousnature ofthe loudissu hthat

t

max

= O(t

min

)

.

(28)

Syn hronousmode,

λ = 1

Node1

t

u

3 3 3 3 3 3 3 3 3 3 3 3 Syn hronousmode,

λ = 4

t

u

6 6 6 6 6 6 Asyn hronousmode,

λ = 4

t

u

1 1 11 11 1 2 1 11 11 1 2 1 11 1 2 11 1 11 2 1 11

Figure 12: Advantages of theasyn hronous nodea ess. Inthe syn hronous asewith

λ = 1

, WCT

= 3

. AddingthreeslowerNodes24allowstohavefoursimultaneousevaluations,butthewall lo ktimewillbe

determinedbytheworstnode. However,theasyn hronousa essredu esthe

t

u

valuesto

t

b

forthemajority ofexpensivefun tion evaluations.

4.2 Computational Analysis of Wall Clo k Time

The wall lo k time ould be omputed by performing the ve steps indi ated above. They need to be

repeatedasmanytimesasthenumberof

λ

-generationsdemands,andalsorepeatingtherunswithdierent initialsets

T

. TheS ilab odeofasinglerun isprovided inAppendix se t:listingw tasyn ,where"busz" standsfor

m

,and"lamb"for

λ

.

Fig.14indi ates howtheWCTvaluede reasesw.r.t. anin reasingvalueof

m

. One anseethat when

m

islargeenough,theWCTvaluesbe omesharply on entratedatthe

t

b

value.

TheWCTvaluesde reaseroughlyas

O(m

−1

₎

. Amorepre iserulethattsthedatapresentedinFig.14

is

O(m

−1−α

₎

,where

α ≈

_3t

t

b

min

(λ − 1).

(18) Noti ethat

t

max

isnotpresentintheequation.

Thesettingthatmat hestheProA tivePACAGrid loudbestistheonewith

t

min

= 10

,and

t

max

= 30

.

When

m = 32

, this allows to update

λ = 4

nodes with thewall lo ktime approa hing

t

b

. The relevant WCTvaluesareshownin Table6.

For omparison,herewehavealsopresentedthe orrespondingstatisti swithasyn hronoussimulation.

As one an see in Table6, the redu tionof theWCT valuedue to theasyn hronous simulationseems to

beimpressive. Sowhat exa tly isoptimization ofan expensive-to-evaluatefun tion? Thepra ti al fun tion

(29)

Table6: Mean and deviation ofthe node update time

t

u

for dierentalgorithms. Parameters:

t

min

= 10

,

t

max

= 30

,

t

b

= 2

. Averagingisperformedwith

25 · 10

4

points.

Asyn hronous m

λ

Mean(WCT) Deviation

True

32

1

2.04 0.0024

True

32

4

2.77

0.13

False

0

1

22.0

5.77

False

0

4

28.0

3.27

Table7: Wall lo k times,real time fa tors,and speed-ups ofasyn hronousoptimization omparedto the

syn hronous ase,NRI

= 0.8

. Parameters:

t

min

= 10

,

t

max

= 30

,

t

b

= 2

. WCT RTF "rank1approx9d"

S

0 S

1

EI

0,4

syn

28

1

EI

0,4

asyn

2.77

0.099

0.42

4.2

EI

28,4

asyn

2.77

0.099

0.56

5.7

4.3 Testing Asyn hronous Algorithms

Asyn hronousalgorithmsareexpe tedtoredu ethespeedoftheevolutionoftheoptimizationpathtowards

theoptimumw.r.t. thenumberofgenerations. Thereasonisthatadire tuseofthemulti-pointimprovement

riteriondoesnotex ludethepossibilityofadupli atepointgeneration. Oneexampleoftheappearan eof

dupli atepointsisillustratedin Fig.15.

Asa onsequen e,theevolutionpathsofoptimizationmighttendtohavemorejumpdis ontinuitieswhen

the riterionEI

0,λ

is employedin theasyn hronous settings. A dire t remedy is to utilize afull riterion

EI

µ,λ

where

µ

points orrespondtothe andidatelo ationswhoseexpensivefun tionvaluesarebeinga tively evaluated,butarenotknownatthetimewhenarequest omestosendanew andidatefortheevaluation.

Eq.(7)statesthatin ludinga tivepoints

x

1:µ

in thetargetpartoftheEI riterionpreventsthealgorithm fromresamplingthere[21℄. It anbeseenthatifthenew

λ

pointsformasubsetofthe

µ

a tivepoints,then EI

µ,λ

will bezero. More generally, EI

µ,λ

de reases assomeofthe new

λ

sear h pointsget loserto a tive points[21℄.

Theappli ationofthesyn hrononousalgorithmwiththeEI

0,4

riterion,aswellasthetwo orresponding

asyn hronousalgorithms,to the"rank1approx9d"problemissummarizedin Fig.16.

One an see that the asyn hronous algorithm with the EI

0,4

riterion is inferior to its syn hronous

ounterpart, but the in lusion of

µ = 28

a tivepoints improves thealgorithm. Still, theEI

28,4

algorithm

makesaslowerprogressw.r.t. thenumberofgenerations. Whiledupli atesarenotthemajorissueanymore,

one an noti e that a syn hronous algorithm always uses a omplete information, i.e. both, the lo ation,

and theexpensivefun tion value,whiletheasyn hronous aseonly ex ludestheappearan e ofdupli ates,

butitwill oftendoit"blindly"withoutanavailablefun tion value.

Theexamplesofthespeed-upvaluesare providedinTable7.

The

S

0

values indi ate that asyn hronous algorithms anmake theprogress w.r.t. generations slower (2x) than the orresponding syn hronous ases, but the real time fa tor is ru ial and may result in an

asyn hronousalgorithmwhi hruns vetimesfasterin arealtime.

Optimization paths of asyn hronous algorithms are ompared with the syn hronous ases in Fig. 17.

The orrespondingmeansanddeviationsareshowninFig.18. Theresultsindi atethatoptimizationpaths

in reaseslowerw.r.t. thenumberofgenerationswhenthealgorithmsareasyn hronous. However,onemust

Parallel Budgeted Optimization Applied to the Design of an Air Duct

HAL Id: hal-00723427

https://hal.archives-ouvertes.fr/hal-00723427v2

Submitted on 17 Sep 2012

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Parallel Budgeted Optimization Applied to the Design

of an Air Duct

Ramunas Girdziusas, Rodolphe Le Riche, Fabien Viale, David Ginsbourger

To cite this version:

Ramunas Girdziusas, Rodolphe Le Riche, Fabien Viale, David Ginsbourger. Parallel Budgeted

Opti-mization Applied to the Design of an Air Duct. 2012. �hal-00723427v2�

,

girdziusas[at]emse.fr

leriche[at]emse.fr

fabien.viale[at]inria.fr

david.ginsbourger[at]stat.

unibe.ch

f

1.83

1.44

O(1)

λ

O(1)

λ

λ

λ = 1

λ

λ

k

ǫ

ν = 1.6 · 10

−4

m

2

/s

Re = 4000

k

ǫ

∂ ¯

u

i

∂t

+ ¯

u

j

∂ ¯

u

i

∂x

j

=

∂

∂x

j



(ν + ν

T

)

 ∂ ¯

u

j

∂x

i

+

∂ ¯

u

i

∂x

j

_m

_/s

∂ ¯

∂ ¯

∂ ¯