• Aucun résultat trouvé

A survey of rare event simulation methods for static input–output models

N/A
N/A
Protected

Academic year: 2021

Partager "A survey of rare event simulation methods for static input–output models"

Copied!
29
0
0

Texte intégral

(1)

HAL Id: hal-01081888

https://hal.archives-ouvertes.fr/hal-01081888

Submitted on 12 Nov 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

A survey of rare event simulation methods for static input–output models

Jérôme Morio, Mathieu Balesdent, Damien Jacquemart, Christelle Vergé

To cite this version:

Jérôme Morio, Mathieu Balesdent, Damien Jacquemart, Christelle Vergé. A survey of rare event sim- ulation methods for static input–output models. Simulation Modelling Practice and Theory, Elsevier, 2014, 49, pp.287-304. �10.1016/j.simpat.2014.10.007�. �hal-01081888�

(2)

input-output models

JérmeMorio

1,

MathieuBalesdent

2

Damien Jaquemart

2,3

Christelle Vergé

2,4,5

Abstrat

Crude Monte-Carlo or quasi Monte-Carlo methods are well suited to haraterize events of

whihassoiatedprobabilities arenottoolowwithrespettothesimulationbudget.Forvery

seldomobservedevents,suhastheollisionprobabilitybetweentwoairraftinairspae,these

approahes donot lead to aurateresults. Indeed, the numberof available samples is often

insuient to estimate suh low probabilities (at least 106 samples are needed to estimate a probability of order 104 with 10% relative error with Monte-Carlo simulations). In this artile,one reviewed dierent appropriatetehniquesto estimate rare event probabilities that

require afewernumberof samples.These methods an bedivided into four main ategories:

parameterizationtehniquesofprobabilitydensityfuntiontails,simulationtehniquessuhas

importanesamplingorimportanesplitting,geometri methodstoapproximateinput failure

spaeandnally,surrogatemodelling.Eahtehniqueisdetailed,itsadvantagesanddrawbaks

aredesribedandasynthesisthataimsatgivingsomeluestothefollowing questionisgiven:

"whihtehniquetousefor whihproblem?".

Keywords: Monte-Carlomethods,Rareevent,Input-outputmodel,Simulation

orrespondingauthor

Emailaddresses: jerome.morioonera.fr (JérmeMorio),mathieu.balesdentonera .fr

(MathieuBalesdent),damien.jaquemartonera .fr(DamienJaquemart),

hristelle.vergeonera.f r(ChristelleVergé).

1

Onera-TheFrenhAerospaeLab,BP74025,31055ToulouseCedex,FraneTel.:+33562252663

2

Onera-TheFrenhAerospaeLab,BP80100,91123PalaiseauCedex,Frane

3

INRIARennes,ASPIAppliationsofinteratingpartile systemstostatistis, ampusdeBeaulieu,

35042Rennes,Frane

4

INRIABordeaux,351oursdelaLibération,33405TaleneCedex,Frane

5

CNES,18avenueEdouardBelin,31401ToulouseCedex9,Frane

(3)

Rareeventestimationhasbeomealargeareaofresearhinthereliabilityengineering

andsystemsafetydomains.Asigniantnumberofmethodshasbeenproposedtoredue

theomputationburdenfortheestimationofrareeventsfromsamplingtoextremevalue

theory.Howeveritisoftendiulttodeterminewhihalgorithmisthemostadaptedto

agivenproblem.Moreover,theexisting surveyartilesonrareeventsare oftenfoused

onspeialgorithms[13℄.Thenoveltiesofthisartilearethustoprovideabroadview

of the urrent available tehniquesto estimate rare event probabilities desribed with

aunied notationand to provide someluesto answerthis question:whih rareevent

tehniqueisthemostadaptedtoagivensituation?

Thegeneralproblemonsideredin thisartileisanalysedin arstsetionandthenall

thedierentmethodsaredesribedseparately.Theiradvantagesanddrawbaksarealso

given.Finally,asynthesishelpsthereadertodeterminethemostappropriatemethodto

agivenrareeventestimationproblem.

Let us onsider a d-dimensional random vetor X with a probability density funtion (PDF) h0, φ a ontinuous positive salar funtion φ : Rd R and S a threshold.

The dierent omponents of X will be denoted X= (X1, X2, ..., Xd) in the following.

The funtion φ is stati, i.e., doesnotdepend on time, and representsfor instane an

input-outputmodel. Thiskindofmodel isnotablyusedin numerousengineering appli-

ations[49℄.Weassumethat theoutputY =φ(X)isasalarrandomvariable.Inthis

artile, we propose to review dierent algorithms that an be eient to estimate the

probabilityP =P(φ(X)> S)whenthisquantityisrarerelativelytotheavailablesim-

ulationbudgetN,thatiswhenP < N1.Forthesakeofoniseness,theissueofextreme

quantileestimation is notaddressed even ifthe vast majority of the methods that are

presentedinthepaperanbeadaptedtothisspeiase.Theaseofdynamisystems

modeledwith Markovhains isalsonotonsidered inthispaper.Speialgorithmex-

tensionsfor large omplexsystems modelled by anetwork ora oherent fault tree are

ompletely detailed in [10℄ and willnot be muhdevelopedhere. Itorrespondsto the

asewhere theinputs Xi, i = 1, ..., d follow aBernoullidistribution andthe output is equivalenttoanindiatorfuntion.

2. Monte-Carlomethods

AsimplewaytoestimateaprobabilityistoonsiderrudeMonte-Carlo(CMC) [11

16℄. Forthat purpose, onegenerates N independent and identially distributed (i.i.d.) samplesX1, ...,XN from thePDF h0 and omputestheiroutputs withthe funtion φ: φ(X1), ..., φ(XN). TheprobabilityP(φ(X) > S), also alled failure probability, is then estimatedwith

PˆCMC = 1 N

XN i=1

1φ(X

i)>S, (1)

where1φ(X

i)>Sisequalto1 ifφ(Xi)> Sand0otherwise.Thisestimationonvergesto

therealprobabilityasshowsthelawoflargenumbers[13℄.Thepositiveandnegativeas-

petsofCMCaredesribedinTable1.Apossibleindiatoroftheestimationeienyis

notablyitsrelativedeviation.TherelativedeviationorrelativeerrorREofanestimator

(4)

Simpleimplementation Slowonvergene

InformationonφnotneededSigniantsimulationbudgetforrareevents

Nobias

Table1

AdvantagesanddrawbaksofCMCmethods.

Pˆ ofP isgivenbythefollowingratio:

RE( ˆP) = σPˆ

E( ˆP), (2)

withσPˆ thestandarddeviation ofPˆ andEthemathematialexpetation.Therelative errorissaidboundedwhenRE( ˆP)remainsboundedwhenP −→0[17,18℄.Inthatase,

thenumberofsamplesneededto getaspeiedrelativeerroris bounded whateverthe

rarityof φ(X)> S. Thelogarithmi eieny LE analso be dened foran unbiased

estimatorPˆ with[17,18℄,

LE( ˆP) = lim

P→0

log(E( ˆP2))

log(P) = 2. (3)

Logarithmi eieny is a neessary but not suient ondition for bounded relative

error. Charaterizing the rare event probability estimate with these onepts is very

importanteveniftheyareoftendiultto verifyin pratie.

SinePˆCMC isunbiased,therelativeerrorof theestimatorPˆCMC is givenbytheratio

σP CM Cˆ

P with σPˆCM C, thestandarddeviation ofPˆCMC.KnowingthetrueprobabilityP

oftheevent(φ(X)> S),onehas[11,19℄

σPˆCM C

P = 1

N

PP2

P . (4)

Considering rare event probability estimation, that is when P takes low values, one

obtains

P→0lim σPˆCM C

P = lim

P→0

1

N P = +. (5)

Therelativedeviationisonsequentlyunbounded.Forinstane,toestimateaprobability

P of order 10−4 with a10%relative deviation, at least 106 samplesare required. The

simulationbudgetisthusanissuewhentheomputationtimerequiredtoobtainasample

φ(Xi)is not negligible.CMC is thus notadapted to rareevent estimation and awide olletionofstatistiandsimulationmethodshasbeendeveloped.Thefollowingsetions

desribethedierentavailablealternativestoCMC toimproveprobabilityestimations,

i.e., to redue the number of required samples, inrease the estimation auray, and

thusdereaseRE( ˆP).

3. Statistial tehniques

Statistialtehniquesenabletoderiveaprobabilityestimateandassoiatedondene

intervalswithaxedset ofsamplesφ(X1), ..., φ(XN). Themainstatistial approahes,

extremevaluetheoryandlargedeviationtheory,modelthebehaviourofthePDFtails.

Letusreviewtheirtheoretialfounding.

(5)

Extreme value theory(EVT) [20,21℄haraterizes thedistribution tailsof arandom

variable, basedon areasonablenumberof observations. Thanksto itsgeneralapplia-

tive onditions,thistheory hasbeenwidelyused fordesribing extrememeteorologial

phenomena with appliations suh as hydrology[22℄, snowfall [23℄, but also in nane

andinsurane[20,24℄,andengineering[25℄.

3.1.1. Law ofsample maxima

EVT isnotablyveryuseful whenonehasto work withonly axedset of data.One

onsequentlyassumesinthefollowingthat anitesetofi.i.d.samplesφ(X1), ..., φ(XN)

oftheoutput isavailable,butalsothat oneannotgeneratenewsamplesofφ(X).The

assoiated orderedsampleset is denedwith φ(X(1))φ(X(2))...φ(X(N)). EVT

enablesto estimateforsomethresholdStheprobabilityP(φ(X)> S).

Thefounder theorem of EVT [20,26,27℄isthat, under someonditions,themaxima of

ani.i.d.sequeneonvergetoageneralizedextremevalue(GEV) distributionGξ,whih

admitsthefollowingumulativedistributionfuntion (CDF)

Gξ(x) =

exp(exp(x)), for ξ= 0, exp

(1 +ξx)1ξ

, for ξ6= 0.

(6)

ThesetofGEVdistributionsisomposedofthreedistinttypes,haraterizedbyξ= 0,ξ > 0 and ξ <0 that orrespond to theGumbel, Fréhet and Weibulldistributions respetively.Letusdene G,theCDFofthei.i.d.samplesφ(X1), ..., φ(XN).

Theorem3.1 SupposethereexistaN andbN,withaN >0suhthat,for all yR P

φ(X(N))bN

aN y

=GN(aNy+bN)N→∞−→ G(y),

where Gis anon degenerate CDF, then Gis aGEV distributionGξ.In this ase, one

denotes GM DA(ξ)(MDA=maximumdomain ofattration).

ThesequenesaN and bN areomputedin [20℄for mostwell-knownPDF. An approxi-

mationofP(φ(X)> S)[20℄ forlargevaluesofSand N analsobeobtained:

PˆEV T(φ(X)> S) 1 N

1 +ξ

SbN

aN

1ξ

. (7)

TheGEVapproahisnotablyusedwhenonlysamplesofmaximaareavailable.Inthat

ase,thedierentparametersoftheGEVdistributionareobtainedbydeterminingmax-

imumlikelihood orprobabilityweightedmomentestimators.Whensamples ofmaxima

arenotavailable,itisrequiredtogroupthesamplesφ(X1), ..., φ(XN)intobloksandt

theGEVusingthemaximumofeahblok(blokmaximamethod).Themaindiulty

istodetermineaneientsamplesizeforthedierentbloks.

3.1.2. Peak overthreshold approah

Insteadofgroupingthesamplesintoblokmaxima,POTonsidersthelargestsamples

φ(Xi)toestimatetheprobabilityP(φ(X)> S).

(6)

to haraterizethe distribution of samples above athreshold u, whih is given by the

generalizedParetoCDF.AnalternativeistouseaPoissonpointproesswhihountsthe

numberofthresholdexeedanes.Thisapproahisnotdevelopedinthisartile,butone

anreferto[27℄ formoredetails.TherstpaperlinkingtheEVTwith thedistribution

ofathresholdexeedaneis[28℄.Later,DeHaanobtainsaresultofthesametype,with

aslightlysimpliedonlusion,usingslowvaryingfuntions[29℄.Thefollowingtheorem

[20℄anbethenobtained:

Theorem3.2 LetusassumethatthedistributionfuntionGofi.i.d.samplesφ(X1),..., φ(XN)isontinuous.Set y= sup{y, G(y)<1}= inf{y, G(y) = 1}.Then,the twofol-

lowing assertionsareequivalent

(i) GM DA(ξ),

(ii) thereexistsapositive andmeasurablefuntion u7→β(u) suhthat

u7→ylim sup

0<y<y−u|Gu(y)Hξ,β(u)(y)|= 0,

where Gu(y) = P(φ(X)u y|φ(X) > u), and Hξ,β(u) is the CDF of a generalized Pareto distribution(GPD) withshape parameterξandsaleparameter β(u).

TheexpressionoftheGPD distributionfuntion isthefollowing

Hξ,β(x) =

1exp

xβ

, forξ= 0, 1

1 + ξxβ−1/ξ

, forξ6= 0.

(8)

This theorem is in fat useful to estimate a probability of exeedane. Indeed, the

probabilityP(φ(X)> S)anberewrittenas

P(φ(X)> S) =P(φ(X)> S|φ(X)> u)P(φ(X)> u). (9)

forS > u. AnaturalestimateofP(φ(X)> u)isgivenby PˆCMC(φ(X)> u) = 1

N XN i=1

1φ(X

i)>u. (10)

With theTheorem3.2 andforsigniantvalueofu,oneobtains

Pˆ(φ(X)> S|φ(X)> u) = 1Hξ,β(u)(Su). (11)

TheestimateofP(φ(X)> S)isthenbuiltwith PˆP OT(φ(X)> S) = 1

N XN i=1

1φ(X

i)>u

!

× 1Hξ,β(u)(Su)

. (12)

ThemathematialjustiationofEq.11andEq.12isnotablydisussedin[21℄,[30℄,[31℄,

or[32℄ foragivenset of samplesto determine ifthis set issuitable forthe appliation

of POT. Three parameters haveto be determined in the POTprobability estimate of

Eq. 12: thethreshold uand the ouple(ξ, β(u)). Thehoie of u isveryinuentsine

it determines the samples that are used in the estimation of (ξ, β(u)). Indeed, a high

thresholdleadstoonsideronlyasmallnumberofsamplesintheestimationof(ξ, β(u))

andthustheirestimateanbethenspoiledbyalargevarianewhereasalowthreshold

(7)

Noneedtoresample Complexestimationoftheadequateparameters

(u, ξ, β(u))oroftheblokmaximasize.

CanbeappliedwitharelativelylowvalueofN Lesseientthansimulation

methodswhenresamplingispossible

Table2

AdvantagesanddrawbaksofEVT.

introduesabiasintheprobabilityestimate[33℄.Thereareseveralmethodstodetermine

avaluablethresholduknowingthesamples.Themostwell-knownonesaretheHillplot

andthemeanexessplot[20℄.Thesemethodsareneverthelessveryempirialsinethey

are based on graphial interpretation. It is often neessary in pratie to ompare the

estimatesofugivenbythedierentmethods.Onethevalueofuisset,theparameters (ξ, β(u)) are often estimated by maximum likelihood [34℄ or more oasionallyby the

method of moments [35℄.Theestimate PˆP OT(φ(X)> S)givenin Eq. 12 for S > u is

thenompletely dened.A reviewofthese dierentmethods anbefound in[36℄.It is

notpossible,toourknowledge,toontroltheprobabilityerrorestimateinEVT.Never-

theless,theuseofboostraponsamplesφ(X1), ..., φ(XN)[37℄angivesomeinformation ontheeieny ofEVT.

3.1.3. Blok maximaversus POT

ThePOTmethodtakesintoaountallrelevanthighsamplesφ(X1), ..., φ(XN)whereas

theblokmaximamethod anmiss someofthese highsamplesand, onthesametime,

onsidersomelowersamplesinitsprobabilityestimation.Thus,POTseemstobemore

appropriateforthedesignofsample PDFtail.Nevertheless, theblokmaxima method

is preferablewhen the available samples are notexatlyi.i.d. orwhen only samples of

maxima are available. For instane, the samples of a monthly river maximum height

orrespond to this situation. Finally, the tuning of blok maxima size turns out to be

easierthanthetuning ofPOTthresholduinmanysituations[38℄.Theadvantagesand

drawbaksofEVTarepresentedinTable2.

3.2. Largedeviationtheory

Thelargedeviationtheory(LDT)haraterizestheasymptotibehaviourofPDFse-

quenetails[3941℄andmorepreisely,itanalyseshowaPDFsequenetaildeviatesfrom

itstypialbehaviourdesribedbythelawoflargenumbers.LDTanbeusedtoevaluate

theonvergeneofrareeventalgorithms[4246℄.LetusdeneHN =J(φ(X1), ..., φ(XN))

arandomvariableindexed byN withJ aontinuoussalarfuntion,H itsmathemat-

ial expetation and VN = HN H. Onesays that VN satises the priniple of large

deviationswithaontinuousratefuntion Iifthefollowinglimitexists:

Nlim→∞

1

N ln[P(|VN |> γ)] =I(γ). (13)

TheexisteneofthislimitimpliesforalargevalueofN that

P(|VN |> γ)exp (N I(γ)). (14)

(8)

The probability deays exponentially as N grows to innity, at a rate depending on γ. Thisapproximationis awell-known resultof LDT. Ifthe limitdoesnotexist, then

P(|VN |> γ)hasatoosingularbehaviourordereasesfaster thanexponentialdeay.If thelimitisequalto0,thenthetailP(|VN |> γ)dereaseswithN slowerthanexp (N a)

witha >0.Theomputation oftheratefuntion I isnotobviousbut anbeobtained

throughtheGärtner-Ellistheorem[47℄.Letusdenethefuntionλ(θ)ofVN with

λ(θ) = lim

N→∞

1

N ln [E(exp (N θVN))], (15)

withθR.

Theorem3.3 Gärtner-Ellis theorem If the funtion λ(θ) of the variable VN exists

andisdierentiable for allθR,thenVN satisesthepriniple of largedeviationsand

I(γ)isgiven by

I(γ) = sup

θ∈R

[θγλ(θ)].

In the spei ase of a salar funtion J, one an derive the Cramér theorem from

Gärtner-Ellistheorem[47℄.

Theorem3.4 Cramér theorem If VN = N1 PN

i=1J(φ(Xi)) where the random vari-

ables J(φ(Xi))arei.i.d, the ratefuntion isgiven by I(γ) = sup

θ∈R

[θγλ(θ)],

with

λ(θ) = ln [E(exp (θJ(φ(X))))].

Thistheoremonlyholdsforlighttaildistributions.

Let us onsider theMonte-Carlo probabilityestimate given in Eq. 1.In that ase, one

has J(φ(.)) = 1φ(.). The random variable J(φ(Xi)) follows aBernoulli distribution of meanP. ThesequeneVN isdened with

VN = 1 N

XN i=1

1φ(X

i)>S

!

P. (16)

The funtions λ(θ) and I(γ) anbe derived for some well-known PDF. In the ase of

BernoullidistributionsofmeanP,onehas

λ(θ) =Pexp(θ) + 1P, (17)

and

I(γ) =γlnγ P

+ (1γ) ln 1γ

1P

. (18)

Oneanthen obtainthe onvergene speed oftheMonte-Carloprobabilityestimatein

funtion ofthenumberofsampleswiththefollowingequation

Nlim→∞

1

N ln[P(|VN |> γ)] =I(γ) =γlnγ P

(1γ) ln 1γ

1P

. (19)

The quantity I(γ)orrespondsto the relativeentropy (Kullbak-Leiblerdivergene) of a oin toss with bias γ with respet to true value P. In a lot of situations, the large deviationratefuntionistheKullbak-Leiblerdivergene[47℄.

Références

Documents relatifs

Genetic mark- recapture and traditional mark-recapture provide the minimum estimate of maximum foraging range, harmonic radar is limited to a radius of detection less than

While some models objective is to retrieve ta- bles that contain the answer to a given ques- tion (Wang et al., 2021a; Herzig et al., 2021), oth- ers (Pan et al., 2021) use a

In the current paper, we revisit the DTC structure to provide theoretical conditions, expressed through linear matrix inequalities (LMIs), for the stability analysis of the closed

We used three event logs (one synthetic and two real-life) 3 to validate the process performance measures computed by the automatically generated simulation models against the

Two points argue against a positive answer to the question: most methods are not available as a stand-alone tool that a user could apply to his favorite data set; most

In this paper, we perform the optimal determi- nation of tuning parameters on several representative test cases in order to propose some recommendations for the tuning of rare

Study of new rare event simulation schemes and their application to extreme scenario generation.. Ankush Agarwal, Stefano de Marco, Emmanuel Gobet,

This model was developed by first identifying four attributes essential to a multiple operator: multiple vehicle discrete event simulation: events, event arrival processes,