HAL Id: hal-01689366
https://hal.archives-ouvertes.fr/hal-01689366
Submitted on 22 Jan 2018
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Efficient evaluation of reliability-oriented sensitivity indices
Gilles Defaux, Guillaume Perrin
To cite this version:
Gilles Defaux, Guillaume Perrin. Efficient evaluation of reliability-oriented sensitivity indices. Journal
of Scientific Computing, Springer Verlag, 2018. �hal-01689366�
indies
G.Perrin a
, G.Defaux a
a
CEA/DAM/DIF, F-91297, Arpajon,Frane
Abstrat
Therole ofsimulationkeeps inreasingforthe reliabilityanalysis ofom-
plex systems. Most ofthe time, these analysesan beredued toestimating
theprobabilityofourreneofanundesirableevent,alsoalledfailureprob-
ability,usingastohastimodelofthesystem. Iftheonsideredeventisrare,
sophistiated sample-basedproedures are generally introdued toget a rel-
evant estimate of the failureprobability. Based on the samples onstruted
for the evaluationof this estimate, this workdenes two typesof reliability-
oriented sensitivity indies. The rst ones are introdued to identify the
modelinputs whosevariabilityhas toberedued inpriority toderease this
probability. The seond ones are used to nd the model inputs whose dis-
tributionhas tobe partiularlywell-haraterizedfor the available estimate
to be realisti. It is also shown how these sensitivity indies an be derived
whenthetruemodelisapproximatedbyasurrogatemodel. Inpartiular,an
innovative proedure is proposed to take into aount the surrogate model
unertainty in the estimation of these sensitivity indies. The proposed ap-
proahis then appliedto the reliability analysis of aseries of numerialand
industrialexamples.
Keywords:
Sobol indies,Gaussianproess, sensitivity analysis, risk analysis.
1. Introdution
The reliability analysis of omplex systems is more and more quantied
using numerial simulations. Hene, omputer odes in the form
y(x) =
Email address: guillaume.perrin2ea.fr(G.Perrin)
g(x; d)
are generally involved. Here,x = (x
1, . . . , x
D) ∈ X
is the vetor of stohasti inputs andd
is the vetor of deterministi inputs. Formally, given an adapted thresholdq
, the problem of estimating the probability of ourreneofanundesirableevent,seenasy(x)
exeedingq
,anbereduedto omputing the probability
p := P
x(y(x) > q) = E
x1
y(x)>q,
(1)1
y(x)>q=
( 1
ify(x) > q,
0
otherwise.
(2)While
D
inreases and low values ofp
are onsidered, the evaluationof
p
annot be handled with usual quadratures. Sampling tehniques are preferred, suh as the Monte Carlo Simulation (MCS) [34℄. In MCS, theode is omputed in a large number of inputs, and probability
p
is esti-mated by ounting the number of responses that are above the threshold
q
. However, the square value of the oeient of variation of the estimatorprovided by MCS is proportional to
1/p
. Hene, the numberof ode evalu-ations required for MCS to estimate smallvalues of
p
(sayp < 10
−3 for theonsidered appliations) quikly beomes burdensome. To irumvent this
problem, various approahes have been proposed. On the one hand, several
nonstatistial approahes, suh as the rst-order or seond-order reliability
methods (FORM/SORM) [14, 31, 18, 5℄, propose to approximate the limit
statefuntionasaparametrifuntion. Then,theseapproximationsareused
to evaluate
p
at a low omputational ost, but at the expense of a redued preision. On theotherhand,thesplittingmethods[16℄proposetorewritep
using a nite sequene of inreasing thresholds. Dependingon the hoie of
the thresholds,thevarianeoftheaggregatedestimatoran bemuhsmaller
than the one given by MCS, as itwillbeexplained inthe next setion.
In addition to this estimation of
p
, it is useful to quantify the impor-tane of eah modelinput onthe failureprobability. This is the purpose of
reliability-orientedglobalsensitivityanalysis(ro-GSA).Here,thewordglobal
refers to the denition given by [36℄, as the whole variation domain of the
inputs is onsidered. Indeed, if one modelinput happens to be strongly in-
uential,itouldbeworthtrying todereaseitsvariability. Ontheontrary,
if one model inputseems tohave noinuene on
p
,itsvariabilityan be ne- gleted, resulting in a simpler model. Several methods have been proposedhowthe probabilityof exeeding
q
isaeted by xing one inputmodeltoagiven value. It an beshown that this is equivalent toomputing the Sobol
indies assoiated with the indiator funtion
1
y(x)>q (see Setion 3). How-ever, dediated evaluationsof the ode are generally required toassess suh
indies[40℄,whihanbemuhmorenumerousthatthe onesrequiredtoget
a goodestimate of
p
. Hene, the rst objetive of this work is to propose amethodusingnonparametristatististoevaluatesuhSobolindieswithout
additionalode evaluations.
Thesensitivityofprobability
p
toeahmodelinputanalsobeevaluatedby omparing the partial derivatives of
p
with respet to the statistialmo-ments of eah model input. Using adapted strategies, suh derivatives an
be omputed as a simple post-proessing of the ode evaluations that were
arriedoutforthe estimationof
p
[33℄. WhereasSobolindiesareassoiatedwith axed distributionfor the inputs,suhindiators assess thesensitivity
of
p
tosmallhangesof theinput distribution. Hene,the meanings ofthese twosetsofindiatorsare dierentbut omplementary. Ontheone hand,theSobol indies indiatethe modelinputs whose variability has tobe redued
inpriorityif wewantto derease
p
. On theother hand,the derivative-based indies show the model inputs whose statistial moments have to be par-tiularly well-ontrolled to get a relevant estimation of
p
. In that prospet,generalizing the works ahieved in [19℄, the seond objetive of this work is
topropose reliability-orientedindiesthatan onsider moregeneralmodi-
ationsof the modelinputsdistributions,and whihan be usedtoevaluate
ross-eets between modelinputs. Suh indies willalsobeomputed asa
simple post-proessingof the simulations used toestimate
p
.When the numerial ost assoiated with one evaluation of the ode is
high (between several minutes to several days CPU), surrogate models are
ommonlyintroduedtoemulatethe time-demandingomputer ode forthe
estimation of
p
. Among these methods, the Gaussian proess regression(GPR) method, or kriging, plays a major role. This is mostly due to its
ability to provide an unertainty on the evaluation of
p
that is due to thesubstitution of the true ode by its emulator [35, 37℄. Finally, this paper
shows how to derive eah reliability-oriented sensitivity index in the ase
whenthe ode isreplaed by aGaussianemulator. Inpartiular, theimpat
of the emulator unertainty on the estimation of these indiesis quantied.
Theoutlineofthisworkisasfollows. First,Setion2brieyreviewsexist-
ing sample-basedmethodsfor estimating
p
. Setion3 introdues reliability-on
p
. Another type of sensitivity indies is dened in Setion4, in order to quantify the robustness of the evaluation ofp
to small perturbations of the inputdistribution. Then,Setion5introduestheestimationoftheseindieswhentheomputerode isreplaedbyaGaussianemulator. Atlast,aseries
of examples are shown inSetion6 to illustratethe interest of the proposed
methods.
Notations
The following notations are adopted:
• x, y
orrespond to salars.• X, Y
orrespond to integers.• x, y
orrespond to vetors.•
Letx
i bethe omponentsof a vetorx
.•
For allD
-dimensional vetorx = (x
1, . . . , x
D)
, we denote byx
−i:=
(x
1, . . . , x
i−1, x
i+1, . . . , x
D)
the vetor that gathers allthe omponentsof
x
but thei
th.•
For all random vetorx
,E
x[ · ]
andV
x[ · ]
denote the mathematial expetationand the varianeoperator assoiatedwiththe distributionof
x
.2. Bakground : sample-based methods to estimate probabilities
of exeeding thresholds
Let
S
be the system we are interested in,whose properties (dimensions, boundaryonditions,materialproperties...) anbeharaterizedbyavetorof
D ≥ 1
parametersx ∈ X
,whereX
isasubsetofR
D. Vetorx
ismodelledby a random vetor to take into aount the fat that these parameters are
not perfetly known. The omponents of
x
are assumed to be statistially independent. Forall1 ≤ i ≤ D
,letX
i,f
xi andF
xi bethedenition domain,the probabilitydensity funtion (PDF) and the umulative density funtion
(CDF) of omponent
x
i respetively. It follows thatX =
×
i=1DX
i, f
x(x) = Y
D i=1f
xi(x
i), F
x(x) = Y
D i=1F
xi(x
i),
(3)where
f
x andF
x are the PDF and the CDF ofx
respetively. In addition, lety :
( X → R
x 7→ y(x)
(4)be the real-valued deterministi mapping desribing the behaviour of
S
. Inthis work,we areinterestedinthe evaluationofthe probability
p
fory(x)
toexeed a given threshold
q ∈ R
,p := P
x(y(x) > q) = Z
X
1
y(x)>qf
x(x)dx = E
x1
y(x)>q,
(5)butalsointheidentiationoftheomponentsof
x
thatplaythemostimpor-tant roles on this probability. We moreover assume that the omputational
ost assoiated with one evaluationof
y
is high (between several minutes toseveral hours CPU), so that the number of ode evaluations is supposed to
bebounded (less than
10
3 forinstane). In thatontext, weare partiularly interested by methods that ould allow all the omputational budget to beused atthe sametimeforthe estimationof
p
and forthesensitivityanalysis.Asthemodelinputsareassumed independent,theyallanbeonsidered
as normally distributed, entred and of variane equal to 1 without loss of
generality. Indeed, an isoprobabilist transform an been applied to eah
modelinput[32, 24, 17℄,impatingneitherthe denitionof
p
northe resultsof the sensitivity analysis. Therefore, in the following,
f
i(x
i) = ϕ(x
i; 0, 1), x
i∈ X
i= R ,
(6)where for all
(µ, σ)
inR × R
+∗,ϕ(x
i; µ, σ) := 1
√ 2πσ exp
− (x
i− µ)
22σ
2.
(7)The most famous sample-based method to estimate
p
is the MCS. Ifx
n, 1 ≤ n ≤ N
, denoteN
independent opiesofx
,itiswellknown [34℄thatp
MC:= 1 N
X
N n=11
y(xn)>q (8)denes an unbiased estimator of
p
. The assoiated oeient of variationveries
δ
MC2= 1 − p
Np .
(9)This approah is partiularly easy to implement, but requires a lot of
ode evaluationstogetaeptablevaluesfor
δ
MC. Alternatively,the splitting methodsrewritep
using a nite sequene of inreasing thresholds(q
k)
Kk=0,p = P
x(y(x) > q
K| y(x) > q
K−1) ×· · ·× P
x(y(x) > q
1| y(x) > q
0) × P
x(y(x) > q
0),
(10)
with
q
0= −∞
andq
K= q
. Then, lassial Monte Carlo estimators anbe proposed for eah onditional probability. All these estimators being
unbiased and independent, the mean of their produt is still equal to
p
.However, the variane of the aggregated estimator strongly depends on the
hoie of the thresholds. In pratie, the sequene of thresholds is dened
on they,whihisgenerallyreferredasAdaptivesplitting[7℄. Inpartiular,
the Markov hain
y
k:= (y(x) | y(x) > y
k−1), y
0= −∞ , k ≥ 1,
(11)is alledaninreasing randomwalk. Andit an beshown that the ounting
randomvariableof thenumberofevents before
q
,whihisdenoted byM :=
Card
{ k ≥ 1 | y
k≤ q }
, follows a Poisson law with parameter− log(p)
[41℄.Hene, given
Q ≥ 1
independent randomounting variables(M
q)
1≤q≤Q,p
MP:=
1 − 1
Q
PQq=1Mq(12)
alsodenesanunbiasedestimatorof
p
,whoseoeientofvariationisequalto
p − log(p)/Q
. ThisapproahisreferredasMovingPartile(MP)methodinthe following. Anotherstrategy tohoose the dierentthresholds isgiven
by the Subset Simulation (SS) method. The interested reader may refer to
[1, 8℄for further details about this approah.
Indeed, if we fous on the Markov hain dened by Eq. (11),
y
k has tobe randomly generated onditionally greater than
y
k−1. This an be doneusing theMetropolis-Hastingsalgorithm[22,15℄. If
T
isthenumberofstepsthat isused toontrolthe onvergene of the Markov haintoitsstationary
distribution, it follows that, on average,
1 − T log(p)
samples have to begeneratedtogetonerealizationoftheountingvariable
M
q. Thus, themeantotal number of ode evaluations toget the
Q
samplesM
1, . . . , M
Q is equalto
N = Q(1 − T log(p))
. It follows that the oeient of variation ofδ
M Pan beapproximated as
δ
MP2≈ log(p)(T log(p) − 1)
N ≈ T log(p)
2N .
(13)This has tobeompared to
δ
2MC≈ 1/(pN)
for the rude MonteCarlo.In MCS, SS and MP methods, let
p b
denote the best estimate ofp
weget one the maximal omputational budget is attained. For eah of these
methods, it an be notied that the points
x
(k), where it was observed thaty(x
(k))
is greater thanq
, are independent realizations of the onditioned randomvetor(x | y(x) > q)
. Letusgatheralltheserealisationsof(x | y(x) >
q)
in the setD
f:=
x
(1), . . . , x
(N∗) . Hene, in the following,N
∗ denotesthe numberof pointsthat have been sampledin the failuredomain.
3. Compared inuene of the inputs variability on
p
Based on the estimated value of
p
and the elements ofD
f only, thepurpose of this setion is to identify the omponents of
x
, whose vari-ability has to be redued in priority if we want to derease the value of
p
. To this end, it an be interesting to quantify the eet onp
due tothe fat that
x
i is xed to the partiular valuex
⋆i. Indeed, the higherP
x(y(x) > q) − P
x−i
(y(x) > q | x
i= x
⋆i)
2, the more inuential
x
i. Thus,averaging over
x
i, the quantityE
xi
h
P
x(y(x) > q) − P
x−i
(y(x) > q | x
i)
2i
= V
xi
E
x−i
1
y(x)>q| x
i(14)
an be used to analyse the sensitivity of
p
to model inputx
i. Normalizing thesequantitiesbyV
x1
y(x)>s,wendbakthewell-knownrstorderSobol
indies [38℄ assoiated with funtion
1
y(x)>q:s
i:= V
xi
E
x−i
1
y(x)>q| x
iV
x1
y(x)>q.
(15)By onstrution, index
s
i indiates the variane of1
y(x)>q aused byx
iindividually. Thevarianeof
1
y(x)>q ausedbyx
i inludinginterationswith the omponents ofx
−i is given by thei
th total Sobol index, denoted byt
i,whih veries:
t
i:= 1 − V
x−i
E
xi
1
y(x)>q| x
−iV
x1
y(x)>q.
(16)Based onEqs. (15) and (16), the omputation of
s
i andt
i is nontrivial, sineE
x−i
[ · ]
andV
x−i
[ · ]
refertomultidimensionalintegrals. Thismotivated the introdution of various algorithms to redue the omputational ost oftheSobol'indies. Inpartiular,eientsample-basedmethodsanbefound
in[40, 39, 25,20℄toreplaethe naive and very expensive double-loopMCS.
However, inspiteof these developments,the numberof dediatedode eval-
uations that are needed for these methods is still very high. To irumvent
this problem,another approahis proposedin this paper, whihis based on
the Proposition 1, whoseproof has been moved toAppendix.
Proposition 1. For all
1 ≤ i ≤ D
, we have:s
i= p 1 − p V
xi
f
xi|y(x)>q(x
i) f
xi(x
i)
,
(17)t
i= 1 − p 1 − p V
x−i
f
x−i|y(x)>q(x
−i) f
x−i(x
−i)
,
(18)where,for all
x
inX
,
f
x|y(x)>q(x) := 1
p 1
y(x)>qf
x(x), f
xi|y(x)>q(x
i) :=
Z
×
1≤j≤D, j6=iXjf
x|y(x)>q(x) Y
1≤j≤D, j6=i
dx
j,
f
x−i|y(x)>q(x
−i) :=
Z
Xi
f
x|y(x)>q(x)dx
i.
(19)
indiatorfuntionareproportionaltothevarianesofthe ratiosbetween the
apriori PDFsof
x
i andx
−iandtheirPDFsonditionedbythefatthaty(x)
isgreaterthan
q
. In this work,weproposetoapproximatethesePDFsusing oneofthenonparametriapproahesdesribedin[30,29℄. Thesemethodsarepartiularlysuited for this kind of approximations, as the onstrutionthey
proposeonlyrequiresthe preseneofindependentrealizationsofthe random
vetor to be modelled. For eah
1 ≤ i ≤ D
, letf b
xi|y(x)>q andf b
x−i|y(x)>qbe these approximations of funtions
f
xi|y(x)>q andf
x−i|y(x)>q based on theelementsoftheset
D
f only. Sobolindiess
iandt
ianthenbeapproximated as:s
i≈ b s
i:= p b 1 − p b V
xi
"
f b
xi|y(x)>q(x
i) f
xi(x
i)
#
,
(20)t
i≈ b t
i:= 1 − p b 1 − p b V
x−i
"
f b
x−i|y(x)>q(x
−i) f
x−i(x
−i)
#
,
(21)where it is reminded that
b p
is the estimated value ofp
based on one ofthe sample-based methods presented in Setion 2. Finally, generating inde-
pendent realizations under
f b
xi andf b
x−i being quik and easy, Monte-Carlo estimations ofb s
i andb t
i an be alulatednumeriallywith aontrolled pre-ision.
4. Robustness analysis of the estimation of
p
Indies
s
i andt
i,whihare dened by Eqs. (17) and (18),are assoiatedwith a xed distribution of the model inputs. In this setion, another type
of sensitivity indies is dened, whih an be used to quantify the robust-
ness of the evaluation of
p
to small hanges of the input distribution. As explained in Introdution, the information provided by these new indies isomplementary to the information provided by
s
i andt
i. Whens
i andt
iallow the identiation of the omponents of
x
whose variability has to be redued in priorityto derease the value ofp
, thesenew indies an be usedto identify the omponents of
x
whose distributions have to be partiularly well-haraterized for arelevant estimationofp
.Toquantify the robustness of the estimation of
p
tosmall hanges of theinputdistribution,wegenerallyomputethegradientofthe failureprobabil-
ity with respet to the parameters that haraterize the PDF of the inputs.