HAL Id: hal-02113448
https://hal.archives-ouvertes.fr/hal-02113448v7
Preprint submitted on 7 Jun 2021
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A tradeoff between explorations and repetitions for
estimators of two global sensitivity indices in stochastic
models induced by probability measures
Gildas Mazo
To cite this version:
Gildas Mazo. A tradeoff between explorations and repetitions for estimators of two global sensitivity
indices in stochastic models induced by probability measures. 2021. �hal-02113448v7�
FOR ESTIMATORS OF TWO GLOBAL SENSITIVITY INDICES IN
2
STOCHASTIC MODELS INDUCED BY PROBABILITY MEASURES∗
3
GILDAS MAZO†
4
Abstract. Sobol sensitivity indices assess how the output of a given mathematical model is
5
sensitive to its inputs. If the model is stochastic then it cannot be represented as a function of the
6
inputs, thus raising questions as how to do a sensitivity analysis in those models. Practitioners have
7
been using an approach that exploits the availability of methods for deterministic models. For each
8
input, the stochastic model is repeated and the outputs are averaged. These averages are seen as
9
if they came from a deterministic model and hence Sobol’s method can be used. We show that
10
the estimator so obtained is asymptotically biased if the number of repetitions goes to infinity too
11
slowly. With limited computational resources, the number of repetitions of the stochastic model
12
and the number of explorations of the input space cannot be large together and hence some balance
13
must be found. We find the pair of numbers that minimizes a bound on some rank-based error
14
criterion, penalizing bad rankings of the inputs’ sensitivities. Also, under minimal distributional
15
assumptions, we derive a functional relationship between the output, the input and some random
16
noise; the Sobol-Hoeffding decomposition can be applied to it to define a new sensitivity index, which
17
asymptotically is estimated without bias even though the number of repetitions remains fixed. The
18
theory is illustrated on numerical experiments.
19
Key words. asymptotic normality, Sobol indices, tradeoff, sensitivity analysis, stochastic model.
20
AMS subject classifications. 62G20, 60H99, 65C05
21
1. Introduction. The goal of sensitivity analysis is to assess how the output of
22
a given physical or mathematical model is sensitive to its inputs [26,27]. Classically,
23
the model of interest is deterministic. To each input there corresponds an output
24
given by the model. Thus, in this case, the model is in fact a function, say f . To
25
assess the sensitivity of the model to its inputs, the probabilistic/statistical framework
26
is often employed. One draws at random a large number of inputs and observe how
27
the corresponding outputs vary. From a statistical perspective, at each draw, one
28
observes a random pair (X, Y ) such that Y = f (X), where X = (X1, . . . , Xp) is the 29
input vector and Y is the output.
30
Sobol’s idea [29, 30] was to notice that, if X1, . . . , Xp are drawn independently 31
then f (X) can be decomposed into a sum of lower-dimensional functions and that this
32
decomposition can be used to allocate the variance of the output to the individual
33
components of the decomposition. More precisely, we have
34 f (X) − f0=f1(X1) + · · · + fp(Xp) (1.1) 35 + f1,2(X1, X2) + · · · + fp−1,p(Xp−1, Xp) 36 + · · · + f1,...,p(X1, . . . , Xp), 37 38
where f0 = E f (X), fj(Xj) = E(f (X) − f0|Xj), j = 1, . . . , p, and f1,2, . . . , f1,...,p 39
are some functions defined iteratively; see [29] and [33, p. 157] for more details. In
40
the field of uncertainty quantification the above decomposition is known as the
Sobol-41
Hoeffding decomposition in reference to [11,29]. The expectations and the covariances
42
of the individual components in the right-hand side of (1.1) are zero and hence we
43
∗Submitted to the editors 2019/03/07.
†MaIAGE, INRA, Universit´e Paris-Saclay, 78350, Jouy-en-Josas, France (gildas.mazo@inra.fr,
have the variance decomposition
44
Var f (X) = Var f1(X1) + · · · + Var fp(Xp) + · · · , 45
46
which leads to the so-called Sobol indices
47 Sj = Var fj(Xj) Var f (X) = Var E(f (X)|Xj) Var f (X) = E f (X)f ( eX−j) − (E f (X))2 E f (X)2− (E f (X))2 , 48 49
j = 1, . . . , p; here eX−j stands for an independent copy of X where the jth compo-50
nent has been replaced by that of X. Thus the Sobol index associated with the jth
51
argument of f is defined as the proportion of the total variance associated with the
52
lower-dimensional function that depends on the jth argument only. Sobol indices are
53
interpreted as sensitivity measures and used to achieve various goals in uncertainty
54
quantification [27].
55
If the model is nonadditive (it is said that the inputs “interact” with each other)
56
then the Sobol indices may be inadequate. To account for interactions, the so-called
57
total sensitivity indices [12] are often computed along with Sobol indices. The total
58
sensitivity index associated with the jth argument of f is given by
59 STj = 1 − Var E(f (X)|X1, . . . , Xj−1, Xj+1, . . . , Xp) Var f (X) . 60 61
The total sensitivity index quantifies the sensitivity of the output of f to its jth
62
argument through the interactions it may have with the other inputs.
63
There are numerous methods to estimate the sensitivity indices. For
simplic-64
ity, we describe below Sobol’s original method to estimate Sj through Monte Carlo 65
sampling [29]. For a review of the many other methods, see [23] or the package
66
sensitivity [16] of the R software for an up-to-date list of many methods, with
ref-67
erences. Thus, draw two independent sets of inputs {X(i), i = 1, . . . , n}, { eX(i) := 68
( eX1, . . . , eXp), i = 1, . . . , n} and make p more sets by combining the first two: { eX (i) −j, 69 i = 1, . . . , n}, j = 1, . . . , p, where 70 e
X−j(i):= ( eX1(i), . . . , eXj−1(i) , Xj(i), eXj+1(i) , . . . , eXp(i)). (1.2)
71 72
The first and the p last sets are passed on to the function f which produces the
73
outputs {Y(i), i = 1, . . . , n} (for the first set) and {Y(i)
j , i = 1, . . . , n}, j = 1, . . . , p 74
(for the p last sets), which in turn make up the so-called pick-freeze estimator
75 b Sj= 1 n Pn i=1Y (i)Y(i) j − 1 n Pn i=1Y (i)2 1 n Pn i=1Y(i)2− 1 n Pn i=1Y(i) 2 . (1.3) 76 77
This gives a simple procedure to estimate all the Sobol indices S1, . . . , Spwith (p+1)n 78
runs of the model. The pick-freeze estimator is asymptotically normal [7, 17]. The
79
above formula can be improved in many ways [12, 17, 21]. Many versions of this
80
estimator exist, the goal being always to get the most efficient estimator with the least
81
computations. Sobol indices for multivariate, functional outputs [6,18] or functional
82
inputs [15] have been proposed as well.
83
The big difference between a deterministic model and a stochastic model is that
84
the stochastic model is not a function anymore. To a particular value of the input there
85
does not correspond any particular value for the output. Instead, there corresponds a
range of possible values, assumed to come from a probability distribution depending
87
on the input. Examples can be found in epidemiology [2,3,24,28] or ecology [31], to
88
name a few.
89
To do the sensitivity analysis of a stochastic model, several approaches have been
90
investigated. In [19], to the best of my understanding, the authors carry out the
91
sensitivity analysis of a stochastic model based on a joint metamodel. In [10], a
92
stochastic model is seen as a functional relation of the form Y (ϑ, ω) = f (X(ϑ), ω),
93
where the X is a random vector on some probability space, ω is a point in some
94
probability space distinct from that on which X is defined, f is some function and
95
Y (ϑ, ω) is a random variable on the induced product probability space. The
quan-96
tity f (X(ϑ), ω) represents the output of the stochastic model run with input X(ϑ);
97
the point ω represents the intrinsic randomness. The idea is then to decompose the
98
function ϑ 7→ f (X(ϑ), ω) for each ω and estimate the associated sensitivity indices,
99
which depend on ω. The estimates are then averaged over ω to make the final
sen-100
sitivity estimates. In [1], to the best of my understanding, the stochastic model is
101
represented as a deterministic mapping which with an input associates a probability
102
density function. The Sobol-Hoeffding decomposition is applied to the mapping which
103
with an input associates the entropy of the output evaluated at that input. Here the
104
entropy is the Kullback-Leibler divergence of the output density. In [34], the output
105
of the stochastic model is seen as a semiparametric statistical model—the generalized
106
lambda distribution—with parameters depending on the inputs. These parameters
107
have a polynomial chaos expansion which is estimated by maximum likelihood. Once
108
the law of the output conditionally on the input has been estimated, its inverse
cumu-109
lative distribution function is used to turn the stochastic model into a deterministic
110
model to which standard methods are applied. In [5], the stochastic model is seen as
111
a mapping that goes from the input space to a space of probability measures equipped
112
with the Wasserstein distance. Following [8,9], the Wasserstein space is mapped to
113
the real line R with some family of test functions, thus allowing for a standard
Sobol-114
Hoeffding decomposition which is then averaged over all possible test functions. In
115
more specific contexts, global sensitivity analysis methods also have been proposed.
116
For instance, there are methods for stochastic differential equations [4] and chemical
117
reaction networks [22].
118
In practice, although it has not been formally defined in the literature, another
119
method has been used for some time [2, 24, 28, 31]. The idea is simple: at each
120
draw of the input X(i), one produces as many outputs Y(i,1), . . . , Y(i,m)as possible, 121
makes the average m−1Pm
k=1Y
(i,k)and does as if it were the output of some deter-122
ministic model. The same is done with the inputs eX−j(i)(1.2) to produce the outputs
123
m−1Pm
k=1Y (i,k)
j . The obtained estimator is then the same as that in (1.3) but with 124
Y(i)replaced by m−1Pm
k=1Y
(i,k) and Y(i)
j replaced by m−1 Pm k=1Y (i,k) j , yielding 125 b Sj= 1 n Pn i=1m−1 Pm k=1Y (i,k)m−1Pm k=1Y (i,k) j − 1 n Pn i=1m−1 Pm k=1Y (i,k)2 1 n Pn i=1 m−1 Pm k=1Y(i,k) 2 − 1 n Pn i=1m−1 Pm k=1Y(i,k) 2 . (1.4) 126 127
The big advantage for practitioners is that they can use the numerous available and
128
ready-to-use softwares for deterministic models.
129
To build the estimator (1.4), the stochastic model must be run mn(p + 1) times.
130
The number m is called the number of repetitions and the number n is called the
131
number of explorations. If the stochastic model is computationally intensive—that
is, each model run is time-consuming—, then the estimator is built with limited
133
resources. In this context, an increase of m must go along with a decrease of n, and
134
conversely. What is then a good balance between m and n? How to choose m and
135
n such that the estimator (1.4) will be the most efficient? This question was asked
136
by [31].
137
We address this problem by finding a pair (m, n) that minimizes a bound on
138
the so-called missranking error. The missranking error penalizes bad rankings of the
139
Sobol indices associated with the inputs. The estimated minimizer is used to build
140
well-balanced estimators, leading to a two-step procedure to estimate the sensitivity
141
indices. This two-step procedure is shown to have good asymptotic properties. We
142
show that the estimator (1.4) is asymptotically normal but may be biased if m goes
143
to infinity too slowly. If m/√n → ∞ then it is asymptotically unbiased and converges
144
to the so-called sensitivity index of the second kind. Under the minimal assumption
145
that a stochastic model is a set of probability measures that capture how the outputs
146
are drawn, we show that the output, the inputs and some random “noise” are linked
147
through a function, which the Sobol-Hoeffding decomposition can be applied to. This
148
yields a new sensitivity index, called the sensitivity index of the first kind, with the
149
advantage that asymptotically unbiased estimators can be built even though m
re-150
mains fixed. The indices of the first and of the second kinds are complementary as
151
they offer distinct pieces of information. Interestingly, these indices can be estimated
152
jointly with no additional cost, the joint estimator is asymptotically normal and the
153
two kinds of sensitivity indices lead to the same solution for the tradeoff problem.
154
This paper is organized as follows. Section 2 gives a definition of stochastic
155
models in terms of probability measures and shows how one can construct a functional
156
representation linking the output, the input and some random noise. Section3defines
157
the indices of both kinds and their estimators. The asymptotic properties are deferred
158
to Section5. Section4introduces the tradeoff problem, gives a procedure to attack it
159
and gives some theoretical guarantees. Section 6 illustrates the theory on numerical
160
simulations. A Conclusion closes the paper.
161
2. Representations of stochastic models. The concept of stochastic models
162
is intuitive and shared by many people but there are different mathematical routes
163
to describe them. One is given in Section 2.1. It makes minimal distributional
as-164
sumptions to get to a representation in terms of random variables and establishes the
165
existence of a function to which the Sobol-Hoeffding decomposition can be applied.
166
Section2.2makes connections with the stochastic models of [10].
167
2.1. Representing stochastic models from minimal distributional
as-168
sumptions. A stochastic model is some mechanism that produces outputs at
ran-169
dom given some inputs. Thus, a stochastic model can be seen as family of probability
170
measures {Qx, x ∈ X } indexed by some input space X ⊂ Rp. We assume that each 171
probability measure Qx is defined on the measurable space (R, B), where B is the 172
Borel σ-field induced by R. The law Qxgoverns how the outputs are produced given 173
the input x ∈ X ⊂ Rp
. Let us endow Rp with its product Borel σ-field Bp and let 174
P∗ be a product probability measure on the measurable space (Rp, Bp) such that 175
P∗(X ) = 1. (Thus we assume that X ∈ Bp.) The probability measure P∗ represents 176
how the inputs are drawn by the practitioner. In particular, since P∗ is a product
177
probability measure, the inputs are drawn independently.
178
The stochastic experiment that consists of drawing inputs at random according
179
to P∗ and observing the corresponding outputs is fully characterized by the family
180
{Qx} and the probability measure P∗. This leads us to Definition 2.1. 181
Definition 2.1. If P∗and {Qx} are probability measures as described above then 182
the pair (P∗, {Qx}) is called the complete stochastic model. 183
Now we look for a representation in terms of random variables that will allow us
184
to use the Sobol-Hoeffding decomposition later on.
185
Lemma 2.2. If (P∗, {Qx}) is a complete stochastic model then there exist a proba-186
bility space (Ω, F , P ), a random vector (X, Z) : Ω → Rp+1 and a function f : Rp+1 →
187
R such that
188
(i) f (x, Z) is measurable for every x ∈ X ,
189
(ii) P (f (x, Z) ∈ B) = Qx(B) for every x ∈ X and every B ∈ B, 190
(iii) P (X ∈ A, Z ∈ B) = P∗(A)P (Z ∈ B) for every A ∈ Bp and B ∈ B. 191
Moreover, if (X, f (X, Z)) and (X0, f0(X0, Z0)) are two joint vectors that satisfy the
192
conditions (i), (ii) and (iii) then (X, f (X, Z)) = (Xd 0, f0(X0, Z0)) where = meansd
193
equality in distribution.
194
Note that the conditions in Lemma 2.2 do not determine the law of Z; see the
195
example below.
196
Example 1 (The law of Z is not determined). Let p = 1. Let P∗ be the
197
standard uniform distribution and Qx be the Gaussian distribution with mean x ∈ R 198
and variance 1. Let Ω = (0, 1)2 endowed with its Borel σ-field and set P to be the 199
product Lebesgue measure. Let X1(ω) = ω1 for ω = (ω1, ω2) ∈ Ω. Let Φ denote the 200
distribution function of the standard Gaussian distribution and denote by Φ−1 the
201
inverse of Φ. If Z(ω) = ω2 and f (x, z) = Φ−1(z) + x, x ∈ R, z ∈ (0, 1), then it is 202
easy to see that (X, Z) and f satisfy the conditions of Lemma2.2and the law of Z is
203
the standard uniform distribution. But the conditions of Lemma2.2are also satisfied
204
with Z(ω) =√ω2 and f (x, z) = Φ−1(z2) + x, in which case, P (Z ≤ t) = t2, t ∈ (0, 1), 205
that is, the law of Z is the beta distribution with parameter (2, 1).
206
The indeterminacy of the law of Z is symptomatic of the lack of control of the
207
intrinsic randomness assumed in our definition of stochastic models. But this is not
208
an issue because our interest lies in the joint vector (X, f (X, Z)), the law of which is
209
fully characterized by the conditions in Lemma2.2. To each complete stochastic model
210
there corresponds a unique law that all vectors (X, f (X, Z)) must have, regardless of
211
the chosen representation. Therefore, the pair (X, f (X, Z)) can be used to define the
212
pair (input, output) of a complete stochastic model, as done in Definition2.3.
213
Definition 2.3. If (X, Z) and f satisfy the conditions in Lemma 2.2 then the
214
pair (X, f (X, Z)) is called an observation of the complete stochastic model (P∗, {Qx}); 215
the random variable X is called the input and f (X, Z) is called the output.
216
In sum, we have established the existence of random variables on a common
217
probability space and a function f that characterize the statistical experiment that
218
consists of drawing inputs and observing the outputs of a stochastic model. The set
219
of assumptions used to represent outputs and inputs of a stochastic model is minimal:
220
all we need is a family {Qx} and a probability measure P∗. We remark that the above 221
formalism of stochastic models can be used to represent physical models [32] as well.
222
2.2. Links with the stochastic models and the sensitivity indices in [10].
223
In [10], the authors consider the model (X0(ω0), ϕ(X0(ω0), ω00)), ω0 ∈ Ω0, ω00 ∈ Ω00, 224
where (Ω0, F0, P0) and (Ω00, F00, P00) are probability spaces, X0 = (X10, . . . , Xp0) is a
random vector on Ω0 and ϕ is some function. They consider the sensitivity indices 226 SHAGj = Z Ω00 Sj(ω00) P00(dω00), 227 228 where 229 Sj(ω00) = Var E(ϕ(X0, ω00)|Xj0) Var ϕ(X0, ω00) ; 230 231
above the variances and the expectation are to be understood as integrals on Ω0with
232
respect to P0.
233
One can choose a representation in Lemma 2.2 that corresponds to the models
234
in [10]. In particular, one can recover the sensitivity indices SHAG
j , j = 1, . . . , p. Let 235
us illustrate this with an example. Let (P∗, {Qx}) be a complete stochastic model 236
and let X = (X1, . . . , Xp), Z and f be as in Lemma2.2. Define 237 e SjHAG= E Var(E[f (X, Z)|Xj, Z]|Z) Var(f (X, Z)|Z) , j = 1, . . . , p. 238 239
Consider the model in Example 1.1 of [10], given by
240
ϕ(X0(ω0), ω00) = X1(ω0) + X2(ω0)ω00,
(2.1)
241 242
where the law of X10 is the uniform distribution on (0, 1), the law of X20 is the uniform 243
distribution on (1, L + 1), L > 0, and P00 is the standard normal distribution on 244
Ω00= R. The indices in Example 1.1 of [10] are given by 245 S1HAG= Z Ω00 1 1 + L2ω00P 00(dω00) =Z R 1 1 + L2wexp −w 2 2 1 √ 2πdw 246 247 and SHAG 2 = 1 − S2HAG. 248
We can build a probability space (Ω, F , P ), a random vector (X, Z) and a function
249
f such that eSHAG
1 = S1HAG, as shown in Example2below. 250
Example 2. Let us first extract the induced complete stochastic model. Set P∗((0, t1]× 251
(1, t2]) = t1(t2− 1)/L for all 0 < t1 < 1, 1 < t2 < L + 1, L > 0 and Qx(−∞, t] = 252
Φ((t − x1)/x2) for all t ∈ R, where Φ(t) =
Rt
−∞(2π)−1/2e−s
2/2
ds and x = (x1, x2) ∈ 253
R × (0, ∞). Now it remains to choose a representation that fulfills the conditions in
254
Lemma 2.2 and ensures that SHAG
1 = eS1HAG. Such a representation can easily be 255
found. For instance, take Ω = (0, 1)3 endowed with the product Lebesgue measure and 256
put Z(ω) = ω3, X1(ω) = F1−1(ω1) and X2(ω) = F2(ω2)−1 for ω = (ω1, ω2, ω3) ∈ Ω, 257
where F1(t1) = t1 for 0 < t1< 1 and F2(t2) = (t2− 1)/L for 1 < t2< L + 1. Finally 258
take f (x, z) = Φ−1(z)x2+ x1 for x1∈ R, x2> 0 and z ∈ (0, 1). Then the conditions 259
of Lemma2.2are fulfilled by construction and the detailed calculations in AppendixA
260
show that SHAG
1 = eS1HAG. 261
In sum, the stochastic models in [10] can be expressed with the framework of
262
Section 2.1. There is however a difference between [10] and Section 2.1. In [10], the
263
function ϕ is fixed. It is given as being a part of the stochastic model. In our side,
264
the function f is constructed from the probability measures that we are given in the
265
first place. It is not unique. Consequently, it is unclear whether or not the indices
266
e SHAG
1 are uniquely determined. 267
3. The sensitivity indices and their estimators. In view of Section 2, we
268
can assume that there are a random vector (X, Z) ∈ Rp×R with mutually independent 269
p + 1 components on some probability space (Ω, F , P ) and a real function f such that
270
the pair (X, f (X, Z)) ∈ Rp× R represents a random observation (input, output) of 271
the stochastic model of interest. To ensure the existence of the sensitivity indices and
272
later to derive theoretical results for the estimators, we need to assume the following:
273
there exists some function F with E F (X)8< ∞ such that
274
(3.1) |f (X, Z)| ≤ F (X)
275
almost surely. This assumption appears to be mild. In particular every stochastic
276
model with bounded outputs fulfills the condition.
277
3.1. Definition of the sensitivity indices. We define two kinds of sensitivity
278
indices. The sensitivity indices of the first kind exploit the existence of the function f
279
by applying the Sobol-Hoeffding decomposition to it directly. The sensitivity indices
280
of the second kind result from an application of the Sobol-Hoeffding decomposition
281
to the conditional expectation of f (X, Z) given X, which is a function of X alone.
282
The indices of the second kind are those to which the estimators (1.4) mentioned in
283
the Introduction converge.
284
3.1.1. Indices of the first kind. Applying the Sobol-Hoeffding decomposition
285 to f yields 286 f (X, Z) − f0= f1(X1) + · · · + fp(Xp) + fp+1(Z) + · · · , (3.2) 287 288 where f0 = E f (X, Z), fj(Xj) = E(f (X, Z) − f0|Xj), j = 1, . . . , p, fp+1(Z) = 289
E(f (X, Z) − f0|Z) and + · · · stands for the interaction terms. Since X and Z are in-290
dependent, we have Var f (X, Z) = Var f1(X1) + · · · + Var fp(Xp) + Var fp+1(Z) + · · · , 291
which leads us to the indices in Definition3.1.
292
Definition 3.1 (Sobol indices of the first kind). The Sobol indices of the first
293
kind are defined as
294
Sj0 = Var E(f (X, Z)|Xj)
Var f (X, Z) , j = 1, . . . , p.
295
It is important to notice that the indices of the first kind depend on the law
296
of (X, f (X, Z)) only and hence are uniquely determined. Note that total sensitivity
297
indices could be defined as well but it is unclear whether or not they depend on the
298
chosen representation.
299
3.1.2. Indices of the second kind. Let g(X) := E(f (X, Z)|X) be the
condi-300
tional expectation of the output of the stochastic model given the input. The object
301
g is a function and the Sobol-Hoeffding decomposition can be applied to it, yielding
302
g(X) − g0= g1(X1) + · · · + gp(Xp) + · · · , 303
304
where g0= E g(X), gj(Xj) = E(g(X) − g0|Xj), j = 1, . . . , p and + · · · stands for the 305
interaction terms. Since the components of X are independent, we have Var g(X) =
306
Var g1(X1) + · · · + Var gp(Xp) + · · · , leading to the indices in Definition3.2. 307
Definition 3.2 (Sobol indices of the second kind). The Sobol indices of the
308
second kind are defined as
309 Sj00= Var E(g(X)|Xj) Var g(X) = Var E[E(f (X, Z)|X)|Xj] Var E(f (X, Z)|X) , j = 1, . . . , p. 310
The total sensitivity indices, defined by 311 ST j00 = 1 −Var E (g(X)|X1, . . . , Xj−1, Xj+1, . . . , Xp) Var g(X) , (3.3) 312 313
j = 1, . . . , p, are uniquely determined.
314
3.1.3. Comparison of the definitions. The sensitivity indices of the first kind
315
provide more refined “first-order” information than the indices of the second kind.
316
Example3 and4illustrate this.
317
Example 3. Let f (X, Z) = aX1+ cX2φ(Z), where X1, X2, Z are independent 318
standard normal variables, a, c are real coefficients and φ is a function such that
319 E φ(Z) = 0. Then 320 S10 = a 2 a2+ c2E φ(Z)2, S 0 2= 0, S100= 1 and S200= 0. 321 322
According to the sensitivity indices of the second kind, X1 has the same impor-323
tance regardless of the value of its coefficient a, while the sensitivity indices of the
324
first kind acknowledge that the importance of X1 should depend on its coefficient. 325
However, the sensitivity indices of the first kind cannot provide insight into the
in-326
teractions between the inputs. For instance, if a is small then the sum S10 + S20 will
327
be small and hence the contribution to the variance of the output must come from
328
elsewhere. Perhaps it comes from the intrinsic stochasticity of the model or from the
329
interactions.
330
Example4 returns to the model (2.1).
331
Example 4. Let f (X, Z) = Φ−1(Z)X2+ X1 such that the law of X1 and that 332
of Z are the uniform distribution on (0, 1), the law of X2 is the uniform distribution 333
on (1, L + 1), L > 0, and Φ−1 denotes the inverse distribution function of the
stan-334
dard normal distribution. It is easy to see that S100 = 1 and S200 = 0. The detailed
335
calculations in Appendix Ashow that S02= 0 and
336
S10 = 1
4(L2+ 3(L + 1)) + 1. 337
338
As in Example3, the sensitivity indices of the second kind do not depend on the
339
coefficient L. The sensitivity indices of the first kind do depend on L but note that
340
S0
1+ S20 ≤ 1/13, indicating that most of the contribution to the output comes from 341
the intrinsic randomness or the interactions.
342
In sum, both kinds of sensitivity indices provide useful insights although neither
343
kind is perfect. The sensitivity indices of the second kind are good indices for doing a
344
sensitivity analysis of the model averaged over the intrinsic randomness but by doing
345
so information may be lost. The sensitivity indices of the first kind provide more
346
refined information into the individual contributions of the inputs but the information
347
is only partial because the knowledge of the interactions and the intrinsic randomness
348
are lacking.
349
3.2. Construction of the estimators. We construct estimators for the indices
350
in Definition 3.1 and3.2 by Monte-Carlo simulation. The input space is “explored”
351
n times; at each exploration two independent input vectors are drawn, combined
352
and passed to the stochastic model which is run m times. The integer n is called
353
the number of explorations and the integer m is called the number of repetitions.
The couple (n, m) is called the design of the Monte-Carlo sampling scheme. The
355
total number of calls to the stochastic model is mn(p + 1). The details are given in
356
Algorithm3.1.
357
Algorithm 3.1 Generate a Monte-Carlo sample for i = 1 to n do
draw two independent copies X(i)= (X1(i), . . . , Xp(i)), eX(i)= ( eX (i) 1 , . . . eX (i) p ) for j = 0, 1, . . . , p do for k = 1 to m do
run the stochastic model at eX−j(i):= ( eX1(i), . . . , eXj−1(i) , Xj(i), eXj+1(i) , . . . , eXp(i)) to
get an output Yj(i,k) end for
end for end for
In the algorithm above, eX−0(i) = X(i) by convention. By assumption, the objects
358
e
X(i), eX(i) −j and Y
(i,k)
j , j = 0, . . . , p, k = 1, . . . , m, i = 1, . . . , n, are random vectors 359
such that the sets { eX(i), eX(i) −j, Y
(i,k)
j : j = 0, . . . , p; k = 1, . . . , m}, i = 1, . . . , n, are 360
i.i.d., X(i) and eX(i) are independent and P (∩pj=0∩m k=1{Y (i,k) j ∈ B (k) j }|X (i), eX(i)) = 361 Qp j=0 Qm k=1P (Y (i,k) j ∈ B (k)
j |X(i), eX(i)) for all Borel sets B (k)
j ∈ B. It is easy to see that 362
these conditions characterize the joint law of the set { eX−j(i), Yj(i,k): j = 0, . . . , p; k =
363
1, . . . , m; i = 1, . . . , n}, that is, the inputs and the outputs of Algorithm3.1.
364
In view of Section2, assume without loss of generality that there is some function
365
f and some random variables Zj(i,k), j = 0, . . . , p, k = 1, . . . , m, i = 1, . . . , n, such that
366
Yj(i,k)= f ( eX−j(i), Zj(i,k)), where all of the random vectors in the sets { eX(i), X(i), Zj(i,k):
367
j = 0, . . . , p; k = 1, . . . , m}, i = 1, . . . , n, are mutually independent and all of these
368
sets are i.i.d. We shall use both the notations Y and f (X, Z) to denote the outputs.
369
With the above notation, the estimators (1.4) of the indices of the second kind
370 are rewritten 371 (3.4) bSj;n,m00 = 1 n Pn i=1 1 m Pm k=1Y (i,k) 0 1 m Pm k0=1Y (i,k0) j − 1 n Pn i=1 1 m Pm k=1Y (i,k) 0 2 1 n Pn i=1 1 m Pm k=1Y (i,k) 0 2 −1 n Pn i=1 1 m Pm k=1Y (i,k) 0 2 , 372
j = 1, . . . , p, which are indeed the empirical versions of the indices Sj00, since
373 Sj00=E g(X (1))g( eX(1) −j) − (E g(X(1)))2 E g(X(1))2− (E g(X(1)))2 374 =E E[f (X (1), Z(1,1) 0 )|X (1)] E[f ( eX(1) −j, Z (1,1) j )| eX (1) −j] − (E E[f (X(1), Z (1,1) 0 )|X (1)])2 E E[f (X(1), Z(1,1) 0 )|X(1)]2− (E E[f (X(1), Z (1,1) 0 )|X(1)])2 . (3.5) 375 376
As said in the Introduction, this estimator is used implicitly by practitioners but has
377
not been formally studied in the literature. A simplified version with m = n appears
378
in [13,14].
To estimate the sensitivity indices of the first kind, we exploit a formula similar 380 to (3.5). Indeed, we have 381 Sj0 =E f (X (1), Z(1,1) 0 )f ( eX (1) −j, Z (1,1) j ) − (E f (X(1), Z (1,1) 0 ))2 E f (X(1), Z(1,1) 0 )2− (E f (X(1), Z (1,1) 0 ))2 382 =
E E[f (X(1), Z0(1,1))|X(1)] E[f ( eX−j(1), Zj(1,1))| eX−j(1)] −E E[f (X(1), Z0(1,1))|X(1)]
2 E E[f (X(1), Z(1,1) 0 )2|X(1)] − E E[f (X(1), Z(1,1) 0 )|X(1)] 2 . (3.6) 383 384
Notice that the upper left, upper right and the lower right terms are identical to the
385
upper left, upper right and the lower right terms in (3.5) respectively. The upper
386
left term is the only term that depends on j and, therefore, it is the only term that
387
permits to discriminate between any two indices of the same kind. For this reason, it
388
is called the discriminator. It is denoted by Dj. Formula (3.6) yields the estimator 389 (3.7) bSj;n,m0 = 1 n Pn i=1 1 m Pm k=1Y (i,k) 0 1 m Pm k0=1Y (i,k0) j − 1 n Pn i=1 1 m Pm k=1Y (i,k) 0 2 1 n Pn i=1 1 m Pm k=1Y (i,k)2 0 − 1 n Pn i=1 1 m Pm k=1Y (i,k) 0 2 . 390
Since the estimators for the discriminator are identical, both kinds of sensitivity
391
indices lead to the same estimated ranking of the inputs. All of the 2p estimators
392
can be computed with mn(p + 1) runs of the stochastic model. In (3.7) and (3.4), if
393
m = 1 and if the function f does in fact not depend on Z, then the estimators reduce
394
to Sobol estimators [29,30] for deterministic models.
395
4. Choosing between Monte-Carlo designs. The estimators in Section 3
396
depend on the design (n, m) of the Monte-Carlo sampling scheme. To estimate the
397
sensitivity indices in Definition3.1and Definition3.2, the stochastic model has to be
398
called (p + 1)mn times.
399
It is reasonable to think of a sensitivity analysis as done the following way. The
400
total number of calls is set to a limit, say T . Then n and m are chosen so that
401
T = (p + 1)mn. For instance, suppose that one cannot afford more than 150 calls to
402
a model with two inputs. Then T = 150, p = 2 and one can choose either one of the
403
columns in the following table
404
n 50 25 10 5 2 1
m 1 2 5 10 25 50.
405
Denote by divp(T ) the set of all divisors of T /(p + 1) between 1 and T /(p + 1). 406
In the example above, div2(150) = {1, 2, 5, 10, 25, 50}. There are as many designs as 407
there are elements in the set divp(T ). Each one of those elements corresponds to a 408
possible combination for n and m which Algorithm3.1can be run with. The resulting
409
estimators require the same number of calls but do not perform equally well. The
410
goal of this section is to find the “best” way to estimate the sensitivity indices.
411
4.1. Introducing the miss-ranking error and its bound. To compare the
412
estimators, a measure of performance has to be defined. We shall consider the
ranking error (MRE), defined by 414 MRE = E p X j=1 | bRj;n,m− Rj| , 415
where Rj is the rank of Dj among D1, . . . , Dp, that is, Rj =P p
i=11(Di ≤ Dj), and 416
b
Rj;n,m is an estimator of Rj. Recall that D1, . . . , Dpare the upper-left terms in (3.6) 417
and (3.5). They determine the ranks of the sensitivity indices. The MRE is small
418
when one succeeds in ranking the inputs from the most to the least important, a task
419
which is called “factors prioritization” in [27, p. 52]. For simplicity, we assume that
420
the sensitivity indices, and hence the quantities D1, . . . , Dp, are distincts, so that they 421
can be ranked unambiguously. This assumption is mild: even if Djand Dj0 only differ
422
by a small amount, it is still satisfied. Recall that the ranks of the sensitivity indices
423
of the first kind coincide with the ranks of the sensitivity indices of the second kind.
424
Thus, the MRE permits to find a unique solution for both kinds of sensitivity indices.
425
The MRE has a bound with interesting mathematical properties. Denote by
426
MRE(T, m) the MRE based on T calls to the model with m repetitions, so that the
427
number of explorations is T /(p + 1)/m. To shorten the notation, let (X(1), eX(1)) = X, 428 f (X(1), Z(1,1) 0 ) = Y0 and f ( eX (1) −j, Z (1,1) j ) = Yj. 429
Proposition 4.1. Let bDj;n,m, j = 1, . . . , p, be the upper-left term in (3.7) or (3.4) 430
and put bRj;n,m =P p
i=11( bDi;n,m≤ bDj;n,m). If D1, . . . , Dp are all distincts then 431 MRE(T, m) ≤ L nm m p X j=1 Var(E[Y0Yj|X]) 432 + p X j=1
E[Var(Y0Yj|X) − Var(Y0|X) Var(Yj|X)] 433 + 1 m p X j=1 E[Var(Y0|X) Var(Yj|X)] , 434 435 where 436 L = 4(p − 1) min j<j0(|Dj− Dj0| 2). 437 438
The constant L tells us that the bound is smaller when the indices are well
439
separated. The bound goes to zero when the number of explorations goes to infinity.
440
This is true even if the number of repetitions is fixed. Most interestingly, the bound
441
separates T and m: substituting T /(p + 1) for nm in the denominator of the bound,
442 we get 443 MRE(T, m) ≤ 1 Tv(m), m ∈ divp(T ), (4.1) 444 445
where 446 447 v(m) = L(p+1)m p X j=1 Var(E[Y0Yj|X])+ p X j=1
E[Var(Y0Yj|X)−Var(Y0|X) Var(Yj|X)] 448 + 1 m p X j=1 E[Var(Y0|X) Var(Yj|X)] . 449 450
Denote by m†T the element m ∈ divp(T ) that minimizes v(m). Taking m = m†T 451
in (4.1), we get the bound
452 MRE(T, m†T) ≤v(m † T) T ≤ v(m)
T , for all m ∈ divp(T ).
453
Thus choosing m = m†T and n = T /(p + 1)/m†T in Algorithm 3.1 ensures that the
454
MRE cannot exceed the least possible bound. The least possible bound v(m†T)/T
455
is also called the best obtainable guarantee. However, m†T is unknown and must be
456
estimated.
457
Remark 4.2. The choice of T , through the specification of divp(T ), will influence 458
the quality of the bound. It is clear that choosing T /(p + 1) a prime number may not
459
be a good idea because v(m†T) will be either v(1) or v(T /(p + 1)). On the opposite,
460
choosing T /(p + 1) a factorial number ensures many more choices (in fact, all).
461
4.2. A two-stage procedure to estimate the sensitivity indices. The
re-462
sults in Section4.1suggest a two-stage procedure to estimate the sensitivity indices.
463
The procedure is given in Algorithm4.1. The computational budget is split into two
464
parts K and T − K. Denote by m†T −K the element m ∈ divp(T − K) that minimizes 465
the function v(m). The first K calls to the model are used to estimate m†T −K. The
466
last T − K calls to the model are used to estimate the sensitivity indices.
467
Algorithm 4.1 Estimate the sensitivity indices by a two-stage procedure
Stage 1. Choose an integer K such that K/(p + 1) and (T − K)/(p + 1) are integers also. Choose integers m0 and n0 such that K = m0n0(p + 1). Run Algorithm3.1
with m = m0and n = n0. Estimate m†T −K by an estimatormb
†
T −K ∈ divp(T − K).
Stage 2. Run Algorithm3.1with m =mb†T −K and
n = T − K
(p + 1)mb†T −K.
Compute the sensitivity indices estimators (3.7) and (3.4).
In Algorithm 4.1we need mb†T −K an estimator of m†T −K. Let us build one. Let
468
m∗ be the minimizer of v seen as a function on the positive reals. Since v is convex,
469
the minimizer is unique. It follows from (4.1) and Proposition 4.1that
470 m∗:= sPp j=1E[Var(Y0|X) Var(Yj|X)] Pp j=1Var(E[Y0Yj|X]) = sPp j=1ζ3,j Pp j=1ζ1,j , (4.2) 471 472
where ζ3,j = E[Var(Y0|X) Var(Yj|X)] and ζ1,j = Var(E[Y0Yj|X]), j = 1, . . . , p. 473
Let ϕT : (0, ∞) −→ divp(T ), be the function defined by ϕT(x) = 1 if 0 < x < 1, 474 ϕT(x) = T /(p + 1) if x > T /(p + 1), and 475 ϕT(x) = xxyT if √ xxyTpxqT > x ≥ 1 pxqT if √ xxyTpxqT ≤ x ≤ p+1T 476 477 where 478
xxyT = max{m ∈ divp(T ), m ≤ x}, pxqT = min{m ∈ divp(T ), m ≥ x}. 479
480
The function ϕT is piecewise constant with discontinuity points at
√
ij, where i and
481
j are two consecutive elements of divp(T ). 482
Proposition 4.3. If m∗ > 0 then m†T −K = ϕT −K(m∗). If, moreover, 483
xm∗yT −Kpm∗qT −K is not equal to m∗2then the minimizer of v(m), m ∈ divp(T −K), 484
is unique.
485
Proposition 4.3 suggests that m†T −K can be estimated by applying the function
486
ϕT −K to an estimate of m∗. Thus, our problem of estimating m†T −K boils down to 487
the problem of estimating m∗. Let us find an estimator of m∗. Remember that it has
488
to be based on the first K = m0n0(p + 1) calls to the model. In view of (4.2), put 489 (4.3) mb∗K:= v u u t Pp j=1ζb3,j Pp j=1ζb1,j , 490 where 491 ˆ ζ3,j = 492 1 n0 n X i=1 1 m0 m0 X k1=1 f (X(i), Z(i,k1) 0 ) 2 1 m0 m0 X k2=1 f ( eX−j(i), Z(i,k2) j ) 2 (4.4) 493 + 1 n0 n X i=1 1 m0 m0 X k1=1 f (X(i), Z(i,k1) 0 ) !2 1 m0 m0 X k2=1 f ( eX−j(i), Z(i,k2) j ) !2 (4.5) 494 − 1 n0 n X i=1 1 m0 m0 X k1=1 f (X(i), Z(i,k1) 0 ) !2 1 m0 m0 X k2=1 f ( eX−j(i), Z(i,k2) j ) 2 (4.6) 495 − 1 n0 n X i=1 1 m0 m0 X k1=1 f (X(i), Z(i,k1) 0 ) 2 1 m0 m0 X k2=1 f ( eX−j(i), Z(i,k2) j ) !2 , (4.7) 496 497 and 498 ˆ ζ1,j = 499 1 n0 n X i=1 1 m0 m0 X k=1
f (X(i), Z0(i,k))f ( eX−j(i), Zj(i,k)) !2 (4.8) 500 − 1 n0 n X i=1 1 m0 m0 X k=1
f (X(i), Z0(i,k))f ( eX−j(i), Zj(i,k)) !2
. (4.9)
501 502
Notice that bζ1,j ≥ 0 and bζ3,j ≥ 0 so thatmb∗K ≥ 0. If m0= 1 then bζ3,j = 0 and hence 503
b m∗K = 0.
504
The estimator mb∗K is consistent and asymptotically normal on some conditions
505
on the rates of n0 and m0. 506
Theorem 4.4. Assume (3.1) holds. Let n0→ ∞. If m0 is fixed then 507 √ n0 b m∗K− m∗+ C m0 + m0 d → N (0, σ2 m0), 508
where C is some constant, m0 = C1/m
2
0 + · · · + CN/mN +10 for some constants 509
C1, . . . , CN and σm20 is some variance depending on m0. If m0→ ∞ then the above
510
display with m0 = o(1/m0) and σm0 replaced by limm0→∞σm0 is true.
511
Theorem4.4shows that mb∗K is asymptotically biased. The bias is polynomial in
512
1/m0. Corollary4.5shows that letting m0→ ∞ suffices to get the consistency ofmb
∗ K 513
but to get a central limit theorem centered around m∗, it is furthermore needed that
514
√
n0/m0→ 0. 515
Corollary 4.5. Assume (3.1) holds. Let n0 → ∞ and m0→ ∞. Then mb∗K P
→
516
m∗. If, moreover,√n0/m0→ 0, then 517 √ n0(mb ∗ K− m∗) d → N (0, lim m0→∞ σm20). 518
Now we have everything that is needed to estimate m†T −K. Put mb†T −K =
519
ϕT −K(mb
∗
K). Proposition 4.6 states that mb
†
T −K and m †
T −K are equal with proba-520
bility going to one.
521
Proposition 4.6. Assume (3.1) holds. Let n0→ ∞ and m0→ ∞. Then 522
Pmb†T −K = m†T −K→ 1.
523 524
All the details of Algorithm4.1have been given.
525
4.3. Performance. To get some insight into the performance of Algorithm4.1,
526
we look at the estimators produced in Stage 2, which are built with T − K calls
527
to the model withmb†T −K repetitions and (T − K)/(p + 1)/mb†T −K explorations. Let
528
b
Dj;ˆn,mb, j = 1, . . . , p, be the discriminators of those estimators, so that ˆn = (T − 529
K)/(p + 1)/mb†T −K and m =b mb†T −K. Let bRj;ˆn,mb be the rank of bDj;ˆn,mb among 530
b
D1;ˆn,mb, . . . , bDp;ˆn,mb. The conditional expectation of P p
j=1| bRj;ˆn,mb − Rj| given the 531
outputs produced in Stage 1 is equal to MRE(T − K,mb†T −K), the MRE based on
532
T − K calls to the model withmb†T −K repetitions and (T − K)/(p + 1)/mb†T −K
explo-533
rations. Recall that the estimatormb†T −K is computed with K calls to the model and
534
by constructionmb†T −K∈ divp(T − K). Equation (4.1) yields 535 MRE(T − K,mb†T −K) ≤ 1 T − Kv(mb † T −K), (4.10) 536 537
where the inequality above is to be understood almost surely, since both sides of
538
equation are random variables.
539
We shall show that, under conditions on T and K, the bound in (4.10) is in
540
fact close to the best obtainable guarantee v(m†T)/T . Suppose for simplicity that
divp(T − K) ⊂ divp(T ). Since mb†T −K ∈ divp(T − K) by construction, we have 542
b
m†T −K ∈ divp(T ) and therefore v(m†T)/T ≤ v(mb
†
T −K)/(T − K) by definition of m † T. 543
The left hand side is the best guarantee we can hope for and corresponds to the case
544
where K = 0 in Algorithm4.1. It can be interpreted as the guarantee obtained if we
545
would know the value of m∗ in (4.2), for in that case we would not need to spend 546
budget in Stage 1 of Algorithm4.1. We proceed in two steps.
547
Theorem 4.7. Assume that the conditions of Proposition 4.6 are fulfilled.
Sup-548
pose furthermore that K → ∞ such that K/T → 0. Then
549 1 T − Kv(mb † T −K) = 1 Tv(m † T −K)(1 + oP(1)). 550 551
Theorem4.7holds without the condition divp(T − K) ⊂ divp(T ). Imposing this 552
condition, we get Corollary4.8below.
553
Corollary 4.8. If, in addition to the conditions of Theorem4.7, divp(T − K) ⊂ 554 divp(T ) then 555 MRE(T − K,mb†T −K) ≤ v(m † T) T (1 + oP(1)). 556 557
The result of Corollary 4.8 easily follows from (4.10) and Theorem 4.7 because
558
m†T −K = m†T as soon as (T − K)/(p + 1) > m†T −K, which happens eventually as T
559
and K go to infinity because the function v is convex.
560
5. Asymptotic normality of the sensitivity indices estimators. The
sen-561
sitivity indices estimators of Section 3.2 depend on both m and n. It is clear that
562
n should go to infinity to get central limit theorems. It may be less clear, however,
563
whether or not m should go to infinity as well. The answer depends on the kind of
564
the sensitivity index we are looking at.
565
Two frameworks are considered:
566
• n → ∞ and m is fixed;
567
• n → ∞ and m → ∞.
568
In the second framework m = mn is a sequence indexed by n that goes to infinity as 569
n goes to infinity. Denote by S0 (resp. S00) the column vector with coordinates Sj0
570
(resp. S00j), j = 1, . . . , p, and denote by bS0n,m (resp. bS00n,m) the column vector with
571
coordinates bSj;n,m0 given in (3.7) (resp. bSj;n,m00 given in (3.4)). Theorem 5.1 below
572
predicts that the joint vector (bS0>n,m, bS00>n,m)> is asymptotically normal.
573
Theorem 5.1. Assume (3.1) holds. Let n → ∞. If m is fixed then
574 √ n Sb 0 n,m− S0 b S00n,m− S00h1 − E Var(f (X,Z)|X)
E Var(f (X,Z)|X)+m Var E(f (X,Z)|X)
i !
d
→ N (0, Ξm), 575
for some nonnegative matrix Ξm of size 2p × 2p. If m → ∞ then, elementwise, 576
limm→∞Ξm exists and the above display with Ξmreplaced by limm→∞Ξm is true. 577
A blockwise reading of Theorem 5.1shows that the behaviors of bS0n,mand bS00n,m
578
differ. While bS0>n,mis asymptotically unbiased even if m is kept fixed, bS00>n,mis asymptot-579
ically biased in general even if m goes to infinity. The estimator bS00
n,munder-estimates 580
S00. The bias, given by 581
S00 E Var(f (X, Z)|X)
E Var(f (X, Z)|X) + m Var E(f (X, Z)|X),
582 583
is null whenever f actually does not depend on Z, and large whenever the stochastic
584
model is highly stochastic.
585
Corollary 5.2 below shows that m must go to infinity fast enough to avoid the
586
estimator to be concentrated around the wrong target.
587
Corollary 5.2. Assume (3.1) holds. Let n → ∞. If m → ∞ such that √ n/m → 588 0 then 589 √ nSb00n,m− S00 d → N (0, Ξ22), 590
where Ξ22 is the lower-right block of the matrix limm→∞Ξm given in Theorem5.1. 591
The difference between bS0n,mand bS00n,mis due to the difference between the
lower-592
left terms in (3.7) and (3.4). While the lower-left term in (3.7) is unbiased for all n
593
and m, the lower-left term in (3.4) has a bias depending on m which propagates to the
594
estimator of the sensitivity indices. (The calculations are carried out in AppendixD.)
595
From a statistical perspective, it is more difficult to estimate the sensitivity indices of
596
the second kind than to estimate the sensitivity indices of the first kind. To estimate
597
the former, one needs to repeat the model many times. To estimate the later, this is
598
not necessary.
599
6. Numerical tests. Section6.1illustrates how the MRE responds to a change
600
in the Monte-Carlo design. In Section6.1the total budget T is kept fixed. Section6.2
601
illustrates how the sensitivity indices estimators behave asymptotically. In Section6.2
602
the total budget T increases.
603
6.1. Comparison of Monte-Carlo designs. The effect of the number of
rep-604
etitions on the sensitivity indices estimators and the effect of the calibration in the
605
two-stage procedure are examined in two kinds of experiments: the “direct”
experi-606
ments and the “calibration” experiments.
607
In the direct experiments, the sensitivity indices are estimated directly with the
608
given number of repetitions. Increasing numbers of repetitions m are tested. (Since
609
the budget is fixed, this goes with decreasing numbers of explorations.) For each m,
610
the mean squared errors (MSEs), given by EPp
j=1( bS 0 j;n,m−Sj0)2and E Pp j=1( bS 00 j;n,m− 611
Sj00)2, are estimated with replications. They are also split into the sum of the squared 612
biases and the sum of the variances to get further insight about the behavior of the
613
estimators. The MREs are estimated as well. A normalized version is considered:
614
it is the MRE divided by the number of variables. For models with two inputs, the
615
normalized MRE is interpreted directly as the probability that the two inputs are
616
ranked incorrectly.
617
In the calibration experiments, the sensitivity indices are estimated with the
two-618
stage procedure, the results of which depend on the calibration parameters K and
619
m0. Various calibration parameters are tested to see their effect on the MRE. The 620
budgets for the direct experiments and the calibration experiments are the same so
621
that the numbers can be compared. In particular, the direct experiments correspond
622
to the case K = 0 in the calibration experiments.
623
A linear model of the form Y = X1+ βX2+ σZ, where X1, X2, Z, are standard 624
normal random variables and β, σ are real coefficients, has been considered because
625
the sensitivity indices are explicit and hence the performance of the estimators can
626
be evaluated easily. The quantity m∗ is explicit: the formula is given in AppendixE.
627
6.1.1. High noise context. The coefficients are β = 1.2 and σ = 4. The
628
sensitivity indices are S10 = 0.05, S20 = 0.08, S001 = 0.41 and S200 = 0.59. The real
m0 n0 K/3 2 5 10 20 20 10 5 2 400 0.43 0.42 0.42 - - 0.42 0.39 0.40 200 0.38 0.39 0.37 - - 0.35 0.35 0.34 100 0.36 0.37 - - - - 0.32 0.30 50 0.39 0.33 - - - - 0.33 0.31
Table 1: Normalized MRE in the linear model with high noise for various calibrations: K/(p + 1) = 50, 100, 200, 400 and m0= 2, 5, 10, 20, . . . For instance, for K/(p + 1) =
200 = m0n0, the normalized MRE is available for m0= 2, 5, 10, 20, 40, 100.
m∗ is about 5.8. The total budget is T = 3 × 500 = 1500 and hence div2(1500) = 630
{1, 2, 4, 5, 10, 20, 25, 50, 100, 125, 250, 500}. The integer m†1500is equal to ϕ1500(m∗) = 631
5. Since the budget is kept fixed, the numbers of explorations are, respectively,
632
500, 250, 125, 100, 50, 25, 20, 10, 5, 4, 2, 1. The number of replications is 1500.
633
The results of the direct experiment are given in Figure1for m = 1, 2, 4, 5, 10,
634
20, 25. The MSE of first kind does not vary with the number of repetitions and is
635
much lower than the MSE of second kind, see(c). The estimators of the second kind
636
are highly biased for small numbers of repetitions(a)and they have a higher variance
637
for larger numbers of repetitions(b). The fact that the bias is high for small numbers
638
of repetitions agrees with the theory, according to which the bias should vanish as m
639
goes to infinity. Overall, the sensitivity indices of the second kind seem to be much
640
harder to estimate than the indices of the first kind, the estimators of which have a
641
negligible bias and a very small variance whatever the number of repetitions.
642
According to Figure1(c), the normalized MRE curve has a banana shape with a
643
minimum of about slightly less than 30% reached around m ∈ {5, 10} and endpoints
644
with a value of about 35%. A value of 30% means that the probability of ranking
645
the inputs correctly is about 70%. The region of observed optimal performance m ∈
646
{5, 10} coincides with m†1500= 5, the point at which the bound is minimal.
647
The results of the calibration experiment is given in Table1 for the normalized
648
MRE. The lowest MREs are reached at the bottom right of the table, with values
649
corresponding to 2 ≤ m ≤ 10 in Figure1 (c). Optimal performance is reached with
650
very few explorations in the first stage of the two-stage procedure. In this case, the
651
estimator mb∗K has a small bias but a high variance. It seems to be better than an
652
estimator with a small variance but a large bias. This might be explained by the low
653
curvature of the MRE curve.
654
6.1.2. Low noise context. The coefficients are β = 1.2 and σ = 0.9. The
655
sensitivity indices are S10 = 0.31, S20 = 0.44, S001 = 0.41 and S200 = 0.59. The real
656
m∗ is about 0.30 and hence the integer m†
1500 is equal to 1. As expected, these 657
numbers are smaller than the ones found in the high noise context. The total budget
658
is T = 3 × 500 = 1500. The number of replications is 500.
659
The results for the direct experiment are given in Figure 2. The MSE of first
660
kind increases with the number of repetitions, see (c): this is due to the increase
661
of the variance (b), while the bias is negligible (a). As in the high noise context,
662
the estimators of the second kind have a decreasing bias and an increasing variance,
663
although the decrease of the bias is of much less magnitude. This agrees with the
664
theory, where we have seen that, for the sensitivity indices of the second kind, the
665
biases of the estimators are small when the noise of the model is low.
● ● ● ● ● ● ● 5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 m
sum of squared biases
● first kind second kind
(a) Squared bias
● ● ● ● ● ● ● 5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 m sum of v ar iances ● first kind second kind (b) Variance ● ● ● ● ● ● ● 5 10 15 20 25 0.0 0.1 0.2 0.3 0.4 m v
alue of the error
MSE of first kind MSE of second kind normalized MRE
(c) Errors
Fig. 1: Sum of squared biases(a), sum of variances(b)and errors(c)of the sensitivity indices estimators for the linear model in the high noise setting. Confidence intervals of level 95% are added in(c).
In Figure2 (c), the normalized MRE varies a lot. It increases from about 2% at
667
m = 1 to 30% at m = 25. Thus, unlike in the high noise setting, choosing a good
668
number of repetitions is important. The best performance is achieved at m = 1, which
669
coincides with the minimizer m†1500= 1 of the bound.
670
The results of the calibration experiment for the normalized MRE is given in
671
Table2. The best performance is reached at the bottom left of the table with numbers
672
that correspond to the optimal performance in Figure2 (c). Moreover, notice that a
673
large spectrum of calibration parameters (K, m0) yield low errors. 674
● ● ● ● ● ● ● 5 10 15 20 25 0.00 0.10 0.20 0.30 m
sum of squared biases
● first kind second kind
(a) Squared bias
● ● ● ● ● ● ● 5 10 15 20 25 0.00 0.10 0.20 0.30 m sum of v ar iances ● first kind second kind (b) Variance ● ● ● ● ● ● ● 5 10 15 20 25 0.00 0.10 0.20 0.30 m v
alue of the error
MSE of first kind MSE of second kind normalized MRE
(c) Error
Fig. 2: Sum of squared biases(a), sum of variances(b)and errors(c)of the sensitivity indices estimators for the linear model in the low noise context. Confidence intervals of level 95% are added in(c).
m0 n0 K/3 2 5 10 20 20 10 5 2 400 0.18 0.15 0.17 - - 0.16 0.18 0.20 200 0.05 0.04 0.04 - - 0.06 0.05 0.07 100 0.02 0.04 - - - - 0.04 0.04 50 0.03 0.02 - - - - 0.02 0.04
Table 2: Normalized MRE in the linear model with low noise for various calibrations: K/(p + 1) = 50, 100, 200, 400 and m0= 2, 5, 10, 20, . . . For instance, for K/(p + 1) =
6.2. Asymptotic behavior of the sensitivity indices estimators. To
illus-675
trate the asymptotic behavior of the sensitivity indices estimators, Sobol’s g-function,
676
a benchmark in sensitivity analysis [25,20], is considered. Sobol’s g-function is given
677 by 678 g(U1, . . . , Up+1) = p+1 Y j=1 |4Uj− 2| + aj 1 + aj , 679 680
where the aj are nonnegative and the Uj are independent standard uniform random 681
variables. The less aj the more Uj is important. Elementary calculations show that 682
the first-order Sobol index associated with Uj is given by 683 S(a1,...,ap+1) j = 1 3(1 + aj)2 −1 + p+1 Y j=1 (4/3 + a2 j+ 2aj) (1 + aj)2 −1 . 684 685
To build a stochastic model out of Sobol’s g-function, we let one of the Uj play 686
the role of Z. For instance if Ui, 1 ≤ i ≤ p + 1, were to play this role, then the 687
stochastic model would be
688
Y = f (X1, . . . , Xp, Z) = g(X1, . . . , Xi−1, Z, Xi, . . . , Xp).
(6.1)
689 690
Of course Y and f above depend on i. In the rest of this section we choose arbitrarily
691
i = 2 and p = 4.
692
The Sobol indices of the first and of the second kind (in the sense of Definition3.1
693
and3.2) are then easily seen to be
694 S0j= ( S(a1,...,ap+1) j if 1 ≤ j ≤ i − 1 S(a1,...,ap+1) j+1 if i ≤ j ≤ p 695 696 and Sj00= S(bi1,...,bip) j , where 697 bij= aj if 1 ≤ j ≤ i − 1, aj+1 if i ≤ j ≤ p. 698 699
For each kind of Sobol index, we produced 500 estimates of the p Sobol indices
700
and computed the values of the mean squared error (MSE) by averaging over the
701
500 replications and summing over the p indices. We tested n = 100, 500, 2500 and
702
m = 1, 10, 100.
703
The MSEs are shown in Figure3. Let us look at3a. As n increases, the decrease
704
is linear for each m. This indicates that the MSEs go to zero at a polynomial rate,
705
even if m is fixed (look at the line m = 1). This agrees with the theoretical results
706
of Section 5. The picture is different for the estimator of Sobol indices of the second
707
kind. In 3b, the curve for m = 1 is not a straight line, indicating that the MSE may
708
not go to zero. Indeed, the MSE for m fixed is not expected to go to zero because
709
of the bias depending on m. To make the MSE go to zero, one has to force m go to
710
infinity.
711
Figure 4, which shows the distribution of the estimates for the index associated
712
to X1, better explains this phenomenon. Here the bias is apparent for m = 1 and 713
vanishes as m goes to infinity. The bias for the indices associated with the other
714
inputs is not as large (not shown here).
● ● ● 0.02 0.05 0.10 0.20 0.50 n mse ● ● ● ● ● ● ● ● ● 100 500 2500 m=100 m=1 m=10
(a) first kind
● ● ● 0.02 0.05 0.10 0.20 0.50 n mse ● ● ● ● ● ● ● ● ● 100 500 2500 m=100 m=1 m=10 (b) second kind
Fig. 3: MSEs for the Sobol index estimators of the first and second kind (logarithmic scale). m=1 m=10 m=100 m=1 m=10 m=100 m=1 m=10 m=100 0.0 0.5 1.0 n=100 n=500 n=2500
Fig. 4: Boxplots of the estimates for the Sobol index of the second kind associated with X1. The red horizontal line is the truth.
7. Conclusion. The practical method that consists of repeating the stochastic
716
model at each exploration of the input space was analysed in the context of global
717
sensitivity analysis. To find a tradeoff between the number of explorations n and the
718
number of repetitions m, a bound on the missranking error (MRE) was found and
719
minimized, leading to a solution in closed form. A two-step procedure was
imple-720
mented to estimate the sensitivity indices. It was shown to have good asymptotic
721
properties. Two sensitivity indices were considered. The sensitivity index of the first
722
kind results from the existence of a function that links the output, the inputs and
some random noise in stochastic models defined through probability measures. The
724
sensitivity index of the second kind is the population version of the estimator (1.4). An
725
asymptotic analysis of the estimators was conducted. It was found that the estimators
726
for the indices of the second kind may be asymptotically biased if m goes to infinity
727
too slowly, while the estimators for the indices of the first kind are asymptotically
728
unbiased even if m remains fixed. To test the theory, simulation experiments were
729
conducted and the bias of the sensitivity estimator of the second kind was confirmed.
730
Optimal compromises between repetitions and explorations have been identified and
731
compared with the output of the two-stage procedure.
732
This work opens many research directions. First, the sensitivity estimators of the
733
two stages could be aggregated to build estimators with a lower variance. Second,
734
other methods might be developed to optimize the Monte-Carlo sampling scheme. For
735
instance the MSE might be approximated or asymptotic variance-covariance matrices
736
might be minimized. Third, multilevel Monte-Carlo sampling schemes might be
con-737
sidered to alleviate the bias issue. Fourth, a finite-sample analysis could be conducted
738
to get insight into the tradeoff K is subjected to. Fifth, since the bias is known, it
739
could be estimated to build bias-corrected sensitivity indices estimators. Sixth, the
740
problem of choosing a number of calls with many divisors must be addressed. It may
741
be worth to call the model a bit less if this permits to have a better set divp(T ). Sev-742
enth, the connection between our representation of stochastic models and that of [10]
743
could be investigated further.
744
Appendix A. Calculations of some sensitivity indices.
745
A.1. Calculations for eS1HAG in Example 2. We have 746 e S1HAG= E Var(E[f (X, Z)|Xj, Z]|Z) Var(f (X, Z)|Z) = Z Ω Var(E[f (X, Z)|Xj, Z]|Z) Var(f (X, Z)|Z) dP. 747 748 749
Since the term inside the integral is a function of Z and the law of Z is the standard
750
uniform distribution, a change of measures yields
751 e S1HAG= Z (0,1) Var(E[f (X, z)|Xj, Z = z]|Z = z) Var(f (X, z)|Z = z) dz = Z (0,1) Var(E[f (X, z)|X1]) Var(f (X, z)) dz. 752 753
It remains to know what the ratio inside the integral is. We have
754
Var(f (X, z)) = Var(Φ−1(z)X2+ X1) =Φ−1(z)2Var(X2) + Var(X1) 755 =Φ−1(z)2L 2 12 + 1 12, 756 757 and 758 Var(E[f (X, z)|X1]) = Var(E[Φ−1(z)X2+ X1|X1]) 759
= Var(Φ−1(z) E[X2|X1] + E[X1|X1]) 760 = Var(Φ−1(z) E[X2] + X1) 761 = Var(X1) 762 =1 12 763 764
and hence 765 e S1HAG= Z (0,1) 1 Φ−1(z)2L2+ 1dz = Z ∞ −∞ 1 z2L + 1 1 √ 2πe −z2/2 dz. 766 767
A.2. Calculations for S10 in Example 4. The sensitivity index of the first
768
kind associated with the first input is given by
769 S10 =Var[E(X1+ X2Φ −1(Z)|X 1)] Var[X1+ X2Φ−1(Z)] . 770 771
The numerator is given by Var[E(X1+ X2Φ−1(Z)|X1)] = Var[X1+ E(X1Φ−1(Z))] = 772
Var[X1] = 1/12. The denominator is given by Var[X1+ X2Φ−1(Z)] = Var[X1] + 773
Var[X2Φ−1(Z)] = 1/12 + Var[X2Φ−1(Z)], where 774
Var[X2Φ−1(Z)] = Var[E(X2Φ−1(Z)|Z)] + E(Var[X2Φ−1(Z)|Z]) 775 = Var Φ−1(Z) L 2 + 1 + Z 1 0 Φ−1(z)2Var[X2] dz 776 = L 2 + 1 2 +L 2 12, 777 778 so that 779 S10 = 1/12 1/12 + (L/2 + 1)2+ L2/12 = 1 4(L2+ 3(L + 1)) + 1. 780 781 Appendix B. Proofs. 782
B.1. Proof of Lemma 2.2. Since P∗ is a product probability measure, we
783
can write P∗ = ⊗pj=1Pj∗. Let Ω = (0, 1)p+1 endowed with its Borel σ-field and let 784
P be the product Lebesgue measure λ⊗p+1j=1. If F
j denotes the distribution function 785
corresonding to Pj∗ then, for ω = (ω1, . . . , ωp+1) ∈ Ω, put Xj(ω) = Fj←(ωj) := 786
inf{xj ∈ R : Fj(xj) ≥ ωj} for all j = 1, . . . , p and Z(ω) = ωp+1. Take f (x, z) = 787
Fx←(z) := inf{t ∈ R : Fx(t) ≥ z}, z ∈ (0, 1), where Fx is the cumulative distribution 788
function associated with Qx. Standard probability techniques show that f (x, Z) is 789
measurable for every x. Moreover, for every t ∈ R,
790 P (f (x, Z) ≤ t) 791 =P (Z ≤ Fx(t)) = λ⊗ p+1 j=1{ω ∈ Ω : ω p+1 ≤ Fx(t)} = λ(0, Fx(t)] = Fx(t). 792 793
Finally, by the same token,
794 P (X1≤ t1, . . . , Xp≤ tp, Z ≤ tp+1) 795 =P {ω : ω1≤ F1(t1), . . . , ωp≤ Fp(tp), ωp+1 ≤ tp+1} = tp+1 p Y j=1 Fj(tj). 796 797
The proof is complete.
798
Proof of Proposition 4.1. Assume without loss of generality that D1< · · · < 799
Dp. We first prove the following Lemma. For convenience, the subscripts n and m 800
are left out.