Adaptive surrogate modeling by ANOVA and sparse polynomial dimensional decomposition for global sensitivity analysis in fluids simulation

(1)

HAL Id: hal-01178398

https://hal.inria.fr/hal-01178398

Submitted on 20 Jul 2015

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

polynomial dimensional decomposition for global

sensitivity analysis in fluids simulation

Kunkun Tang, Pietro Marco Congedo, Remi Abgrall

To cite this version:

Kunkun Tang, Pietro Marco Congedo, Remi Abgrall. Adaptive surrogate modeling by ANOVA and

sparse polynomial dimensional decomposition for global sensitivity analysis in fluids simulation.

[Re-search Report] RR-8758, Inria Bordeaux Sud-Ouest. 2015. �hal-01178398�

(2)

S N 0 2 4 9 -6 3 9 9 IS R N IN R IA /R R --8 7 5 8 --F R + E N G

RESEARCH

REPORT

N° 8758

June 2015 Project-Team Cardamom

Adaptive surrogate

modeling by ANOVA and

sparse polynomial

dimensional

decomposition for global

sensitivity analysis in

fluids simulation

(3)

(4)

RESEARCH CENTRE BORDEAUX – SUD-OUEST

200 avenue de la Vieille Tour

sparse polynomial dimensional decomposition

for global sensitivity analysis in fluids

simulation

Kunkun Tang

∗

_{, Pietro M. Congedo}

∗

_{, Rémi Abgrall}

†

Project-Team Cardamom

Research Report n° 8758 — June 2015 —32pages

∗_{INRIA Bordeaux Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence, France} †_{Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland}

(5)

moderate to large number of input random variables. Due to the intimate structure between the PDD and the Analysis of Variance (ANOVA) approach, PDD is able to provide a simpler and more direct evaluation of the Sobol’ sensitivity indices, when compared to the Polynomial Chaos expan-sion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure fea-turing stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-square regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.

Key-words: Uncertainty quantification; Global sensitivity analysis; Analysis of Variance (ANOVA); Polynomial dimensional decomposition (PDD); Regression approach; Adaptive sparse polynomial surrogate model; Atmospheric reentry

(6)

polynomial dimensional decomposition for global

sensitivity analysis in fluids simulation

Résumé : Pas de résumé

(7)

1 Introduction

Compared to local sensitivity analysis methods, global sensitivity analysis (GSA) has the advan-tage of taking into account the overall influence of input parameters and their interactions onto the output quantity of interest, by considering the entire input space and not depending on a specific nominal point (see for example [8]).

However, the main difficulty encountered when employing the global methods is the required high cost of numerical computations, since the Monte Carlo simulation (MC) or a quasi Monte Carlo method (QMC) is usually applied to estimate the sensitivity indices. For complex problems in the real life, relying upon MC or QMC can be very expensive from a computational point of view.

Traditionally, GSA is performed using methods based on the decomposition of the output variance [36], i.e. Analysis of Variance (ANOVA), which is nowadays one of the most commonly used GSA techniques in the literature. Indeed, ANOVA relies on a functional decomposition that incorporates component functions involving a single uncertain parameter, or a group of parameters, and the computation of the sensitivity measures of each component function usually requires MC or QMC simulations.

In 2001, Sobol’ used this formulation to define global sensitivity indices [36], displaying the relative variance contributions of different ANOVA terms. In [27,28], they introduced two High-Dimensional Model Representation (HDMR) techniques to capture input-output relationships of physical systems with many input variables. These techniques are based on ANOVA-type decompositions. Since it usually requires a large number of function evaluations to perform this analysis, several techniques have been developed to compute the different so-called sensitivity indices at low cost [33]. In [37,14, 4], the generalized Polynomial Chaos expansions (gPC) are used to build surrogate models for computing the Sobol’ indices analytically as a post-processing of the PC coefficients. In [15], they combine the multi-element polynomial chaos with the ANOVA functional decomposition to enhance the convergence rate of polynomial chaos in high dimensions and in problems with low stochastic regularity. The use of adaptive ANOVA decomposition is investigated in [49] as an effective dimension-reduction technique in modeling incompressible and compressible flows with a high dimensionality of random space. In Sudret [4], the sparse Polynomial Chaos (PC) expansions are introduced in order to compute the global sensitivity indices. An adaptive algorithm allows to build a PC-based meta-model that only contains the significant terms whereas the PC coefficients are computed by the least-square regression. Other approaches are developed if the assumption of independence of the input parameters is not valid. New indices have been proposed to address the dependence [46, 47], but these attempts are limited to a linear correlation. In [7], they introduce a global sensitivity indicator which looks at the influence of input uncertainty on the entire output distribution without reference to a specific moment of the output (moment independence) and which can also be defined in the presence of correlations among the parameters. In [11], a gPC methodology to address the GSA for this kind of problems is introduced. A moment-independent sensitivity index that suits problems with dependent parameters is reviewed. Recently, in [9], a numerical procedure is set-up for moment-independent sensitivity methods.

Recently, the anchored ANOVA method (i.e. cut-HDMR) has been widely used in the lit-erature (see for instance [22, 42, 23,16, 50,51,49, 38]). In particular, the method proposed in [38] is based on the covariance decomposition of the output variance to obtain accurate results which are less sensitive to the choice of the reference point, and meanwhile preserves the main advantage of the anchored method, i.e. a much less number of deterministic simulations is needed compared to the standard ANOVA method combined with MC or QMC. A disadvantage of this anchored based global method is that a surrogate model cannot be built in a straightforward

(9)

way.

The objective of this work consists in building an efficient UQ and GSA method featuring a surrogate model representation that is affordable for complex numerical simulation problems. An accurate surrogate representation is useful for accelerating the model evaluations when using, for instance, the MC type techniques. We emphasize again that in [4], the generalized Polynomial Chaos (gPC) is combined with the ANOVA approach to perform the global analysis and to build a surrogate sparse representation. We recall a traditional gPC decomposition of a N -dimensional function f(ξ1, ξ2, · · · , ξN) of degree not exceeding p can be written as follows

f (ξ1, ξ2, · · · , ξN) = a0ζ0+ N X i1=1 ai1ζ1(ξi1) + N X i1=1 i1 X i2=1 ai1i2ζ2(ξi1, ξi2) + · · · + N X i1=1 i1 X i2=1 · · · i_Xp−1 ip=1 ai1i2···ipζp(ξi1, ξi2, · · · , ξip), (1)

where ζn(ξi1, ξi2, · · · , ξin) denote the orthogonal polynomials of order n from the Askey scheme

(see [44,45]) in terms of the multivariate random variables (ξi1, ξi2, · · · , ξin). On the one hand,

note the gPC formulation (1) is organized with respect to an increasing degree of multivariate polynomials, and not to an increasing order of parameter interactions. For instance, the group of polynomials of order n can contain polynomials with random variables subject to a dimensionality equal to or less than n. For this reason in order to compute the Sobol’ indices, one needs to additionally reorder the PC terms according to the random variables they depend on. This task can be annoying in our practice for complex multivariate problems. On the other hand, the gPC expansions are known to succumb to the curse of dimensionality for high-dimensional problems. To address these issues in the present work, we employ the polynomial dimensional decomposition (PDD) (introduced and developed by Rahman and coworkers in e.g. [29,30,32,48,31]), instead of PC, to combine with the ANOVA decomposition due to their direct link between each other. Indeed, the PDD expansion of the model output relies on the ANOVA functional decomposition in a very direct way: a T -dimensional ANOVA component function is represented by a summation of

T -dimensional multivariate polynomials from lowest to highest degree using a specific polynomial

basis subject to the input parameter distributions. Thus, the PDD gives priority to exploit the low order parameter interactions, which matches perfectly the principle of the ANOVA decomposition where we suppose that the low order component functions are dominant for most engineering cases. Consequently, even with a shortly truncated PDD expansion, one can take advantage of employing relatively high-order single- or multi-variate polynomials. Similar to the gPC approach, an important inevitable task is to determine the polynomial coefficients of PDD. The least-square regression approach reveals itself as an efficient tool for this purpose, by minimizing the error of the surrogate model representation in the mean square sense (see e.g. [12,37, 4]). Compared to the projection approach (see e.g. [19, 24, 3]) where each polynomial coefficient is obtained by computing a multi-dimensional integral, we find the regression approach more flexible for problems involving a moderate number of uncertain parameters. It is known that the number of ANOVA component functions increases exponentially with respect to the uncertain parameter dimensionality, and meanwhile the imposed polynomial order for the PDD expansion involves a polynomial increase of the number of PDD terms for each component function. This phenomenon causes one main limitation of the regression approach even for a truncated low-order ANOVA expansion, namely the required high number of deterministic model evaluations for problems characterized by a moderate to large number of uncertainties; indeed, for the regression problem to be well posed, the number of deterministic model evaluations is necessary to be larger than the total polynomial expansion size [4,5]. In this respect, this paper proposes

(10)

to combine the active dimension strategy in the framework of the adaptive ANOVA method [49,38] and the stepwise regression technique [4,5] to obtain an efficient sparse surrogate model representation. This method has similarities with the adaptive sparse PC approach presented in [4,5], but differs by the use of the PDD expansion, the truncation order / active dimension technique used in the ANOVA type methods, and the selection criteria allowing to retain the most important polynomial terms. Once the sparse surrogate model is built, the Sobol’ sensitivity indices can be easily obtained by manipulating the polynomial coefficients without any reordering task as required for the PC expansion.

The paper is organized as follows: the classical ANOVA functional decomposition and the related variance-based global sensitivity indices are summarized in Section 2. The polynomial dimensional decomposition (PDD) expansion, the determination of its coefficients, and its link with the sensitivity analysis are presented in Section 3. The proposed adaptive sparse meta-modeling approach is then presented in Section4by using a combination of the active dimension technique of the ANOVA expansion and the stepwise regression approach. As far as the strategy of selecting the most significant polynomial terms is concerned, the variance-based selection criterion is detailed (Criterion 1). Two academic benchmark problems as well as an atmospheric reentry spacecraft problem are carried out in Section5. Conclusions follow. Ais devoted to the presentation of the estimator of accuracy used in this work and of the Criterion 2 allowing to retain the most important terms by comparing the surrogate representation error.

2 ANOVA dimensional decomposition of the model

re-sponse

Let us introduce some notations. The upper-case letters, X = (X1, · · · , XN) and Y , denote a list

of independent input random variables (random vector) and a scalar random output, respectively; the lower-case letters x and y represent the realizations. {X} is used to represent a set whose members are the random variables contained in X, i.e. {X} = {X1, · · · , XN}. We assume the

N input random variables X admit a joint probabilty density function (pdf)

pX(x). (2)

The assumption of independence of random variables in the set with respect to each other implies:

pX(x) =

N

Y

i

pXi(xi)

where pXi(xi) is the marginal pdf of Xi.

The expectation and variance of an integrable function of random vector X, g(X), denoted by E(g(X)) and Var(g(X)) respectively, are given by

E(g(X)) = Z RN g(x)pX(x) dx, Var(g(X)) = Z RN (g(x) − E(g(X)))2_p X(x) dx, where dx =QN i=1dxi.

Let us suppose that the response of a given system of interest can be represented by a N-dimensional function y = f(x)

(11)

We consider (3) in its functional expansion form as follows y = f0+ N X 16i6N fi(xi) + N X 16i<j6N fij(xi, xj) + · · · + f1,2,··· ,N(x1, x2, · · · , xN), (4)

or in compact form using a multi index system:

y = fs0+

2N₋₁

X

j=1

fsj(xsj). (5)

The multi indices sj are defined such as

s0 = (0, 0, 0, · · · , 0) s1 = (1, 0, 0, · · · , 0) s2 = (0, 1, 0, · · · , 0) .. . sN = (0, 0, 0, · · · , 1) sN+1 = (1, 1, 0, · · · , 0) sN+2 = (1, 0, 1, · · · , 0) .. . sN = (1, 1, 1, · · · , 1). (6) where N _{= 2}N_{− 1.} ₍₇₎

The representation (5) is called ANOVA (Analysis Of Variance) decomposition [1] of f(x), if for any j ∈ {1, · · · , N },

Z

R

fsj(xsj)pXi(xi) dxi= 0 for xi∈ {xsj}. (8)

It follows from (8) the orthogonality of ANOVA component terms, namely

E(fsjfsk) = 0 for j 6= k, (9)

Meanwhile, we obviously have

E(fsj) = 0 for j = 1, · · · , N .

Note that the terms in the ANOVA decomposition can be expressed as integrals of f(x). Indeed, we have

E(Y ) = f0,

E(Y |xi) = f0+ fi(xi),

E(Y |xi, xj) = f0+ fi(xi) + fj(xj) + fij(xi, xj),

(10)

and so on, where E(Y |·) denotes the conditional expectation with respect to the conditional pdf

pX|X_sj(x|xsj) defined in the standard way. For instance,

E(Y |xi) = R RN −1f (x)pX|Xi(x|xi) dx−i= R RN −1f (x) pX(x) p_Xi(xi)dx−i =R_RN −1f (x) QN j=1,j6=ipXj(xj) dx−i, (11)

(12)

where dx−i=QNj=1,j6=idxj. We thus observe from (10) the ANOVA terms can be computed as follows: fsj(xsj) = E(Y |xsj) − X sk⊂sj fsk, (12)

with sk a subset multi-index of sj (Here we define sk ⊂ sj ⇔ {xsk} ⊂ {xsj}).

By taking the variance operator to the formulation (12), and keeping in mind the orthogonal-ity between component functions, we can obtain the component variance of the term fsj(xsj):

Var(fsj) = VarXsj(E(Y |Xsj)) −

X sk⊂sj Var(fsk). (13) The notation VarX sj(E(Y |Xsj))

in (13) is used to emphasize its meaning of the variance of the conditional expectation of Y given

Xsj.

On the other hand, by integrating f2_{and exploiting the orthogonality property of component}

functions, the output variance of f can be written as follows:

V (Y ) = N X j=1 E(fs2j(Xsj)) = N X j=1 Var(fsj), (14)

which is in fact the sum of the variances of all the decomposition terms. Note (14) is a special case of (13) when sj = sN.

2.1 Variance-based global sensitivity analysis

The ANOVA decomposition is closely related to the global sensitivity indices [35,36] which are defined by the ratios

Ssj = Vsj V (Y ) = VarX_sj(E(Y |Xsj)) − P sk⊂sjVsk V (Y ) for j = 1, · · · , N . (15)

For simplicity, we have denoted here Vsj = Var(fsj). In particular, we note the first-order

sensitivity indices are defined by SXi =

Vi

V (Y ) =

VarXi(E(Y |Xi))

V (Y ) for i = 1, · · · , N

where Vi= Var(fi). From (14), all the Ssj are non-negative and their sum equals unity: N

X

j=1

Ssj = 1. (16)

Furthermore, the total effects of the variable Xi is estimated by

ST i = V (Y ) −VX_−i+P_{X sk}⊂{X−i}Vsk V (Y )

= 1 − VarX−i(E(Y |X−i))

V (Y ) for i = 1, · · · , N

(17)

(13)

3 Polynomial dimensional decomposition (PDD) of the model

response

The previous section deals with the functional decomposition of a model response aiming to compute the moments and sensitivity indices. However, this approach does not provide any strategy to build a meta-model. For time-consuming numerical simulations, the meta-modeling is of importance to approximate and thus to accelerate the model evaluations. To seek an efficient way for this purpose, the polynomials can be used to represent the component functions in the ANOVA expansion. Roughly speaking, two techniques are widely used in the literature: polynomial chaos (PC) [44], and polynomial dimensional decomposition (PDD) [29]. We prefer to employ the PDD representation in this work to take advantage of the close dimensional structure between the PDD and ANOVA. Note however the PC can also be used in a similar way, and no additional difficulty appears.

3.1 PDD representation

Let us consider an orthogonal system of polynomials in the Hilbert space L2, denoted by

{ψj(xi); j = 0, 1, · · · }, which is characterized by the following relation

Z

R

ψj(xi)ψk(xi)pXi(xi) dxi= γj,Xiδjk, (18)

where δjk = 1 if j = k, otherwise δjk = 0. The normalization constant γj,Xi can obviously be

determined as follows

γj,Xi=

Z

R

ψj2(xi)pXi(xi) dxi. (19)

As well known in the literature, common distributions can be associated to specific families of polynomials [44]. For instance, a uniform distribution can be associated to Legendre polynomials, and a Gaussian to Hermite polynomials.

Let us consider a general T -dimensional component function (1 ≤ T ≤ N ) of the ANOVA decomposition

fi1,i2,··· ,iT(xi1, xi2, · · · , xiT). (20)

Due to the assumption of independence between member variables of the random vector X, it can be proved that

ΨJ_T(xi1, xi2, · · · , xiT) = T

Y

k=1

ψjk(xik) (21)

is a multivariate basis in the T -dimensional space. jk is the order of polynomial for the variable

xik, and JT = {j1, j2, · · · , jT}.

Keeping in mind its zero mean property, the component function (20) can be expanded as done in [29]: fi1,i2,··· ,iT(xi1, xi2, · · · , xiT) = ∞ X jT=1 · · · ∞ X j1=1 Cj1,j2,··· ,jT i1,i2,··· ,iT T Y k=1 ψjk(xik). (22) Here Cj1,j2,··· ,jT

i1,i2,··· ,iT is the coefficient. As will be discussed later, this coefficient will be determined

(14)

Due to the fact of

E(ΨJ_S(Xi1, Xi2, · · · , XiS)ΨJT(Xi1, Xi2, · · · , XiT)) = 0, with S 6= T,

the orthogonality property of ANOVA component functions (9) is guaranteed.

In practice, the expansion with an infinite number of terms in (22) must be truncated. For the sake of simplicity, as done in [29], we truncate (22) by m terms for each dimension, i.e.

fi1,i2,··· ,iT(xi1, xi2, · · · , xiT) = m X jT=1 · · · m X j1=1 Cj1,j2,··· ,jT i1,i2,··· ,iT T Y k=1 ψjk(xik). (23)

In particular, the first-order, second-order, and third-order component functions can be expressed respectively by fi(xi) = m X j=1 Cijψj(xi), fi1,i2(xi1, xi2) = m X j2=1 m X j1=1 Cj1,j2 i1,i2ψj1(xi1)ψj2(xi2), fi1,i2,i3(xi1, xi2, xi3) = m X j3=1 m X j2=1 m X j1=1 Cj1,j2,j3 i1,i2,i3ψj1(xi1)ψj2(xi2)ψj3(xi3). (24)

In conclusion, the polynomial dimensional decomposition of order m of the model output

f (x) can be written as f (x) ≈ fm(x) = f0+ N X i=1 m X j=1 Cijψj(xi) + N X i1<i2 m X j2=1 m X j1=1 Cj1,j2 i1,i2ψj1(xi1)ψj2(xi2) + N X i1<i2<i3 m X j3=1 m X j2=1 m X j1=1 Cj1,j2,j3 i1,i2,i3ψj1(xi1)ψj2(xi2)ψj3(xi3) + · · · + N X i1<···<iN m X jN=1 · · · m X j1=1 Cj1,j2,··· ,jN i1,i2,··· ,iN N Y k=1 ψjk(xik). (25)

Hence, the total size P of the m-th order full PDD expansion of an N -dimensional function is found to be the following

P = 1 + N m + N 2 m2+ · · · + N N mN = (1 + m)N . (26)

Note in the particular case of m = 1, the PDD representation has the same expansion size as ANOVA.

The following section is devoted to the discussion of the expansion coefficient computation.

3.2 PDD coefficient computation by regression

The computation of the coefficients involved in (25) can be obtained by projection [19, 18]. Indeed, we have Cj1,j2,··· ,jT i1,i2,··· ,iT = E (f (X)ΨJ_T(Xi1, Xi2, · · · , XiT)) E Ψ2 J_T(Xi1, Xi2, · · · , XiT) . (27)

(15)

As it is well-known, the formulation (27) can be expensive to evaluate in case of a large number of input variables, since high-dimensional integrations are required to be computed. Typically, one can employ random sampling type methods (e.g. Monte Carlo (MC), Latin Hypercube) which are costly in general, or Gauss quadrature methods which are prohibitive in the high-dimensional case even when using a sparse grid.

As done in [5, 4], in order to take advantage of the flexibility property, we use instead the regression approach in this work to determine the expansion coefficients.

The regression approach in this work can be regarded as a response surface aiming to provide an optimized PDD expansion to the considered model problem.

Let us write the finite PDD expansion (25) in a vector form as follows

fm(x) = CTαψα(x) (28)

where Cα= (Cα0, · · · , CαP −1)

T _{is a vector containing all the coefficients, and ψ}

α(x) = (ψα0, · · · , ψαP −1)

T

gathering all the multivariate basis polynomials including the unity basis ψα0 = 1. Here P is the

total size of PDD expansion, given by (26).

When using a regression method to determine the expansion coefficients, it is necessary to choose a set of realizations of input random vector (i.e. an experimental design [5,4], for instance by a Sobol’ quasi-random sequence [34]), denoted by

X = {x1, x2, · · · , xQ_},

whose size is necessary to be larger than the PDD expansion size. As indicated in [6], in order for the regression problem to be stable enough, the experimental design size Q, in practice, is usually taken as

Q = kP with 2 ≤ k ≤ 3. (29)

We denote the corresponding model outputs by Y = {y1, y2, · · · , yQ}.

The idea is to determine the coefficients Cαby minimizing the projection error in L2 norm.

That is, f Cα= arg min C_α_∈RP Q X i=1 yi− CTαψα(xi) 2 . (30)

The solution of the least-square regression problem (30) can be easily found by using a variational approach ∂ ∂Cαj "_XQ i=1 yi_{− C}T αψα(xi) 2 # = 0, for j = 0, · · · , P − 1, and can be obtained by solving the following linear system

(ΨT

Ψ) fCα= ΨT

Y. (31)

Ψ represents the following matrix involving basis polynomials evaluated at the realizations in the experimental design:

Ψij = ψαj(x i

), i = 1, · · · , Q, j = 0, · · · , P − 1 (32) It is well known the real matrix ΨT_Ψ _{is symmetric and positive-definite. If this matrix}

is well-conditioned, the linear system (31) can be in general resolved efficiently by Cholesky factorization.

(16)

We emphasize the design size Q is directly linked to the global computational cost of un-certainty quantification for numerical simulations. As a consequence from (29) for the choice of experimental design size, the main objective of this work is to minimize the PDD expansion size

P , by including only the most influential polynomials terms.

3.3 PDD based global sensitivity indices

Once the coefficients Cαare determined by the regression approach for the PDD expansion (28),

the second-order moment and the global sensitivity indices can be obtained in a straightforward way.

Indeed, keeping in mind

E(fm(X)) = Cα0,

the approximated variance of the model output is then

Var(fm(X)) = P_X−1

j=1

Cα2jγαj,

with the multivariate normalization constant determined by

γαj = E

h

ψ2αj(X)

i

.

If one employs normalized basis polynomials, i.e. γαj = 1 , the output variance formulation can

be further simplified to

Var(fm(X)) = P_X−1

j=1 Cα2j.

3.3.1 ANOVA and PDD-based sensitivity estimates

It is now straightforward to write the variance-based global sensitivity indices by using the PDD expansion. Indeed, Ssj = Vsj Var(fm(X)) = 1 Var(fm(X)) X αj∈sj Cα2jγαj (33)

The total effects of an input variable can be written as done in Section2.

4 Variance-based dimension reduction for the model

rep-resentation

For practical problems, in particular for the ones with a large number of stochastic parameters, the size of the PDD representation given in Section3must be reduced to make the uncertainty analysis feasible.

The purpose of this section is to present our adaptive technique belonging to the family of

stepwise regression. The method used in this work is indeed a variant of the one used in [5,4]. Apart from applying a truncated dimensionality widely used for the ANOVA component functions, we consider additionally two levels of dimension reduction in this section, both of which relies on the relative importance of expansion functions in terms of their variance contribution compared to the total variance. This criterion is different from [5,4] where the adaptive strategy is based on the numerical error of the model representation. We think a variance-based criterion is more reliable if one cares more about the statistical properties of the model output.

(17)

4.1 Adaptive ANOVA – retaining active dimensions

As already done in our previous work [38], we use the adaptive strategy presented in [49] in order to retain only the most important dimensions for interaction terms in ANOVA.

4.1.1 Truncation dimension

The low order interactions of input variables often have the main impact upon the output [36]. Thus, the full ANOVA expansion (4) can be approximated by

f (x) = f0+ ν X T=1 N X i1<···<iT fi1,i2,··· ,iT(xi1, xi2, · · · , xiT), with ν ≪ N. (34)

Here, ν is called the truncation (or effective) dimension representing the highest dimension of the ANOVA component functions.

4.1.2 Adaptive ANOVA

For problems featuring a high dimensionality N , ANOVA decomposition method is still very expensive even when we only choose a truncation dimension ν = 2. An efficient way to solve this problem is to use the adaptive ANOVA decomposition. To this purpose, we replace the approximation (34) by f (x) = f0+ ν X T=1 DT X i1<···<iT fi1,i2,··· ,iT(xi1, xi2, · · · , xiT), with ν ≪ N, DT 6N. (35)

Here DT is the active dimension for ANOVA component functions of order T . For problems

considered in this work, we take D1 = N . The active dimension for higher order component

functions will be determined by the criterion presented below (see [49]).

In this work. we use the variance-based criterion [49] for choosing the active dimension D2

and further selecting the most important second- and higher-order terms. It is assumed Var(fi)

(i ∈ [1, N ]) of first-order terms are monotonically decreasing with respect to i.

D2is evaluated using the sum of variances of first-order terms:

D2 X i=1 Var(fi) > p N X i=1 Var(fi), (36)

where p is a proportionality constant in (0, 1), and is very close to 1. For simplicity in our applications, we set

DT = D2, for T ≥ 3

in this work. Note however DT can certainly be further reduced depending on problems and

objectives.

4.2 Adaptive PDD algorithm – eliminating non-important polynomials

In section4.1, we have talked about how to reduce the size of ANOVA expansion. However, even with a sparse ANOVA expansion, if applying the classical PDD expansion to the component function (see the formulation (23) and (24)), the required computational cost still remains very

(18)

high. Indeed, it reveals that, for a large number of engineering problems, the contribution of many polynomial terms is negligible when regarding the accuracy of the construncted meta-model [5, 4]. In this work on the other hand, as will be shown in our numerical results, if we eliminate those polynomials whose variance is negligible, we can also build a very sparse PDD representation without compromising the accuracy of the meta-model.

Considering the adaptive ANOVA technique presented in Section 4.1, let us describe our adaptive algorithm for stepwise regression from a practical point of view as follows.

1. First of all, we construct a full set of PDD representation (given m) for all first-order ANOVA component functions, namely

f (x) ≈ fm(x) = f0+ N X i=1 fi(xi) = f0+ N X i=1   m X j=1 Cijψj(xi)   . (37)

We then compute the total first-order variance by

Var(fm(X)) = N X i=1 m X j=1 (Cj i) 2_γj i.

The first-order global sensitivity indices can be obtained by a simple rearrangement:

Si= m X j=1 (Cij) 2_γj i . Var(fm(X)).

Let us assume that the sensitivity indices {Si} are monotonically decreasing with respect

to i (thus, a re-ordering task is generally required), so we choose the active dimension D2

for second-order ANOVA functions in such a way that

D2

X

i=1

Si>p,

with p a constant close to the unity (e.g. p = 0.999). Note for the sake of simplicity in this work, we emphasize again the same active dimension is employed for second- and also higher-order component functions if applicable. However, one can further reduce this dimension if necessary.

2. The objective of this step is to reduce the size of the first-order PDD expansion as expressed in (37). The principle remains similar as in the previous step: we eliminate those non-important polynomial terms by measuring their variance contribution. More in detail, if (Cij) 2_γj i . Var(fm(X)) < θ, (38)

the corresponding polynomial is then to be removed from the expansion. Here θ is a pre-defined threshold (e.g. θ = 10−5_{). The resultant first-order model representation contains}

only significant components, and thus is more concise. Let us denote this first-order PDD base by {ψα1}.

3. Starting from the concise first-order PDD representation, the task of this step is to enrich the model representation by adding significant second and higher order PDD polynomials.

(19)

After choosing a truncation dimension ν and an active dimension for each order (D2, · · · , Dν),

the full set of second and higher order PDD polynomial bases {ψα2+} can be constructed

easily from the tensor product rule (see the formulation (24) or (25)).

The selection process of important polynomials from {ψα2+} can be explained by the simple

stepwise algorithm1.

Algorithm 1Adaptive PDD by stepwise regression (Criterion 1)

1: initialization of multivariate PDD Base: {ψw} = {ψα1}

2: for ψα_i∈ {ψα2+} do

3: add ψαi into {ψ

w_{}, i.e. {ψ}w_{} = {ψ}w_{, ψ} αi}

4: depending on the size Pw of {ψw}, adjust, if necessary, the size Qw of the experimental

design (see formulation (29))

5: solve the regression system (31) to determine the PDD expansion coefficients gCw. 6: compute the total variance: Var(fw(X)) =P_k(Cαk)

2_γ αk 7: for ψα_j ∈ {ψw} do 8: if (Cαj) 2_γ αj/Var(f w_{(X)) < θ then}

9: eliminate this polynomial: {ψw} = {ψw} \ ψαj

10: end if

11: end for

12: end for

13: solve the final regression system based on the constructed base {ψF}

14: compute the final total variance using the obtained PDD coefficients 15: compute the global (total) sensitivity indices

Let us mention in this algorithm the cost of the recursive resolution of the regression linear system is negligible compared to the one of deterministic model evaluations.

If denoting the size of the finally obtained sparse PDD representation by Psparse, we define

the sparsity of our adaptive PDD approach as follows

sparsity = Psparse

(1 + m)N. (39)

Note that the formulation (39) will be employed in our numerical applications for assessing the efficiency of the proposed approach.

4. Several polynomial chaos error estimators are presented in [5,4]. These estimators can be directly used for the sparse PDD expansions in this work. We present the estimator used in this work inA.

4.2.1 Criterion 1: by comparing the component variance of the concerned polyno-mial term

We name the operation8–9in the Algorithm1 (see also (38)) as the criterion 1 whose objective is to eliminate non-important polynomial terms. This is achieved by discarding any component polynomial whose variance is negligible compared to the total variance. The threshold θ is predefined by the user.

(20)

Ssi S T i X1 0.3138 0.5574 X2 0.4424 0.4424 X3 0 0.2436 X1, X2 0 X1, X3 0.2436 X2, X3 0 X1, X2, X3 0 V (Y ) 13.845

Table 1: Analytical variance and variance-based sensitivity indices for Ishigami function.

4.2.2 Criterion 2: by comparing the model accuracy of the concerned polynomial term

The formulation (38) and the operation 8–9 in the Algorithm 1 can be replaced by a second criterion (already used in [4]) who relies on the estimator used to evaluate the accuracy of the model representation. Briefly speaking, one eliminates those polynomials whose contribution to the model accuracy can be negligible. The technical details of the estimator of accuracy and how to implement this criterion can be found inA.

5 Numerical results

This section is devoted to the presentation of our numerical results. Two academic functions will be studied, and we will also investigate one CFD application example.

5.1 Ishigami function

Let us consider the Ishigami function [17] which has been thoroughly studied in our previous work [38] by making use of the Covariance-based Sensitivity Analysis:

Y = sin X1+ a sin2_X2_{+ bX}4

3sin X1, (40)

where the input random variables X = (X1, X2, X3) are uniformly distributed over [−π, π]. The constants are set to a = 7, b = 0.1, as done in [37,21].

As presented in [17, 37, 38], the total output variance and the component variances based on standard ANOVA expansion can be obtained analytically:

V (Y ) = a 2 8 + bπ4 5 + b2_π8 18 + 1 2, V1=bπ 4 5 + b2_π8 50 + 1 2, V2= a2 8 , V3= 0, V12= V23= 0, V13= 8b 2_π8 225 , V123= 0. (41)

Thus, the variance-based sensitivity indices can be gathered in Table1.

We set the truncation dimension ν = 3; i.e. all interactions are considered. We use the quasi-random sampling design based on Sobol’ sequences for solving the regression system. For the sake of comparison with [4], let us choose the experimental design size Q = 200, and set the

(21)

Ssi S T i X1 0.3139 0.5573 X2 0.4427 0.4427 X3 0 0.2434 X1, X2 0 X1, X3 0.2434 X2, X3 0 X1, X2, X3 0 V (Y ) 13.8304 model evaluations 200 sparse polynomial base 15

PDD order m 10

sparsity 15/1331≈0.01127 model accuracy Q2

(A) 0.999965

Table 2: Numerical results for the Ishigami test case.

PDD order m = 10. The active dimension selection technique in Section4.1is not adopted for this relatively low dimensional test case. The numerical results with the proposed approach are reported in Table2.

Comparing our results with the ones reported in [4, Table 2, last column, pp. 1223], sensitivity indices are obtained with a similar accuracy. With our approach, only 15 polynomial terms are necessary to obtain the model accuracy of 0.999965, while, in [4], 77 terms are needed to have an accuracy of 0.9999. We write the constructed surrogate polynomial model as follows to approximate the Ishigami function:

Y = 3.50078 + 2.81152 ˜Le1(X1) − 3.41737 ˜Le3(X1) + 0.649461 ˜Le5(X1)

−1.33092 ˜Le2(X2) − 5.85130 ˜Le4(X2) + 4.90081 ˜Le6(X2)

−1.39774 ˜Le8(X2) + 0.212954 ˜Le10(X2)

+5.29778 ˜Le1(X1) ˜Le2(X3) + 2.09215 ˜Le1(X1) ˜Le4(X3)

−6.46581 ˜Le3(X1) ˜Le2(X3) − 2.60564 ˜Le3(X1) ˜Le4(X3)

+1.23633 ˜Le5(X1) ˜Le2(X3) + 0.495326 ˜Le5(X1) ˜Le4(X3).

(42)

where ˜Lej(xi) represents the j−th order shifted Legendre polynomial for the variable xi with

respect to the weight function w(xi) = 1/(2π) for xi ∈ [−π, π].

Because the underlyng function in (40) is even with respect to the variable X2 and X3, the

odd polynomials related to these variables are found to be zero in (42) as expected. For the same reason, the even polynomials linked to the variable X1 have zero coefficients.

As it is known, by the analytical analysis using the standard ANOVA (see e.g. [38]), the maximum interaction order for Ishigami function is 2, as is also found in the surrogate model (42). Note thus if setting the truncation dimension ν = 2 before constructing the surrogate model, one finds the same results as in (42).

5.1.1 Sensitivity to the PDD order m with the variance criterion 1

In this section, we study the sensitivity of the proposed approach with respect to the single-variable polynomial order m, by reporting the sparsity of the surrogate representation and show-ing the convergence of the method in terms of model accuracy.

(22)

Note we fix the size of our experimental design to Q = 200. The dimension reduction technique presented in Section4.1 is not adopted; only the adaptive PDD algorithm in Section

4.2 is employed here. Note the variance selection threshold is set to θ = 10−5 _{for Criterion 1.}

The polynomial order m varies from 5 to 9. We report the results in Table 3.

SI Exact m= 5 m= 6 m= 7 m= 8 m= 9 S1 0.3138 0.2104 0.3141 0.3113 0.3136 0.3164 S2 0.4424 0.4720 0.4099 0.4040 0.4427 0.4379 S3 0 0 0 0 0 0 S12 0 0 0 0 0 0 S13 0.2436 0.1482 0.2532 0.2583 0.2437 0.2407 S23 0 0 0 0 0 0 S123 0 0.1527 0.02 0.024 0 0 ST 1 0.5574 0.5181 0.5898 0.5953 0.5573 0.5616 ST 2 0.4424 0.6414 0.4326 0.4304 0.4428 0.4430 ST 3 0.2436 0.3108 0.2736 0.2834 0.2437 0.2452 V(Y ) 13.845 22.386 13.392 14.241 13.887 14.075 sparse base 88 83 104 15 67 sparsity 88 216≈ 0.407 83 343≈ 0.242 104 512 ≈ 0.203 15 729≈ 0.021 67 1000 ≈ 0.067 model accuracy Q2 _0.618 _0.988 _0.996 _0.99987 _0.99928

Table 3: Ishigami test. Numerical results using criterion 1 by varying the polynomial order m. The experimental design size Q = 200. The model accuracy is estimated by the cross validation method.

By looking at the Table 3 together with the Table 2, we notice, for this specific case, the adaptive PDD approach with the even number of m provides better results in general than with the odd m. However, the convergence tendency when increasing m can be clearly observed.

5.1.2 Sensitivity to the target accuracy Q2

tgt with the error criterion 2

In order to evaluate the model accuracy by using the error criterion 2, we vary the target accuracy

Q2

tgt to compute the sensitivity indices. The results are reported in Table4.

SI Exact Q2 tgt= 0.9 Q 2 tgt= 0.99 Q 2 tgt= 0.999 S1 0.3138 0.2796 0.3171 0.3139 S2 0.4424 0.4607 0.4381 0.4424 S3 0 0 0 0 S12 0 0 0 0 S13 0.2436 0.2597 0.2448 0.2437 S23 0 0 0 0 S123 0 0 0 0 S1T 0.5574 0.5393 0.5619 0.5576 ST 2 0.4424 0.4607 0.4381 0.4424 ST 3 0.2436 0.2597 0.2448 0.2437 V(Y ) 13.845 13.026 13.671 13.851 model evaluations 75 390 430 PDD degree m 6 7 8 sparse base 10 14 15 sparsity 10 343 ≈0.029 14 512 ≈0.027 15 729 ≈0.021 model accuracy Q2 0.9428 0.9908 0.9996

Table 4: Ishigami test. Numerical results using criterion 2 by varying the target accuracy Q2

tgt.

(23)

The convergence of the sensitivity indices to the reference results is clearly verified when we increase the target model accuracy.

In the authors’ experience, we observe it is relatively difficult to obtain an accuracy superior to 0.9999 for this case when using the criterion 2. However, the numerical results are found to be sufficiently accurate with the conditions in the last column in Table4.

By comparing these results with the ones in Table 3, we notice that, with Criterion 2, a larger number of model evaluations are usually required to obtain the sensitivity indices with a comparable accuracy. On the other hand, it shows, with the criterion 2, we have sparser polynomial bases than with the criterion 1.

5.2 8-dimensional Sobol’ function

The second test case is devoted to an eight-dimensional Sobol’ function (see [4, Section 6.2]):

Y = f (X) = N Y k=1 f(k)(Xk), Xk ∼ U (0, 1), (43) where N = 8, f(k)_(X k) = |4Xk− 2| + ak 1 + ak , a= {1, 2, 5, 10, 20, 50, 100, 500}T_. (44)

Note the members of the input random vector X are uniformly distributed over [0, 1]. {ak, k =

1, · · · , N } are positive coefficients whose values are gathered in the vector a.

In the computation by the proposed approach, let us set the ANOVA interaction order equal to 2 for simplicity. We first set the active dimension D1 = D2 = 8 here (by imposing a large

enough p). The experimental design size is set to 150 by a quasi-random Sobol’ sequence, for the sake of comparison with the results obtained with the adaptive sparse polynomial chaos method and the Monte Carlo method reported in [4]. Our numerical results with the variance criterion 1 (the predefined threshold θ = 2 × 10−4_{) are shown in Table} ₅ _{compared to the ones in the}

reference [4].

It is noticed, by using a half number of polynomial terms with the highest degree equal to 10, the corresponding model representation by the present approach is able to provide more accurate sensitivity indices than the method presented in [4] using the sparse PC of degree 6. On the other hand, we clearly observe the advantage of our method compared to the classical MC method when looking at the corresponding number of model evaluations.

5.2.1 Sensitivity to the p constant for the selection of active dimension

In this section, we test the method sensitivity to the p constant (see Section4.1.2) which allows the efficient selection of the active dimension, and thus reducing the order of modeling difficulty. The results by varying p from 0.9 to 0.999 are reported in Table6. We can conclude, from this analysis, that the using of only the 3 most important variables (by setting p = 0.9) responsible for the second-order interactions are sufficient to provide very accurate results regarding both the sensitivity indices and the meta-model representation whose error is measured by Q2_.

5.3 Application example

The method proposed in this paper is applied to the uncertainty quantification of an atmospheric entries spacecraft case. Numerical simulation solves a set of governing equations including

(24)

mod-SI Exact Sparse PDD Sparse PC [4] Crude MC [4] (present work) S1 0.60 0.60 0.56 0.57 S2 0.27 0.25 0.22 0.29 S3 0.07 0.06 0.05 0.06 S4 0.02 0.02 0.02 0.03 S5 0.01 0.01 0.01 0.01 S6 0.00 0.00 0.00 0.01 S7 0.00 0.00 0.00 0.00 S8 0.00 0.00 0.00 0.00 ST 1 0.63 0.62 0.59 0.66 S2T 0.29 0.27 0.26 0.27 ST 3 0.08 0.08 0.10 0.08 ST 4 0.02 0.03 0.05 0.01 ST 5 0.01 0.01 0.03 0.01 S6T 0.00 0.00 0.04 0.00 ST 7 0.00 0.00 0.03 0.00 ST 8 0.00 0.00 0.03 0.00 model evaluations 150 150 100,000

sparse polynomial base 38 76

ANOVA order 2

max polynomial degree 10 (m = 5) 6

sparsity 38 1679616 ≈2.3e-5 76 3003 ≈0.02 model accuracy Q2 0.993 0.99

Table 5: 8-dimensional Sobol’ test case. D2= 8. θ = 2× 10−4as the predefined threshold for the

variance Criterion 1. Numerical results with the proposed sparse PDD approach are compared to the ones obtained in [4] (sparse PC and Crude MC).

elization of rarefied gas effects, aerothermochemistry, radiation, and the response of thermal protection materials to extreme conditions. A global overview over this problem has been stud-ied in [40].

Here, the focus is in predicting stagnation-point pressure and heat flux from freestream con-ditions, and is described by a physico-chemical model and solved by suitable numerical methods proposed by Barbante [2,43].

We use a set of physico-chemical models to simulate high temperature reacting flows, including 2D axisymmetric Navier Stokes equations and gas/surface interaction equations (see Ref. [2]). Indeed, the wall of the spacecraft acts as a catalyzer and promotes recombination of atoms. This phenomenon is modeled by a catalytic wall at radiative equilibrium, where the so-called effective catalytic recombination coefficient γ represents the proportion of gas impinging the body that will recombine. A mixture of 5 species of air is used, namely N, O, N2, O2, and NO, with chemical

mechanism due to Park [26]. Input data for the forward model are the freestream pressure p∞

and Mach number M∞, the effective catalytic recombination coefficient γ, and the gas reaction

rate coefficients kr of the chemical reactions r.

The code COSMIC developed by Barbante [2] is used, which was designed to approximate hy-personic flow models where chemical non-equilibrium effects need to be accounted for. It includes a Hybrid Upwind Splitting (HUS) scheme [13], which is an interesting attempt of combining, in a mathematically rigorous way, Flux Vector Splitting (FVS) and Flux Difference Splitting (FDS)

(25)

SI Exact p= 0.9 p= 0.99 p= 0.999 S1 0.60 0.61 0.61 0.60 S2 0.27 0.27 0.25 0.24 S3 0.07 0.06 0.06 0.07 S4 0.02 0.03 0.02 0.02 S5 0.01 0.01 0.01 0.01 S6 0.00 0.00 0.00 0.00 S7 0.00 0.00 0.00 0.00 S8 0.00 0.00 0.00 0.00 ST 1 0.63 0.63 0.64 0.62 ST 2 0.29 0.29 0.28 0.28 ST 3 0.08 0.07 0.07 0.08 S4T 0.02 0.03 0.03 0.03 ST 5 0.01 0.01 0.01 0.01 ST 6 0.00 0.00 0.01 0.01 ST 7 0.00 0.00 0.00 0.00 S8T 0.00 0.00 0.00 0.01 Active dimension D2 3 6 7

sparse polynomial base 25 40 43

ANOVA order 2 2 2

max polynomial degree 9 (m = 5) 9 (m = 5) 10 (m = 5)

sparsity 25 1679616 ≈1.5e-5 40 1679616 ≈2.4e-5 43 1679616 ≈2.6e-5 model accuracy Q2 0.982 0.987 0.992

Table 6: 8-dimensional Sobol’ test case. Experimental design size Q = 150. θ = 2 × 10−4 _as

the predefined threshold for the variance criterion 1. Numerical results with the proposed sparse PDD approach by varying p.

schemes. The design principle combines the robustness of FVS schemes in the capture of nonlin-ear waves and the accuracy of some FDS schemes in the resolution of linnonlin-ear waves. In particular, COSMIC uses the hybridization of the Van Leer scheme [20] and the Osher scheme [25] and includes a carbuncle fix.

The boundary conditions are illustrated in the left panel of Fig.1: an axisymmetric condition is imposed on the y axis (horizontal axis on Fig.1), while the wall of the body is modeled by a partially catalytic wall at radiative equilibrium. The mesh used for the computations is given in the right panel of Fig.1. Pressure and temperature iso-contours of the flow around the European EXPerimental Reentry Test-bed (EXPERT) vehicle obtained with COSMIC for input data mean values are shown in Fig.2. Note that a specific point of the trajectory of EXPERT is considered [39]. The trajectory point corresponds roughly to the chemical non-equilibrium flow conditions of Table7.

Flow conditions Altitude [km] T∞ [K] p∞ [Pa] M∞ [-] Chemical non-equilibrium 60 245.5 20.3 15.5 Table 7: Freestream conditions for one trajectory point of the EXPERT vehicle.

(26)

Figure 1: Boundary conditions (left) and mesh (right)

Uncertainties are considered on p∞, M∞, and γ, with uniform distributions detailed in

Ta-ble8. Concerning p∞ and M∞, only a priori ranges of plausible values are known. Concerning γ, the mean value corresponds roughly to the EXPERT material, while the 33% error have been

previously determined [41].

Variable Distribution Min Max

X1(p∞) [Pa] Uniform 16.3 24.3 X2(M∞) [-] Uniform 13.7 17.3 X3(γ) [-] Uniform 0.001 0.002

Table 8: Distributions of M∞, p∞, and γ

Uncertainty is also considered on the gas reaction rate coefficients krof four chemical reactions

of the dissociation reaction. For the trajectory point investigated, the dissociation reaction of molecular oxygen and nitric oxide was found important. Following the suggestion of Bose et al. [10], the uncertainty concerns only the pre-exponential factor Arof the Arrhenius rate equation: kr= ArTbrexp(−E_{r/RT ). Since the uncertainties on kr} can be quite large, it is appropriate to

consider them on a logarithmic scale; in particular, log10(kr/kr,0), with kr,0 the recommended

rate constant, is commonly assumed to vary following a normal distribution,

P (kr) ∝ exp " −1 2 log10(kr/kr,0) σrmr 2# (45)

where ±2σr(reported in Table9) defines the 95% confidence limits symmetrically bounding kr,0.

Note that the quantities of interest are the pressure pst and heat flux qst at the stagnation

(27)

Figure 2: Pressure and temperature iso-contours for input data mean values Gas reaction Distribution of log10kr σr

X4(NO + O → N + O + O) Normal 0.12

X5(NO + N → N + O + N) Normal 0.12

X6(O2+ N2→ 2O + N2) Normal 0.10 X7(O2+ O → 2O + O) Normal 0.10

Table 9: Distributions of log₁₀kr

5.4 Uncertainty quantification results

As far as this application test is concerned, 1,000 resolutions of the deterministic code are used to recursively solve linear regression systems. Let us first consider the pressure pst as the quantity

of interest and set the highest ANOVA interaction order equal to ν = 2 (the truncation dimen-sion). We further set p = 0.99999 for the selection of the active dimensions for the second-order ANOVA interaction terms. For the sake of comparison with the UQ results presented in the reference [40], let us vary the PDD polynomial order from m = 2 to m = 4. Both criteria for the selection of the most important polynomial terms are tested. The computed mean, variance and sensitivity indices are reported in Table10. Note that all first-order and total indices are presented here; concerning the second-order sensitivity indices, for conciseness, we only present the most important one measuring the interaction between p∞ and M∞. The moment results

from Table10show the method is convergent when increasing the PDD order m for both of the two selection criteria. Meanwhile, for all cases considered here, the obtained model accuracy is very high (all superior to 0.9995). p∞ and M∞are found to be the two most important

parame-ters whose first-order sensitivity measures are largely superior to those of other parameparame-ters. The second-order interaction between these two parameters is found to be non-negligible compared to other first-order sensitivity estimates. γ reveals its negligible influence on pst. We further

ob-serve the two criteria provide similar sensitivity estimates and moments. Following the numerical conditions used for this case, the Criterion 2 generally produces a sparser polynomial basis for a similar model accuracy. Note nevertheless the Criterion 2, based essentially upon the model error estimates for the selection of polynomial terms, requires more resolutions of regression problems than the Criterion 1.

(28)

pst (Criterion 1) pst (Criterion 2)

m= 2 m= 3 m= 4 m= 2 m= 3 m= 4

E_(pst) 6499.31 6499.24 6498.83 6499.31 6499.27 6499.30

V_(pst) 0.133743E+07 0.133791E+07 0.133790E+07 0.133761E+07 0.133791E+07 0.133787E+07 S1 0.411332E+00 0.411383E+00 0.411377E+00 0.411411E+00 0.411422E+00 0.411390E+00 S2 0.581267E+00 0.581192E+00 0.580948E+00 0.581190E+00 0.581178E+00 0.581124E+00 S3 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 S4 0.937678E-06 0.141060E-05 0.508373E-05 0.936357E-06 0.763570E-06 0.000000E+00 S5 0.239850E-05 0.238962E-05 0.399189E-05 0.239324E-05 0.230749E-05 0.153589E-05 S6 0.746241E-06 0.826894E-06 0.795075E-06 0.741934E-06 0.855743E-06 0.780410E-06 S7 0.110647E-06 0.164247E-05 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 S12 0.739341E-02 0.738204E-02 0.741574E-02 0.739252E-02 0.738666E-02 0.741212E-02 ST

1 0.418726E+00 0.418771E+00 0.418842E+00 0.418804E+00 0.418809E+00 0.418803E+00 ST

2 0.588663E+00 0.588578E+00 0.588393E+00 0.588585E+00 0.588569E+00 0.588547E+00 ST

3 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 ST

4 0.937678E-06 0.383797E-04 0.121003E-03 0.936357E-06 0.957319E-05 0.575515E-04 ST

5 0.518330E-05 0.238962E-05 0.126563E-03 0.474173E-05 0.230749E-05 0.148786E-04 ST

6 0.746241E-06 0.826894E-06 0.795075E-06 0.741934E-06 0.855743E-06 0.780410E-06 ST

7 0.110647E-06 0.290228E-04 0.182279E-03 0.000000E+00 0.506326E-05 0.607349E-04

D2 3 4 5 3 4 5

sparse base 14 34 79 11 14 21 sparsity ₂₁₈₇14 ≈6.4e-3 ₁₆₃₈₄34 ≈2.1e-3 ₇₈₁₂₅79 ≈1.0e-3 ₂₁₈₇11 ≈5.0e-3 ₁₆₃₈₄14 ≈8.5e-4 ₇₈₁₂₅21 ≈2.7e-4 accuracy Q2 0.999678 0.999718 0.999728 0.999680 0.999722 0.999743

Table 10: Application test. Uncertainty quantification results for pst using criterion 1 and

criterion 2 by varying the PDD polynomial order m. The truncation dimension ν = 2. The experimental design size Q = 1000. The model accuracy is estimated by the cross validation method (see A). For Criterion 1, the threshold θ = 10−7_{; for Criterion 2, the threshold ǫ}

Q2 =

10−7_.

When the heat flux quantity qst is considered, the same type of UQ analysis is realized and

the results are reported in Table 11. We set the maximum interaction order ν = 3. When

qst (Criterion 1) qst (Criterion 2)

m= 2 m= 3 m= 4 m= 2 m= 3 m= 4

E_(qst) 286927.0 286945.0 285907.0 286898.0 286947.0 287233.0

V_(qst) 0.296806E+10 0.303006E+10 0.387263E+10 0.296506E+10 0.301310E+10 0.311091E+10 S1 0.104034E+00 0.102941E+00 0.838386E-01 0.104603E+00 0.103322E+00 0.101710E+00 S2 0.874650E+00 0.872884E+00 0.712564E+00 0.876280E+00 0.877072E+00 0.872107E+00 S3 0.946742E-02 0.961702E-02 0.656856E-02 0.924117E-02 0.908686E-02 0.876759E-02 S4 0.366610E-03 0.367870E-03 0.611833E-02 0.000000E+00 0.000000E+00 0.000000E+00 S5 0.219524E-02 0.192399E-02 0.189310E-02 0.225662E-02 0.195543E-02 0.923366E-03 S6 0.200434E-03 0.139166E-02 0.151648E-02 0.000000E+00 0.000000E+00 0.000000E+00 S7 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 S12 0.588863E-02 0.616165E-02 0.624935E-02 0.588868E-02 0.597284E-02 0.642160E-02 S23 0.140176E-02 0.151489E-02 0.805465E-03 0.133857E-02 0.135844E-02 0.104616E-02 S24 / / 0.121030E-02 / / 0.117491E-02 S13 0.176827E-03 0.945794E-03 0.337197E-03 0.000000E+00 0.532325E-03 0.000000E+00 S14 / / 0.486300E-02 / / 0.000000E+00 S34 / / 0.218688E-01 / / 0.000000E+00 S123 0.161953E-02 0.225210E-02 0.734882E-02 0.392357E-03 0.699826E-03 0.178231E-02 S124 / / 0.293194E-01 / / 0.144664E-02 S234 / / 0.597520E-01 / / 0.000000E+00 S134 / / 0.557466E-01 / / 0.462055E-02 ST

1 0.111719E+00 0.112301E+00 0.187703E+00 0.110884E+00 0.110527E+00 0.115981E+00 ST

2 0.883560E+00 0.882812E+00 0.817249E+00 0.883899E+00 0.885103E+00 0.883978E+00 ST

3 0.126655E-01 0.143298E-01 0.152428E+00 0.109721E-01 0.116774E-01 0.162166E-01 ST

4 0.366610E-03 0.367870E-03 0.178878E+00 0.000000E+00 0.000000E+00 0.724210E-02 ST

5 0.219524E-02 0.192399E-02 0.189310E-02 0.225662E-02 0.195543E-02 0.923366E-03 ST

6 0.200434E-03 0.139166E-02 0.151648E-02 0.000000E+00 0.000000E+00 0.000000E+00 ST

7 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00

D2, D3 3 3 4 3 3 4

sparse base 19 31 122 11 14 18 sparsity ₂₁₈₇19 ≈8.7e-3 ₁₆₃₈₄31 ≈1.9e-3 ₇₈₁₂₅122 ≈1.6e-3 ₂₁₈₇11 ≈5.0e-3 ₁₆₃₈₄14 ≈8.5e-4 ₇₈₁₂₅18 ≈2.3e-4 accuracy Q2 0.783770 0.793848 0.805251 0.785050 0.796990 0.813363

Table 11: Application test. Uncertainty quantification results for qst using criterion 1 and

crite-rion 2 by varying the PDD polynomial order m. The truncation dimension ν = 3. The experi-mental design size Q = 1000. The model accuracy is estimated by the cross validation method (seeA). For Criterion 1, the threshold θ = 10−4_{; for Criterion 2, the threshold ǫ}

(29)

the p constant is modified to p = 0.995 for both criteria, 3 active dimensions are retained for second- and third-order interactions when the PDD order is set as m = 2 and m = 3, whilst 4 active dimensions are retained for m = 4. Concerning the computed expectation and variance, the convergence is less good than for the quantity of pst regarding both criteria. For

Criterion 1, p∞ and M∞ are still found to be the most significant parameters, while the gas

reaction O2+ O → 2O + O (X7) is negligible. Indeed, this reaction uncertainty is not taken

into consideration in the final meta-model polynomial representation, since its total sensitivity effect is also zero. Moreover, Table 11 shows that all second- and third-order interactions are non-negligible: the orders of magnitude of these sensitivity indices are comparable to those of first-order contributions (X3, X4, X5, X6). In particular, in the case of m = 4, the third-order interactions are found more important than the second-third-order ones. It is also shown that the sensitivity measure of a parameter or a group of parameters can vary significantly when a different number of active dimensions is employed. For instance, the total sensitivity indice of

X4 is about 500 times bigger with D2, D3 = 4 than with D2, D3 = 3. Finally, the accuracy of the PDD model representation for qst is found to be less good than for pst, as also mentioned

in the reference [40]. As far as the Criterion 2 is concerned, in order to obtain a similar model accuracy, a less number of polynomial terms are required. However, the model representation includes fewer uncertain parameters. For instance, X6 and X7 are excluded when using m = 4,

and X4 is additionally neglected if we set m = 2 or m = 3.

6 Conclusions

This paper aims to deal with engineering and physical problems featuring a moderate to large number of uncertain input parameters. The purpose is to identify the relative importances of these uncertainties onto a given quantity of interest. This is achieved in this work by performing global sensitivity analysis, and in particular by combining the Analysis of Variance technique (ANOVA) and the polynomial dimensional decomposition approach (PDD). Complexities present in pracitical problems usually make the global methods prohibitive due to the large uncertainty in the model output. In this paper, we have employed three levels of adaptivity to reduce the meta-modeling difficulty which read as follows:

1. We set a truncation dimension (the maximum interaction order) in the ANOVA expansion. 2. Resulting from the solution of the regression system including only the PDD terms of the first-order ANOVA component functions, a rank of importances can be established quanti-tatively for all the input parameters. Hence, we can retain so-called active dimensions (the most influential parameters) for the PDD terms of the second- and higher-order ANOVA components.

3. Starting from the PDD polynomials of the first-order components, we enrich our surrogate model representation by adding only significant polynomials of second- and higher-order functions. Two selection criteria have been utilized in this work for this purpose. We emphasize that recursive resolutions of regression problems are required for this task. The resulting surrogate polynomial approximation is a very sparse representation of the deterministic model. Since the surrogate model size is updated recursively with respect to the resolutions of regression problems subject to the enrichment of the polynomial basis, the number of the required deterministic model evaluations is well controlled, and its final value is significantly smaller than when employing a standard Monte Carlo or quasi Monte Carlo method. The

Adaptive surrogate modeling by ANOVA and sparse polynomial dimensional decomposition for global sensitivity analysis in fluids simulation

HAL Id: hal-01178398

https://hal.inria.fr/hal-01178398

Submitted on 20 Jul 2015

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

polynomial dimensional decomposition for global

sensitivity analysis in fluids simulation

Kunkun Tang, Pietro Marco Congedo, Remi Abgrall

To cite this version:

Kunkun Tang, Pietro Marco Congedo, Remi Abgrall. Adaptive surrogate modeling by ANOVA and

sparse polynomial dimensional decomposition for global sensitivity analysis in fluids simulation.

[Re-search Report] RR-8758, Inria Bordeaux Sud-Ouest. 2015. �hal-01178398�

RESEARCH

REPORT

N° 8758

Adaptive surrogate

modeling by ANOVA and

sparse polynomial

dimensional

decomposition for global

sensitivity analysis in

fluids simulation

sparse polynomial dimensional decomposition

for global sensitivity analysis in fluids

simulation

Kunkun Tang

, Pietro M. Congedo

, Rémi Abgrall

polynomial dimensional decomposition for global

sensitivity analysis in fluids simulation

Contents

1

Introduction

2

ANOVA dimensional decomposition of the model

re-sponse

2.1

Variance-based global sensitivity analysis

3

Polynomial dimensional decomposition (PDD) of the model

response

3.1

PDD representation

3.2

PDD coefficient computation by regression

3.3

PDD based global sensitivity indices

4

Variance-based dimension reduction for the model

rep-resentation

4.1

Adaptive ANOVA – retaining active dimensions

4.2

Adaptive PDD algorithm – eliminating non-important polynomials

5

Numerical results

5.1

Ishigami function

5.2

8-dimensional Sobol’ function

5.3

Application example

5.4

Uncertainty quantification results

6

Conclusions

_{, Pietro M. Congedo}

_{, Rémi Abgrall}