Model Assisted Probability of Detection curves: New statistical tools and progressive methodology

(1)

HAL Id: hal-01260335

https://hal.archives-ouvertes.fr/hal-01260335

Submitted on 22 Jan 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Model Assisted Probability of Detection curves: New

statistical tools and progressive methodology

Loïc Le Gratiet, Bertrand Iooss, Géraud Blatman, Thomas Browne, Sara

Cordeiro, Benjamin Goursaud

To cite this version:

Loïc Le Gratiet, Bertrand Iooss, Géraud Blatman, Thomas Browne, Sara Cordeiro, et al.. Model Assisted Probability of Detection curves: New statistical tools and progressive methodology. Journal of Nondestructive Evaluation, Springer Verlag, 2017, 36 (1). �hal-01260335�

(2)

Model Assisted Probability of Detection curves:

New statistical tools and progressive methodology

Lo¨ıc Le Gratiet Bertrand Iooss G´eraud Blatman

Thomas Browne Sara Cordeiro Benjamin Goursaud

(EDF, France) January 22, 2016

Abstract

The Probability Of Detection (POD) curve is a standard tool in sev-eral industries to evaluate the performance of Non Destructive Testing (NDT) procedures for the detection of harmful defects for the inspected structure. Due to new capabilities of NDT process numerical simula-tion, Model Assisted Probability of Detection (MAPOD) approaches have also been recently developed. In this paper, a generic and progres-sive MAPOD methodology is proposed. Limits and assumptions of the classical methods are enlightened, while new metamodel-based meth-ods are proposed. They allow to access to relevant information based on sensitivity analysis of MAPOD inputs. Applications are performed on Eddy Current Non Destructive Examination numerical data.

1 Introduction

In several industries, the Probability Of Detection (POD) curve is a stan-dard tool to evaluate the performance of Non Destructive Testing (NDT) procedures (Gandosi and Annis, 2010; DoD, 2009; Meyer et al, 2014). The goal is to assess the quantification of inspection capability for the detection of harmful flaws for the inspected structure. For instance, for the French company of electricity (EDF), the potentialities of this tool are studied in the context of the Eddy Current Non Destructive Examination in order to ensure integrity of steam generators tubes in nuclear power plants (Maurice et al, 2013a).

However, high costs of the implementation of experimental POD cam-paigns combined with continuous increase in the complexity of configuration

(3)

make them sometimes unaffordable. To overcome this problem, it is possible to resort to numerical simulation of NDT process (see for example Rupin et al (2014) for ultrasonics and Rosell and Persson (2013) for eddy-current). This approach has been called MAPOD for “Model Assisted Probability of Detection” (Thompson, 2008) (see also Meyer et al (2014) for a survey and Calmon (2012) for a synthetic overview).

The determination of this “numerical POD” is based on a four-step ap-proach:

1. Identify the set of parameters that significantly affect the NDT signal; 2. Attribute a specific probability distribution to each of these parameters

(for instance from expert judgment);

3. Propagate the input parameters uncertainties through the NDT nu-merical model;

4. Build the POD curve from standard approaches like the so-called Berens method (Berens, 1988).

In POD studies, two main models have been proposed: POD model for binary detection representation (using hit/miss data) and POD model for continuous response (using the values of the NDT signal). We focus in this work on POD model for continuous response, arguing that model-based data contain quantitative and precise information on the signal values that will be better exploited with this approach.

As it totally relies on a probabilistic modeling of uncertain physical vari-ables and their propagation through a model, the MAPOD approach can be directly related to the uncertainty management methodology in numer-ical simulation (see de Rocquigny et al (2008) and Baudin et al (2016) for a general point of view, and Dominguez et al (2012) for illustration in the NDT domain). This methodology proposes a generic framework of model-ing, calibratmodel-ing, propagating and prioritizing uncertainty sources through a numerical model (or computer code). Indeed, investigation of complex com-puter code experiments has remained an important challenge in all domain of science and technology, in order to make simulations as well as predictions, uncertainty analysis or sensitivity studies. In this framework, the numerical model G just writes

Y = G(X) = G(X1, . . . , Xd) , (1)

with X ∈ Rd the random input vector of dimension d and Y ∈ R a scalar model output.

(4)

However, standard uncertainty treatment techniques require many model evaluations and a major algorithmic difficulty arises when the computer code under study is too time expensive to be directly used. For instance, it happens for NDT models based on complex geometry modeling and finite-element solvers. This problem has been identified in Calmon (2012) who dis-tinguishes “semi-analytical” codes (fast to evaluate but based on simplified physics) and “full numerical” ones (physically realistic but cpu-time expen-sive) which are the models of interest in our work. For cpu-time expensive models, one solution consists in replacing the numerical model by a math-ematical approximation, called a response surface or a metamodel. Several statistical tools based on numerical design of experiments, uncertainty prop-agation efficient algorithms and metamodeling concepts will then be useful (Fang et al, 2006). They will be applied, in this paper, in the particular NDT case of a POD curve as a quantity of interest.

The physical system of interest, the numerical model parameterization and the design of numerical experiments are explained in the following sec-tion. The third section introduces four POD curves determination methods: the classical Berens method, a binomial-Berens method and two methods (polynomial chaos and kriging) based on the metamodeling of model out-puts. In the fourth section, sensitivity analysis tools are developed by using the metamodel-based approaches. A conclusion synthesizes the work with a progressive strategy for the MAPOD process, in addition to some prospects.

2 The NDT system

Our application case, shown in Figure 1, deals with the inspection by the SAX probe (an axial probe) of steam generator tubes to detect the wears, which are defects due to the rubbing of anti-vibration bars (BAV). This configuration has been studied with the software Code Carmel3D (C3D) for several years. This tool has demonstrated its ability to accurately simulate the signature of a wear with its influential parameters (mainly the BAV) (Maurice et al, 2013b).

2.1 The computer code and model parameterization

The numerical simulations are performed by C3D, computer code derived from code Carmel developed by EDF R&D and the L2EP laboratory of Lille (France). This code uses the finite element method to solve the problem. Hence, there is a large flexibility for the parameters that can be taken into account (cf. Figure 2). The accuracy of the calculations can be ensured with

(5)

Figure 1: Representation of the system under study (Tube, BAV and SAX).

a sufficiently refined mesh (Thomas et al, 2015), using HPC capabilities if necessary.

Figure 2: Illustration of the mesh in the numerical model of NDT simulation.

The eddy-current non-destructive examinations are based on the change of the induction flux in the coils of the probe approaching a defect. When the tube is perfectly cylindrical, both coils of the probe get the same flux of induction. If there is a defect, the flux are distinct and hence the differential flux, which is the difference between the flux in each coil, is non-zero: it is a complex quantity whose real part is the channel X and the imaginary part is the channel Y . Hence, when plotting the differential flux for each position of the probe, one gets a curve in the impedance plane, called a Lissajous curve. The output parameters of a non-destructive examination are (as illustrated on Figure 3) :

• the amplitude (amp), which is the largest distance between two points of the Lissajous curve,

(6)

• the phase, which is the angle between the abscissa axis and the line linking two points giving the amplitude,

• the Y -projection (P rojY ), which is the largest imaginary part of the difference between two points of the Lissajous curve.

Figure 3: Lissajous curve: output parameters of a NDT simulation for the SAX probe in differential mode (amp, phase and P rojY ).

2.2 Input parameters and associated random distributions

definition

By relying on both expert reports and data simulations, the set of the in-put parameters which can have an impact on the code outin-puts have been defined. Probabilistic models have also been proposed following deep dis-cussions between NDT experts and statisticians. N (., .) (resp. U [., .]) stands for Gaussian (resp. uniform) law. These parameters are the following (see Fig. 4):

• E ∼ N (aE, bE): pipe thickness (mm) based on data got from 5000

pipes,

• h₁∼ U [a_h₁, bh1]: first flaw height (mm),

• h2∼ U [ah2, bh2]: second flaw height (mm),

• P1∼ U [aP1, bP1]: first flaw depth (mm),

• P₂∼ U [a_P₂, bP2]: second flaw depth (mm),

• ebav₁∼ U [−P₁+ a_ebav₁, bebav1]: length of the gap between the BAV and

(7)

• ebav2∼ U [−P2+ aebav2, bebav2]: length of the gap between the BAV and

the second flaw (mm).

All these input parameters are synthesized in a single input random vector (E, h₁, h2, P1, P2, ebav1, ebav2).

As displayed in Figure 4, we consider the occurrence of one flaw on each side of the pipe due to BAV. To take this eventuality into account in the computations, 50% of the experiments are modeled with one flaw, and 50% with two flaws.

Figure 4: Illustration of the considered inputs.

2.3 Definition of the design of numerical experiments

In order to compute the output of interest with C3D, it is necessary to choose the points in the variation domain of the inputs (called the input set). This dataset, called “design of experiments”, has to be defined at the very beginning of the study, which is to say before any numerical simulation. A classical method consists in building the design of experiments by randomly picking different points of the input set, obtaining a so-called Monte Carlo sample. However, a random sample can lead to a design which does not properly “fill-in” the input set (Fang et al, 2006). A better idea would be to spread the numerical simulations all over the input set, in order to avoid some empty big subsets.

(8)

To this effect it is more relevant to choose the values according to a deterministic rule, such as a quasi-Monte Carlo method, for instance a Sobol’ sequence. Indeed, for a size of design N , it is proved that this design often happens to be more precise than the standard Monte Carlo method (Fang et al, 2006). Given the available computing time (several hours per model run), a Sobol’ sequence of size 100 is created, and 100 model outputs are obtained after the computer code (G) runs.

3 Methods of POD curves estimation

In this section several methods (from the simplest relying on strong assump-tions to the most complex) are presented and applied. The objective is to build the POD curve as a function of the main parameter of interest, related to the defect size. As there are two defects in the system, a := max(P1, P2) is chosen as the parameter of interest.

By using the computer code C3D, one focuses on the output P rojY which is a projection of the simulated signal we would get after NDT process. The other inputs are seen as random variables, which makes P rojY itself an other random variable. The model (1) writes now

P rojY = G(a, X) (2)

with the random vector X = (E, h₁, h2, ebav1, ebav2). The effects of all the

input parameters (a, X) are displayed in Figure 5. The bold values are the correlation coefficients between the output P rojY and the corresponding input parameter. Strong influences of P₁ and P₂ on P rojY are detected.

iP 2 is the binary variable governing the presence of one flaw (iP 2= 1) or two

flaws (i_{P 2}= 2).

Given a threshold s > 0, a flaw is considered to be detected if P rojY > s. Therefore the one dimensional POD curve is denoted by:

∀a > 0 _{POD(a) = P (G(a, X) > s | a) .} (3) Four different regression models of P rojY are proposed in the following, in order to build an estimation of the POD curve. Numerical simulations are computed for the N = 100 points of the design of experiments.

3.1 Data linearization step

All the POD methods consist in a (linear or non-linear) regression of the output P rojY . Then, a data linearization is useful to improve the adequacy

(9)

Figure 5: P rojY with respect to the input parameters. On each plot, the solid curve is a local polynomial smoother and the upper number is the corresponding correlation coefficient between the input (in abscissa) and

P rojY (in ordinate).

of the models. This can be made by a Box-Cox transformation (Box and Cox, 1964) of the output, which means that we now focus on:

y =P rojY

λ_{− 1}

λ . (4)

λ is determined by maximum likelihood as the real number that offers the

finest linear regression of y regarding the parameter a (see Figure 6). The same transformation has to be applied to the detection threshold s. In the following, we keep s for the notation of this threshold. It is important to note that this transformation is useful for all the different POD methods (Dominguez et al, 2012).

3.2 Berens method (Berens, 1988)

The Berens model, based on y, is defined as

(10)

Figure 6: Model response with respect to a. Left: Initial data (P rojY as response); Right: Linearized data (yP rojY as response) by Box-Cox

trans-formation with parameter λ = 0.3 of the response P rojY .

with the model error such as ∼ N 0, σ2

. Maximum likelihood method provides the estimators ˆβ0, ˆβ1and ˆσ. Hence the model implies the following

result: ∀a > 0, y(a) ∼ Nβˆ0+ ˆβ1a, ˆσ2

. On our data, we obtain ˆβ0= 2.52,

ˆ

β1 = 43.48 and ˆσ = 1.95, which leads to the linear model represented in

Figure 7.

With the normality hypothesis, as displayed in Figure 7, the values of the POD curve can be easily estimated, giving the POD curve of Figure 8. By considering the error that is provided by the property of a maximum like-lihood estimator in a case of a linear regression, we can use this uncertainty on both β0and β1to build non-asymptotic confidence intervals. Indeed, the

Gaussian hypothesis on makes it possible to obtain the prediction law of

β0 and β1 conditionally to σ2 : β0 β1 ! ∼ N β =ˆ ˆ β0 ˆ β1 ! , σ2XTX−1 ! , (6)

with X the data input matrix:

X =       1 a1 1 a2 .. . ... 1 aN       .

(11)

Figure 7: Linear model illustration. The Gaussian predictive distributions for a = max (P1, P2) = 0.2, 0.3 and 0.4 are given. The horizontal line repre-sents the detection threshold s.

Classical results on linear regression theory state that the variance σ2 follows a chi-2 distribution with N − 2 degrees of freedom:

(N − 2) ˆσ2 σ2 ∼ χ2_{N −2}, (7) where ˆ σ2= yN− X ˆβTyN− X ˆβ N − 2 , (8) with yN _{= (y(a}

1), . . . , y(aN)) the data output sample. Then, we can obtain

a sample (β0, β1, σ2) by simulating σ2 then β0

β1

conditionally to σ2. From this sample, we get a sample of POD(a) via the formula:

1 − Φ ((s − β0− β1a)/σ) , (9)

where Φ is the standard Gaussian distribution. By simulating a large num-ber of POD samples, we can deduce some confidence intervals. The 95%-confidence lower bound of the POD curve is illustrated in Figure 8.

From the estimated POD of Figure 8, we obtain a90 ' 0.30 mm for

the defect size detectable with a 90%-probability. Taking into account the confidence interval, we obtain a_90/95' 0.31 mm for the minimal defect size detectable with a 90%-probability with a 95%-confidence. This means that the defect size that we detect in 90% of cases has a 95%-probability to be smaller than 0.31 mm.

(12)

Figure 8: Results of Berens method: POD curve estimation (solid curve) and POD lower curve (dashed curve) of the POD 95%-confidence interval.

In conclusion, we remind that the Berens method stands on two hy-potheses that have to be validated:

• the linearity relation between y and a (after the Box-Cox transforma-tion) that can be studied via classical linear regression residuals anal-ysis (Christensen, 1990). On our data, we have for instance R2_{= 88%}

for the regression coefficient of determination, indicator which denotes the explained variance of the linear regression;

• the Gaussian distribution, homoscedasticity and independence of the residuals that can be studied via many statistical tests (see for in-stance Walter and Pronzato (1997)). On our data, we have the fol-lowing p-values: 0.62 for Kolmogorov-Smirnov test (Gaussian distri-bution), 0.10 for Anderson-Darling test (Gaussian distridistri-bution), 0.82 fo Breusch-Pagan test (homoscedasticity) and 0.12 for Durbin-Watson test (non correlation). We conclude that, with a 90%-confidence, the homoscedasticity and non-correlation hypotheses of cannot be re-jected, but the normality hypothesis of can be rejected.

3.3 Binomial-Berens method

Here we keep the linear regression on y, which is: ∀a > 0 y = ˆβ0+ ˆβ1a+ but

(13)

assumed to be independent and identically distributed. We then consider that we have N of its realizations which we regroup in the following vector

N = yN− ˆβ0− ˆβ1aN. (10)

Therefore we build its histogram and we add it to the prediction of the linear model as shown in Figure 9. By using the i.i.d. property of , let us consider that we have N realizations of the random value y(a) for a > 0. We propose to use them to estimate the probability for y(a) to exceed the threshold s (see Figure 9).

Figure 9: Binomial-Berens method: Berens method without normal hy-pothesis. The Gaussian densities are replaced by the sample histogram. The horizontal line represents the detection threshold s.

For each a > 0, let Ns(a) be the number of realizations of the random

variable y(a) that are higher than s. That is to say:

Ns(a) = Card

n

(_i)_{i∈{1,...,N }}| ˆβ0+ ˆβ1a + i> s

o

. (11)

Therefore an estimation of POD(a) is given byNs(a)

N , with Ns(a) ∼ B (N, POD(a)),

with B the binomial probability law. The assumption on N_s(a) distribution can then be used to build confidence intervals on the value of POD(a), for

a > 0.

Let us note that the Binomial-Berens method only requires to validate the linear relation between y and a. For the 90%-level defect, we obtain

(14)

a90' 0.30 mm and a_90/95' 0.305 mm. A slight difference with the classical Berens method is present for a_90/95.

3.4 Polynomial chaos method

As some criticism could be made at some point regarding the simplistic linear model of equation (5), let us build a metamodel (Fang et al, 2006) of the transformed output y. Now the influence of the other inputs (described in Section 2.2) are explicitly mentioned in the model whereas it used to be all included in . The model response of interest, e.g. the Y -projection, is represented as a “pure” function of X (i.e. without additional noise):

Y = G(a, X). (12)

The so-called polynomial chaos (PC) method (Soize and Ghanem, 2004; Blatman and Sudret, 2011) consists in approximating the response onto a specific basis made of orthonormal polynomials:

Y ≈ Y =b

P −1

X

j=0

ajψj(a, X) , (13)

where the ψ_j’s are the basis polynomials and the a_j’s are deterministic co-efficients which fully characterize the model response and which have to be estimated. The orthonormality property reads:

E [ψi(a, X)ψj(a, X)] = 1 if i = j else 0 . (14)

The derivation of sensitivity indices (see Section 4.1) of the response is di-reclty obtained by simple algebraic operations on the coefficients a_j. The latter are computed based on the experimental design and the associated model evaluations by least squares.

PC approximations are computed with several values for the total degree, and their accuracies are compared in terms of predictivity coefficient Q2, itself based on the leave-one-out error. The greatest accuracy is obtained with a linear approximation (i.e. with degree equal to one), with Q2= 88%. This PC representation reads:

b

Y ' 27.9 − 0.5 ψ1(E) + 11.4 ψ2(a) + 0.7 ψ3(ebav1)

+ 0.3 ψ₃(ebav₁) + 0.4 ψ₄(h₁) + 1.0 ψ₅(h₂) (mV) (15) As in the Berens model in Section 3.2, it is assumed that the approxi-mation error is a normal random variable with zero mean and standard

(15)

deviation equal to σ, that is:

Y ≈ Y + ,b  ∼ N (0, σ2) (16)

Thus the POD associated with a given defect size a can be approximated by:

POD(a) = P(Y > s | a) = PY (a, X) + > sb

. (17) For any value of a, this probability is estimated by Monte Carlo simulation of the random quantities X and (104 random values are drawn).

Note that this estimate relies upon the assumption that the chaos co-efficients are perfectly calculated. However, their estimation is affected by uncertainty due to the approximation error (1 − Q2= 12% of unexplained variance of the Y -projection) and the limited number of available evalua-tions of C3D. As for the Berens model, standard theorems related to linear regression hold for the PC expansions and can be used to define the probabil-ity distribution of the chaos coefficients and the residual standard deviation

σ. Based on these results, 150 sets of both quantities are randomly

gen-erated and each realization is used to compute the POD (Eq.(17)). Hence, for any a, a sample of 150 values of POD(a) is obtained. We computed its 5%-empirical quantile in order to construct the 95%-POD curve. The aver-age and the 95%-POD curves are plotted in Figure 10. The characteristic defect sizes (defined in the previous sections) are given by a₉₀' 0.30 mm and a_90/95' 0.32 mm. 0.1 0.2 0.3 0.4 0.5 Defect size (mm) 0.0 0.2 0.4 0.6 0.8 1.0 Probability of detection (POD) Mean curve 95%curve

Figure 10: Average and 95%-POD curves based on a PC approximation

It has to be noted that the chaos results are closed to the ones obtained by the Berens approach. Indeed, the PC representation (15) is similar to the

(16)

Berens model (5) as all the coefficients except the mean value and the factor related to a are relatively insignificant in our application case. Furthermore, it is also supposed that the residuals are independent realizations of a normal random variable. As discussed previously, this assumption can be rejected by statistical tests. Another kind of metamodel, namely kriging, is based on the weaker and more realistic assumption of correlated normal residuals (the correlation between two model evaluations increases as the related inputs get closer). This is the scope of the next section.

3.5 Kriging method

We turn now to a probabilistic metamodel technique, which is the Gaussian process regression (Sacks et al, 1989), first proposed by Demeyer et al (2012) for POD estimation. Since the linear trend used in the Berens method was rather relevant, we keep it as the mean of the Gaussian process that we are about to use. The kriging model is defined as follows:

Y (a, X) = β0+ β1a + Z(a, X), (18)

where Z is a centered Gaussian process. We make the assumption that

Z is second order stationary with variance σ2 and covariance Mat´ern 5/2 parameterized by its lengthscale θ (θ ∈ R6 in our application case). Thanks to the maximum likelihood method, we can estimate the values of the so far-unknown parameters: β₀, β1, σ2 and θ (see for instance Marrel et al (2008)

for more details).

Kriging provides an estimator of Y (a, X) which is called the kriging pre-dictor and written Yc_P(a, X). On our data, we compute the predictivity

coefficient Q2 in order to quantify the prediction capabilities of this meta-model (Marrel et al, 2008). We obtain Q2= 90%.

In addition to the kriging predictor, the kriging variance σ_Y2(a, X) quan-tifies the uncertainty induced by estimating Y_P(a, X) with Yc_P(a, X). Thus,

we have the following predictive distribution:

∀x Y (a, X) | yN∼ NYc_P(a, X), σ_Y2(a, X)

(19)

whereYc_P(a, X) (the kriging mean) and σ2_Y(a, X) (the kriging variance) can

both be explicitly estimated.

Obtaining the POD curve consists in replacing Y = G(a, X) by its kriging metamodel (19) in (3). Hence we can estimate the value of POD(a), for a > 0 from:

(17)

Two sources of uncertainty have to be taken into account in (20): the first coming from the parameter X and the second coming from the Gaussian distribution in (19). From (20), the following estimate for POD(a) can be deduced: POD(a) = EX " 1 − Φ s −YcP(a, X) σY(a, X) !# . (21)

This expectation is estimated using a classical Monte Carlo integration pro-cedure.

By using the uncertainty implied by the Gaussian distribution regres-sions, one can build new confidence intervals as it is illustrated in Figure 11. We visualize the confidence interval induced by the Monte Carlo (MC) estimation, the one induced by the kriging (PG) approximation and the to-tal confidence interval (including both approximations: PG+MC). For the 90%-level defect, we obtain a₉₀' 0.305 mm and a_90/95' 0.315 mm.

Figure 11: Example of POD curves estimated with a kriging model.

The four methods discussed in this section have given somewhat similar results. This will be discussed in the conclusion of this paper, which also introduces a general and methodological point of view for the numerical POD determination.

(18)

4 Sensitivity analysis on POD curve

Sensitivity analysis allows to determine those parameters that mostly influ-ence on model response. In particular, global sensitivity analysis methods (see Iooss and Lemaˆıtre (2015) for a recent review) take into account the overall uncertainty ranges of the model input parameters. In this section, we propose new global sensitivity indices attached to the whole POD curve. We focus on the variance-based sensitivity indices, also called Sobol’ indices, which are the most popular tools and were proved robust, interpretable and efficient.

4.1 Sobol’ indices on scalar model output

If all its inputs are independent and E(Y2) < ∞, the variance of the numer-ical model Y = G(X1, . . . , Xd) can be decomposed in the following sum:

Var(Y ) = V = d X i=1 Vi+ X i<j Vij+ . . . + V1...d (22)

with Vi= Var[E(Y |Xi)], Vij = Var[E(Y |XiXj)] − Vi− Vj, etc. Then, ∀i, j =

1 . . . d, i < j, the Sobol’ indices of X_i write (Sobol, 1993):

Si=

Vi

V , Sij = Vij

V , . . . , and Ti= Si+ Sij+ . . . . (23)

The first-order Sobol’ index Si measures the individual effect of the input

Xi on the variance of the output Y , while the total Sobol’ index Ti measures

the X_i effect and all the interaction effects between X_i and the other inputs (as the second-order effect S_ij). T_i can be rewritten as T_i= 1 −V−i

V with V−i= Var[E(Y |X−i)] and X−i the vector of all inputs except Xi.

These indices are interpreted in terms of percentage of influence of the different inputs on the model output uncertainty (measured by its variance). They have been proven to be useful in many engineering studies involving numerical simulation models (de Rocquigny et al, 2008).

(19)

4.2 Sobol’ indices on POD

In order to define similar sensitivity indices for the whole POD curve, we first define the following quantities:

POD_X(a) = _{P(Y > s | a, X) ,} PODXi(a) = P(Y > s | a, Xi) , POD_X_−i(a) = _{P(Y > s | a, X}−i) ,

D = _{EkPOD(a) − POD}_X(a)k2

(24)

with k.k the euclidean norm. The POD Sobol’ indices are then defined by:

SPOD

i =

EkPOD(a) − PODXi(a)k

2

D ,

TPOD

i =

EkPODX(a) − PODX−i(a)k

2

D .

(25)

These POD Sobol’ indices are easily computed with the metamodels. In particular, the kriging metamodel allows one to replace P(Y > s | a) by the expectation EX

"

1 − Φ s −YcP(a, X)

σ2_Y(a, X)

!#

in the POD expressions of (24). Figure 12 gives the sensitivity analysis results on our data. We find that the POD curve is mainly influenced by ebav1 parameter, with smaller effects

of ebav₂ and h₁₂ parameters. As the first-order and total Sobol’ indices strongly differ, we know that the main contributions come from interactions between these three influent parameters. From an engineering point of view, working on the uncertainty reduction of ebav₁ is a priority in order to reduce the POD uncertainty.

4.3 Sobol’ indices for a specific defect size or probability

The POD Sobol’ indices quantify the sensitivity of each input on the overall POD curve. However, we could be interested in the sensitivities on the detection probability at a specific defect size a. As it is a scalar value, this can be directly done by replacing Y by POD_X(a) in all the equations of Section 4.1.

If we are now interested by the sensitivities on the defect size at a specific probability detection, we have to study the inverse function of the POD: POD−1_X (p) with p a given probability. Similarly to the previous case, the defect size Sobol’ indices can be obtained by replacing Y by POD−1_X (p) in all the equations of Section 4.1. Figure 13 displays these sensitivity indices on our data for p = 0.90. We conclude that a₉₀is mainly influenced by ebav₁

(20)

E ebav1 ebav2 h11 h12 S 0.0 0.2 0.4 0.6 0.8 1.0

Main effect sensitivity indices

0.0 0.2 0.4 0.6 0.8 1.0 E ebav1 ebav2 h11 h12 T 0.0 0.2 0.4 0.6 0.8 1.0

Total effect sensitivity indices

0.0 0.2 0.4 0.6 0.8 1.0

Figure 12: POD first order (left) and POD total (right) Sobol’ indices.

parameter, with smaller effects of ebav2 and h12parameters. The influences

are similar than those of the POD curve.

E ebav1 ebav2 h11 h12 S 0.0 0.2 0.4 0.6 0.8 1.0

a90 − Main effect sensitivity indices

0.0 0.2 0.4 0.6 0.8 1.0 E ebav1 ebav2 h11 h12 S 0.0 0.2 0.4 0.6 0.8 1.0

a90 − Total effect sensitivity indices

0.0 0.2 0.4 0.6 0.8 1.0

(21)

5 Conclusions

This paper has presented four different techniques for POD curves deter-mination (flaw detection probability), valuable over a wide range of NDT procedures. As part of this study, we focus on the examination under wear anti-vibration bars of steam generator tubes with simulations performed by the finite-element computer code C3D. The model parameterization and the design of numerical experiments have been firstly explained.

Based on these methods of POD curves (and associated confidence in-tervals) determination, a general methodology is proposed in Figure 14. It consists in a progressive application of the following methods:

1. the Berens method, based on a linear regression model, and requiring normality assumption on regression residuals;

2. the Binomial-Berens method which relaxes the normality hypothesis;

3. the polynomial chaos metamodel which does not require the linearity assumption but requires normal metamodel residuals;

4. the kriging metamodel.

Other techniques, not discussed here, could be introduced in this scheme, as the quantile regression used in Dominguez et al (2012) to relax Berens’ hypothesis on the residuals distribution, or bootstrap-based alternatives.

The results of these four techniques in terms of the estimation of the defect size detectable with a 90%-probability (a₉₀) and its 95%-lower bound (a_90/95) are synthesized in Table 1. While a90 is rather unchanged, we

observe slight variations on a_90/95 between the different methods.

Berens Binomial-Berens Polynomial chaos Kriging

a90 0.30 0.30 0.30 0.305

a_90/95 0.31 0.305 0.32 0.315

Table 1: Synthesis of results for detectable defect sizes (in mm) with the four methods of the POD methodology.

From the metamodel-based techniques, variance-based sensitivity anal-ysis can also be performed in order to quantify the effect of each input on the POD curve. Other sensitivity analysis methods devoted to POD curves allow to quantify the effects of the modifications of each input distribution.

(22)

Figure 14: General and progressive MAPOD methodology.

For example, the Perturbation-Law based sensitivity Indices (Lemaˆıtre et al, 2015) would be the subject of a further work. Finally, an iterative process can be applied to choose new simulation points in order to improve the metamodels predictivity or to reduce the POD confidence interval (see Fig. 14). These metamodel-based sequential procedures have not been discussed in the present paper.

It is important to note that the obtained POD curves are based on a probabilistic modeling of system input parameters that has to be validated. Moreover, the initial simple model (1) does not fully represent the reality, and taking into account the numerical model uncertainty is an important task (Aldrin et al, 2013). Additional noise as reproducibility noise and measurement errors have also to be added. Solutions for this problem, based on random POD models, are currently under study (Browne and Fort, 2015).

(23)

6 Acknowledgements

Part of this work has been backed by French National Research Agency (ANR) through project ByPASS ANR-13-MONU-0011. All the calculations were performed by using the OpenTURNS software (Baudin et al, 2016). We are grateful to L´ea Maurice for initial works on this subject, as Pierre-Emile Lhuillier, Pierre Thomas, Fran¸cois Billy, Pierre Calmon, Vincent Feuillard and Nabil Rachdi for helpful discussions. Thanks to Dominique Thai-Van who provided a first version of Figure 14.

References

Aldrin J, Knopp J, Sabbagh H (2013) Bayesian methods in probability of de-tection estimation and model-assisted probability of dede-tection evaluation. AIP Conference Proceedings 1511:1733–1744

Baudin M, Dutfoy A, Iooss B, Popelin A (2016) Open TURNS: An indus-trial software for uncertainty quantification in simulation. In: Ghanem R, Higdon D, Owhadi H (eds) Springer Handbook on Uncertainty Quantifi-cation, Springer, In press

Berens A (1988) NDE reliability data analysis, vol 17, Metals Handbook, 9th edition, pp 689–701

Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on Least Angle Regression. J Comp Phys 230(6):2345–2367

Box G, Cox D (1964) An analysis of transformations. Journal of the Royal Statistical Society 26:211–252

Browne T, Fort JC (2015) Redéfinition de la POD comme fonction de répartition aléatoire. Actes des 47èmes Journées de Statistiques de la SFdS, Lille, France

Calmon P (2012) Trends and stakes of NDT simulation. Journal of Non Destructive Evaluation 31:339–341

Christensen R (1990) Linear models for multivariate, time series and spatial data. Springer-Verlag

de Rocquigny E, Devictor N, Tarantola S (eds) (2008) Uncertainty in indus-trial practice. Wiley

(24)

Demeyer S, Jenson F, Dominguez N (2012) Modélisation d’un code numérique par un processus gaussien - Application au calcul d’une courbe de probabilité de dépasser un seuil. Actes des 44èmes Journées de Statis-tiques de la SFdS, Bruxelles, Belgique

DoD (2009) Department of Defense Handbook - Nondestructive evaluation system reliability assessment. Tech. Rep. MIL-HDBK-1823A, US Depart-ment of Defense (DoD), Washington, D.C.

Dominguez N, Jenson F, Feuillard V, Willaume P (2012) Simulation assisted POD of a phased array ultrasonic inspection in manufacturing. AIP Con-ference Proceedings 1430:1765–1772

Fang KT, Li R, Sudjianto A (2006) Design and modeling for computer ex-periments. Chapman & Hall/CRC

Gandosi L, Annis C (2010) Probability of detection curves: Statistical best-practice. Tech. Rep. 41, European Network for Inspection and Qualifica-tion (ENIQ), Luxembourg, Germany

Iooss B, Lemaˆıtre P (2015) A review on global sensitivity analysis methods. In: Meloni C, Dellino G (eds) Uncertainty management in Simulation-Optimization of Complex Systems: Algorithms and Applica-tions, Springer

Lemaˆıtre P, Sergienko E, Arnaud A, Bousquet N, Gamboa F, Iooss B (2015) Density modification based reliability sensitivity analysis. Journal of Sta-tistical Computation and Simulation 85:1200–1223

Marrel A, Iooss B, Van Dorpe F, Volkova E (2008) An efficient methodology for modeling complex computer codes with Gaussian processes. Compu-tational Statistics and Data Analysis 52:4731–4744

Maurice L, Costan V, Guillot E, Thomas P (2013a) Eddy current NDE performance demonstrations using simulation tools. AIP Conference Pro-ceedings 1511:464–471

Maurice L, Costan V, Thomas P (2013b) Axial probe eddy current inspec-tion of steam generator tubes near anti-vibrainspec-tion bars: performance eval-uation using finite element modeling. Proceedings of JRC-NDE, Cannes, France pp 638–644

(25)

Meyer R, Crawford S, Lareau J, Anderson M (2014) Review of literature for Model Assisted Probability of Detection. Tech. Rep. PNNL-23714, Pacific Northwest National Laboratory, USA

Rosell A, Persson G (2013) Model based capability assessment of an au-tomated eddy current inspection procedure on flat surfaces. Research in Nondestructive Evaluation 24:154–176

Rupin F, Blatman G, Lacaze S, Fouquet T, Chassignole B (2014) Probabilis-tic approaches to compute uncertainty intervals and sensitivity factors of ultrasonic simulations of a weld inspection. Ultrasonics 54:1037–1046

Sacks J, Welch W, Mitchell T, Wynn H (1989) Design and analysis of com-puter experiments. Statistical Science 4:409–435

Sobol I (1993) Sensitivity estimates for non linear mathematical models. Mathematical Modelling and Computational Experiments 1:407–414

Soize C, Ghanem R (2004) Physical systems with random uncertainties: chaos representations with arbitrary probability measure. SIAM J Sci Comput 26(2):395–410

Thomas P, Goursaud B, Maurice L, Cordeiro S (2015) Eddy-current non destructive testing with the finite element tool Code Carmel3D. 11th In-ternational Conference on Non destructive Evaluation, Jeju, Korea

Thompson R (2008) A unified approach to the Model-Assisted determina-tion of Probability of Detecdetermina-tion. In: 34th Annual Review of Progress in Quantitative Nondestructive evaluation, July 2007, Golden, Colorado, American Institute of Physics, Melville, New-York, pp 1685–1692

Walter E, Pronzato L (1997) Identification of parametric models from ex-perimental data. Springer-Verlag