Inversion and uncertainty of highly parameterized models in a Bayesian framework by sampling the maximal conditional posterior distribution of parameters

(1)

HAL Id: hal-01098506

https://hal.archives-ouvertes.fr/hal-01098506

Submitted on 26 Dec 2014

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de

Inversion and uncertainty of highly parameterized

models in a Bayesian framework by sampling the

maximal conditional posterior distribution of parameters

Thierry A. Mara, Noura Fajraoui, Anis Younes, Frederick Delay

To cite this version:

Thierry A. Mara, Noura Fajraoui, Anis Younes, Frederick Delay. Inversion and uncertainty of highly parameterized models in a Bayesian framework by sampling the maximal conditional pos-terior distribution of parameters . Advances in Water Resources, Elsevier, 2015, 76, pp.1 - 10. �10.1016/j.advwatres.2014.11.013�. �hal-01098506�

(2)

Inversion and uncertainty of highly parameterized

models in a Bayesian framework by sampling the

maximal conditional posterior distribution of

parameters

∗

Thierry A. Maraa,∗_{, Noura Fajraoui}b_{, Anis Younes}b_{, Frederick Delay}b

a

PIMENT, EA 4518, Université de La Réunion, FST, 15 Avenue René Cassin, 97715 Saint-Denis, Réunion

b

LHyGeS, UMR-CNRS 7517, Université de Strasbourg/EOST, 1 rue Blessig, 67084 Strasbourg, France

Abstract

We introduce the concept of Maximal Conditional Posterior Distribution (MCPD) to assess the uncertainty of model parameters in a Bayesian frame-work. Although, Markov Chains Monte Carlo (MCMC) methods are par-ticularly suited for this task, they become challenging with highly parame-terized nonlinear models. The MCPD represents the conditional probability distribution function of a given parameter knowing that the other param-eters maximize the conditional posterior density function. Unlike MCMC which accepts or rejects solutions sampled in the parameter space, MCPD is calculated through several optimization processes. Model inversion using MCPD algorithm is particularly useful for highly parameterized problems because calculations are independent. Consequently, they can be evaluated simultaneously with a multi-core computer. In the present work, the MCPD approach is applied to invert a 2D stochastic groundwater flow problem where

∗_{Corresponding author: mara@univ-reunion.fr}

∗_{Published in Advances in Water Resources, Vol.} _{76, pp.} _{1-10, 2014,} doi:10.1016/j.advwatres.2014.11.013

(3)

the log-transmissivity field of the medium is inferred from scarce and noisy data. For this purpose, the stochastic field is expanded onto a set of or-thogonal functions using a Karhunen-Loève (KL) transformation. Though the prior guess on the stochastic structure (covariance) of the transmissiv-ity field is erroneous, the MCPD inference of the KL coefficients is able to extract relevant inverse solutions.

Keywords: Inverse modeling, Bayesian parameter estimation, Model parameter identification, Highly parameterized model, Heterogeneous transmissivity field, Karhunen-Loève expansion.

(4)

Contents

1 Introduction 4

2 Bayesian inference 6

3 Maximal conditional posterior distribution 7

3.1 The concept . . . 7

3.2 MCPD assessment . . . 9

3.3 Predictive density of an MCPD sample . . . 11

4 Inference of a multimodal distribution with MCPDs 12 5 Inverting a stochastic field with MCPDs 15 5.1 Karhunen-Loève expansion . . . 15

5.2 Problem statement . . . 17

5.3 Posterior distribution . . . 18

5.4 Results and discussion . . . 21

(5)

1. Introduction

Models are tools on which environmental risk-assessment and decision-making strategies can rely, provided it is proved that the models are relevant to the problem under investigation. This relevance can be addressed by facing a model prediction to observation data knowing that the whole procedure also requires assigning model parameter values. Some parameters can be directly measured while some others ought to be indirectly estimated by comparing model predictions with observations. The present work addresses the issue of parameter identification for highly parameterized models. The notion of identification encompasses seeking the parameter values and assessing the uncertainty on parameters and on model predictions.

During the past two decades, the increasing power of computers was con-ducive to emphasize and promote the so-called Bayesian parameter estima-tion techniques. In essence, the Bayesian framework leads to the definiestima-tion of the parameter joint posterior probability density function (pdf), for instance inferred by means of Markov Chain Monte Carlo (MCMC) samplings ([1–4]). The notion of posterior pdf is associated with the fact that the parameter’s pdf is conditioned both on plausible (prior) parameter values and on ob-servation data. MCMC provides draws directly sampled from the posterior pdf which leads to exploration of the plausible areas in the parameter space. The Bayesian estimation using MCMC has been subject to many develop-ments and improvedevelop-ments during the last decade (e.g. [5–8] among others). However, MCMC samplers remain computationally expensive because many draws are rejected by the statistical test embedded in the sampler. Further-more, with MCMC, the parameters marginal posterior distributions cannot be investigated independently. Recently, several strategies have been pro-posed to increase MCMC efficiency (see [9–13]).

(6)

In the present work we propose a new method, partly grounded in op-timization techniques, to cope with the identification of model parameters. The first step of this approach is to seek all the probable local optima of the joint posterior pdf of the whole set of parameters (including the maximum a posteriori estimate). Next, several maximizations of the conditional pdf are performed for different prescribed values of one selected parameter. The values assigned to this parameter are picked from a range around its optimal value(s). The value of the other parameters is investigated by maximizing the conditional pdf. This provides what we call the Maximal Conditional Poste-rior Distribution (MCPD) of the selected parameter. It actually corresponds to a discrete approximation of the pdf of a single parameter conditioned on data such that the conditional pdf is maximized.

The MCPD returns information about the model parameter values sup-ported by the data and any correlations between parameters. The MCPD sample also allows uncertainty bounds to be assigned to the model predic-tions. The main advantage of the approach is that MCPD inferences for different parameters are independent and can be evaluated simultaneously by easily distributing the calculations over a multi-core computer (or several computers). This feature drastically decreases the computation time and makes the inversion of highly parameterized problems feasible.

The main topics addressed in the present paper are organized as follows. A short outline on inverse modeling within a Bayesian framework is proposed in Section 2, and then followed by the details on the MCPD sampling in Section

3. The first exercise testing the MCPD approach is proposed in Section 4

and addresses the ability of the sampler to retrieve a multimodal probability density function. The second test in Section5applies the MPCD approach to identify the Karhunen-Loève expansion ([14]) of a stochastic transmissivity

(7)

field for a two-dimensional steady-state groundwater flow problem.

2. Bayesian inference

In inverse modeling, the parameter set (of size s) θ = {θ1, . . . , θs} of a

given model is estimated from a set of observation data d. In the following, we assume that the model does not suffer from misconceptions. The model is therefore supposed to be exact regarding the processes and the system that it mimics. However, observation data remain uncertain (random variables) making the model parameters to be also random and characterized by a joint probability density function p(θ). We denote by Ωi the probable prior

un-certainty range of θi. In a Bayesian framework, the parameter joint posterior

pdf is defined by

p(θ|d) = p(d|θ)p(θ)

p(d) (1)

where p(d) is a scaling factor called evidence, p(θ) is the prior density cor-responding to a first guess on parameters before collecting the observations, while p(d|θ) is called the likelihood function measuring how well the model describes the data.

The parameter set that maximizes Eq. (1)

θM AP _{= arg max}

θ p(θ|d) (2)

is called the maximum a posteriori estimate. It is the most probable param-eter set given our knowledge about the system (i.e. the data d and the prior pdf of the parameters p(θ)) and it is sought by appropriate optimization algorithms (e.g., descent methods, evolutionary algorithms, etc...). Unfor-tunately, finding θM AP does not allow to (fully) characterize the posterior uncertainty of the parameters (except for linear models, see [15]). This un-certainty should be assessed by calculating the marginal posterior density for

(8)

each parameter, defined as follows p(θi|d) =

Z

p(θi, θ−i|d)dθ−i, ∀i = 1, . . . , s (3)

where θ−i represents the vector of parameters θ without θi. The integral in

(3) can be approximated by a multidimensional quadrature method or by a sampling-based method such as the Markov Chain Monte Carlo (MCMC). Nevertheless, the computational effort can be prohibitive and sometimes un-affordable for problems with a large number of parameters.

In the present work, we propose an optimization-based method in order to assess the parameter uncertainty for models post-conditioned on avail-able observation data. For this purpose, we introduce the concept of max-imal conditional posterior distribution. One could raise that relying on an optimization-based method will require solving many problems, as is classical with standard inversion techniques when obtaining a large set of solutions is contemplated. As shown hereafter, the maximal conditional posterior distri-bution has some specific features diminishing the calculation loads.

3. Maximal conditional posterior distribution

3.1. The concept

We define the maximal conditional posterior distribution (MCPD) of θi

as

Pi(θi) = max

θ −i

(p(θ−i|d, θi)) × p(θi|d) (4)

Pi(θi) is interpreted as the posterior probability function that maximizes

the conditional posterior distribution p(θ−i|d, θi) and encompasses the MAP

probability (i.e. Pi(θM APi ) = p(θ

M AP_{|d)). By using the Bayes theorem,}

one can write, max (p(θ−i|d, θi)) × p(θi|d) = max (p(θi|θ−i, d) × p(θ−i|d)).

(9)

parameter θi, knowing that the other parameters θ−i are at their optimal

values. The MCPD of θi is assessed in a discrete form by sampling Eq. (4).

A parameter θi is frozen at a prescribed value and the other parameters

θ_−i _{are optimized to find (according to the Bayesian definition) the maximal} probability of these parameters. Changing the prescribed value of θi allows

scanning the distribution of θi. In practice, the sampled values of θi (denoted

below θ∗

i) are picked around the MAP estimate θM APi (estimated beforehand)

within its prior uncertainty range Ωi (see Fig. 1). This gives,

θ∗_−i_{= arg max}

θ_−i p(θ−i|d, θi = θ ∗ i) (5) Pi(θ∗i) = p(θ ∗ −i|d, θ∗i) × p(θ∗i|d) = p(θ ∗_|d) (6) On the one hand, if θi is globally identifiable, one expects that the

fur-ther the parameter value is from θM AP

i , the more Pi(θ∗i) < Pi(θM APi ). On

the other hand, if θi is not identifiable, varying its sampled values θ∗i, will

not change the joint posterior distribution in (1). Then, max (p(θ−i|d, θi)) =

p(θM AP_−i |d) and the MCPD of θi will be equal to its marginal prior

distribu-tion (i.e. Pi(θi) ∝ p(θi), see Eq. (4)). This is the case for instance of the

unimodal multi-Gausian probability density function.

In the event of multimodality of p(θ|d), θM AP is not unique or its search is hampered by the existence of local maxima. It is advisable to first acquire all the possible maxima of p(θ|d) by starting optimization processes from different locations in the parameter space (a multi-start procedure). Note that these prior searches do not ensure that all maxima will be identified. When the local maxima are known, the MCPD calculation principle evoked above is generalized by sampling θ∗

i in all the subareas of the parameter space

enclosing the local maxima. Finally, note that, by definition, the MCPD Pi(θi) and the marginal posterior density p(θi|d) are different. However,

(10)

these densities would be exactly the same whenever max (p(θ−i|d, θi)) is

constant ∀θi ∈ Ωi.

It can be questioned on the relevance of the MCPD if the latter is not the marginal posterior density. We remind that the aim of the MCPD is to evaluate uncertainties on both the parameters and the model predictions given some observation data. The key point is less to know whether one can approach the marginal posterior density rather than providing uncertainties for valuable solutions to the inverse problem. As told above, by construction an MCPD samples the distribution of a parameter θi knowing that the others

are optimal. This new way brought by the MCPD to envision the uncertainty associated with a parameter seems to be an interesting alternative to more classical definitions. One can mention however that a Markov chain Monte Carlo sampler tries to draw all the probable solutions to the inverse problem while only some of these solutions are inferred by the MCPD sampling. In some situations, this can represent a drawback of the MCPD sampling.

3.2. MCPD assessment

For the sake of clarity, the algorithm inferring the MCPD assumes that the posterior density p(θ|d) has only one mode. The occurence of multi-modal densities is just an adaptation of the procedure depicted below. The procedure starts by seeking the maximum a posteriori θM AP (Eq. (2)). Then, the algorithm sketched by the flowchart in Fig.2is launched. Calculation of MCPD for each parameter takes place in two stages. The first stage identifies the relevant range within which the parameter θi will be made to vary. For

this purpose, we define a large sampling step, for θi, e.g. ∆ × θM APi , with

∆ = 0.5. Then, θi is successively set to θ±ki = θM APi (1 ± k∆), k = 1, 2, 3, · · ·

and the associated optimization in Eq. (5) is solved. Increments of k are stopped when:

(11)

1. the two current probability values p(θ+k|d) and p(θ−k|d) decrease be-low a prescribed threshold which may be expressed thus: p(θ±k|d) < p(θM AP|d)/100; or

2. the sampled values θ±ki are beyond Ωi, i.e. the prescribed prior

uncer-tainty range on θi.

Usually, the first stopping criterion is reached before the second one, thus leading to a narrower sampling range of θi compared to its prior uncertainty

range Ωi. The second stage of the algorithm resamples the values of θi to

increase the refinement of the discrete MCPD. Before proceeding, the results from the first sampling step are re-ranked by increasing values of θi. We

denote {θki_{, P}

i(θkii) = p(θ

ki_{|d), k}

i = 1, · · · , ni} the re-ranked first sample

with θki

i < θ ki+1

i . We seek in this sample the interval where the difference

Pi(θkii+1) − Pi(θkii) is maximal, i.e., we seek the index km verifying

km= arg max ki P_i(θk_ii+1) − P_i(θk_ii) (7)

Then, the optimization in Eq. (5) is solved for θ∗ i = θkm+1 i + θ km i 2 and the pair (θ∗

i, Pi(θ∗i)) joins the set {θ ki_{, P}

i(θkii)}. The latter is then sorted again

by increasing value of θi and the search of the index km is resumed. This

procedure makes that one resamples θi and amends the set {θki, Pi(θkii)}

progressively (iteratively) at locations in the set where the MCPD is the most coarsely discretized. Usually, a few iterations on km (i.e. Nit = 10) are

enough to obtain a good discrete depiction of the whole MCPD of θi.

In the case of multimodal densities p(θ|d), the algorithm is adapted so that it repeats the first and second steps of sampling around each "optimal" θopt value (see the example in Section4).

At this stage, it is interesting to note that the optimization procedures to identify optimal vectors θ−i for prescribed values of θi are completely

(12)

independent of the optimizations identifying θ−j for prescribed values of

θj. Therefore, the MCPD samplings for each parameter of the investigated

problem are independent and can be handled easily with parallel computing streams. There can be as many sessions as the number of parameters, which strongly reduces the total computation time. Obviously, other inverse tech-niques can be parallelized, for example, the Markov Chain Monte Carlo (see [16], [17], [18]). In this case, several chains are launched in parallel to explore the parameter space. The chains will generate independent subpopulations of solutions and exchange good individuals between the subpopulations to accelerate the convergence (the benefit of some emigration between subpop-ulations). The parallel calculations are no longer independent and a master computer is needed to analyze the independent subpopulations and generate new ones. In the end, the ease of parallelization brought by the MCPD in-ference is suited to the inversion of highly parameterized problems (see the example in Section 5).

3.3. Predictive density of an MCPD sample

Let us denote by n = Ps

i=1ni the total number of MCPD draws. A

specific MCPD sample (θk, Pi(θki)), k = 1, · · · , n corresponds to a point

location on the hypersurface described by the parameter joint probability density p(θ|d). The density function of the model predictions conditioned on observed data d allows assigning uncertainty bounds to the prediction of a new observation d∗_{. If we assume that d}∗ _{is independent of d conditional}

on θ (i.e. p(d∗_{|d, θ) = p(d}∗_{|θ), see [}₁₉_{]), we can write}

p(d∗|d) = Z

p(d∗|θ)p(θ|d)dθ (8)

Given that, Pi(θki)) = p(θ k

|d), Eq. (8) is simply approximated by

ˆ p(d∗|d) = Pn k=1Pi(θki)p(d∗|θ k₎ Pn k=1Pi(θki) (9)

(13)

where p(d∗_|θk

) is the likelihood function defined in Eq. (1) evaluated at d∗

and conditioned onto θ = θk.

4. Inference of a multimodal distribution with MCPDs

The first case study deals with the ability of MCPD samplings to retrieve a three-modal probability density function with connected and disconnected modes. The test function to be retrieved is inspired from a case study pro-posed by Laloy and Vrugt who state that inferring a density function with disconnected modes by means of MCMC is very challenging ([20]). In the present work, the density function also encloses modes that differ for each parameter. The density function to retrieve has 25 parameters and is the sum of three multiGaussian density functions expressed as

p(θ) = 1 6N (µ1, 5C) + 2 6N (µ2, 5I25) + 3 6N (µ3, 5I25) (10)

where N (µi, I25) is the multiGaussian distribution of mean vector µi. I25

is the 25-dimensional identity matrix which indicates that the parameters (θ1, . . . , θ25) are independent in the second and third Gaussian distributions

in (10). C is a correlation matrix with null off-diagonal elements except for C1,11 = C11,1 = −0.5 and C1,13 = C13,1 = 0.8. These non-null terms

impose, for the first Gaussian distribution in Eq. (10), a negative correlation between θ1 and θ11 and a strong positive correlation between θ1 and θ13.

The three modes of each parameter are grouped in the vectors of means µ₁ = [−12, . . . , 12], µ₂ = [1, . . . , 25] and µ₃ = [25, . . . , 1]. Thus, θ13 has

two modes located at θ13 = 13. To compute the MCPDs, and specifically

to seek optimal vector θ∗_−i knowing the value of θ∗

i, we use the MATLAB

optimization toolbox, especially the fminunc.m function based on the so-called trust-region method. This function minimizes − log(p(θ)) and the

(14)

convergence velocity is accelerated by providing the values of the derivatives of − log(p(θ)) with respect to the θ components.

The first hurdle in this exercise is to identify the 3 local maxima of the density function. For all parameters, the maximal mean values are located in the third multiGaussian distribution with µ₃ as the vector of means. For this purpose we ran the optimization procedure twenty times, using initial solutions uniformly sampled from the parameter space. This calculation required about 6 500 evaluations of the density function (Eq. (10)). Subse-quently, MCPD sampling was carried out in the vicinity of each optimum requiring an additional computation effort of about 3 000 runs for Nit = 20

iterations of MCPDs refinements (see Section 3).

In Fig. 3 we compare the maximal conditional posterior density of some parameters to their marginal posterior density. The MCPDs were computed numerically while the marginal posterior densities were assessed analytically by computing p(θi) =R p(θ)dθ−i. Interestingly, when the modes are

discon-nected the two densities are equal (e.g. θ1) whereas they can be very different

when the modes are superimposed (e.g. θ13). This is due to the fact that the

marginal posterior pdf is an integral making that the superimposed modes are summed-up. Note that because the MCPDs are calculated by several op-timization procedures in the vicinity of each mode, the superimposed modes produce overlapping MCPD curves.

It is noticeable that with only a few point estimates of the MCPD, the densities obtained are accurate enough (e.g. with only Nit = 10 refinement

iterations, not reported in Fig. 3). The "off-diagonal" scatterplots in Fig. 3

for a column i and a row j correspond to the pairwise MCPD draws (θki

i , θ ki j ) and (θkj i , θ kj

j ). The first one corresponds to the optimal sought value of the

(15)

optimal values of θi for sampled θj. The first observation is that one can

find several distinct values of a parameter θi for the same draws of θj. This

is specific to the targeted pdf with overlapping modes. For example in Fig.3, θ₁₃ has its three modes in (0, 13, 13), i.e. two overlapping modes, whereas modes of θ1 at (-12, 1, 25) do not overlap. When seeking the MCPD of θ13,

one identifies two local optima (θ1 = 1, θ13 = 13) and (θ1 = 25, θ13 = 13),

i.e., several distinct values of θ1 (around 1 and 25, respectively) are found

for the same draws of θ13 around the value 13.

The scatterplots also prove that the MCPD sampler is able to retrieve the correlation structure of the pdf. One can easily check that, in the vicinity of the first mode (for which θ1 = 1, θ11 = 11 and θ13 = 13), a negative

correlation between (θ1, θ11) and a positive correlation between (θ1, θ13) are

observed. It can also be noted that no correlation is observed elsewhere, as assumed by the targeted density function (see Eq. (10)). Hence, the plots of θ13 versus θ11 draw orthogonal crosses (see scatterplot at row 3, column 2).

To conclude on this first exercise, we note that finding out the local op-tima is the most expensive stage. Identifying the multiple local opop-tima is performed via a multi-start procedure launching searches from different ini-tial locations in the parameter space. This procedures does not guarantee that all the local optima will be found. Our empirical experience shows how-ever that the MCPD sampling (second stage) can identify some missed modes or improve the evaluation of some others. In this first stage, it is hard to state whether an optimization technique is a mark above the others. The use of gradient-based methods or evolutionary algorithms such as genetic algo-rithms or shuffled complex evolution methods are possible choices ([21]). For the second stage dealing with MCPDs sampling, the starting points in the parameter space are not far from the optimal locations. The calculations are

(16)

fast and optimizations relying on gradient-based methods are recommended because they converge rapidly when started close to the solution. In the present example, the computation time was also strongly reduced by launch-ing independent MCPDs sampllaunch-ings uslaunch-ing 25 processors. This resulted in a cost of 260 computational time units (CTU) for the step identifying the local optima, whereas MCPDs sampling took only 120 CTU.

5. Inverting a stochastic field with MCPDs

We consider here an inverse problem with a two-dimensional random field as the unknown parameters.The two-dimensional random field of scalar val-ues is denoted by Y (x, ω) with x the location in the Euclidean bounded space D and ω a realization index of the random field equivalent to a coordi-nate in the probability space Ω. Let us also consider a model response vector G_{(Y (x, ω)) and some observation data d. The inverse problem consists of} finding an optimal estimate ˆYM AP _{of the random field and its uncertainty}

range given the data. At first glance, the problem is ill-posed because there are an infinite number of unknowns, or at least, as many unknowns as the number of grid cells discretizing the domain D for numerically evaluating G_{(Y ). Hence, the first task is to reduce the dimensionality (regarding} pa-rameters) of the problem.

5.1. Karhunen-Loève expansion

Several authors have suggested using Karhunen-Loève (KL) transforma-tion of the random field ([13, 22–24]) to reduce the dimensionality of the problem. It is assumed that the random field obeys a second-order station-ary Gaussian process, making that Y (x, ω) ∼ GP (µY, CY(x1, x2)), where

µY is the mean of the process, (x1, x2) is a pair of different locations in the

(17)

process. This covariance is a scalar continuous and positive-definite function corresponding to CY(x1, x2) = E [(Y (x1) − µY) (Y (x2) − µY)], with E[ ] the

mathematical expectation. Provided that Y (x, ω) is a real-valued random field with finite second moments, its KL expansion is

Y (x, ω) = µY + +∞

X

i=1

pλiξi(ω)ϕi(x) (11)

where the douplets (λi, ϕi(x)), ∀i ∈ N∗ are a set of eigenvalues (λ1 ≥ λ2 ≥

· · · > 0) and the associated eigenfunctions of the covariance kernel CY while

{ξi}∞_i=1 are independent Gaussian random variables of zero mean and unit

variance (ξi ∼ N (0, 1)). The doublets eigenvalues-eigenfunctions are

ob-tained by solving the Fredholm equation Z

D

CY (x1, x2) ϕi(x2)dx2 = λiϕi(x1) (12)

When ranked by decreasing values, the eigenvalues tend more or less rapidly to zero, thus allowing truncation of the KL development to the Kth

order ˆ Y (x, ω) = µY + K X i=1 pλiξi(ω)ϕi(x) (13)

where K is chosen so that PK

i=1λi ≥ (1 − ǫ)

P+∞

i=1 λi. Given that the

eigen-functions {ϕi(x)}∞i=1are continuous and form a complete orthonormal system

in L2(D), i.e.

R

Dϕi(x)ϕj(x)dx = δij with δij the Kronecker delta function,

Eq. (13) is therefore an expansion of Y (x, ω) onto an orthogonal basis. Pro-vided the parameters µY and CY(·) are fixed (guided by previous

investiga-tions), identifying ˆY (x, ω) essentially involves seeking the plausible Gaussian random variables ξ∗that explain the observation data d. The dimensionality of the inverse problem is therefore reduced to K parameters, i.e. the number of eigenvalues-eigenvectors chosen to expand Y (x, ω). The problem is also partly regularized since ξ is a vector of independent Gaussian variables, i.e.

(18)

p(ξ) = N (0, IK). Note also that one can include the mean µY in the sought

parameters, as done in the sequel where one defines the parameter vector θ_{= (µ}_Y_{, ξ) of dimension s = K + 1.}

If we compare the above parameterization of a multiGaussian field with, for example a pilot point method, the reduction of dimensionality when solv-ing the inverse problem is of the same order. Obtainsolv-ing several plausible solutions in a Bayesian framework will finally require several optimizations procedures. On the one hand, a classical pilot point technique will duplicate the optimization of the pilot point values to modify different prior guesses on the parameter field. On the other hand, calculating MCPDs for the parameters of the Karhunen-Loève decomposition will also require multiple optimizations (see Section 3). Further studies handling carefully designed comparisons would be needed to identify advantages and drawbacks of both approaches.

5.2. Problem statement

Let us now illustrate the approach mixing a Karhunen-Loève decomposi-tion and MCPD calculadecomposi-tions. It must be first emphasized that the Karhunen-Loève decomposition is well known to depict accurately any second-order stationary random field ([23, 24]). Along this line, our study does not pro-vide new insights on the KL decomposition. The main aim here is to assess the capability of MCPD samplings to correctly identify a large number of parameters that do not directly enter in the forward problem. As is the case with a lot of parameterization techniques, the optimized parameters serve as seeds to recompose the parameter field that enters in the calculation of the forward problem. In addition, the parameters of a Karhunen-Loève de-composition before inversion are independent because the eigenvectors of the decomposition are orthogonal. After inversion and post-conditioning, these

(19)

parameters may become correlated. It appears interesting to see how MCPD samplings will cope with this modification and which kind of uncertainty can be derived for parameters that are the seeds of a parameterization technique. Let us now illustrate the approach by considering the following steady-state two-dimensional flow problem in a heterogeneous medium

             ∇ · (T (x)∇h(x)) = 0, x = (x, y) ∈ D = [0, 1] × [0, 1] ∂h(x) ∂y y=0 = ∂h(x) ∂y y=1 = 0 T (x) ∂h(x) ∂x x=0 = 0.02; h(1, y) = 0 (14)

where, h(x) [L] is the hydraulic head and T (x) [L2_T−1_{] is the transmissivity}

of the aquifer. The forward problem (14) is numerically solved by means of the mixed-hybrid finite element method ([25]) for a domain discretized into 104 _{triangular meshes.}

The inverse problem consists of finding the log-transmissivity field Y (x) = log(T (x)) conditioned on a vector d of data enclosing 25 measurements of hydraulic head spread all over the domain and 25 log-transmissivity values lo-cated at the same points as the observed heads. These data were obtained af-ter running the flow scenario in Eq. (14) over a multiGaussian field Y (x) with a mean of zero and an isotropic Gaussian covariance Cy(x1, x2) = 2e−6|x1−x2|

2 , yielding an effective correlation length of the random field Y (x) of 0.7. The data where then corrupted by a Gaussian white noise of standard deviation σm

h = 1 for the heads and σYm = 0.1 for the log-transmissivity. The

refer-ence field Y (x) and the measurement locations are reported in Fig.4(a) and Fig. 4(d) respectively.

5.3. Posterior distribution

First, let us assume, as a prior guess, that the log-transmissivity field Y (x) has a mean of zero and an isotropic Exponential covariance Cy(x1, x2) =

(20)

10e−8|x1−x2| _{yielding an effective correlation length of 0.375. This choice of} an erroneous covariance function compared with that of the reference prob-lem is twofold. First, we want to show that it is still possible to obtain a good estimation of the random field even though the prior guess on its spa-tial structure is flawed. For instance, the correlation length of the guess is here half the correlation length of the reference field, and the prior variance is five times the variance of the reference field. One can also mention to justify this discrepancy that the model of covariance in practical applica-tions is often more conjectured than really known. Second, the Fredholm equation in (12) has well-known analytical solutions for an exponential co-variance kernel (see [26]) which simplify and accelerate the calculations of the eigenvalues−eigenfunctions. It is of course possible to choose another covariance kernel. In that case, one can make use of the calculation method proposed in [27] to estimate the eigenmodes. In the present study, the KL expansion is performed by keeping the first 103 modes representing 84% of the variance associated with the field Y (x). In the end, 104 parameters are sought (including the mean µY), which is still a highly parameterized

prob-lem but far less than the initial probprob-lem that had a number of parameters equal to the number of meshes (104_{) discretizing the domain.}

Given the way the 50 local data were obtained, the likelihood function is written p(d|θ, σm h , σ m Y) ∝ exp −SSh(θ) 2(σm h )2 −SSY(θ) 2(σm Y )2 (15)

where SSh and SSY are the sum of squared errors on the hydraulic head

and log-transmissivity respectively. In the present inversion exercise, we cannot include (σh, σY) in the parameters to be estimated. The optimization

algorithm overfits the data because the number of unknowns (s = 104) is much greater than the number of data (i.e. 50). Hence, they are fixed to

(21)

their exact presumed value.

Keeping in mind that in the KL expansion, ξ ∼ N (0, IK) and assuming

a uniform prior density function for µY allows to write (by using Eq. (15))

the parameter joint posterior density function as p(θ|d, σm h , σ m Y ) ∝ exp −SSh(θ) 2(σm h )2 − SSY(θ) 2(σm Y )2 K Y i=1 exp −ξ 2 i 2 (16) Maximizing the above density function amounts to minimize the following weighted sum of squares,

J(θ) = SSh(θ) (σm h )2 +SSY(θ) (σm Y )2 + K X i=1 ξ2_i (17)

The parameters ξ being normally distributed a priori, we did not prescribe any restrictive variation ranges. Regarding the parameter µY, we prescribed

a very large variation range [−100, 100] making in practice µY free of any

constraint. The type of objective function in Eq. (17) is rapidly minimized with gradient-based methods if the Jacobian matrix ∂h(x, θ)

∂θ is calculated

accurately ([28, 29]). A close look at the forward problem in (14) shows that its differentiation with respect to any parameter θ parameter yields

             ∇ · (T (x, θ)∇hθ(x, θ)) = −∇ · (Tθ(x, θ)∇h(x, θ)) , x ∈ D ∂hθ(x, θ) ∂y y=0 = ∂hθ(x, θ) ∂y y=1 = 0 T (x, θ) ∂hθ(x, θ) ∂x x=0 = −Tθ(x, θ) ∂h(x, θ) ∂x x=0 ; hθ(1, y) = 0 (18)

where the notations hθ(x, θ), Tθ(x, θ) stand for the derivatives

∂h(x, θ)

∂θ ,

∂T (x, θ)

∂θ , respectively. Provided the forward problem has been solved and

one can analytically differentiate the terms Tθ(x, θ), the calculation of (18)

is very similar to that of the forward problem (14). It can be handled with the same code, the same discretization, resulting in comparable accuracy for evaluating both h(x, θ) and hθ(x, θ).

(22)

5.4. Results and discussion

With 104 coefficients kept in the KL expansion, MCDPs were calculated on an eight-core computer, each core calculating 13 MCPDs. On average, each core ran the model (forward flow problem + calculation of the Jacobian matrix) about 6 500 times (CTU) for a total over the eight cores of 54 560 runs. 105 spatially distributed problems were solved, one being the calculation of the head variable (Eq. (14)), and 104 evaluating sensitivities to parameters θ−ifor a prescribed value of θi (Eq. (18)). As a comparison, in

[13], the authors handled a similar problem by means of MCMC and following the approach proposed in [12]. To lower the computational load, a surrogate model was used in a two-stage approach in which a new candidate must first pass the surrogate likelihood successfully before undergoing the statistical test of the original model. Three dependent chains were launched in parallel. The chains started to converge after 4 000 calls of the forward (i.e. original) model for a total of 10 000 calls (CTU).

The reference log-transmissivity field is reported in Fig. 4(a). The log-transmissivity field stemming from the KL expansion using the maximum a posteriori estimate of the parameter vector θM AP = (µY, ξ)M AP is reported

in Fig. 4(c). Both fields in 4(a) and 4(c) closely resemble each other, despite the fact that the covariance kernel of the KL expansion differs from that of the reference field (see above). Note also that we deliberately chose an erroneous covariance kernel for the KL expansion. However, the sampled covariances of the reference field and of the MAP estimates are very similar, implying that initial errors on the type of covariance and on the correlation length can be amended by inverting the KL coefficients and mixing eigenfunctions to recompose a relevant random field.

(23)

values. Though the patches of high and low values are correctly located, the kriged map differs from the reference field (Fig.4(a)) and the MAP solution (Fig.4(c)). For example, the effective correlation length seems overestimated and the overall map is smoother than the reference field. As shown later, the sampled variogram of the kriged map differs from that of MCPD solu-tions and confirms the preceeding visual appraisals. Stated differently, the local transmissivity values do not conceal all the information and it makes sense inverting a flow problem conditioned on head data to retrieve the log-transmissivity field. It can also be noted that the inverse problems solved to calculate the MCPDs overfit the data, the number of parameters being larger than the conditioning data. The consequence is that the terms enclosed in the objective function Eq. (17) become lower than the prior measurement errors on heads and transmissivity data.

The uncertainty on the predicted log-transmissivity (or hydraulic head) can be estimated with Eq. (9). The uncertainty (the 95% confidence interval size) associated with the log-transmissivity field is shown in Fig. 4(d). The crosses mark the (co)locations of heads and transmissivities data used for the inversion. We can note that the uncertainty is not constant over the domain and is lower at the measurement locations where the 95% confidence interval size is about 0.4 (i.e. ∼ ±1.96σm

Y). This makes sense because the

model prediction was anticipated to be more accurate at the measurement locations. Moreover, most of the domain is assigned an uncertainty range less than 0.8 (∆ log(T ) < 0.8) which is quite narrow except in the areas that contain no data (see North and South-West boundaries of the domain).

It is interesting to check whether the parameter uncertainty inferred via MCPDs yields a set of log-transmissivity fields including the reference (true) field. Fig. 4(e) maps the areas (in black) where the local value Y (x) of the

(24)

reference field is outside the 95% probable uncertainty range calculated by MCPDs (the width of these local uncertainty ranges are given in Fig. 4(d)). As expected, the areas where the uncertainty ranges do not include the local values of the reference field are those where no prior information was collected to solve the inverse problem. In the present case however, the uncertainty ranges are relatively narrow (Fig.4(e)) and the non-matching areas represent 30% of the total domain. They reach 13% for the 99% predicted uncertainty range. Interestignly, the uncertainty bounds encompass the true field in the vicinity of observed data where the uncertainty range is smaller (∆ log(T ) < 0.8). This is the consequence of a well-conditioned problem. Observation data are rather evenly spread over the whole domain and very few sub-areas are not documented by at least a transmissivity value giving the order of magnitude of the parameters. Finally, it is worth noting that increasing the variance of the postulated Exponential covariance kernel (e.g. σ2

Y = 20)

provides slightly wider uncertainty bounds that finally encompass the true field (not shown). This artificial increase of the σ2

Y value to obtain increased

uncertainties on the transmissivity field is not a consequence of flawed MCPD samplings. The handled problem is here highly regularized by the Karhunen-Loève decomposition which controls the freedom of action onto the tuned parameters (i.e. the eigenvalues of the decomposition).

The MCPD estimates of the first KL coefficients are reported in Fig. 5

("diagonal" plots). We notice that the density of µY has a bell-shaped curve

centered on zero, though a prior uniform density was postulated. Fig. 5also reports, in its "off-diagonal" plots, on the various draws of parameters θ to build MCPDs. A plot in column i and row j corresponds to the optimal values of the parameter θj for sampled (prescribed) values of θi and optimal values

(25)

line of non-null slope indicating a strong negative correlation between these two parameters. Notably, µY and ξ1 are also closely correlated with ξ6 (row

#7, columns #1-2 of Fig. 5) meaning that these parameters have dependent effects on the state variable calculated by the flow model. We remind that the prior guess on the parameters ξ assumed that they were independent and of Gausssian distribution with zero mean and unit variance. The posterior estimates show that some parameters can be highly correlated and that their distribution may have changed. For example, the parameters from ξ2 to ξ4

show bell-shaped MCPDs, but are clearly non-centered on zero and have a variance less than one (width of the bell-shaped curve far less than 6).

The overall quality of the inverse solutions is now discussed. First, the MAP solution reported in Fig.4(c) has a spatial structure close to that of the reference solution. Note that the reference solution has an effective correla-tion length of about 0.5 which is smaller than expected from the covariance model used to generate the random field. This occurs when the size of the domain is close to the correlation length. The sampled covariance (vari-ogram) of the MAP solution is Gaussian with a correlation length of about 0.45, close to that of the reference field (see Fig. 6(a)). The MAP solution, however, slightly underestimates the variance, probably because the KL ex-pansion considers modes representing only 84% of the variance on Y (see above). However, the MAP solution is much closer from the reference field than a kriged map of transmissivity data. This kriged map has a variogram that largely overestimates the correlation length and underestimates the vari-ance of the transmissivity field. As told above the kriged field is smoother than the reference and MAP solutions. We note that some inverse solutions calculated by taking parameters from MCPDs sampling can perfectly fit the covariance of the reference field (Fig.6(a)). These solutions are good but not

(26)

the best with respect to the posterior pdf.

Note that MCPDs sample give optimal inverse solutions for a prescribed value of one parameter. Using the sample, we rebuilt the associated Y fields and calculated their sampled (experimental) covariances. Then, these co-variances were fitted with a Gaussian model in the form of Cy(x1, x2) =

σ2

ye−η(x1−x2)

2

, the variance of CY obviously being σ2y and the effective

corre-lation length p3/η. Fig. 6(b), (c), and (d) show these variances and corre-lation lengths. In general, the MCPD solutions slightly underestimate the variance of Y with mean values on the order of 1.9 for a reference at 2.1. The mean correlation length of MCPD solutions establishes at 0.43 for a reference at 0.47. Though the prior guess on the spatial structure of the inverse solutions greatly differed from the reference, the eigenvectors using the KL expansion clearly record the effective spatial structure (covariance) from which data are extracted. The re-composition of these eigenvectors by means of MCPDs provides a large set of probable solutions that are only very slightly biased, at least for the forward problem handled here.

6. Conclusions

In this work, we introduced the concept of maximal conditional posterior distribution for the identification of model parameters. The MCPD of a parameter θi can be viewed as the distribution of the parameter θi knowing

that the other parameters are at their optimal values (in the sense that the conditional posterior pdf is maximized). It makes sense calculating a MCPD because the uncertainty drawn from the distribution of θi refers to

valuable solutions to the inverse problem. All the parameters θ−i are optimal

except θi which is allowed to wander in the parameter space. In essence,

(27)

They can be performed very easily on parallel computational sessions and one can extend the number of parallel sessions up to the total number of parameters involved in the problem under investigation (thus, drastically reducing the computational effort). This feature associated with the fact that the MCPD draws are good inverse solutions post-conditioned on observation data, renders the inversion in a Bayesian context affordable, even for highly parameterized problems.

As stated above, a MCPD allows a single parameter θi to vary when the

others are optimal. A key feature to avoid the single parameter (and the others) to be far from valuable solutions is to pre-identify the so called max-imum a posteriory (MAP). This MAP is a solution where all the parameters are at their optimal value, but this solution may correspond to several lo-cal maxima when the underlying distribution of parameters is multimodal. Hence, the success of MCPD sampling depends on the ability of the prelim-inary optimization procedure to retrieve all the probable local optima. This can be achieved by multi-start optimizations. The synthetic test case per-formed in this study showed that the MCPD calculations retrieve fairly well multimodal distributions.

The MCPD technique was also faced to the problem of retrieving a param-eter random field on the basis of information gathering both measurements of the parameters and state variables from a spatially distributed problem (here, steady-state groundwater flow). The parameterization allowing a strong de-crease in the dimensionality (in parameters) of the problem was grounded in a Karhunen-Loève decomposition of the parameter field. This technique is well-known to be accurate regarding the depiction of a stationary random field. Hence, the MCPD calculations were really tested on their ability to provide valuable inverse solutions and the associated uncertainties for

(28)

pa-rameters that do not directly enter in the calculation of the forward problem (the parameters allow the re-composition of a transmissivity field then used in the forward groundwater flow problem). The whole procedure allowed us to approximate the random field with only s = 104 parameters: the mean of the random field and the 103 first eigenmodes of the Karhunen-Loève decom-position. Then, the MCPDs of these parameters were assessed quickly with the proposed algorithm taking advantage of their computation in parallel.

It was shown that the MCPD-KL association was able to accurately re-trieve a reference field even when starting from a flawed initial estimate of the spatial structure (covariance) of the field. MCPDs can also render parameter distributions that strongly differ from their prior guess especially regarding their mean and variance.

Acknowledgement

The authors are grateful to the French National Research Agency who

(29)

References

[1] N.-A. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, E. Teller, Equations of state calculations by fast comput-ing machines, Journal of Chemical Physics 21 (1953) 1087–1091.

doi:10.1063/1.1699114.

[2] H. Hastings, Monte carlo sampling methods using markov

chains and their applications, Biometrika 57 (1970) 97–109.

doi:10.1093/biomet/57.1.97.

[3] M. J. Bayarri, J. O. Berger, R. Paulo, J. Sacks, J. A. Cafeo, J. Cavendish, C. H. Lin, J. Tu, A framework for validation of computer models,

Tech-nometrics 49 (2007) 138–154. doi:10.1198/004017007000000092.

[4] D. Higdon, J. Gattiker, B. Williams, M. Rightley, Computer model cal-ibration using high-dimensional output, J. American Statistical

Associ-ation 103 (2008) 570–583. doi:10.1198/016214507000000888.

[5] H. Haario, Saksman, J. Tamminen, An adaptive metropolis algorithm, Bernouilli 7 (2001) 223–242. doi:10.2307/3318737.

[6] P. J. Green, A. Mira, Delayed rejection in reversible

jump metropolis-hastings, Biometrika 88 (2001) 1035–1053.

doi:10.1093/biomet/88.4.1035.

[7] H. Haario, M. Laine, A. Mira, E. Saksman, Dram:

Effi-cient adaptive mcmc, Statistics and Computing 16 (339–354).

doi:10.1007/s11222-006-9438-0.

(30)

snooker updater and fewer chains, Statistics and Computing 18 (4)

(2008) 435–446. doi:10.1007/s11222-008-9104-9.

[9] M. C. Kennedy, A. O’Hagan, Bayesian calibration of computer mod-els, Journal of the Royal Statistical Society 63 (B) (2001) 425–464.

doi:10.1111/1467-9868.00294.

[10] J. A. Christen, C. Fox, Mcmc using an approximation, Journal of Computational and Graphical Statistics 14 (4) (2005) 795–810.

doi:10.1198/106186005X76983.

[11] X. Ma, N. Zabaras, An efficient bayesian inference approach to inverse problems based on an adaptive sparse grid collocation method, Inverse

Problems 25 (2009) 035013. doi:10.1088/0266-5611/25/3/035013.

[12] T. Cui, C. Fox, M. A. O’Sullivan, Bayesian calibration of a large-scale geothermal reservoir model by a new adaptive delayed acceptance metropolis-hastings algorithm, Water Resources Research 47 (2011)

W10521. doi:10.1029/2010WR010352.

[13] E. Laloy, B. Rogiers, J. A. Vrugt, D. Mallants, D. Jacques, Effi-cient posterior exploration of a high-dimensional groundwater model from two-stage markov chain monte carlo simulation and polyno-mial chaos expansion, Water Resources Research 49 (2013) 2664–2682.

doi:10.1002/wrcr.20226.

[14] M. Loève, Probability Theory, Springer-Verlag, Berlin, 1977.

[15] A. Tarantola, Inverse Problem. Theory and Methods for Model Param-eter Estimation, SIAM, Philadelphia, 2005.

(31)

[16] C. J. F. ter Braak, A markov chain monte carlo version of the ge-netic algorithm differential evolution: easy bayesian computing for real parameter spaces, Statistics and Computing 16 (2006) 239–249.

doi:10.1007/s11222-006-8769-1.

[17] J. A. Vrugt, C. J. F. ter Braak, C. G. H. Diks, D. Higdon, B. A. Robin-son, J. M. Hyman, Accelerating markov chain monte carlo simulation by differential evolution with self-adaptive randomized subspace sampling, International Journal of Nonlinear Sciences and Numerical Simulation

10 (2009) 273–290. doi:10.1515/IJNSNS.2009.10.3.273.

[18] J. A. Vrugt, C. J. F. ter Braak, H. V. Gupta, B. A. Robinson, Equi-finality of formal (dream) and informal (glue) bayesian approaches in hydrologic modeling?, Stochastic Environmental Research and Risk

As-sessment 23 (7) (2008) 1011–1026. doi:10.1007/s00477-008-0274-y.

[19] M. Laine, Adaptative mcmc methods with applications in environmental and geophysical models, Ph.D. thesis, Finnish Meteorological Institute, Findland (2008).

[20] E. Laloy, J. A. Vrugt, High-dimensional posterior exploration

of hydrologic models using _{multiple-try dream(zs) and}

high-performance computing, Water Resources Research 48 (2012) W01526.

doi:10.1029/2011WR010608.

[21] Q. Duan, S. Sorooshian, V. K. Gupta, Optimal use of the sce-ua global optimization method for calibrating watershed models, Journal of

Hy-drology 158 (1994) 265–284. doi:10.1016/0022-1694(94)90057-4.

(32)

and unstructured grids, Water Resources Research 42 (2006) W06402.

doi:10.1029/2005WR004668.

[23] Y. Efendiev, T. Y. Hou, W. Luo, Preconditioning markov chain monte carlo simulations using coarse-scale models, SIAM Journal on Scientific

Computing 28 (2006) 776–803. doi:10.1137/050628568.

[24] Y. Marzouk, H. N. Najm, Dimensionality reduction and poly-nomial chaos acceleration of bayesian inference in inverse prob-lems, Journal of Computational Physics 228 (6) (2009) 1862–1902.

doi:10.1016/j.jcp.2008.11.024.

[25] A. Younes, P. Ackerer, R. Mose, G. Chavent, A new formulation of the mixed-finite element method for solving elliptic and parabolic pde with triangular elements, Journal of Computational Physics 149 (1999)

148–167. doi:10.1006/jcph.1998.6150.

[26] D. Zhang, Z. Lu, An efficient high-order perturbation approach for flow in random porous media via karhunen–loève and polynomial expansions, Journal of Computational Physics 194 (2004) 773–794.

doi:10.1016/j.jcp.2003.09.015.

[27] K. K. Phoon, S. P. Huang, S. T. Quek, Implementation of karhunen-loeve expansion for simulation using a wavelet-galerkin scheme, Probabilistic Engineering Mechanics 17 (2002) 293–303.

doi:10.1016/S0266-8920(02)00013-9.

[28] A. Kaczmaryk, F. Delay, Improving dual-porosity-medium approaches to account for karstic flow in a fractured limestone. application to the automatic inversion of hydraulic interference tests, Journal of Hydrology

(33)

[29] P. Ackerer, F. Delay, Inversion of a set of well-test interferences in a fractured limestone aquifer by using an automatic multi-scale pa-rameterization technique, Journal of Hydrology 389 (2010) 42–56.

(34)

Figure captions List of Figures

1 A two-dimensional illustration of the MCPD assessment. The

MCPD of θ1 is built by maximizing the conditional joint

pos-terior distribution. During this maximization, the most prob-able value of θ2 is derived while the value of θ1 is fixed. This

operation is repeated by fixing θ1 farther and farther from its

maximum a posteriori estimate . . . 35

2 General algorithm used to infer MCPDs in the vicinity of the

MAP, assuming that no parallelization of the calculations is performed (3.2for explanations). Specifically, Nitcorresponds

to the maximal number of iterations used to refine the sam-pling of a MCPD (part below the green lozenge in the middle of the chart); d is the number of parameters. Notably, when the algorithm seeks MCPDs for multimodal densities, the se-quence in the chart is simply resumed for each local θopt (just replace θM AP _{by θ}opt_{). . . .} ₃₆

3 The posterior parameter densities of a multivariate Gaussian

distribution are represented on the diagonal. The blue solid lines are the MCPDs while the marginal posterior densities are in red broken lines. The MCPD of θ13 is very different

from its marginal density because two modes overlap. The off-diagonal plots represent some of the MCPD draws. For a row i and a column j, horizontal direction corresponds to the values of parameter θi and vertical direction to the values of

θj. Plots of optimal θj values for MCPD draws of θi and plots

(35)

4 The true log-transmissivity field are represented in (a). The kriged estimated field is represented in (b). In (c) the MAP estimate of the log-transmissivity field is depicted. The size of the 95% uncertainty interval of the estimated field is shown in (d), the crosses indicate the measurement locations. In (e), the black stains indicate the region of the domain where the estimated uncertainty log-transmissivity field does not encom-pass the true field . . . 38

5 The estimated MCPDs of the first seven coefficients in the

KL approximation of the log-transmissivity field (on-diagonal plots). The off-diagonal plots represent some of the MCPD draws. For a row i and a column j, horizontal direction corre-sponds to the values of parameter θi and vertical direction to

the values of θj. Plots of optimal θj values for MCPD draws

of θi and plots of optimal θi for draws of θj . . . 39

6 Comparison of Y = LogK variograms between the reference

problem and MCDP inversions. (a)- variograms of the refer-ence field, the MAP field and one of the MCPDs fields. (b), (c), variances and correlation lengths of variograms from all MCPD fields with respect to MCPD probability values. The red line is the location of the reference field. (d)- correlation length versus variance of variograms from MCPD fields. The red dot corresponds to the reference field . . . 40

(36)

(37)

Given θM AP, p(θM AP|d) Set i = 1, Nit = 20 Set ki = 2, ∆ = 0.5, it = 1, Pi(θ1i) = p(θM AP|d) If ki even, θkii = θM APi (1 − ki 2∆) Else, θki i = θM APi (1 + (ki−1) 2 ∆) Compute θki

−i from Eq.(5)

Set Pi(θkii) = p(θki|d)

ki= ki+ 1 i = i + 1

max(Pi(θkii), Pi(θkii−1))

< p(θM AP|d)/100 ?

Set ki = ki+ 1, Find km (Eq.(7))

Set θki i = θkm_i +θkm+1_i 2 Find θki −i (Eq.(5)) Set Pi(θk_ii) = p(θki|d) it = it + 1 it = Nit ? i = s ? END No Yes No Yes Yes No Figure 2:

(38)

−20 0 20 400 0.05 0.1 P1 θ₁ Density −20 0 20 40 −10 0 10 20 30 θ₁₁ −10 0 10 20 300 0.05 0.1 P1 1 Density θ₁₃ −20 0 20 40 −10 0 10 20 30 θ₁ −10 0 10 20 30 0 20 40 θ₁₁ θ 13 −10 0 10 20 300 0.05 0.1 0.15 P1 3 θ₁₃ Density Figure 3:

(39)

(40)

−2 0 20 0.5 1 Density P0(µ_Y) −5 0 5 −5 0 5 −5 0 50 0.5 1 Density P1(ξ1) −5 0 5 −2 −1 0 −2 −1 00 2 4 Density P2(ξ2) −2 0 2 −2 −1 0 −5 0 5 −2 −1 0 −2 −1 0 −2 −1 0 −2 −1 00 2 4 Density P3(ξ3) −2 0 2 −2 −1 0 −5 0 5 −2 −1 0 −2 −1 0 −2 −1 0 −2 −1 0 −2 −1 0 −2 −1 00 2 4 Density P4(ξ4) −2 0 2 −2 0 2 −5 0 5 −2 0 2 −2 −1 0 −2 0 2 −2 −1 0 −2 0 2 −2 −1 0 −2 0 2 −2 0 20 1 2 Density P5(ξ5) −2 0 2 −1 0 1 µY −5 0 5 −1 0 1 ξ1 −2 −1 0 −1 0 1 ξ2 −2 −1 0 −1 0 1 ξ3 −2 −1 0 −1 0 1 ξ4 −2 0 2 −1 0 1 ξ5 −1 0 10 2 4 Density ξ6 P6(ξ6) −2 0 2 −2 −1 0 µY ξ1 ξ2 ξ3 ξ5 ξ4 ξ6 Figure 5: 39

(41)

0 0.1 0.2 0.3 0.4 0.5 0 0.5 1 1.5 2 2.5 (a) Lag Semi−variance True profile Probable profile MAP profile Kriged profile 1 1.5 2 2.5 3 3.5 4 0 0.005 0.01 0.015 0.02 0.025 0.03 (b) Variance Posterior Distribution 0 0.005 0.01 0.015 0.02 0.025 0.03 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Posterior Distribution Correlation length (c) 1 1.5 2 2.5 3 3.5 4 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Variance Correlation length (d) Figure 6: