Two-dimensional probabilistic inversion of plane-wave electromagnetic data: methodology, model constraints and joint inversion with electrical resistivity data

Texte intégral

(1)Geophysical Journal International Geophys. J. Int. (2014) 196, 1508–1524 Advance Access publication 2013 December 22. doi: 10.1093/gji/ggt482. Two-dimensional probabilistic inversion of plane-wave electromagnetic data: methodology, model constraints and joint inversion with electrical resistivity data Marina Rosas-Carbajal,1 Niklas Linde,1 Thomas Kalscheuer2 and Jasper A. Vrugt3,4 GJI Marine geosciences and applied geophysics. 1 Applied. and Environmental Geophysics Group, Faculty of Geosciences and Environment, University of Lausanne, 1015 Lausanne, Switzerland. E-mail: Marina.Rosas@Unil.ch 2 Institute of Geophysics, ETH Zurich, Zurich, Switzerland 3 Department of Civil and Environmental Engineering, University of California Irvine, 4130 Engineering Gateway, Irvine, CA 92697–2175, USA 4 Institute for Biodiversity and Ecosystems Dynamics, University of Amsterdam, Amsterdam, the Netherlands. Accepted 2013 November 25. Received 2013 September 4; in original form 2013 April 24. SUMMARY Probabilistic inversion methods based on Markov chain Monte Carlo (MCMC) simulation are well suited to quantify parameter and model uncertainty of nonlinear inverse problems. Yet, application of such methods to CPU-intensive forward models can be a daunting task, particularly if the parameter space is high dimensional. Here, we present a 2-D pixel-based MCMC inversion of plane-wave electromagnetic (EM) data. Using synthetic data, we investigate how model parameter uncertainty depends on model structure constraints using different norms of the likelihood function and the model constraints, and study the added benefits of joint inversion of EM and electrical resistivity tomography (ERT) data. Our results demonstrate that model structure constraints are necessary to stabilize the MCMC inversion results of a highly discretized model. These constraints decrease model parameter uncertainty and facilitate model interpretation. A drawback is that these constraints may lead to posterior distributions that do not fully include the true underlying model, because some of its features exhibit a low sensitivity to the EM data, and hence are difficult to resolve. This problem can be partly mitigated if the plane-wave EM data is augmented with ERT observations. The hierarchical Bayesian inverse formulation introduced and used herein is able to successfully recover the probabilistic properties of the measurement data errors and a model regularization weight. Application of the proposed inversion methodology to field data from an aquifer demonstrates that the posterior mean model realization is very similar to that derived from a deterministic inversion with similar model constraints. Key words: Inverse theory; Probability distributions; Non-linear electromagnetics.. 1 I N T RO D U C T I O N Geophysical measurement methods make it possible to noninvasively sense the physical properties of the subsurface at different spatial and temporal resolutions. Inversion methods are required to interpret these indirect observations and derive a physical description of the subsurface, yet multiple descriptions can be found (also referred to as models) that fit the observed geophysical data equally well. This is in large part due to measurement errors, incomplete data coverage, the underlying physics and/or overparameterization of the subsurface models. Whereas the probabilistic properties of observation errors are relatively easy to describe, model structural errors are difficult to formulate in probabilistic terms. Arbitrary and subjective regularizations and parameterizations may significantly decrease model parameter uncertainty but they may also introduce. 1508. C. a ‘bias’, meaning that some features of the true model may not be resolved. Bayesian inference can help to explicitly treat input data, parameter, and model uncertainty, but successful implementation requires efficient sampling methods that explore the posterior target distribution. In this probabilistic approach, the inverse problem is stated as an inference problem where the solution is given by the posterior probability density function (pdf) of the model parameters. This distribution quantifies joint and marginal parameter uncertainty. Unfortunately, in most practical applications, this posterior distribution cannot be derived analytically, and methods are required that use trial-and-error sampling to approximate the target distribution. Markov chain Monte Carlo (MCMC) simulation methods are well suited for this task, but suffer from poor efficiency, particularly when confronted with significant model nonlinearity, nonuniqueness. The Authors 2013. Published by Oxford University Press on behalf of The Royal Astronomical Society..

(2) 2-D MCMC inversion of electromagnetic data and high-dimensional parameter spaces (Mosegaard & Tarantola 1995). The basic building block of MCMC sampling is Monte Carlo (MC) simulation. This approach randomly samples the prior parameter space, and evaluates the distance of the response of each candidate model to the respective data. If the parameter space is low dimensional, MC simulation can provide a reasonable approximation of the posterior distribution pending that the ensemble of samples is sufficiently large. Yet, for higher dimensional spaces, exhaustive random sampling is inefficient, and more intelligent search methods such as MCMC simulation are required to speed up the exploration of the target distribution. Monte Carlo methods have been applied to magnetotelluric (MT) data and other types of frequencydomain electromagnetic (FDEM) data in a number of studies for 1-D modeling problems (Tarits et al. 1994; Grandis et al. 1999; Grandis et al. 2002; Hou et al. 2006; Khan et al. 2006; Chen et al. 2007; Guo et al. 2011; Minsley 2011; Buland & Kolbjornsen 2012). We briefly summarize a few of these studies. Tarits et al. (1994) used Monte Carlo sampling to estimate the posterior distribution of the thicknesses and electrical resistivity of different subsurface layers assuming that the number of layers is known a priori. Grandis et al. (1999) extended this 1-D approach by employing MCMC simulation with sampling from a prior distribution that favours smooth variations in the 1-D electrical resistivity model. Hou et al. (2006) used a quasi-Monte Carlo method (Ueberhuber 1997, p. 125) for 1-D models of reservoir-fluid saturation and porosity to jointly invert controlled source electromagnetic (CSEM) and seismic data. The same types of data were jointly inverted by Chen et al. (2007) using MCMC simulation to derive 1-D models of gas saturation. In a more recent contribution, Guo et al. (2011) compared deterministic and Bayesian MT data inversion using 1-D synthetic and field data. Data errors and regularization weight were treated as hyperparameters and determined by MCMC simulation (cf. Malinverno & Briggs 2004). Results showed that the MT data contained sufficient information to accurately determine these latent variables. Minsley (2011) presented a 1-D trans-dimensional MCMC inversion (Malinverno 2000) algorithm for FDEM data, in which the number of layers was assumed unknown. Their approach favours model parsimony between models that equally fit the data. This favouring of simple models is naturally accounted for in the socalled ‘Ockham factor’, which measures how much of the prior information is contained in the posterior pdf. With increasing number of parameters, the probability mass of the prior in the vicinity of the posterior will typically decrease (and so will the Ockham factor), while the data fit will typically improve (Malinverno 2002). Ray & Key (2012) used the same type of method to determine 1-D anisotropic resistivity profiles from marine CSEM data. Most recently, Buland & Kolbjornsen (2012) jointly inverted synthetic CSEM and MT data and presented a real-world application for CSEM data. Khan et al. (2006) used EM data within a MCMC framework to constrain the composition and thermal state of the mantle beneath Europe. The published contributions summarized thus far have demonstrated the ability of MCMC methods to (1) successfully converge to the global optimum of the parameter space, (2) treat nonlinear relationships between model and data and (3) adequately characterize parameter and model uncertainty. Yet, all these studies used relatively simple 1-D models to minimize the computational costs of the forward solution, and considered relatively low-dimensional parameter spaces to facilitate convergence of the MCMC sampler to the appropriate limiting distribution.. 1509. Grandis et al. (2002) presented the first published multidimensional MCMC inversion of MT data using a thin-sheet modelling code that is CPU-efficient, but only accurate for relatively thin anomalous bodies. Inversions were presented for a horizontal 2-D anomaly embedded in a known horizontally layered 1-D model. Chen et al. (2012) presented a MCMC algorithm to invert 2-D MT data. They fixed the number of layers in the model, yet allowed the depths to vary at given offsets. A 2-D resistivity structure was estimated at a geothermal site using 436 model parameters. This particular algorithm enables the inversion of 2-D data, but imposes strict constraints on the model parameterization in that only layered models with sharp boundaries are allowed. Other global search methods of stochastic nature, such as simulated annealing (Kirkpatrick et al. 1984) and genetic algorithms (Holland 1975), have been used to produce 1-D and 2-D electrical resistivity models from MT data (Dosso & Oldenburg 1991; Everett & Schultz 1993; Pérez-Flores & Schultz 2002). These methods fully account for the nonlinear relation between model and data, but are only concerned with finding the optimal model, without recourse to estimating the underlying posterior parameter distribution. Postprocessing of the sampled trajectories can provide some insights into the remaining parameter uncertainty, but this type of analysis approach lacks statistical rigor. More complex and highly parameterized 2-D or 3-D resistivity models are generally obtained through deterministic inversion (e.g. deGroot-Hedlin & Constable 1990; Siripunvaraporn & Egbert 2000; Rodi & Mackie 2001; Siripunvaraporn et al. 2005). These algorithms are much more efficient but provide only a single ‘best’ solution to the inverse problem (e.g. Menke 1989). Approximate uncertainty estimates can be obtained through linearization in the vicinity of the final solution (Alumbaugh & Newman 2000). As an alternative to such approaches, Oldenburg & Li (1999) derived a set of different deterministic models using the same data set by running repeated deterministic inversions with different regularization constraints. Features that appear in all models are interpreted as being well resolved by the data. Jackson (1976) and Meju & Hutton (1992) constructed extremal models that fit the data up to a given data misfit threshold with a most-squares inversion. This approach derives the extremal deviations of each model parameter from a best-fitting model. Kalscheuer & Pedersen (2007) used truncated singular value decomposition (TSVD) to estimate the model parameter errors and resolution of models from radio magnetotelluric (RMT) data. Finally, Kalscheuer et al. (2010) used the same approach to compare the errors and resolution properties of the RMT data against those of a joint inversion with electrical resistivity tomography data (ERT) and ERT data alone. The aforementioned methods partly account for model nonlinearity but violate formal Bayesian principles, first, because the ‘best’ model is found by minimizing an objective function rather than analyzing the variables’ marginal pdfs, and secondly because the estimated uncertainties are dependent on this best model, which in turn depends on the initial model used to find it (e.g. Chen et al. 2008). This poses questions regarding the statistical validity of the estimated model and parameter uncertainty. The purpose of the present paper is to investigate MCMC-derived parameter uncertainty and bias of a finely parameterized 2-D subsurface system for an increasing level of model constraints. In particular, we study how the posterior uncertainty changes when RMT data is inverted using (1) no constraints on the model structure, (2) smoothness constraints with different model norms and (3) joint inversion with ERT data. We also investigate the ability of the MCMC algorithm to retrieve the ‘true’ measurement data errors and the.

(3) 1510. M. Rosas-Carbajal et al.. regularization weight that provides appropriate weights to the model constraints. The remainder of the paper is organized as follows: Section 2 presents the theoretical background of the proposed inversion approach. This is followed in Section 3 by the results of a synthetic model using different levels of model constraints and in Section 4 for a real world application using experimental data from an aquifer in Sweden. Section 5 discusses the implications of our results and highlights potential further developments. Finally, Section 6 concludes this paper with a summary of the presented work.. 2 METHOD 2.1 Bayesian inversion Let the physical system under investigation be described by a vector of M model parameters, m = (m1 , m2 , . . . , mM ) and a set of N observations, d = (d1 , d2 , . . . , dN ) which are theoretically related to the model via a set of equations, d = g(m) + e,. (1). where e is a vector of dimension N, which contains measurement data errors and any discrepancies caused by the model parameterization, deficiencies in the forward function g(m), etc. The posterior pdf p(m|d) of the model parameters, conditional on the data, can be obtained by applying Bayes theorem (Tarantola & Valette 1982): p( m| d) =. p(m) p( d| m) , p(d). (2). where p(d|m) is the pdf of d conditional on m, also called the likelihood function L(m|d), p(m) is the prior pdf and p(d) signifies the evidence. The evidence is a normalizing constant that is required for Bayesian model selection and averaging (e.g. Malinverno 2002), but because our interests concern a fixed model parameterization, p(d) can be removed without harm from eq. (2) leaving us with the following proportionality equality p(m|d) ∝ p(m)L( m| d).. (3). The prior probability of the model vector, p(m), represents the information known about the subsurface before collecting the actual data. It can be based on other types of geophysical measurements, geological information about the model structure, expected type of rocks and values of model parameters, etc. In the absence of detailed prior information about the subsurface properties, we assume a Jeffreys prior, that is, that the logarithm of each respective property is uniformly distributed (Jeffreys 1939; Tarantola 2005).. where is the data covariance matrix and det() denotes the determinant of . If the errors are uncorrelated, then is a diagonal maN trix and det() = σi2 . The log-likelihood can then be expressed i=1. as. N 1 1 N 2 σi − φd,2 , l(m |d ) = − log(2π ) − log 2 2 2 i=1. (5). N gi (m)−di 2 ( σi ) represents the data misfit and σi dewhere φd,2 = i=1 notes the standard deviation of the i-th measurement error. This misfit function is a measure of the distance between the forward response of the proposed model and the measured data, where the subscript 2 defines the l2 norm. The first term in eq. (5) is a constant, and the measurement data errors can be assumed unknown and estimated jointly with the model parameters. This approach is also referred to as hierarchical Bayes (e.g. Malinverno & Briggs 2004; Guo et al. 2011). As the data misfit becomes smaller, the log-likelihood increases and the proposed model is more likely to be a realization from the posterior distribution. Given the assumptions of the data errors made thus far, the sum of squared errors should follow a chi-square distribution with expected value of N. To avoid data over- or underfitting, it is therefore necessary to have a posterior misfit pdf with the same expected value. When the data errors deviate from normality, it is common to use an exponential distribution, which is consistent with an l1 norm instead of an l2 norm (Menke 1989). Different publications have demonstrated that the l1 norm is more robust against outliers, and often more realistic (e.g. Shearer 1997; Farquharson & Oldenburg 1998). When the measurement errors are independent, the corresponding exponential likelihood function is given by (Tarantola 2005): N. gi (m) − di. 1. (6) exp − L(m |d ) =. , N σd,i i=1 2N σi i=1. which corresponds to the following formulation of the loglikelihood function N σi − φd,1 , (7) l(m |d ) = −N log(2) − log i=1. N gi (m)−di | σi |. This where the data misfit is now defined as φd,1 = i=1 distribution has much longer tails (e.g. Menke 1989), thereby reducing the importance of outliers during parameter estimation.. 2.3 Constraining the model structure 2.2 The likelihood function The likelihood function summarizes the distance (typically a norm of a vector of residuals) between the model simulation and observed data. The larger the value of the likelihood, the closer the model response typically is to the experimental data. Under the assumption that the measurement data errors follow a normal distribution with zero mean, the likelihood function is given by (Tarantola 2005) L(m |d ) =. 1 (2π ). N /2. det ()1/2. 1 × exp − (g (m) − d)T −1 (g (m) − d) , 2 . (4). When strong a priori knowledge of a suitable model structure is lacking, one may invert for the model pdf by only providing each model parameter’s likely range of variation as a priori information. An alternative is to also constrain the model structure to favour smooth spatial transitions. This is a common strategy in deterministic inversion (e.g. Constable et al. 1987; deGroot-Hedlin & Constable 1990), where these constraints serve as a regularization term that decreases the ill-posedness of the inverse problem. In the Bayesian framework, the constraints can be included in the prior pdf (e.g. Besag et al. 1995; Chen et al. 2012). To favour models with smoothly varying resistivity structures, we impose independent normal distributions to the horizontal and vertical model gradients. This results in the following constraint.

(4) 2-D MCMC inversion of electromagnetic data prior pdf (see Appendix A) 1 1 cm,2 (m) =

(5) My

(6) M 2 2π α y 2π αz2 z. 1 1 T T 1 T T × exp − m D y D y m + 2 m Dz Dz m , (8) 2 α 2y αz where D y and Dz signify the difference operators in the horizontal and vertical directions with rank M y and Mz , respectively, (M y + 1) and (Mz + 1) denote the number of horizontal and vertical grid cells, respectively, and α y and αz are the standard deviations of the model gradients in each spatial direction. If their expected values are similar for both directions, the constraint function becomes 1 log(cm,2 (m)) = −(M y + Mz ) log(2π λ2 ) − φm,2 , 2. (9). where φm,2 = λ12 (mT DTy D y m + mT DTz Dz m) and λ = αz = α y is a hyperparameter to be determined using MCMC simulation. This latter variable bears much resemblance with model regularization weights used in deterministic inversions, and hence will be referred to as such hereafter. Note also that the right-hand side term in eq. (9) is essentially the model regularization term proposed by deGrootHedlin & Constable (1990). The smaller the value of λ, the higher the weight given to the regularization term. Sharper spatial model transitions than those obtained by the leastsquares smoothness constraints may be sought. In classical deterministic inversions, sharp transitions are usually imposed by applying alternative model norms (e.g. Farquharson 2008; Rosas Carbajal et al. 2012). Similar to how an exponential pdf was used to obtain more robust data misfit measures, here we apply it to increase the likelihood of models whose properties change abruptly from one cell to the next: D y m Dz m1 1 1 1 exp − + , cm,1 (m) =

(7) M M αy αz 2α y y (2αz ) z (10) where a l1 norm is used (subscript) for the smoothness constraints. In the case that αz = α y = λ, the log-distribution of eq. (10) becomes log(cm,1 (m)) = −(M y + Mz ) log(2λ) −.

(8) 1 D y m + Dz m1 . 1 λ (11). The l1 norm linearly weights the differences of the properties of adjacent cells. This is different from an l2 norm that squares these differences, and hence an l1 norm is less sensitive to sharp transitions between neighbouring cells.. 1511. compute the 2.5D ERT and RMT forward responses using finitedifference approximation. A detailed description of the forward solvers can be found in Kalscheuer et al. (2010), and interested readers are referred to this publication for additional details about the numerical setup and solution.. 2.5 MCMC strategy for high-dimensional problems For high-dimensional and non-linear inverse problems, it is practically impossible to analytically derive the posterior distribution. We therefore resort to MCMC sampling methods that iteratively search the space of feasible solutions. In short, MCMC simulation proceeds as follows. An initial starting point, mold is drawn randomly by sampling from the prior distribution. The posterior density of this point is calculated by evaluating the product of the likelihood of the corresponding simulation and prior density. A new (candidate) point, mnew is subsequently created from a proposal distribution that is centred around the current point. This proposal is accepted with probability (Mosegaard & Tarantola 1995): Paccept = min {1, exp [l(mnew |d ) − l(mold |d )]} .. (12). If the proposal is accepted the Markov chain moves to mnew , otherwise the chain remains at its old location. After many iterations, the samples that are generated with this approach are distributed according to the underlying posterior distribution. The efficiency of sampling is strongly determined by the scale and orientation of the proposal distribution. If this distribution is incorrectly chosen, then the acceptance rate of candidate points might be unacceptably low, resulting in a very poor efficiency. On the contrary, if the proposal distribution is chosen accurately, the MCMC sampler will rapidly explore the posterior target distribution. In this work, we use the MT-DREAM(ZS) algorithm (Laloy & Vrugt 2012), which was especially designed to efficiently explore high-dimensional posterior distributions. This is an adaptive MCMC algorithm (e.g. Roberts & Rosenthal 2007), which runs multiple chains in parallel and combines multitry sampling (Liu et al. 2000) with sampling from an archive of past states (Vrugt et al. 2009a, see also Vrugt et al. 2008) to accelerate convergence to a limiting distribution. Furthermore, it is fully parallelized and especially designed to run on a computer cluster. The MT-DREAM(ZS) algorithm satisfies detailed balance and ergodicity, and is generally superior to existing MCMC algorithms (Laloy & Vrugt 2012). To assess convergence, the Gelman–Rubin statistic (Gelman & Rubin 1992) is periodically computed using the last 50 per cent of the samples in each of the chains. Convergence to a limiting distribution is declared if the Gelman–Rubin statistic is less than 1.2 for all parameters. After convergence, we use the last 25 per cent of the samples in each chain to summarize the posterior distribution.. 2.4 Forward computations To compute the likelihood functions described in the previous section, a numerical solver is needed to simulate the geophysical response of each proposed model. For both geophysical methods considered herein, the RMT and ERT responses are described by Maxwell’s equations. In the general case, the model parameters and electromagnetic field vary dynamically in a 3-D space. The higher the resolution of the resolved spatial dimension and the larger the number of model parameters, the more demanding the forward problem. Despite significant advances in computational power, 3-D MCMC inversion remains a daunting computational task. We therefore focus our attention on a 2-D model of the subsurface and. 2.6 Uncertainty estimation with most-squares inversion Most-squares inversion (Jackson 1976; Meju & Hutton 1992) is a deterministic inversion approach where extremal models are sought that fit the data up to a given threshold. First, a best-fitting model m0 is calculated. Next, a particular cell of the model is chosen and the most-squares inversion is used to find the extremal values of t = φd,2 [m0 ] + φ. this cell that satisfy a data misfit threshold φd,2 All model cells are allowed to vary and two different searches are initiated to derive the smallest and largest acceptable resistivities. If we choose φ = 1 it can be shown that this results in extremal values that deviate one standard deviation from the best-fitting model.

(9) 1512. M. Rosas-Carbajal et al.. Figure 1. (a) Synthetic test model with the MCMC model discretization highlighted. Letters A, B, C and D indicate cells for which the inversion results are evaluated against those of deterministic most-squares inversions. Numbered letters V1, V2 and V3 indicate the offsets at which the resistivity marginal posterior pdfs are presented. (b) Model obtained by inverting RMT data (3 per cent error on the impedance elements) with a smoothness constrained deterministic inversion. The mesh in (b) corresponds to the model discretization of the deterministic inversions and the forward modelling mesh. The triangles at the top of the figures indicate the locations of the RMT stations and the ERT electrodes.. (e.g. Kalscheuer et al. 2010). Most-squares inversion has been used to test the validity of other non-linear yet deterministic variance estimates, such as inversion schemes based on singular value decomposition (Kalscheuer & Pedersen 2007). Furthermore, it can also be applied with regularization constraints using the same model regularization weight used to derive the best-fitting model and modt = φd,2 [m0 ] + (1 λ2 )φm,2 + φ. ifying the threshold misfit to φd,2 The mean and uncertainty of the different cells derived from the most-squares inversion results are compared against their estimates from MCMC simulation. 3 SYNTHETIC EXAMPLES To evaluate the impact of the model constraints and data on the posterior pdf, we consider a synthetic 2-D resistivity model. This study is similar to the one presented by Kalscheuer et al. (2010). Two resistors and two conductors with thicknesses of 10 m (Fig. 1a) are immersed in a homogeneous medium of 100 m. A conductor of 10 m and 50 m length overlays a 1000 m and 30-m long resistor at symmetric positions, and a resistor of 1000 m and 50 m length overlays a 10 m and 30-m long conductor, respectively. The transverse electric (TE) and transverse magnetic (TM) mode responses of this configuration were computed for the 17 different stations shown in Fig. 1(a). A total of 8 frequencies, regularly spaced on a logarithmic scale in the frequency range of 22–226 kHz were used, which resulted in a total of 544 data points. These synthetic observations were subsequently corrupted with a Gaussian measurement data error with standard deviation equal to 3 per cent of the simulated impedances. To explicitly investigate the effect of the probabilistic properties of the measurement data errors, we also created a second data set by perturbing the error-free simulated forward responses with a zero-mean exponential distribution and a similar mean deviation of 3 per cent of the modelled impedances. Unless stated differently, we refer to the RMT data as the data set contaminated with Gaussian noise in the remainder of this paper. To generate the synthetic ERT data, forward and reverse pole–dipole configurations were considered with electrodes placed at the positions of the 17 different RMT stations. Similarly to Kalscheuer et al. (2010), four expansion factors (1, 2, 4 and 6) and a basic potential electrode distance of 10 m, and level values of n = 1, . . . , 7 for a fixed potential electrode distance were used. This resulted in a data set consisting of 306 different artificial observations. To mimic the effect of measurement data errors, the simulated data. were again perturbed with a Gaussian error using a standard deviation of 3 per cent of the simulated apparent resistivities. The model discretization used in the MCMC inversions is shown in Fig. 1(a). Each cell has dimensions of 5 × 10 m, but the cells located at the left, right and bottom edges of the domain extend until ‘infinity’ (i.e. to accommodate the imposed boundary conditions). This results in a total of 228 different resistivity values that need to be estimated from the experimental data. Fig. 1(b) plots the final model derived from the RMT data using a classical deterministic inversion with smoothness constraints (cf. deGroot-Hedlin & Constable 1990). This model was obtained after three iterations and has a misfit of φd,2 = 533, assuming a 3 per cent error of the impedance values. A homogenous half-space of 100 m was used as the starting model. The inversion successfully retrieves the two shallow blocks, and indicates the presence of the deep conductor. However, it shows no evidence of the deep resistor. The resistivity value of the shallow conductor is well defined, but the magnitude of the resistor is underdetermined. We now summarize the results of MCMC simulation using the different penalties of the model structure described previously in Section 2. Following recommendations made by Laloy & Vrugt (2012), we use three different chains and simultaneously create and evaluate five candidate points in each individual chain. To maximize computational efficiency, we run MT-DREAM(ZS) in parallel using 16 different processors. Fifteen processors are used to simultaneously evaluate the different proposals, and achieve a linear speed up, whereas the remaining processor serves to execute the main algorithmic tasks of MT-DREAM(ZS) . We invert for the log-resistivity values, and use a Jeffreys prior in the range of 100.5 to 103.5 m. We also invert for the hyperparameter r, which represents the standard deviation of the measurement data errors as a percentage of the measured impedances. We use a Jeffreys prior for r as well, and define its upper and lower bound as half and double its true value (i.e. 1.5–6 per cent). Appendix B details the log-likelihood that is used to estimate r from the RMT data. In the first MCMC trial, no constraints on the model structure (see eq. 5) were specified. Convergence of the chains was reached after about 100 000 computational time units (CTUs, cf. Laloy & Vrugt 2012). Note that a single update of each of the parallel chains requires two CTUs, one for the evaluation of the candidate points, and one for the calculation of the posterior density of the reference set. To provide insights into the properties of the posterior resistivity distribution, Fig. 2 displays four randomly chosen posterior models..

(10) 2-D MCMC inversion of electromagnetic data. 1513. Figure 2. (a)–(d) Posterior MCMC realizations from the inversion of RMT data with no model constraints other than minimal and maximal parameter bounds of ρ = 100.5 and 103.5 m, respectively. It is very difficult to identify a clear correlation between these realizations and the true underlying model in Fig. 1(a).. The corresponding data misfit is also listed. The models exhibit an extreme variability and the only structure that is clearly persistent in all four realizations is the shallow conductor. Figs 3(a)–(c) depict ranges of the marginal posterior pdf of the resistivity of three vertical profiles. As expected, these results illustrate that model variability increases with depth. The first 20 m appear rather well constrained by the data, but the uncertainty of the resistivity significantly increases beyond this depth. The data misfit and marginal posterior pdfs of the impedance error are represented with histograms in Figs 3(d) and (e), respectively. The marginal distribution of the data misfit is centred on its a priori expected value of N, a finding that inspires confidence in the ability of MT-DREAM(ZS) to converge to the adequate parameter values. In other words, the proposed models do not systematically over or under fit the calibration data. Note also that the standard deviation of the relative data error is well resolved with mean value of r = 0.03 and standard deviation of 0.001 (see Fig. 3e). To determine whether model constraints about the considered subsurface influence the efficiency and robustness of MCMC simulation, a second inversion was performed in which smoothly varying resistivity structures were favoured by including eq. (9) in the prior pdf. The prior distribution in this case is then the same Jeffreys distribution as before with the same parameter ranges, but multiplied by the exponential of eq. (9). The regularization weight, λ was assumed to follow a Jeffreys prior with range of half and two times the optimal value derived by fitting a normal distribution (eq. A2) to the true log-resistivity model. For convenience, we further assumed a similar value of λ in both the vertical and horizontal direction. Numerical results show that convergence was achieved after approximately 75 000 CTUs. Fig. 4 illustrates that the posterior realizations exhibit far less spatial variability than those previously derived for the unconstrained case without smoothness constraints, although the models are visually quite different. This is further confirmed by the vertical resistivity profiles depicted in Figs 5(a)–(c). Model parameter uncertainty has significantly reduced, but with the side effect that some features of the true model are no longer accu-. rately represented in the posterior pdf. Indeed, the two conductors and the shallow resistor are clearly detected, but the deep resistor is not adequately resolved. Yet, the MCMC inferred resistivity increases with depth, which is consistent with the observations. The marginal distribution of the data misfit presented in Fig. 5(d) again nicely centres on the true value, and is quite similar to the unconstrained inversion trial. The same is true for the data error estimation (Fig. 5f): the true value is obtained and the variability is similar to that previously observed in Fig. 3(e). The estimated value of λ is slightly larger than its previous counterpart derived from the true log-resistivity model. This finding is to be expected and is a direct consequence of the influence of the data misfit term in the estimation (i.e. less weight is put on the model constraints). We now summarize the MCMC results with an l1 measure (see eq. 11) for the model constraints. For this inversion, we use a data set contaminated with exponentially distributed errors and loglikelihood function given by eq. (7). For consistency, we again use a Jeffreys prior for all regular model parameters (resistivities) and hyperparameters (regularization weight and impedance error). The resistivity and impedance error prior bounds remain the same as in the past examples, but the prior of the regularization weight ranges from half (0.055) to four (0.44) times the value found by fitting eq. (11) to the true resistivity model. We purposely increased the upper bound of λ so that the posterior pdf was unaffected by the a priori bounds. About 67 000 CTUs were needed to declare convergence to a limiting distribution. The posterior realizations presented in Fig. 6 are rather homogeneous, and display even less variability than their counterparts previously depicted in Fig. 4 using the least-squares model constraints. The two shallow features are clearly identified, and a deep conductor can be seen in three of the four figures. The deep resistor however is not evident in any of the models. This becomes more evident if we plot the three depth profiles (Figs 7a– c). The 95 per cent posterior uncertainty ranges are comparable to those obtained with the inversion using the l2 model constraints. The data misfit and the impedance errors are very well recovered. However, the posterior mean of λ is substantially larger than its.

(11) 1514. M. Rosas-Carbajal et al.. Figure 3. MCMC inversion of RMT data without model constrains. (a)–(c) Marginal posterior pdf of the vertical profiles V1, V2 and V3 corresponding to the offsets (a) 55 m, (b) 95 m and (c) 135 m. The red line represents the true values, while the solid and dashed blue lines represent the mean and P2.5 and P97.5 percentiles, respectively. It is seen that below ∼30 m the posterior models span the full prior range of resistivity. Grey colour-coding indicates the full posterior pdf range. Histograms of the (d) data misfit and (e) the inferred impedance error marginal posterior pdf. The red crosses at the top of the histograms depict the values corresponding to (d) the data misfit of the true model and (e) the true error standard deviation.. value derived from fitting the true model structure to an exponential model (0.11). Finally, we jointly invert the RMT and ERT data using leastsquares smoothness constraints. In this particular case, the loglikelihood function is given by the sum of those corresponding to each data set. A derivation of the ERT likelihood is presented in Appendix C. This inversion includes the ERT data error, which constitutes a new hyperparameter to be estimated. We use a Jeffreys prior for this parameter, with bounds given by half and twice its true value. Convergence of the chains was achieved after about 60 000 CTUs. The posterior realizations shown in Fig. 8 clearly resolve the two conductors and the two resistors. The vertical resistivity profiles presented in Figs 9a–c confirm that joint inversion improves parameter convergence. Yet, the resistor below the conductor (Fig. 9a) is not particularly well resolved. However, its magnitude is much better estimated than in the previous inversions. The model constraints enforce smooth transitions from the conductor to the resistor and vice versa, which complicates estimation of the actual magnitudes in the vicinity of these transitions (e.g. Fig. 9c below the conductor). The posterior histograms of the RMT (Fig. 9d) and ERT data (Fig. 9e) misfits are closely centred on their true values, a desirable finding that indicates that both data types are equally important in. the fitting of the parameters. The marginal posterior distribution of the regularization weight (Fig. 9f) demonstrates a tendency towards somewhat larger values than obtained from the RMT data. This is not surprising, as new data have been added to the likelihood function. For completeness, Figs 9(g) and (h) plot histograms of the impedance and apparent resistivity error. The posterior ranges encompass the synthetic true values, although the most likely (expected) values are somewhat smaller. This demonstrates that the measurement errors of both data types can be successfully retrieved from the joint inversion presented herein. To provide more insights into the behaviour of the MT-DREAM(ZS) algorithm, Fig. 10 presents the evolution of the sampled model structure in one randomly chosen chain as a function of the number of MCMC realizations. The true value and those inferred from the different MCMC trials are given by the l2 norm of the difference operator applied to the model vector in the horizontal and vertical directions (i.e. the term enclosed in parentheses in eq. (9)). We restrict our attention to the posterior samples—thus after burn-in (cf. Laloy & Vrugt 2012) has been achieved. The MCMC inversion without model constrains (Fig. 10a) converges to a model structure that overestimates the actual variability observed in the true model. The true model is not contained in the sampled posterior pdf. When smoothness constraints are explicitly.

(12) 2-D MCMC inversion of electromagnetic data. 1515. Figure 4. (a)–(d) Posterior MCMC realizations obtained by inverting the RMT data with least-squares smoothness constrains. All the four anomalous bodies are somewhat indicated, even if it is only the upper left conductive body that is well resolved.. Figure 5. MCMC inversion of RMT data with least-squares smoothness constrains. (a)–(c) Marginal posterior pdfs of the vertical profiles V1, V2 and V3 corresponding to the offsets (a) 55 m, (b) 95 m and (c) 135 m. The red line represents the true values, while the solid and dashed blue lines represent the mean and P2.5 and P97.5 percentiles, respectively. Grey colour-coding indicates the full posterior pdf range. It is clear that the smoothness constraints have largely decreased model variability. Histograms of the (d) data misfit, (e) regularization weight and (f) impedance error marginal posterior pdf. The red crosses at the top of the histograms depict (d) and (f) the true values and (e) the value given by fitting eq. (9) to the true log-resistivity model..

(13) 1516. M. Rosas-Carbajal et al.. Figure 6. (a)–(d) Posterior MCMC realizations obtained by inverting the RMT data with l1 smoothness constrains. The upper anomalous bodies are resolved, but not the lower ones.. Figure 7. MCMC inversion of RMT data with l1 smoothness constrains. (a)–(c) Resistivity marginal posterior pdf of the vertical profiles V1, V2 and V3 corresponding to the offsets (a) 55 m, (b) 95 m and (c) 135 m. The red line represents the true values, while the solid and dashed blue lines represent the mean and P2.5 and P97.5 percentiles, respectively. Grey colour-coding indicates the full posterior pdf range. The parameters’ uncertainties are comparable to those of the l2 smoothness constrains. Histograms of the (d) data misfit, (e) regularization weight and (f) impedance error marginal posterior pdf. The red crosses at the top of the histograms of (d) and (f) depict the true values. (e) The value given by fitting eq. (11) to the true log-resistivity model (0.11) is not comprised in the marginal posterior pdf.. included in the formulation of the log-likelihood function, the posterior models converge much closer to the true model, but with insufficient structure. This is particularly true if the l1 norm is used. The average model structure in this case is 24, which is about half. the true value. The correspondence between the true model and posterior realizations improves somewhat if an l2 norm is used. Indeed, the sampled chain trajectory moves closer to the dashed black line, but nevertheless the actual model variability is still underestimated..

(14) 2-D MCMC inversion of electromagnetic data. 1517. Figure 8. (a)–(d) Posterior MCMC realizations obtained by joint inversion of RMT and ERT data with least-squares smoothness constrains. The anomalous bodies are better defined compared with the inversions of RMT data alone (see Fig. 4).. Fortunately, a joint inversion of RMT and ERT data provides posterior realizations with properties similar to that of the true model, especially if an l2 norm is used for the model constraints. Table 1 lists the centre values and standard deviations estimated with the MCMC and most-squares inversions for the cells shown in Fig. 1(a). To enable a comparison between both methods, we calculate two different standard deviations from the posterior mean MCMC model: one for resistivity decrease and one for resistivity increase. We performed three most-squares inversions: one for the RMT data with smoothness constraints, one for the ERT data with smoothness constraints, and one for joint inversion with smoothness constraints. To find the best-fitting models, we locate that sample of the MCMC chains with largest value of the sum of eqs (5) and (9). This model was then used to initiate a deterministic inversion with additional Marquardt–Levenberg damping (cf. Kalscheuer et al. 2010) to attempt to find a model with an even larger summed loglikelihood. This model was then used by the most-squares inversion to find the extremal values of each cell. In both inversion steps, we used the mean model regularization weight determined by the MCMC inversions. As seen in Fig. 1(b), the model discretization is finer in the horizontal direction for the most-squares inversion. At each iteration we therefore averaged the two resistivities involved in each particular cell to force a single resistivity value and make it comparable to the MCMC inversion cell. The standard deviations summarized in Table 1 show that the two types of inversions provide similar uncertainty estimates. However, the standard deviations derived with the most-squares inversion are consistently larger than those derived with MCMC simulation. For example, in the single inversions of the RMT data, cell B has standard deviations of 0.18/0.19 for the MCMC inversion, and 0.24/0.24 for the most-squares inversion, respectively. These differences appear larger for the joint inversion. For instance, cell A has standard deviations of 0.08/0.08 with the MCMC inversion, but with the most-squares inversion these values are doubled. Furthermore, we see that the mean value estimates are quite different for the two types of inversion. For example, the mean value of cell A for the. ERT data and MCMC inversion is 1.0, whereas its counterpart derived from the most-squares inversion is 1.16. Thus, although the width of the uncertainty ranges can be quite similar, the mean value might induce shifts in the posterior distribution. 4 F I E L D D ATA E X A M P L E : S K E D I G A AREA (SWEDEN) We now apply our methodology to real-world RMT data. A tensor RMT survey was conducted in Skediga (Sweden) to determine the geometry of a glaciofluvial aquifer system composed of a sand/gravel formation overlying crystalline basement. The aquifer system is overlain by a formation dominated by clay lenses. We use the same RMT data as Kalscheuer & Pedersen (2007), that is, 528 data points consisting of apparent resistivities and phases of the determinant mode (Pedersen & Engels 2005), acquired at 22 different stations using 12 frequencies in the range of 4–181 kHz. An estimate of the data error was provided by the impedance estimation from the electric and magnetic field measurements and an error floor of 1.5 per cent was used as in the previous studies (Pedersen et al. 2005; Kalscheuer & Pedersen 2007). The error floor constitutes a lower bound to the estimated data errors such that no single data has an error estimate smaller than this value. Fig. 11(a) shows the model obtained by Kalscheuer & Pedersen (2007) derived from a deterministic inversion with smoothness constraints using a half-space of 1000 m as the initial model. The model was obtained after four iterations and has a data misfit of φd,2 = 1141. Pedersen et al. (2005) interpret the 30 m isoline (i.e. the transition between the two greenish colours) as the lower bound of the clay lenses. According to boreholes in the vicinity of the profiles, the transition from the aquifer to the underlying crystalline basement occurs at about 30 m depth (Kalscheuer & Pedersen 2007). We ran the MT-DREAM(ZS) algorithm on a 2-D domain consisting of 288 model parameters using the l2 smoothness constraints..

(15) 1518. M. Rosas-Carbajal et al.. Figure 9. MCMC joint inversion of RMT and ERT data with least-squares smoothness constrains. (a)–(c) Resistivity marginal posterior pdfs of the vertical profiles V1, V2 and V3 corresponding to the offsets (a) 55 m, (b) 95 m and (c) 135 m. The red line represents the true values, while the solid and dashed blue lines represent the mean and P2.5 and P97.5 percentiles, respectively. Grey colour-coding indicates the full posterior pdf range. The range of the posterior pdf is rather small, but covers essentially the true model. Histograms of the (d) RMT data misfit, (e) ERT data misfit, (f) regularization weight, (g) RMT impedance error and (h) ERT apparent resistivity error marginal posterior pdfs. The red crosses at the top of the histograms depict (d), (e), (g) and (h) the true values and (f) the value given by fitting eq. (9) to the true log-resistivity model.. Each resistivity cell is of size 5 × 10 m, except for the edges that extend to the end of the forward mesh (1300 m in each direction). We used Jeffreys priors in the range of 100.5 to 103.5 of ρ( m). In addition, we estimated two hyperparameters: the regularization weight λ and a data error correction factor. The latter represents a scaling factor of the errors and error floor. We assume a Jeffreys prior for this scaling factor, with ranges between the logarithms of 0.5 and 4. Convergence was reached after approximately 150 000 CTUs. Figs 11(b) and (c) show two realizations from the MCMC derived posterior pdf. The two models clearly indicate two shallow conductors at profile offsets of 40 m and between 170 and 220 m. A deep resistor is also found that is deeper on the left side of the profile than in the middle and that disappears on the right side. A mean posterior model was constructed by taking the mean value of the different realizations of the posterior pdf (Fig. 11d). This model is largely comparable to the model obtained by the deterministic inversion; the clay—sand/gravel transitions are located at similar depths nearly everywhere along the profile and the overall base-. ment geometry of the two different models corresponds well (this was also noted with the ensemble mean of the synthetic example using least squares smoothness constraints compared to Fig. 1(b), not shown here). Some deviations are possibly due to difference in model discretization, but may more probably be due to differences in data fitting, as discussed below. We present four vertical profiles of the posterior pdf in Figs 12(a)– (d), at offsets (a) y = 50 m, (b) y = 100 m, (c) y = 150 m and (d) y = 200 m. As expected, the profiles show an increase in model variability below the conductive clay lenses. Furthermore, we see how the clay—sand/gravel transitions are much better determined at places where the aquifer stretches up to the surface (Figs 12b and c). In these regions there is no overlapping between the two resistivity intervals, whereas in the other two profiles the transition happens more smoothly, probably due to the model constraints. Also the transition to a fixed basement resistivity is smooth because of the model regularization. Magnitudes are expected to be above ρ = c000 m for the crystalline basement (Pedersen et al. 2005). These values are reached at all profiles except in Fig. 12(d), probably.

(16) 2-D MCMC inversion of electromagnetic data. 1519. Figure 10. Posterior least-squares model structure metric as a function of realization number for the different types of MCMC inversions considered. (a) MCMC inversion of RMT data without model constrains. This inversion needs many more realizations to converge than all other cases and has a much larger average model structure. (b) MCMC inversions with model constraints. The dashed black line represents the true value. The joint inversion of RMT and ERT is the only case that proposes models with the same amount of model structure as the true model. Table 1. Mean values and standard deviations of the cells highlighted in Fig. 1(a) for individual and joint MCMC and most-squares (MS) inversions with different types of model constraints. The centre values are the mean values for the MCMC inversions and the parameter derived from the best-fitting MCMC model for the most-squares inversions (cf. Section 3 for details). The standard deviations (SD) are given in logarithmic units that are calculated individually for each side of the centre value (–/+). Type of inversion. Model constraint. Cell A Centre SD (–/+) log10 ρ ( m). Cell B Centre SD (–/+) log10 ρ ( m). Cell C Centre SD (–/+) log10 ρ ( m). Cell D Centre SD (–/+) log10 ρ ( m). Individual RMT MCMC Individual RMT MS Individual ERT MCMC Individual RMT MS Joint MCMC Joint MS True values. l2 - difference l2 - difference l2 - difference l2 - difference l2 - difference l2 - difference –. 0.97 0.98 1.00 1.16 0.94 0.99 1.0. 2.04 1.90 2.00 1.63 2.35 2.18 3.0. 2.36 2.36 2.65 2.64 2.78 3.11 3.0. 1.36 1.09 2.05 2.12 1.13 1.05 1.0. 0.12/0.11 0.15/0.12 0.10/0.09 0.17/0.17 0.08/0.08 0.15/0.16 N/A. due to the important clay thickness in the shallow part of the model. Figs 12(e) and (f) show marginal distributions of the posterior data misfit and the data error correction factor. These two variables are related. The mean data misfit is 542 and the number of data is comprised within the estimated data misfit uncertainty range. The mean data error correction factor is 1.84, hence data errors are estimated to be almost twice those initially assumed for the impedances. The data misfits presented in Fig. 11 are calculated using data errors corrected with this value, and they show that the model given by the deterministic inversion appears to be overfitting the data. This, in turn, could explain the differences in magnitude observed between the two models. An inversion of the Skediga data set with the same priors for the error scaling factor and resistivity values but with no model constraints converged to a similar marginal posterior pdf of the impedance errors (not shown). In accordance with the synthetic. 0.18/0.19 0.24/0.24 0.12/0.10 0.23/0.24 0.18/0.18 0.25/0.25 N/A. 0.11/0.15 0.19/0.17 0.11/0.11 0.18/0.18 0.17/0.15 0.20/0.18 N/A. 0.21/0.21 0.22/0.26 0.14/0.14 0.23/0.23 0.23/0.25 0.22/0.26 N/A. example, the posterior pdf of the unconstrained inversion contains models with unrealistically high spatial variability.. 5 DISCUSSION We have presented the first fully 2-D pixel-based MCMC inversion of plane-wave EM data. While the presented results indicate that the inversion can be successfully addressed within a probabilistic framework, notable features and issues arise that are discussed in more detail below. A comparison between the most-squares and MCMC inversions showed that while the former tends to provide slightly larger uncertainty estimates, the results of the two approaches are comparable. A more substantial difference between the methods relates to.

(17) 1520. M. Rosas-Carbajal et al.. Figure 11. (a) Deterministic inversion model obtained from RMT data acquired at Skediga, Sweden (modified after Kalscheuer & Pedersen 2007). Numbered letters V1, V2, V3 and V4 indicate the offsets at which the resistivity marginal posterior pdfs are presented in Fig. 12. (b)–(c) Posterior MCMC realizations obtained by inversion of the same data with least-squares smoothness constrains. (d) Ensemble posterior mean model from MCMC inversion. The data misfits are calculated with errors inferred from the mean value of Fig. 12(e). Note the strong similarity between the models in (a) and (d).. the centre values from which the uncertainty estimates are derived. This difference is mainly caused by the fact that the most-squares inversion starts from a model that minimizes the combined data and model misfit function, while the MCMC analysis is based on an ensemble mean model obtained from a combination of the marginal estimates of individual variables. The minimization approach used in the most-squares inversion is not rigorously formal, as the best model should be the one that best represents the statistics of the posterior pdf rather than the minimization of the combined data and model misfit function. Calculating maximal and minimal perturba-. tions of specific parameters from this ‘optimal’ model could be the reason for the ‘shifted’ and slightly larger uncertainty ranges compared to the MCMC estimates that describe the ensemble statistics of the posterior pdf. The type of model parameterization and the number of parameters have an important impact on the posterior pdfs. Laloy et al. (2012) and Linde & Vrugt (2013) used model parameterizations based on Legendre polynomials and the discrete cosine transform, respectively, to show how improper model truncations may lead to biased model estimates. To alleviate this problem, we considered.

(18) 2-D MCMC inversion of electromagnetic data. 1521. Figure 12. MCMC inversion of the Skediga data set with least-squares model constraints. (a)–(d) Resistivity marginal posterior pdf of the vertical profiles V1, V2, V3 and V4 corresponding to the offsets (a) 50 m, (b) 100 m, (c) 150 m and (d) 200 m of the model shown in Fig. 11(d). The solid and dashed blue lines represent the mean and P2.5 and P97.5 percentiles, respectively. The red line represents the values obtained with the deterministic inversion (see Fig. 11a). Grey colour-coding indicates the full posterior pdf range. (e)–(f) Histograms of the (e) data misfit and (f) impedance error scaling factor marginal posterior pdfs. The red cross at the top of (e) depicts the number of data.. a finely discretized model. However, the unconstrained inversions converge to models that exhibit much more structure than the true model (see Fig. 10a), which is in agreement with Linde & Vrugt (2013). When running inversions with coarser grids (i.e. 10 × 10 m cells, not shown herein), the proposed models and the true model are in much better agreement and the uncertainty ranges of the parameters were strongly reduced. This highlights the fundamental trade-off between model resolution and variability: allowing a higher spatial resolution by using smaller model cells implies larger resistivity ranges for each pixel. To obtain meaningful results for fine model discretizations, it appears fundamental to add additional constraints regarding the model structure. As noted by Grandis et al. (1999) for the 1-D MT problem, the use of least-squares smoothness constraints reduced the presence of unrealistic oscillations in the models and led to smaller and more realistic estimates of parameter uncertainty. Unfortunately, the models provided by the constrained inversions did not contain all the features of the true model. In regions where the data are not sensitive enough, the model constraints strongly affect the resulting parameter values and result in biased estimates. The problem of biased estimates was partly mitigated through joint inversion of the plane-wave EM data with ERT. The inversion of the ERT data alone with l2 smoothness constraints (not shown). did recover the deep resistor albeit with a smaller magnitude than the true value, but not the deep conductor that was resolved by the RMT data. As seen in Fig. 10, when inverting the ERT data and plane-wave EM data separately, constraining the model structure led to oversimplified models, whereas the joint inversion led to the correct amount of model structure for this specific application. The models obtained from the plane-wave EM data could clearly be improved by adding lower frequencies, while a larger electrode spread would improve the ERT models. However, our intention was not to determine an optimal experimental design, but to evaluate the implications of the different constraints applied to the inferred subsurface models. In this sense, we see how the combination of two complimentary methods helps to better estimate the resistivity models in terms of structure and magnitude, and effectively reduces the weight given to the model constraints. Other strategies can also be applied to tackle the aforementioned issues. The incorporation of a pre-supposed geostatistical model or summary statistics derived from training images can easily be incorporated in the Bayesian framework (e.g. Cordua et al. 2012). Clearly, the resulting models would be much closer to the true model if the true model structure was known and we penalized deviations from this value in eqs (9) and (11), rather than penalizing deviations from zero variability. Reliable information of this kind is often not.

(19) 1522. M. Rosas-Carbajal et al.. available and strong assumptions about the model structure will to a certain degree promulgate biased model estimates. Nevertheless, it might be favourable to test the resulting models under such restrictive assumptions, rather than to obtain models that are too variable to be meaningful. Alternatively, one may consider a set of possible model parameterizations, model discretizations and/or model constraints that may seem equally suitable for a specific problem. In the spirit of Oldenburg & Li (1999), one may test the different hypotheses of the model structure and compare the results. More quantitatively, a 2-D trans-dimensional inversion algorithm could be implemented. The trans-dimenional algorithm would, for a chosen parameterization, estimate the appropriate degree of discretization, while inherently favouring models with fewer parameters (see Bodin & Sambridge 2009 for a 2-D application to seismic tomography). The implementation of such a method is beyond the scope of the present work. Possibly more interesting than determining appropriate model discretizations would be to determine preferred model parameterizations. In fact, a formal theory based on Bayes factors (e.g. Kass & Raftery 1995) could be used to evaluate evidence in favour of a null hypothesis (see Khan & Mosegaard 2002 and Khan et al. 2004 for applications of Bayes factors to study the physical properties of the moon). Bayes factors could be used within a model selection strategy to evaluate the a posteriori probability of different model parameterizations and discretizations. We leave such a study of Bayesian hypothesis testing for future work.. 6 C O N C LU S I O N S We presented the first pixel-based and fully 2-D MCMC inversion of plane-wave EM and ERT data. The results of the inversion include the posterior mean and uncertainty of the model parameter estimates. Numerical findings demonstrated a necessity to add explicit constraints on the model structure to obtain meaningful results. These constraints were designed such that they favour model parsimony, and consequently the posterior ensemble mean was shifted closer to that of its true value. However, model interpretation should be done with some care, acknowledging that models may be biased in regions with insufficient data sensitivity, and uncertainty estimates are determined by the imposed model constraints. The MCMC inversion not only appropriately converged to the posterior mean model, the posterior realizations adequately estimated the actual data errors, including a regularization weight that favours the appropriate model structure. Joint inversion of the ERT and plane-wave EM data provided the best model estimates. The inversion methodology was applied to real RMT aquifer data from Sweden. The MCMC derived posterior mean model was very similar to that of the model geometry obtained from a deterministic inversion. On top of this, the MT-DREAM(ZS) algorithm also retrieved a correction of the impedance errors, which suggested that the deterministic inversion might have overfitted the experimental data. The differences among the resistivity magnitudes of the two different models may hence be explained by a difference in data fitting. Future work should involve diagnostic criteria and methodologies that help favour model selection. In this regard, Bayes factors may be of particular interest.. AC K N OW L E D G E M E N T S We thank Jinsong Chen, Amir Khan, one anonymous reviewer, and the editor Mark Everett for their very helpful comments that. improved the quality of the paper. Laust B. Pedersen, from Uppsala University, kindly provided the RMT data from Skediga, Sweden. The source code of the MT-DREAM(ZS) algorithm can be obtained from the last author upon request. This research was supported by the Swiss National Science Foundation under grant 200021– 130200. REFERENCES Alumbaugh, D.L. & Newman, G.A., 2000. Image appraisal for 2-D and 3-D electromagnetic inversion. Geophysics, 65 (5), 1455–1467. Besag, J., Green, P., Higdon, D. & Mengerson, K., 1995. Bayesian computation and stochastic systems, Stat. Sci., 10, 3–41. Bodin, T. & Sambridge, M., 2009. Seismic tomography with the reversible jump algorithm, Geophys. J. Int., 178, 1411–1436. Buland, A. & Kolbjornsen, O., 2012. Bayesian inversion of CSEM and magnetotelluric data, Geophysics, 77(1), E33–E42. Chen, J., Kemna, A. & Hubbard, S.S., 2008. A comparison between GaussNewton and Markov-chain Monte Carlo-based methods for inverting spectral induced-polarization data for Cole-Cole parameters, Geophysics, 73(6), F247-F259. Chen, J., Vasco, D., Rubin, Y. & Hou, Z., 2007. A Bayesian model for gas saturation estimation using marine seismic AVA and CSEM data, Geophysics, 72(2), WA85–WA95. Chen, J., Hoversten, G.M., Key, K., Nordquist, G. & Cumming, W., 2012. Stochastic inversion of magnetotelluric data using a sharp boundary parameterization and application to a geothermal site, Geophysics, 77(4), E265–E279. Constable, S.C., Parker, R.L. & Constable, C.G., 1987. Occam’s inversion: a practical algorithm for generating smooth models from electromagnetic sounding data, Geophysics, 52, 289–300. Cordua, K.S., Hansen, T.M. & Mosegaard, K., 2012. Monte Carlo fullwaveform inversion of cross hole GPR data using multiple-point geostatistics a priori information, Geophysics, 77, H19–H31. deGroot-Hedlin, C.D. & Constable, S.C., 1990. Occam’s inversion to generate smooth, two-dimensional models from magnetotelluric data, Geophysics, 55, 1613–1624. Dosso, S.E. & Oldenburg, D.W., 1991. Magnetotelluric appraisal using simulated annealing, Geophys. J. Int., 106, 379–395. Everett, M.E. & Schultz, A. 1993. Two-dimensional magnetotelluric inversion using a genetic algorithm, J. Geomagn. Geoelectr., 45, 1013–1026. Farquharson, C.G., 2008. Constructing piecewise-constant models in multidimensional minimum-structure inversions, Geophysics, 73, K1–K9. Farquharson, C.G. & Oldenburg, D.W., 1998. Nonlinear inversion using general measures of data misfit and model structure, Geophys. J. Int., 134, 213–227. Fischer, G. & LeQuang, B.V., 1981. Topography and minimization of the standard deviation in one-dimensional magnetotelluric modeling, Geophys. J. R. astr. Soc., 67, 279–292. Gelman, A.G. & Rubin, D.B., 1992. Inference from iterative simulation using multiple sequences, Stat. Sci., 7, 457–472. Grandis, H., Menvielle, M. & Roussignol, M., 2002. Thin-sheet electromagnetic modeling using Monte Carlo Markov chain (MCMC) algorithm, Earth Planets Space, 54, 511–521. Grandis, H., Menvielle, M. & Roussignol, M., 1999. Bayesian inversion with Markov chains—I. The magnetotelluric one-dimensional case, Geophys. J. Int., 138, 757–768. Guo, R., Dosso, S.E., Liu, J., Dettmer, J. & Tong, X., 2011. Non-linearity in Bayesian 1-D magnetotelluric inversion, Geophys. J. Int., 185, 663–675. Hou, Z.S., Rubin, Y., Hoversten, G.M., Vasco, D. & Chen, J.S., 2006. Reservoir-parameter identification using minimum relative entropy-based Bayesian inversion of seismic AVA and marine CSEM data, Geophysics, 71, O77–O88. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems, Univ. of Mich. Press. Jackson, D., 1976. Most squares inversion, J. geophys. Res., 81, 1027–1030. Jeffreys, H., 1939. Theory of probability, Oxford Univ. Press, Inc..

(20) 2-D MCMC inversion of electromagnetic data Kalscheuer, T. & Pedersen, L.B., 2007. A non-linear truncated SVD variance and resolution analysis of two-dimensional magnetotelluric models, Geophys. J. Int., 169, 435–447. Kalscheuer, T., Garc´ıa Juanatey, M., Meqbel, N. & Pedersen, L.B., 2010. Non-linear model error and resolution properties from two-dimensional single and joint inversions of direct current resistivity and radiomagnetotelluric data, Geophys. J. Int., 182, 1174–1188. Kass, R.E. & Raftery, A.E., 1995. Bayes factors, J. Am. Stat. Assoc., 90(430), 773–795. Khan, A. & Mosegaard, K., 2002. An inquiry into the lunar interior: a nonlinear inversion of the Apollo lunar seismic data, J. geophys. Res., 107, E6. Khan, A., Connolly, J.A.D. & Olsen, N., 2006. Constraining the composition and thermal state of the mantle beneath Europe from inversion of longperiod electromagnetic sounding data, J. geophys. Res., 111, 1978–2012. Khan, A., Mosegaard, K., Williams, J.G. & Lognonné, P., 2004. Does the Moon possess a molten core? Probing the deep lunar interior using results from LLR and lunar Prospector, J. geophys. Res., 109, E09007. Kirkpatrick, S., 1984. Optimization by simulated annealing: quantitative studies, J. Stat. Phys., 34, 975–986. Laloy, E. & Vrugt, J.A., 2012. High-dimensional posterior exploration of hydrological models using multiple-try DREAM(ZS) and highperformance computing, Water Resour. Res., 48, W01526, doi:10.1029/ 2011WR010608. Laloy, E., Linde, N. & Vrugt, J.A., 2012. Mass conservative threedimensional water tracer distribution from MCMC inversion of timelapse GPR data, Water Resour. Res., 48, W07510, doi:10.1029/ 2011WR011238. Linde, N. & Vrugt, J., 2013. Distributed soil moisture from crosshole grounpenetrating radar using Markov chain Monte Carlo simulation, Vadose Zone J., 11. Liu, J.S., Liang, F. & Wong, W.H., 2000. The multiple-try method and local optimization in Metropolis sampling, J. Am. Stat. Assoc., 95(449), 121– 134. Malinverno, A., 2000. A Bayesian criterion for simplicity in inverse problem parametrization, Geophys. J. Int., 140, 267–285. Malinverno, A., 2002. Parsimonious Bayesian Markov chain Monte Carlo inversion in a nonlinear geophysical problem, Geophys. J. Int., 151, 675– 688. Malinverno, A. & Briggs, V.A., 2004. Expanded uncertainty quantification in inverse problems: hierarchical Bayes and empirical Bayes, Geophysics, 69, 1005–1016. Meju, M.A. & Hutton, V.R.S., 1992. Iterative most-squares inversion: application to magnetotelluric data, Geophys. J. Int., 108, 758–766. Menke, W., 1989. Geophysical Data Analysis: Discrete Inverse Theory, Vol. 45, International Geophysics Series, Academic Press. Minsley, B.J., 2011. A trans-dimensional Bayesian Markov chain Monte Carlo algorithm for model assessment using frequency-domain electromagnetic data, Geophys. J. Int., 187, 252–272. Mosegaard, K. & Tarantola, A., 1995. Monte Carlo sampling of solutions to inverse problems, J. geophys. Res., 100(B7), 12–431. Oldenburg, D.W. & Li, Y.G., 1999. Estimating depth of investigation in dc resistivity and IP surveys, Geophysics, 64, 403–416. Pedersen, L.B. & Engels, M., 2005. Routine 2D inversion of magnetotelluric data using the determinant of the impedance tensor, Geophysics, 70(2), G33–G41. Pedersen, L., Bastani, M. & Dynesius, L., 2005. Groundwater exploration using combined controlled-source and radiomagnetotelluric techniques, Geophysics, 70, G8–G15. Pérez-Flores, M.A. & Schultz, A., 2002. Application of 2-D inversion with genetic algorithms to magnetotelluric data from geothermal areas, Earth Planets Space, 54, 607–616. Ray, A. & Key, K., 2012. Bayesian inversion of marine CSEM data with a trans-dimensional self parametrizing algorithm, Geophys. J. Int., 191, 1135–1151. Roberts, G.O. & Rosenthal, J.S., 2007. Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms, J. appl. Probab., 44, 458– 475.. 1523. Rodi, W. & Mackie, R., 2001. Nonlinear conjugate gradients algorithm for 2-D magnetotelluric inversion, Geophysics, 66, 174–187. Rosas Carbajal, M., Linde, N. & Kalscheuer, T., 2012. Focused time-lapse inversion of radio and audio magnetotelluric data, J. appl. Geophys., 84, 29–38. Shearer, P., 1997. Improving local earthquake locations using the L1 norm and waveform cross correlation: Application to the Whittier Narrows, California, aftershock sequence, J. geophys. Res., 102(B4), 8269–8283. Siripunvaraporn, W. & Egbert, G., 2000. An efficient data-subspace inversion method for 2-D magnetotelluric data, Geophysics, 65, 791–803. Siripunvaraporn, W., Egbert, G., Lenbury, Y. & Uyeshima, M., 2005. Threedimensional magnetotelluric inversion: data-space method, Phys. Earth planet. Int., 150(1–3), 3–14. Tarantola, A., 2005. Inverse Problem Theory and Methods for Model Parameter Estimation, Society for Industrial Mathematics. Tarantola, A. & Valette, B., 1982. Inverse problems = quest for information, Geophysics, 50(3), 150–170. Tarits, P., Jouanne, V., Menvielle, M. & Roussignol, M., 1994. Bayesian statistics of nonlinear inverse problems—example of the magnetotelluric 1-D inverse problem, Geophys. J. Int., 119, 353–368. Ueberhuber, C.W., 1997. Numerical Computation 2: Methods, Software, and Analysis, Springer-Verlag. Vrugt, J.A., ter Braak, C.J.F., Clark, M.P., Hyman, J.M. & Robinson, B.A., 2008. Treatment of input uncertainty in hydrologic modeling: Doing hydrology backward with Markov chain Monte Carlo simulation, Water Resour. Res. 44, W00B09, doi:10.1029/2007WR006720. Vrugt, J.A., ter Braak, C.J.F., Diks, C.G.H., Higdon, D., Robinson, B.A. & Hyman, J.M., 2009. Accelerating Markov chain Monte Carlo simulation with self-adaptive randomized subspace sampling, Int. J. Nonlin. Sci. Num., 10, 273–290.. APPENDIX A: 2-D SMOOTHNESS CONSTRAINTS To obtain smoothly varying model property variations in the 2-D models, we impose zero-mean normal prior distributions with respect to the vertical and horizontal log-resistivity gradients: ⎧

(21) . y 1 1 ⎪ ⎨ cm,2 (m) = 2π α2 M y exp − 2α2y mT DTy D y m ( y) (A1) T T

(22) , 1 1 ⎪ c z (m) = ⎩ Mz exp − 2α 2 m Dz Dz m m,2 2 z (2π αz ) where D y and Dz are the difference operators in the horizontal and vertical directions with rank M y and Mz , respectively, and α y and αz are the standard deviations of the log-resistivity gradients in each direction. Assuming that the two pdfs are uncorrelated, the joint pdf of the horizontal and vertical resistivity gradients is given by multiplication of each pdf (eq. 8). When the standard deviations are the same, eq. (8) can be expressed as cm,2 (m) =. 1. 1 (2π λ2 ) M y . × exp −. (2π λ2 ) Mz.

(23) 1 T T T T m D D m + m D D m , y y z z 2λ2. (A2). where λ = αz = α y . Taking the logarithm of eq. (A2) results in.

(24)

(25) log (cm,2 (m)) = −M y log 2π λ2 − Mz log 2π λ2 −.

(26) 1 T T m D y D y m + mT DTz Dz m , 2λ2. (A3). or, equivalently.

(27).

(28). log (cm,2 (m)) = − M y + Mz log 2π λ2 −.

(29) 1 T T m D y D y m + mT DTz Dz m . 2λ2. (A4).

(30) 1524. M. Rosas-Carbajal et al.. APPENDIX B: LOG-LIKELIHOOD F U N C T I O N F O R P L A N E - WAV E E M D ATA. which is equivalent to N /2. l(m |d ) = −. Equation (5) represents the log-likelihood function of a set of normally distributed errors that have zero mean and are uncorrelated. These errors may, however, have different standard deviations. Indeed, RMT dataoften comprise apparent resistivities and phases. Let the first N 2 data points be the apparent resistiviapp ties di = ρi , i = 1, . . . , N /2, and the last N 2 data points the phases di = φi , i = N /2 + 1, . . . , N . The data standard deviations can then be expressed as (Fischer & LeQuang 1981) r di , if i = 1, . . . , N /2 , (B1) σi = r , if i = N /2 + 1, . . . , N 2 where r is the standard deviation of the relative error of the apparent resistivities, which is assumed to be the same for all measurements. Using eq. (B1), the middle term in eq. (5) can be expressed as ⎛ ⎞ N N /2 N r 2 .

(31) 1 1 2 app ⎠, log σi2 = log ⎝ rρi (B2) 2 2 2 i=1 i=1 i=N /2+1 which leads to N N /2 r N app 1 2 log σi = log ρ . 2 2 N /2 i=1 i i=1. (B3). N 1 gi (m) − di 2 . 2 i=1 σi. (B5). APPENDIX C: LOG-LIKELIHOOD F U N C T I O N S F O R E RT D ATA In the case of ERT, we consider a single type of data. The apparent resistivities are assumed to comprise relative errors. Therefore, we follow the same derivation as in Appendix B, but with standard deviations given by σi = r di ,i = 1, . . . , N . Then, the middle term of eq. (5) can be expressed as N N 1 app 2 N log σi = log r ρi , (C1) 2 i=1 i=1 which leads to a log-likelihood of the form. Expanding the logarithm and replacing this expression in eq. (5) gives N N log(2π ) + log(2) − N log(r ) 2 2 N /2 N . 1 gi (m) − di 2 app log ρi − , − 2 i=1 σi i=1. −. N app log ρi log(π ) − N log(r ) − 2 i=1. l(m |d ) = −. (B4). N app log ρi log(2π ) − N log(r ) − 2 i=1 N. l(m |d ) = −. −. N 1 gi (m) − di 2 . 2 i=1 σi. (C2).

(32)