classified into two finer classes: exact subsampling methods and approximate subsampling methods, depending on their resulting outputs. Exact subsampling approaches typically require subsets of data of random size at each iteration. One solution to this effect is taking advantage of pseudo-marginal **MCMC** via constructing unbiased estimators of the target density evaluated on subsets of the data ( Andrieu and Roberts , 2009 ). Quiroz et al. ( 2016 ) follow this direction by combining the powerful debiasing technique of Rhee and Glynn ( 2015 ) and the correlated pseudo-marginal **MCMC** approach of Deligiannidis et al. ( 2015 ). Another direction is to use piecewise deterministic Markov processes (PDMP) ( Davis , 1984 , 1993 ), which enjoy the target distribution as the marginal of their invariant distribution. This PDMP version requires unbiased estimators of the gradients of the log-likelihood function, instead of the likelihood itself. By us- ing a tight enough bound on the event rate function of the associated Poisson processes, PDMP can produce super-efficient scalable **MCMC** **algorithms**. The bouncy particle sampler ( Bouchard-Cˆot´e et al. , 2017 ) and the zig-zag sampler ( Bierkens et al. , 2016 ) are two competing PDMP **algorithms**, while Bierkens et al. ( 2017 ) unify and extend these two methods. Besides, one should note that PDMP produces a non-reversible Markov chain, which means that the algorithm should be more efficient in terms of mixing rate and asymptotic variance, when compared with reversible **MCMC** **algorithms**, such as MH, HMC and MALA, as observed in some theoretical and experimental works ( Hwang et al. , 1993 ; Sun et al. , 2010 ; Chen and Hwang , 2013 ; Bierkens , 2016 ).

En savoir plus
WITH SUBGEOMETRIC KERNELS
YVES ATCHAD´ E AND GERSENDE FORT
Abstract. This paper deals with the ergodicity (convergence of the marginals) and the law of large numbers for adaptive **MCMC** **algorithms** built from transition kernels that are not necessarily geometrically ergodic. We develop a number of results that broaden significantly the class of adaptive **MCMC** **algorithms** for which rigorous analysis is now possible. As an example, we give a detailed analysis of the Adaptive Metropolis Algorithm of Haario et al. (2001) when the target distribution is sub-exponential in the tails.

En savoir plus
We propose a new online relabelling procedure based on an adaptive **MCMC** algorithm [ 3 ][ 7 ] that tunes its design parameters on the fly to improve its efficiency. We prove the convergence of our algorithm and identify the link between the new target measure and the original distribution of interest π. We also study different mechanisms for the selection of the relabelling at each time step that are inspired by usual clustering techniques, and their influence on the convergence of the global **MCMC** algorithm. Finally, we demonstrate our algorithm on a problem inspired by a real counting issue encountered in experimental particle physics.

En savoir plus
The remainder paper is organized as follows. In Section 2 , we present our method- ology to compute the minimizer θ ∗ (f ) of ( 9 ) and the construction of control variates for some **MCMC** **algorithms**. In Section 3 , we state our main result which guarantees that the asymptotic variance σ ∞,d 2 (f ) defined in ( 2 ) and associated with a given **MCMC** method is close (up to a scaling factor) to the asymptotic variance of the Langevin diffusion σ ∞ 2 (f ) defined in ( 7 ). We provide a CLT and we show that under appropri-

Fig. 1. Acquisition system.
Fig. 2. Synthetic trace.
[15] enable us to circumvent this difficulty by solving the inte- gration and optimization problems by simulating random vari- ables. This approach leads to stochastic versions of the EM al- gorithm [stochastic expectation maximization (SEM), SAEM] [12]. Moreover, we investigate the problem in a totally Bayesian framework in which prior information is introduced upon the pa- rameters [10]. Then, the estimation of the Bernoulli–Gaussian model and of the wavelet is solved by the simulation of random variables via **MCMC** **algorithms** such as the Gibbs sampler [10]. Once the model has been estimated, the next step is the de- convolution itself. This problem is said to be “ill-posed” be- cause, in the presence of noise, different reflectivity sequences can lead to similar seismic data. Therefore, it is necessary to use as much prior information as possible upon the reflectivity to limit the set of acceptable solutions. Thus, it is natural to ac- count for the Bernoulli–Gaussian assumption introduced above. Then, we need an adequate procedure to achieve the detection of reflectors and the estimation of their amplitude. Actually, the problem can be solved using either the maximum a posteriori (MAP) criterion, which can be optimized using the simulated annealing technique, or by the suboptimum maximum posterior mode (MPM) method, which involves optimization by means of a **MCMC** technique.

En savoir plus
3.3. Estimation Algorithm
The estimation algorithm robustness is a crucial point. Indeed, we need to process thousands of galaxies and thus the algorithm must tolerate large approximation on the initial parameters so that it can be used in an unsupervised mode. Several authors have pointed out the diculty of providing a fully automatic algorithm for the estimation of the parameters of the much more simpler two components (bulge and disc) model in mono band images [16, 29, 30]. Obviously, the estimation of a more complex model generates more diculties. To overcome this problem we propose to use **MCMC** methods. **MCMC** **algorithms** allow to sample the parameter space according to the target distribution and theoretical results prove the convergence of the distribution of the samples to the target distribution in innite time.

En savoir plus
Fig. 1. Acquisition system.
Fig. 2. Synthetic trace.
[15] enable us to circumvent this difficulty by solving the inte- gration and optimization problems by simulating random vari- ables. This approach leads to stochastic versions of the EM al- gorithm [stochastic expectation maximization (SEM), SAEM] [12]. Moreover, we investigate the problem in a totally Bayesian framework in which prior information is introduced upon the pa- rameters [10]. Then, the estimation of the Bernoulli–Gaussian model and of the wavelet is solved by the simulation of random variables via **MCMC** **algorithms** such as the Gibbs sampler [10]. Once the model has been estimated, the next step is the de- convolution itself. This problem is said to be “ill-posed” be- cause, in the presence of noise, different reflectivity sequences can lead to similar seismic data. Therefore, it is necessary to use as much prior information as possible upon the reflectivity to limit the set of acceptable solutions. Thus, it is natural to ac- count for the Bernoulli–Gaussian assumption introduced above. Then, we need an adequate procedure to achieve the detection of reflectors and the estimation of their amplitude. Actually, the problem can be solved using either the maximum a posteriori (MAP) criterion, which can be optimized using the simulated annealing technique, or by the suboptimum maximum posterior mode (MPM) method, which involves optimization by means of a **MCMC** technique.

En savoir plus
Abstract
The Importance Sampling method is used in combination with **MCMC** in Bayesian simulation study. In the particular context of numerous simulated data sets, **MCMC** **algorithms** have to be called several times which may become computationally expensive. Since Importance Sampling requires the choice of an importance function, we propose to use **MCMC** on a preselected set of the simulated data and then to obtain Markovian realisations of each corresponding posterior distribution. The estimates for the other simulated data are computed via IS by having previously chosen one of the preselected posterior distributions. This chosen posterior distribution is then the importance function. IS procedure is improved by choosing for each data set a different importance func- tion among the preselected set of posterior distributions. For each Importance Sampling estimation, we propose two criteria to select the suitable posterior distribution. The first criterion is based on the L 1 norm of the difference between two posterior distributions

En savoir plus
Related work Since piecewise deterministic Markov processes for sampling from distributions was introduced by Peters et al. (2012), PDMP-based, continuous- time, non-reversible, **MCMC** **algorithms** have become relevant tools, from applied probability (Bierkens et al., 2017; Fontbona et al., 2016) to physics (Peters et al., 2012; Harland et al., 2017; Michel et al., 2014), to statistics (Bierkens et al., 2016; Fearnhead et al., 2018; Bierkens et al., 2018; Bouchard-Cˆ ot´e et al., 2018; Michel and S´en´ecal, 2017; Vanetti et al., 2017; Pakman et al., 2016). However, almost all existing PDMP-based **MCMC** samplers are based on two original versions: the Bouncy Particle Sampler (BPS) of Bouchard-Cˆ ot´e et al. (2018) and the Zigzag Sampler of Bierkens et al. (2016). Bouchard-Cˆ ot´e et al. (2018) exhibits that BPS can provide state-of-the-art performance compared with the reference HMC for high dimensional distributions, while Bierkens et al. (2016) shows that PDMP- based sampler is easier to scale in big data settings without introducing bias, while Bierkens et al. (2018) considers the application of PDMP for distributions on restricted domains. Fearnhead et al. (2018) unifies BPS and Zigzag sampler in the framework of PDMP and they choose the process velocity, at event times, over the unit sphere, based on the inner product between this velocity and the gradient of the potential function. (This perspective relates to the transition dynamics used in our paper.) To overcome the main difficulty in PDMP-based samplers, which is the simulation of time-inhomogeneous Poisson process, Sherlock and Thiery (2017) and Vanetti et al. (2017) resort to a discretization of such continuous-time samplers. Furthermore, pre-conditioning the velocity set is shown to accelerate the **algorithms**, as shown by Pakman et al. (2016).

En savoir plus
run **MCMC** over repeated batches, recenter all subposteriors thus obtained and take their average as an approximation of the true posterior.
Our article extends the traditional parallel **MCMC** **algorithms** in three direc- tions. First, we scale each likelihood of the subposterior with a factor such that it could be regarded as an approximation of the true likelihood, by which we mean turning each subposterior covariance matrix into the same scale with that of the true posterior. Second, our combination method is simple enough, has solid mathematical justifications and is efficient. Third, even though our method is justified in parametric framework, it can be extend to non-parametric Bayesian without modification.

En savoir plus
the Metropolis Adjusted Langevin Algorithm (MALA), and it is well-established that it has an improved complexity scaling and a better convergence behaviour than the RWM algorithm in general. This method directs the proposed moves towards areas of high probability for the distribution π thanks to the presence of the Σ ∇ log π(x) term. There is now a growing literature on gradient-based **MCMC** **algorithms**, as exemplified through the two papers [ 7 , 4 ] and the references therein. A natural question is if one can improve on the behaviour of MALA by incorporating more information about the properties of π in their proposal. A first attempt would be to use as proposal a one-step integrator with high weak order for ( 1 ), as suggested in the discussion of [ 7 ]. Although this appears not sufficient, we shall show that, by slightly modifying this approach and not focusing the the weak order itself, we are able to construct a new proposal with better convergence and complexity scaling properties than MALA. We mention that an analogous proposal is presented independently in [ 6 ] in a different context to improve the strong order of convergence of MALA.

En savoir plus
In this work we focus on the estimation of β within a Markov chain Monte Carlo (**MCMC**) algorithm that handles 2D or 3D data sets [14]–[18]. **MCMC** methods are powerful tools to handle Bayesian inference problems for which the minimum mean square error (MMSE) or the maximum a posteriori (MAP) estimators are difficult to derive analytically. **MCMC** methods generate samples that are asymptotically distributed according to the joint posterior of the unknown model parameters. These samples are then used to approximate the Bayesian estimators. However, standard **MCMC** methods cannot be applied directly to Bayesian problems based on the Potts model. Indeed, inference on β requires computing the normalizing constant of the Potts model C(β), which is generally intractable. Specific **MCMC** **algorithms** have been designed to estimate Markov field parameters in [19], [20] and more recently in [9], [10]. A variational Bayes algorithm based on an approximation of C(β) has also been recently proposed in [11]. Maximum likelihood estimation of β within expectation-maximization (EM) **algorithms** has been studied in [12], [13], [21]. The strategies used in these works for avoiding the computation of C(β) are summarized below.

En savoir plus
Stochastic Gradient Markov Chain Monte Carlo: Along with the recent advances in **MCMC** techniques, diffusion-based **algorithms** have become increasingly pop- ular due to their applicability in large-scale machine learn- ing applications. These techniques, so called the Stochas- tic Gradient **MCMC** (SG-**MCMC**) **algorithms**, aim at gen- erating samples from the posterior distribution p(θ|Y ) as opposed to finding the MAP estimate, and have strong connections with stochastic optimization techniques ( Dalalyan , 2017 ). In this line of work, Stochastic Gradient Langevin Dynamics (SGLD) ( Welling & Teh , 2011 ) is one of the pioneering **algorithms** and generates an approximate sample θ n from p(θ|Y ) by iteratively applying the follow-

En savoir plus
The rest of the paper is organized as follows: Section 2 introduces the global framework of RJ-**MCMC** and presents a general scheme to sample Gaussian vectors. Section 3 considers a speciﬁc application of the previous results, which ﬁnally boils down to the proposed RJPO sampler. Section 4 analyses the performance of RJPO compared to T-PO on simple toy problems and presents the adaptive RJPO which incorporates an automatic control of the truncation level. Finally, in section 5 , an example of linear inverse problem, the unsupervised image resolution enhancement is presented to illustrate the applicability of the method. These results show the superiority of the RJPO algorithm over the usual Cholesky factorization based approaches in terms of computational cost and memory usage.

En savoir plus
iℓ of target i, corresponding to active co-mRNA ℓ
Table 1: Nomenclature for parameter space description
from a desired probability distribution: their principle consists in constructing a Markov chain that has the desired distribution as its stationary distribution. Given an ergodic Markov chain, and p, the probabilities of transition from state to state in search space S (transition kernel), the property of reversibility between states x and y holds: π(s) p(s′ | s) = π(s′) p(s | s′) (detailed balance equation). Though reversibility is not necessary to guarantee convergence of the posterior to π, it is sufficient. Then, the key to **MCMC** consists in expressing the transition kernel p(s′ | s) as the product of an arbitrary proposal distribution, q, and an associated acceptance distribution, a: p(s′ | s) = q(s′ | s) a(s, s′). To explain the intuition behind these concepts, suppose, without loss of generality, that for states s and s′, some given transition kernel p verifies π(s) p(s′ | s) > π(s′) p(s | s′). Artificial coercion of the previous formula towards reversibility is straightforward, introducing two terms a(s, s′), strictly lower than 1, and a(s′, s), equal to 1: π(s) q(s′ | s) a(s, s′) = π(s′) q(s | s′) a(s′, s). If inequality is reversed, then a(s′, s), strictly greater than 1, and a(s, s′), equal to 1, will be used instead. Finally, acceptance probability is calculated as: a(s, s′) = min 1, π(s′) q(s′,s) π(s) q(s,s′) . The arbitrary proposal distribution q and the acceptance probability a are the two ingredients of the Metropolis-Hastings (MH) algorithm (see Algorithm 1 ).

En savoir plus
X (k) in the preselected data set (containing then M + 1 elements); this can be
done without any difficulty. 3. Applications
In this section we use both the **MCMC** (Gibbs sampling in our case) and IS methods to estimate parameters of three Poisson models. The first is a Poisson model with one parameter (the mean), the second is a Poisson regression on one covariate with two parameters (intercept and covariate association), and the third is a Poisson regression on one covariate with extra Poisson variability introduced by a Gaussian residual error term with three parameters (intercept, covariate association and residual variance). The first model can be seen as a toy example with explicit posterior distributions; the second corresponds to a widely used GLM model, and the third introduces over-dispersion which is es- sential, for example, in medical applications since association estimates would be biased if extra-Poisson variability was not modelled (see Breslow ( 1984 ) for motivations). For each model K = 101 data sets are simulated for different values of the parameters. All data sets contain n = 20 observations. Vague priors are assigned to the parameters and the posterior values are estimated via **MCMC** and IS as discussed above. Note that it is essential that **MCMC** convergence is achieved, therefore several (and not only one) diagnostics of convergence have to be checked as suggested by Brooks and Roberts ( 1998 )

En savoir plus
An important strategy for mitigating this cost is to recognize that the forward model may exhibit regularity in its dependence on the parameters of interest, such that the model outputs may be approximated with fewer samples than are needed to characterize the posterior via **MCMC**. Replacing the forward model with an approximation or “surrogate” decouples the required number of forward model evaluations from the length of the **MCMC** chain, and thus can vastly reduce the overall cost of inference ( Sacks et al. , 1989 ; Kennedy and O’Hagan , 2001 ). Existing approaches typically create high-order global approximations for either the forward model outputs or the log- likelihood function using, for example, global polynomials ( Marzouk et al. , 2007 ; Marzouk and Xiu , 2009 ), radial basis functions ( Bliznyuk et al. , 2012 ; Joseph , 2012 ), or Gaussian processes ( Sacks et al. , 1989 ; Kennedy and O’Hagan , 2001 ; Rasmussen , 2003 ; Santner et al. , 2003 ). As in most of these efforts, we will assume that the forward model is deterministic and available only as a black box, thus limiting ourselves to “non-intrusive” approximation methods that are based on evaluations of the forward model at selected input points. 1 Since we assume that the exact forward model is available

En savoir plus
We conclude that the Gibbs sampler is well adapted to pile-up affected data, and as it attains the Cramér-Rao bound it might lead to a significant reduction of the acquisition time. Therefore, we compare the **MCMC** method to the following estimation practice. Data from the pile-up model are obtained at a low laser intensity (λ = 0.05) such that the probability for 2 or more photons per laser pulse is negligible. Then the observed arrival times are considered as independent observations from the exponential mixture distribution given by (31) and the classical EM algorithm is applied. Repeated simulations provide estimates of the bias and the variance of the estimators for a two-component model and various numbers of observations. For the same two-component model we simulated data using the laser intensity λ opt =

En savoir plus
n=1
log p(x n |θ (t,k) ) (8)
Here, θ (t,k) denotes samples drawn from p(θ|t).
3. STOCHASTIC THERMODYNAMIC INTEGRATION Even though **MCMC** inference has been made much more efficient with the incorporation of stochastic gradients, marginal likelihood estimation methods that are based on **MCMC** still suffer from high computational complexity since they typically require the likelihood to be computed on the whole dataset for each sample (see Eq.8).

A. The Uniform Case
We first consider the case where all the atoms have the same probability to be active, i.e., p i = p ∀i.
For each experiment, the data vector y is generated according to model (3) with σ n 2 = 10 −4 and σ x 2 = 1. In Fig. 1 we represent the MSE, the probability of wrong decision on the elements of the support and the average running time achieved by different sparse-representation **algorithms**. Each point of simulation corresponds to a fixed number of non-zero coefficients, say K, and, given this number, the positions of the non-zero coefficients are drawn uniformly at random for each observation. We set N = 154, M = 256. In addition to the proposed procedures, we consider several **algorithms** of the state of the art: MP [12], OMP [13], StOMP [14], SP [20], IHT [17], HTP [18], Basis Pursuit Denoising (BPD) [10], SBR [7], SOBAP [33] and FBMP [28]. The stopping criterion used for MP and OMP is based on the norm of the residual: the recursions are stopped as soon as the norm of the residual drops below pNσ 2

En savoir plus