MCMC Algorithms

Top PDF MCMC Algorithms:

Accelerating MCMC algorithms

Accelerating MCMC algorithms

classified into two finer classes: exact subsampling methods and approximate subsampling methods, depending on their resulting outputs. Exact subsampling approaches typically require subsets of data of random size at each iteration. One solution to this effect is taking advantage of pseudo-marginal MCMC via constructing unbiased estimators of the target density evaluated on subsets of the data ( Andrieu and Roberts , 2009 ). Quiroz et al. ( 2016 ) follow this direction by combining the powerful debiasing technique of Rhee and Glynn ( 2015 ) and the correlated pseudo-marginal MCMC approach of Deligiannidis et al. ( 2015 ). Another direction is to use piecewise deterministic Markov processes (PDMP) ( Davis , 1984 , 1993 ), which enjoy the target distribution as the marginal of their invariant distribution. This PDMP version requires unbiased estimators of the gradients of the log-likelihood function, instead of the likelihood itself. By us- ing a tight enough bound on the event rate function of the associated Poisson processes, PDMP can produce super-efficient scalable MCMC algorithms. The bouncy particle sampler ( Bouchard-Cˆot´e et al. , 2017 ) and the zig-zag sampler ( Bierkens et al. , 2016 ) are two competing PDMP algorithms, while Bierkens et al. ( 2017 ) unify and extend these two methods. Besides, one should note that PDMP produces a non-reversible Markov chain, which means that the algorithm should be more efficient in terms of mixing rate and asymptotic variance, when compared with reversible MCMC algorithms, such as MH, HMC and MALA, as observed in some theoretical and experimental works ( Hwang et al. , 1993 ; Sun et al. , 2010 ; Chen and Hwang , 2013 ; Bierkens , 2016 ).
En savoir plus

22 En savoir plus

Limit theorems for some adaptive MCMC algorithms with subgeometric kernels

Limit theorems for some adaptive MCMC algorithms with subgeometric kernels

WITH SUBGEOMETRIC KERNELS YVES ATCHAD´ E AND GERSENDE FORT Abstract. This paper deals with the ergodicity (convergence of the marginals) and the law of large numbers for adaptive MCMC algorithms built from transition kernels that are not necessarily geometrically ergodic. We develop a number of results that broaden significantly the class of adaptive MCMC algorithms for which rigorous analysis is now possible. As an example, we give a detailed analysis of the Adaptive Metropolis Algorithm of Haario et al. (2001) when the target distribution is sub-exponential in the tails.
En savoir plus

39 En savoir plus

Relabelling MCMC Algorithms in Bayesian Mixture Learning

Relabelling MCMC Algorithms in Bayesian Mixture Learning

We propose a new online relabelling procedure based on an adaptive MCMC algorithm [ 3 ][ 7 ] that tunes its design parameters on the fly to improve its efficiency. We prove the convergence of our algorithm and identify the link between the new target measure and the original distribution of interest π. We also study different mechanisms for the selection of the relabelling at each time step that are inspired by usual clustering techniques, and their influence on the convergence of the global MCMC algorithm. Finally, we demonstrate our algorithm on a problem inspired by a real counting issue encountered in experimental particle physics.
En savoir plus

3 En savoir plus

Diffusion approximations and control variates for MCMC

Diffusion approximations and control variates for MCMC

The remainder paper is organized as follows. In Section 2 , we present our method- ology to compute the minimizer θ ∗ (f ) of ( 9 ) and the construction of control variates for some MCMC algorithms. In Section 3 , we state our main result which guarantees that the asymptotic variance σ ∞,d 2 (f ) defined in ( 2 ) and associated with a given MCMC method is close (up to a scaling factor) to the asymptotic variance of the Langevin diffusion σ ∞ 2 (f ) defined in ( 7 ). We provide a CLT and we show that under appropri-

44 En savoir plus

Blind marine seismic deconvolution using statistical MCMC methods

Blind marine seismic deconvolution using statistical MCMC methods

Fig. 1. Acquisition system. Fig. 2. Synthetic trace. [15] enable us to circumvent this difficulty by solving the inte- gration and optimization problems by simulating random vari- ables. This approach leads to stochastic versions of the EM al- gorithm [stochastic expectation maximization (SEM), SAEM] [12]. Moreover, we investigate the problem in a totally Bayesian framework in which prior information is introduced upon the pa- rameters [10]. Then, the estimation of the Bernoulli–Gaussian model and of the wavelet is solved by the simulation of random variables via MCMC algorithms such as the Gibbs sampler [10]. Once the model has been estimated, the next step is the de- convolution itself. This problem is said to be “ill-posed” be- cause, in the presence of noise, different reflectivity sequences can lead to similar seismic data. Therefore, it is necessary to use as much prior information as possible upon the reflectivity to limit the set of acceptable solutions. Thus, it is natural to ac- count for the Bernoulli–Gaussian assumption introduced above. Then, we need an adequate procedure to achieve the detection of reflectors and the estimation of their amplitude. Actually, the problem can be solved using either the maximum a posteriori (MAP) criterion, which can be optimized using the simulated annealing technique, or by the suboptimum maximum posterior mode (MPM) method, which involves optimization by means of a MCMC technique.
En savoir plus

11 En savoir plus

Hierarchical multispectral galaxy decomposition using a MCMC algorithm with multiple temperature simulated annealing

Hierarchical multispectral galaxy decomposition using a MCMC algorithm with multiple temperature simulated annealing

3.3. Estimation Algorithm The estimation algorithm robustness is a crucial point. Indeed, we need to process thousands of galaxies and thus the algorithm must tolerate large approximation on the initial parameters so that it can be used in an unsupervised mode. Several authors have pointed out the diculty of providing a fully automatic algorithm for the estimation of the parameters of the much more simpler two components (bulge and disc) model in mono band images [16, 29, 30]. Obviously, the estimation of a more complex model generates more diculties. To overcome this problem we propose to use MCMC methods. MCMC algorithms allow to sample the parameter space according to the target distribution and theoretical results prove the convergence of the distribution of the samples to the target distribution in innite time.
En savoir plus

28 En savoir plus

Blind marine seismic deconvolution using statistical MCMC methods

Blind marine seismic deconvolution using statistical MCMC methods

Fig. 1. Acquisition system. Fig. 2. Synthetic trace. [15] enable us to circumvent this difficulty by solving the inte- gration and optimization problems by simulating random vari- ables. This approach leads to stochastic versions of the EM al- gorithm [stochastic expectation maximization (SEM), SAEM] [12]. Moreover, we investigate the problem in a totally Bayesian framework in which prior information is introduced upon the pa- rameters [10]. Then, the estimation of the Bernoulli–Gaussian model and of the wavelet is solved by the simulation of random variables via MCMC algorithms such as the Gibbs sampler [10]. Once the model has been estimated, the next step is the de- convolution itself. This problem is said to be “ill-posed” be- cause, in the presence of noise, different reflectivity sequences can lead to similar seismic data. Therefore, it is necessary to use as much prior information as possible upon the reflectivity to limit the set of acceptable solutions. Thus, it is natural to ac- count for the Bernoulli–Gaussian assumption introduced above. Then, we need an adequate procedure to achieve the detection of reflectors and the estimation of their amplitude. Actually, the problem can be solved using either the maximum a posteriori (MAP) criterion, which can be optimized using the simulated annealing technique, or by the suboptimum maximum posterior mode (MPM) method, which involves optimization by means of a MCMC technique.
En savoir plus

12 En savoir plus

Importance Sampling combiné avec les algorithmes MCMC dans le cas d'estimations répétées

Importance Sampling combiné avec les algorithmes MCMC dans le cas d'estimations répétées

Abstract The Importance Sampling method is used in combination with MCMC in Bayesian simulation study. In the particular context of numerous simulated data sets, MCMC algorithms have to be called several times which may become computationally expensive. Since Importance Sampling requires the choice of an importance function, we propose to use MCMC on a preselected set of the simulated data and then to obtain Markovian realisations of each corresponding posterior distribution. The estimates for the other simulated data are computed via IS by having previously chosen one of the preselected posterior distributions. This chosen posterior distribution is then the importance function. IS procedure is improved by choosing for each data set a different importance func- tion among the preselected set of posterior distributions. For each Importance Sampling estimation, we propose two criteria to select the suitable posterior distribution. The first criterion is based on the L 1 norm of the difference between two posterior distributions
En savoir plus

5 En savoir plus

Coordinate sampler: a non-reversible Gibbs-like MCMC sampler

Coordinate sampler: a non-reversible Gibbs-like MCMC sampler

Related work Since piecewise deterministic Markov processes for sampling from distributions was introduced by Peters et al. (2012), PDMP-based, continuous- time, non-reversible, MCMC algorithms have become relevant tools, from applied probability (Bierkens et al., 2017; Fontbona et al., 2016) to physics (Peters et al., 2012; Harland et al., 2017; Michel et al., 2014), to statistics (Bierkens et al., 2016; Fearnhead et al., 2018; Bierkens et al., 2018; Bouchard-Cˆ ot´e et al., 2018; Michel and S´en´ecal, 2017; Vanetti et al., 2017; Pakman et al., 2016). However, almost all existing PDMP-based MCMC samplers are based on two original versions: the Bouncy Particle Sampler (BPS) of Bouchard-Cˆ ot´e et al. (2018) and the Zigzag Sampler of Bierkens et al. (2016). Bouchard-Cˆ ot´e et al. (2018) exhibits that BPS can provide state-of-the-art performance compared with the reference HMC for high dimensional distributions, while Bierkens et al. (2016) shows that PDMP- based sampler is easier to scale in big data settings without introducing bias, while Bierkens et al. (2018) considers the application of PDMP for distributions on restricted domains. Fearnhead et al. (2018) unifies BPS and Zigzag sampler in the framework of PDMP and they choose the process velocity, at event times, over the unit sphere, based on the inner product between this velocity and the gradient of the potential function. (This perspective relates to the transition dynamics used in our paper.) To overcome the main difficulty in PDMP-based samplers, which is the simulation of time-inhomogeneous Poisson process, Sherlock and Thiery (2017) and Vanetti et al. (2017) resort to a discretization of such continuous-time samplers. Furthermore, pre-conditioning the velocity set is shown to accelerate the algorithms, as shown by Pakman et al. (2016).
En savoir plus

28 En savoir plus

Average of Recentered Parallel MCMC for Big Data

Average of Recentered Parallel MCMC for Big Data

run MCMC over repeated batches, recenter all subposteriors thus obtained and take their average as an approximation of the true posterior. Our article extends the traditional parallel MCMC algorithms in three direc- tions. First, we scale each likelihood of the subposterior with a factor such that it could be regarded as an approximation of the true likelihood, by which we mean turning each subposterior covariance matrix into the same scale with that of the true posterior. Second, our combination method is simple enough, has solid mathematical justifications and is efficient. Third, even though our method is justified in parametric framework, it can be extend to non-parametric Bayesian without modification.
En savoir plus

14 En savoir plus

Fast Langevin based algorithm for MCMC in high dimensions

Fast Langevin based algorithm for MCMC in high dimensions

the Metropolis Adjusted Langevin Algorithm (MALA), and it is well-established that it has an improved complexity scaling and a better convergence behaviour than the RWM algorithm in general. This method directs the proposed moves towards areas of high probability for the distribution π thanks to the presence of the Σ ∇ log π(x) term. There is now a growing literature on gradient-based MCMC algorithms, as exemplified through the two papers [ 7 , 4 ] and the references therein. A natural question is if one can improve on the behaviour of MALA by incorporating more information about the properties of π in their proposal. A first attempt would be to use as proposal a one-step integrator with high weak order for ( 1 ), as suggested in the discussion of [ 7 ]. Although this appears not sufficient, we shall show that, by slightly modifying this approach and not focusing the the weak order itself, we are able to construct a new proposal with better convergence and complexity scaling properties than MALA. We mention that an analogous proposal is presented independently in [ 6 ] in a different context to improve the strong order of convergence of MALA.
En savoir plus

34 En savoir plus

Estimating the granularity coefficient of a Potts-Markov random field within an MCMC algorithm

Estimating the granularity coefficient of a Potts-Markov random field within an MCMC algorithm

In this work we focus on the estimation of β within a Markov chain Monte Carlo (MCMC) algorithm that handles 2D or 3D data sets [14]–[18]. MCMC methods are powerful tools to handle Bayesian inference problems for which the minimum mean square error (MMSE) or the maximum a posteriori (MAP) estimators are difficult to derive analytically. MCMC methods generate samples that are asymptotically distributed according to the joint posterior of the unknown model parameters. These samples are then used to approximate the Bayesian estimators. However, standard MCMC methods cannot be applied directly to Bayesian problems based on the Potts model. Indeed, inference on β requires computing the normalizing constant of the Potts model C(β), which is generally intractable. Specific MCMC algorithms have been designed to estimate Markov field parameters in [19], [20] and more recently in [9], [10]. A variational Bayes algorithm based on an approximation of C(β) has also been recently proposed in [11]. Maximum likelihood estimation of β within expectation-maximization (EM) algorithms has been studied in [12], [13], [21]. The strategies used in these works for avoiding the computation of C(β) are summarized below.
En savoir plus

15 En savoir plus

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization

Stochastic Gradient Markov Chain Monte Carlo: Along with the recent advances in MCMC techniques, diffusion-based algorithms have become increasingly pop- ular due to their applicability in large-scale machine learn- ing applications. These techniques, so called the Stochas- tic Gradient MCMC (SG-MCMC) algorithms, aim at gen- erating samples from the posterior distribution p(θ|Y ) as opposed to finding the MAP estimate, and have strong connections with stochastic optimization techniques ( Dalalyan , 2017 ). In this line of work, Stochastic Gradient Langevin Dynamics (SGLD) ( Welling & Teh , 2011 ) is one of the pioneering algorithms and generates an approximate sample θ n from p(θ|Y ) by iteratively applying the follow-
En savoir plus

19 En savoir plus

Efficient Gaussian Sampling for Solving Large-Scale Inverse Problems using MCMC

Efficient Gaussian Sampling for Solving Large-Scale Inverse Problems using MCMC

The rest of the paper is organized as follows: Section 2 introduces the global framework of RJ-MCMC and presents a general scheme to sample Gaussian vectors. Section 3 considers a specific application of the previous results, which finally boils down to the proposed RJPO sampler. Section 4 analyses the performance of RJPO compared to T-PO on simple toy problems and presents the adaptive RJPO which incorporates an automatic control of the truncation level. Finally, in section 5 , an example of linear inverse problem, the unsupervised image resolution enhancement is presented to illustrate the applicability of the method. These results show the superiority of the RJPO algorithm over the usual Cholesky factorization based approaches in terms of computational cost and memory usage.
En savoir plus

21 En savoir plus

Bayesian multi-locus pattern selection and computation through reversible jump MCMC

Bayesian multi-locus pattern selection and computation through reversible jump MCMC

iℓ of target i, corresponding to active co-mRNA ℓ Table 1: Nomenclature for parameter space description from a desired probability distribution: their principle consists in constructing a Markov chain that has the desired distribution as its stationary distribution. Given an ergodic Markov chain, and p, the probabilities of transition from state to state in search space S (transition kernel), the property of reversibility between states x and y holds: π(s) p(s′ | s) = π(s′) p(s | s′) (detailed balance equation). Though reversibility is not necessary to guarantee convergence of the posterior to π, it is sufficient. Then, the key to MCMC consists in expressing the transition kernel p(s′ | s) as the product of an arbitrary proposal distribution, q, and an associated acceptance distribution, a: p(s′ | s) = q(s′ | s) a(s, s′). To explain the intuition behind these concepts, suppose, without loss of generality, that for states s and s′, some given transition kernel p verifies π(s) p(s′ | s) > π(s′) p(s | s′). Artificial coercion of the previous formula towards reversibility is straightforward, introducing two terms a(s, s′), strictly lower than 1, and a(s′, s), equal to 1: π(s) q(s′ | s) a(s, s′) = π(s′) q(s | s′) a(s′, s). If inequality is reversed, then a(s′, s), strictly greater than 1, and a(s, s′), equal to 1, will be used instead. Finally, acceptance probability is calculated as: a(s, s′) = min  1, π(s′) q(s′,s) π(s) q(s,s′)  . The arbitrary proposal distribution q and the acceptance probability a are the two ingredients of the Metropolis-Hastings (MH) algorithm (see Algorithm 1 ).
En savoir plus

33 En savoir plus

Use in practice of importance sampling for repeated MCMC for Poisson models

Use in practice of importance sampling for repeated MCMC for Poisson models

X (k) in the preselected data set (containing then M + 1 elements); this can be done without any difficulty. 3. Applications In this section we use both the MCMC (Gibbs sampling in our case) and IS methods to estimate parameters of three Poisson models. The first is a Poisson model with one parameter (the mean), the second is a Poisson regression on one covariate with two parameters (intercept and covariate association), and the third is a Poisson regression on one covariate with extra Poisson variability introduced by a Gaussian residual error term with three parameters (intercept, covariate association and residual variance). The first model can be seen as a toy example with explicit posterior distributions; the second corresponds to a widely used GLM model, and the third introduces over-dispersion which is es- sential, for example, in medical applications since association estimates would be biased if extra-Poisson variability was not modelled (see Breslow ( 1984 ) for motivations). For each model K = 101 data sets are simulated for different values of the parameters. All data sets contain n = 20 observations. Vague priors are assigned to the parameters and the posterior values are estimated via MCMC and IS as discussed above. Note that it is essential that MCMC convergence is achieved, therefore several (and not only one) diagnostics of convergence have to be checked as suggested by Brooks and Roberts ( 1998 )
En savoir plus

23 En savoir plus

Accelerating Asymptotically Exact MCMC for Computationally Intensive Models via Local Approximations

Accelerating Asymptotically Exact MCMC for Computationally Intensive Models via Local Approximations

An important strategy for mitigating this cost is to recognize that the forward model may exhibit regularity in its dependence on the parameters of interest, such that the model outputs may be approximated with fewer samples than are needed to characterize the posterior via MCMC. Replacing the forward model with an approximation or “surrogate” decouples the required number of forward model evaluations from the length of the MCMC chain, and thus can vastly reduce the overall cost of inference ( Sacks et al. , 1989 ; Kennedy and O’Hagan , 2001 ). Existing approaches typically create high-order global approximations for either the forward model outputs or the log- likelihood function using, for example, global polynomials ( Marzouk et al. , 2007 ; Marzouk and Xiu , 2009 ), radial basis functions ( Bliznyuk et al. , 2012 ; Joseph , 2012 ), or Gaussian processes ( Sacks et al. , 1989 ; Kennedy and O’Hagan , 2001 ; Rasmussen , 2003 ; Santner et al. , 2003 ). As in most of these efforts, we will assume that the forward model is deterministic and available only as a black box, thus limiting ourselves to “non-intrusive” approximation methods that are based on evaluations of the forward model at selected input points. 1 Since we assume that the exact forward model is available
En savoir plus

57 En savoir plus

Information bounds and MCMC parameter estimation for the pile-up model

Information bounds and MCMC parameter estimation for the pile-up model

We conclude that the Gibbs sampler is well adapted to pile-up affected data, and as it attains the Cramér-Rao bound it might lead to a significant reduction of the acquisition time. Therefore, we compare the MCMC method to the following estimation practice. Data from the pile-up model are obtained at a low laser intensity (λ = 0.05) such that the probability for 2 or more photons per laser pulse is negligible. Then the observed arrival times are considered as independent observations from the exponential mixture distribution given by (31) and the classical EM algorithm is applied. Repeated simulations provide estimates of the bias and the variance of the estimators for a two-component model and various numbers of observations. For the same two-component model we simulated data using the laser intensity λ opt =
En savoir plus

20 En savoir plus

Stochastic thermodynamic integration: efficient Bayesian model selection via stochastic gradient MCMC

Stochastic thermodynamic integration: efficient Bayesian model selection via stochastic gradient MCMC

n=1 log p(x n |θ (t,k) ) (8) Here, θ (t,k) denotes samples drawn from p(θ|t). 3. STOCHASTIC THERMODYNAMIC INTEGRATION Even though MCMC inference has been made much more efficient with the incorporation of stochastic gradients, marginal likelihood estimation methods that are based on MCMC still suffer from high computational complexity since they typically require the likelihood to be computed on the whole dataset for each sample (see Eq.8).

6 En savoir plus

Bayesian Pursuit Algorithms

Bayesian Pursuit Algorithms

A. The Uniform Case We first consider the case where all the atoms have the same probability to be active, i.e., p i = p ∀i. For each experiment, the data vector y is generated according to model (3) with σ n 2 = 10 −4 and σ x 2 = 1. In Fig. 1 we represent the MSE, the probability of wrong decision on the elements of the support and the average running time achieved by different sparse-representation algorithms. Each point of simulation corresponds to a fixed number of non-zero coefficients, say K, and, given this number, the positions of the non-zero coefficients are drawn uniformly at random for each observation. We set N = 154, M = 256. In addition to the proposed procedures, we consider several algorithms of the state of the art: MP [12], OMP [13], StOMP [14], SP [20], IHT [17], HTP [18], Basis Pursuit Denoising (BPD) [10], SBR [7], SOBAP [33] and FBMP [28]. The stopping criterion used for MP and OMP is based on the norm of the residual: the recursions are stopped as soon as the norm of the residual drops below pNσ 2
En savoir plus

35 En savoir plus

Show all 1341 documents...