# Markov chain Monte Carlo(MCMC) algorithms

## Top PDF Markov chain Monte Carlo(MCMC) algorithms: ### Parallelized Stochastic Gradient Markov Chain Monte Carlo Algorithms for Non-Negative Matrix Factorization

W,H log p(W, H|V) . (2) The point estimates are obtained via optimization methods applied to the posterior density Eq. 1. The optimization procedures used to solve Eq. 2 have various theoretical guarantees which hold under appropriate conditions on the prior and likelihood functions [1, 5– 8]. On the other hand, we might be interested in other quantities based on the posterior distribution, such as the moments or the nor- malizing constants. These quantities are useful in various applica- tions such as model selection  (i.e. estimating the ‘rank’ K of the model) or estimating the Bayesian predictive densities that would be useful for active learning. Markov Chain Monte Carlo (MCMC) algorithms, which aim to generate samples from the posterior distri- bution of interest, are one of the most popular approaches for esti- mating these quantities. However, these methods have received less attention, mainly due to their computational complexity and rather slow convergence of the standard methods, e.g. the Gibbs sampler.
En savoir plus ### On variable splitting for Markov chain Monte Carlo

{maxime.vono,nicolas.dobigeon}@irit.fr, pierre.chainais@centralelille.fr Abstract—Variable splitting is an old but widely used technique which aims at dividing an initial complicated optimization problem into simpler sub-problems. In this work, we take inspiration from this variable splitting idea in order to build efficient Markov chain Monte Carlo (MCMC) algorithms. Starting from an initial complex target distribution, auxiliary variables are introduced such that the marginal distribution of interest matches the initial one asymptotically. In addition to have theoretical guarantees, the benefits of such an asymptotically exact data augmentation (AXDA) are fourfold: (i) easier-to-sample full conditional distributions, (ii) possibility to embed while accelerating state-of-the-art MCMC approaches, (iii) possibility to distribute the inference and (iv) to respect data privacy issues. The proposed approach is illustrated on classical image processing and statistical learning problems.
En savoir plus ### On variable splitting for Markov chain Monte Carlo

{maxime.vono,nicolas.dobigeon}@irit.fr, pierre.chainais@centralelille.fr Abstract—Variable splitting is an old but widely used technique which aims at dividing an initial complicated optimization problem into simpler sub-problems. In this work, we take inspiration from this variable splitting idea in order to build efficient Markov chain Monte Carlo (MCMC) algorithms. Starting from an initial complex target distribution, auxiliary variables are introduced such that the marginal distribution of interest matches the initial one asymptotically. In addition to have theoretical guarantees, the benefits of such an asymptotically exact data augmentation (AXDA) are fourfold: (i) easier-to-sample full conditional distributions, (ii) possibility to embed while accelerating state-of-the-art MCMC approaches, (iii) possibility to distribute the inference and (iv) to respect data privacy issues. The proposed approach is illustrated on classical image processing and statistical learning problems.
En savoir plus ### Estimating the granularity coefficient of a Potts-Markov random field within an Markov Chain Monte Carlo algorithm

B. Approximation of C(β) Another possibility is to approximate the normalizing constant C(β). Existing approximations can be classified into three categories: based on analytical developments, on sam- pling strategies or on a combination of both. A survey of the state-of-the-art approximation methods up to 2004 has been presented in . The methods considered in  are the mean field, the tree-structured mean field and the Bethe energy (loopy Metropolis) approximations, as well as two sampling strategies based on Langevin MCMC algorithms. It is reported in  that mean field type approximations, which have been successfully used within EM ,  and stochastic EM algorithms , generally perform poorly in MCMC algorithms. More recently, exact recursive expressions have been proposed to compute C(β) analytically . However, to our knowledge, these recursive methods have only been successfully applied to small problems (i.e., for MRFs of size smaller than 40×40) with reduced spatial correlation β < 0.5. Another sampling-based approximation consists in estimat- ing C(β) by Monte Carlo integration [27, Ch. 3], at the expense of very substantial computation and possibly biased estimations (bias arises from the estimation error of C(β)). Better results can be obtained by using importance sampling or path sampling methods . These methods have been applied to the estimation of β within an MCMC image processing algorithm in . Although more precise than Monte Carlo integration, approximating C(β) by importance sampling or path sampling still requires substantial compu- tation and is generally unfeasible for large fields. This has motivated recent works that reduce computation by combining importance sampling with analytical approximations. More precisely, approximation methods that combine importance sampling with extrapolation schemes have been proposed for the Ising model (i.e., a 2-state Potts model) in  and for the 3-state Potts model in . However, we have found that this extrapolation technique introduces significant bias . C. Auxiliary Variables and Perfect Sampling
En savoir plus ### A History of Markov Chain Monte Carlo--Subjective Recollections from Incomplete Data--

As mentioned above, the realization that Markov chains could be used in a wide variety of situations only came (to mainstream statisticians) with Gelfand and Smith (1990), de- spite earlier publications in the statistical literature like Hastings (1970), Geman and Geman (1984) and Tanner and Wong (1987). Several reasons can be advanced: lack of computing machinery (think of the computers of 1970!), lack of background on Markov chains, lack of trust in the practicality of the method... It thus required visionary researchers like Alan Gelfand and Adrian Smith to spread the good news, backed up with a collection of papers that demonstrated, through a series of applications, that the method was easy to under- stand, easy to implement and practical (Gelfand et al. 1990, 1992, Smith and Gelfand 1992, Wakefield et al. 1994). The rapid emergence of the dedicated BUGS (Bayesian inference Using Gibbs Sampling) software as early as 1991 (when a paper on BUGS was presented at the Valencia meeting) was another compelling argument for adopting (at large) MCMC algorithms. 1
En savoir plus ### On Markov chain Monte Carlo methods for tall data

After briefly reviewing the limitations of MCMC for tall data, introducing our notation and two running examples in Section 2, we first review the divide-and-conquer literature in Sec- tion 3. The rest of the paper is devoted to subsampling approaches. In Section 4, we discuss pseudo-marginal MH algorithms. These approaches are exact in the sense that they target the right posterior distribution. In Section 5, we review other exact approaches, before relaxing ex- actness in Section 6. Throughout, we focus on the assumptions and guarantees of each method. We also illustrate key methods on two running examples. Finally, in Section 7, we improve over our so-called confidence sampler in ( Bardenet et al. , 2014 ), which samples from a controlled approximation of the target. We demonstrate these improvements yield significant reductions in computational complexity at each iteration in Section 8. In particular, our improved confidence sampler can break the O(n) barrier of number of individual data point likelihood evaluations per iteration in favourable cases. Its main limitation is the requirement for cheap-to-evaluate proxies for the log-likelihood, with a known error. We provide examples of such proxies relying on Taylor expansions.
En savoir plus ### Towards the parallelization of Reversible Jump Markov Chain Monte Carlo algorithms for vision problems

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignemen[r] ### Acceleration Strategies of Markov Chain Monte Carlo for Bayesian Computation

MCMC samplers are based on two original versions: the Bouncy Particle Sampler (BPS) of Bouchard-Cˆ ot´ e et al. (2018) and and the Zigzag Sampler of Bierkens et al. (2016). Bouchard-Cˆ ot´ e et al. (2018) exhibits that BPS can provide state-of-the-art performance compared with the reference HMC for high dimensional distributions, while Bierkens et al. (2016) shows that PDMP-based sampler is easier to scale in big data settings without introducing bias, while Bierkens et al. (2018) considers the application of PDMP for distributions on restricted domains. Fearnhead et al. (2016) unifies BPS and Zigzag sampler in the framework of PDMP and they choose the process velocity, at event times, over the unit sphere, based on the inner product between this velocity and the gradient of the potential function. (This perspective relates to the transition dynamics used in our paper.) To overcome the main dif- ficulty in PDMP-based samplers, which is the simulation of time-inhomogeneous Poisson process, Sherlock and Thiery (2017) and Vanetti et al. (2017) resort to a discretisation of such continuous-time samplers. Furthermore, pre-conditioning the velocity set is shown to accelerate the algorithms, as shown by Pakman et al. (2016). The outline of this chapter is as follows. Section 2 introduces the necessary back- ground of PDMP-based MCMC samplers, the techniques used in its implementation, and two specified samplers, BPS and the Zigzag sampler. In Section 3, we describe the methodology behind the coordinate sampler, provide some theoretical validation along with a proof of geometrical ergodicity, obtained under quite mild conditions, and we compare this proposal with the Zigzag sampler in an informal analysis. Section 4 further compares the efficiency of both approaches on banana-shaped dis- tributions, multivariate Gaussian distributions and a Bayesian logistic model, when effective sample size is measuring efficiency. Section 5 concludes by pointing out further research directions about this special MCMC sampler.
En savoir plus ### Clock Monte Carlo methods

PACS numbers: 02.70.Tt, 05.10.Ln, 05.10.-a, 64.60.De, 75.10.Hk, 75.10.Nr Keywords: Monte Carlo methods; Metropolis algorithm; factorized Metropolis filter; long-range interactions; spin glasses Markov-chain Monte Carlo methods (MCMC) are powerful tools in many branches of science and engineer- ing [1–8]. For instance, MCMC plays a crucial role in the recent success of AlphaGo , and appears as a keystone of the potential next deep learning revolution [10, 11]. To estimate high-dimensional integrals, MCMC generates a chain of random configurations, called samples. The sta- tionary distribution is typically a Boltzmann distribution and the successive moves depend on the induced energy changes. Despite a now long history, the most successful and influential MCMC algorithm remains the founding Metropolis algorithm  for its generality and ease of use, ranked as one of the top 10 algorithms in the 20th century .
En savoir plus ### Optimized Population Monte Carlo

The literature of AIS is vast, including methods based on sequential moment matching such as AMIS , that com- prises a Rao-Blackwellization of the temporal estimators, and APIS that incorporates multiple proposals . Other recent methods have introduced Markov chain Monte Carlo (MCMC) mechanisms for the adaptation of the IS proposals , , . The family of population Monte Carlo (PMC) methods also falls within AIS. Its key feature is arguably the use of resampling steps in the adaptation of the location parameters of the proposals , . The seminal paper  introduced the PMC framework. Since then, other PMC algorithms have been proposed, increasing the resulting performance by the incor- poration of stochastic expectation-maximization mechanisms , non-linear transformation of the importance weights , or better weighting and resampling schemes . The method we propose in this paper falls within the PMC framework.
En savoir plus ### Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo

Abstract Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) algorithms have be- come increasingly popular for Bayesian inference in large-scale applications. Even though these methods have proved useful in several scenarios, their performance is often limited by their bias. In this study, we propose a novel sampling algorithm that aims to reduce the bias of SG-MCMC while keeping the variance at a reason- able level. Our approach is based on a numerical sequence acceleration method, namely the Richardson-Romberg extrapolation, which simply boils down to run- ning almost the same SG-MCMC algorithm twice in parallel with different step sizes. We illustrate our framework on the popular Stochastic Gradient Langevin Dynamics (SGLD) algorithm and propose a novel SG-MCMC algorithm referred to as Stochastic Gradient Richardson-Romberg Langevin Dynamics (SGRRLD). We provide formal theoretical analysis and show that SGRRLD is asymptotically con- sistent, satisfies a central limit theorem, and its non-asymptotic bias and the mean squared-error can be bounded. Our results show that SGRRLD attains higher rates of convergence than SGLD in both finite-time and asymptotically, and it achieves the theoretical accuracy of the methods that are based on higher-order integrators. We support our findings using both synthetic and real data experiments.
En savoir plus ### High dimensional Markov chain Monte Carlo methods : theory, methods and applications

As mentioned above, the rate of convergence obtained for the total variation dis- tance is practically useless to analyze MCMC algorithm when the dimension of the state space becomes large. Despite the fact that the bounds are explicit, these results say in practice little more than ‘the chain converges for large n’; see [ JH01 ] and [ RR04 ]. On the contrary, as observed in several recent works in this direction (see [ Cot+13 ] and the references therein), the Wasserstein bounds are much more informative, at least for ap- propriately designed MCMC algorithms. One of the key to the success of our approach for Wasserstein distance (under of course appropriate assumptions on the transition ker- nel and particular choice of the distance) is the ability to couple MCMC algorithms ”naturally” by simply running two versions of the algorithm with the same random num- bers. This in contrast with the ”general” coupling construction used for total variation convergence where an attempt to make the two components equal is made only when the two versions of the chain meet in a coupling set (the probability meeting in a coupling set is typically not large and the probability of successfully coupling the chain is on the top of this vanishingly small in large dimension). We provide an illustration of this fact on a version of the Metropolis Adjusted Langevin Algorithm using an exponential integrator (EI-MALA) originally proposed by [ Ebe14 ].
En savoir plus ### Forward Event-Chain Monte Carlo: Fast sampling by randomness control in irreversible Markov chains

Irreversible or non-reversible MCMC samplers have attracted a lot of attention for the last two decades. They break the detailed-balance condition while still obeying the global-balance one and leaving π invariant and, by doing so, have often been shown to have better convergence compared to their reversible counterpart. It is however still challenging to develop a construction methodol- ogy for irreversible kernels, which displays the generality of reversible schemes as the Metropolis- Hastings algorithm, while improving the convergence. Indeed, non-reversible MCMC algorithms can be directly built from the composition of reversible MCMC kernels (e.g. Deterministic Scan Gibbs samplers Gelfand and Smith ( 1990a )), but it is well-known that such a strategy can be rela- tively inefficient, in particular since it does not prevent diffusive behavior and backtracking in the resulting process. To circumvent this issue, popular solutions consist in extending the state space by introducing an additional variable and targeting the extended probability distribution
En savoir plus ### Efficient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau.

π(x|y) ∝ exp −ky − Hxk 2 /2σ 2 − αk∇ d xk 1 . (19) Image processing methods using ( 19 ) are almost exclusively based on MAP estimates of x that can be efficiency computed using proximal optimisation algorithms [ Green et al. , 2015 ]. Here we consider the problem of computing credibility regions for x, which we use to assess the confidence in the restored image. Precisely, we use MYULA to compute approximately the marginal 90% credibility interval for each image pixel, where we note that ( 19 ) is log-concave and admits the decomposition U (x) = f (x) + g(x), with f (x) = −ky − Hxk 2 /2σ 2 convex and Lipschitz
En savoir plus ### Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers

unordered but they an be sorted again. Nonetheless, we did not impose ordering when simulating the parameters with xed k's and, more importantly, did not restri t ourselves to implement ombine moves only on adja ent omponents as in Ri hardson and Green (1997). Hen e, the above item that we nd most important is B.; whether the missing data z is kept tra k of in all moves or not. It would indeed be interesting to ompare the performan e of two algorithms, in dis rete or ontinuous time, that are identi al ex ept for this aspe t. (We re all that Robert et al. (2000) did resort to ompletion in their implementation of RJMCMC.)
En savoir plus ### Population Monte Carlo

that a Metropolis{Hastings s heme based on the same proposal does not work well, while a PMC algorithm produ es orre t answers. The PMC framework thus allows for a mu h easier onstru tion of adaptive s hemes, i.e. of proposals that orre t themselves against past performan es, than in MCMC setups. Indeed, while adaptive importan e sampling strategies have already been onsidered in the pre-MCMC area, as in, e.g., Oh and Berger (1992,1993), the MCMC environment is mu h harsher for adaptive algorithms, be ause the adaptivity an els the Markovian nature of the sequen e and thus alls for more elaborate onvergen e studies to establish ergodi ity. See, e.g., Andrieu and Robert (2001) and Haario et al. (1999,2001) for re ent developments in this area. For PMC methods, ergodi ity is not an issue sin e the validity is obtained via importan e sampling justi ations.
En savoir plus ### Nested Monte-Carlo Search

6 Conclusion Nested Monte-Carlo Search can be used for problems that do not have good heuristics to guide the search. The use of ran- dom games implies that nested calls are not guaranteed to im- prove the search. On simple abstracts problems, a theoretical analysis of nested Monte-Carlo search was presented. It was also shown that memorizing the best sequence improves a lot the mean result of the search. Experiments on three differ- ent games gave very good results, ﬁnding a new world record of 80 moves at Morpion Solitaire, improving on previous al- gorithms at SameGame, and being more than 200,000 times faster than depth-ﬁrst search at 16x16 Sudoku modeled as a Constraint Satisfaction Problem.
En savoir plus ### Monte Carlo Beam Search

Monte-Carlo Beam Search Tristan Cazenave Abstract—Monte-Carlo Tree Search is state of the art for multiple games and for solving puzzles such as Morpion Solitaire. Nested Monte-Carlo Search is a Monte-Carlo Tree Search algorithm that works well for solving puzzles. We propose to enhance Nested Monte-Carlo Search with Beam Search. We test the algorithm on Morpion Solitaire. Thanks to beam search, our program has been able to match the record score of 82 moves. Monte-Carlo Beam Search achieves better scores in less time than Nested Monte-Carlo Search alone.
En savoir plus ### Monte-Carlo Kakuro

nested calls [7, 1]. These algorithms use a base heuristic which is improved with nested calls, whereas Nested Monte-Carlo Search uses random moves at the base level instead. Nested Monte-Carlo Search is an algorithm that uses no domain specific knowledge and which is widely applicable. However adding domain specific knowledge will prob- ably improve it, for example at Kakuro pruning more values using stronger consistency checks would certainly improve both the Forward Checking algorithm and the Nested Monte-Carlo search algorithm. ### Joint Bayesian model selection and parameter estimation of the generalized extreme value model with covariates using birth-death Markov chain Monte Carlo.

4. Reversible Jump MCMC Algorithm 4.1. General MCMC Method [ 35 ] The Markov chain Monte Carlo (MCMC) method is a powerful tool for Bayesian estimation. MCMC sampling was first introduced by Metropolis et al.  to integrate over high dimensional probability distributions to make inference about model parameters. In Bayesian inference we are interested in finding the joint posterior distribution of the parameters. The difficulty is that the posterior distribution is typically found by multidimensional integration, which is only feasible for small-scale problems and hence many problems become intractable. When the full-conditional
En savoir plus