Independent Random Sampling - Simulation-based inference and Monte Carlo methods

1.2 Simulation-based inference and Monte Carlo methods

1.2.2 Independent Random Sampling

is a convergent estimator of the unknown normalising constant (for non-asymptotic behaviour see Capp´e, Moulines etRyd`en, 2005) and hence

converges to the desired quantity. Note that there are cases where this self-normalised estimator is to be preferred in either case under bias-variance trade-off arguments (Casella etRobert, 1998).

All the above require nonetheless the ability to sample independently (for now) from either f or g and the next Section will detail some of the available methods.

We would like to conclude by saying that even though the focus of this work is on Bayesian statistics and on computing integrals with respect to probability measures, simulation-based inference is not necessarily limited to this domain and optimization methods that rely on simulated samples are available and can be used to compute (for example) maximum likelihood estimators (Robert etCasella, 2004).

1.2.2 Independent Random Sampling

In order to make use of the results introduced in the previous Section we need to produce a sequence of random variables distributed according to some distribution F. In all the following we will call the distribution we want to draw samples from as the target distribution ; this definition will acquire a more formal and deep meaning in Sections 1.2.3.

The first and the only, but omnipresent, generator that we can find on a computer is the uniform generator. Even though the random variables produced by a machine are only pseudo-random, in that they are generated by a deterministic algorithm, we will assume that the uniform number generator available is an implementation of a modern algorithm like the Mersenne twister (Matsumoto et Nishimura, 1998) or any of its derivations, and hence that it can pass a collection of tests like the Die Harder battery (Brown, Eddelbuettel et Bauer, 2007). For a more in-depth discussion on the matter the reader is referred to RobertetCasella, 2004 ; Devroye, 1986 ;Matsumoto et Nishimura, 1998 and references therein.

So sampling from any uniform distribution U(0,1) should be quite straightfor-ward. Assuming that our target density f is not a uniform though, we have a wide variety of techniques that can help us producing independent samples from almost any arbitrary distribution.

The first basic method is called Generic Inversion and relies on inverting the cu-mulative distributionF of the target distribution. If in fact we define the generalized

inverse F⁻(u) = infx{F(x)≥u} it is easily proven that

IfU ∼ U_[0,1], thenF⁻(U)∼F. (1.10)

Even if (1.10) is very general and theoretically requires just one random sample from the uniformper random sample from F, it is rarely used in practice as it isn’t as efficient as other methods and requires explicit knowledge of the cdf.

Still relying on the structure of the target distribution are the transformation methods, based on transformations of random variables. If the target distribution can be linked to another distribution which is easier to sample from we can often take advantage of this connection to draw samples from f. One notable example is the Box Muller algorithm (Algorithm 1.1) to sample from the normal distribution using again uniform variates :

Algorithm 1.1 Box–Muller Algorithm– SampleX1, X2

iid∼ N(0,1)

Table 1.1: Some examples of General Transformation Method Target Distribution Transformation Auxiliary Samples

A few examples of transformation methods are provided in Table 1.1, but unfor-tunately not every random variable can be expressed as a transformation of others.

As a consequence we need to introduce now some methods where only a rough knowledge of f’s functional form is needed, without needing to exploit any direct probabilistic feature of our target distribution.

The first of these methods is the Accept-Reject algorithm and its basic idea is, somehow similarly to Importance Sampling defined above, to rely on a simpler (to sample from) distribution G in order to obtain this time unweighted independent samples fromf. Accept-Reject in its most simple form is based on the Fundamental Theorem of Simulation, a very simple idea that highlight the fact that f(x) can be written as

where our target appears now as the marginal density of a joint uniform distribution (X, U)∼ U((x, u)|0< u < f(x)).

The fundamental theorem proceeds then as follow :

Theorem 1.2.1. Sampling from

X ∼f is equivalent to sample from

(X, U)∼ U((x, u)|0< u < f(x)) and marginalise (discard the samples) with respect to U.

The idea is that if we can sample from the joint distribution of (X, U), which means uniformly sampling in the set{(x, u)|0< u < f(x)}, the marginalX samples are already distributed according to our target, even if we used f only through some computations of f(x).

Sampling exact pairs of (X, U) is not always easy nor sometimes feasible at all, so we will instead sample from a bigger set and keep the sampled pairs only if they satisfy the constraint. The distribution is indeed preserved by this procedure (Robert etCasella, 2004). To exemplify the method, think of a continuous one-dimensional target distribution defined on a compact interval [a, b] and suppose that it is bounded by m, i.e. f(x) ≤ m ∀x ∈ [a, b]. We can thus sample from (X, U)∼ U((x, u)|a < x < b, 0< u < m) and further check and keep only the pairs that satisfy the constraint 0 < u < f(x) ; this is guaranteed to get us marginal samples X ∼f.

With the help of the aforementioned instrumental distribution g (g f) this argument can easily generalize to settings different from a ‘box’ and for targets whose support or maximum are unbounded ; consider the set

S ={(y, u)|0< u < m(y)}

where the functionm(x) =M g(x) is such that m(x)≥f(x) for allxin the support of f (notice that m(x) cannot be a probability density as it integrates to M). We can now sample uniformly from (U, Y) by drawing a value y ∼ g and then take u|y ∼ U(0, m(y)) and finally accept or reject the point iff u < f(y).

The pseudo-code for this general Accept-Reject algorithm is presented in Algo-rithm 1.2 :

Algorithm 1.2 Accept-Reject Algorithm – Sample from f

1: Sample x∼g and u∼ U_[0,1];

2: if u≤f(x)/M g(x) then

3: Accept x;

4: else

5: Reject and return to 1.

6: end if

Of course we would likem(x) to be as close as possible tof in order to waste the least amount of simulation, even if some can be recycled via Importance Sampling ; for more details on Accept-Reject and its derivatives seeRobertetCasella, 2004 and therein references, like Casellaet Robert, 1998.

There are cases though where independent sampling is overall impossible or too expensive in terms of computation ; in this situations we would still like to sample dependent sequences (X_n)_n=1,...,N such that their empirical average converges at a decent rate to the integral of interest, similarly to (1.4). It is in fact possible to prove that such sequences exist and particularly popular is the class of Markov Chains which we will briefly introduce in the following Section.

Dans le document The DART-Europe E-theses Portal (Page 34-37)