• Aucun résultat trouvé

Sequential Quasi Monte Carlo for Dirichlet Process Mixture Models

N/A
N/A
Protected

Academic year: 2021

Partager "Sequential Quasi Monte Carlo for Dirichlet Process Mixture Models"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01667781

https://hal.archives-ouvertes.fr/hal-01667781

Submitted on 19 Dec 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Sequential Quasi Monte Carlo for Dirichlet Process Mixture Models

Julyan Arbel, Jean-Bernard Salomond

To cite this version:

Julyan Arbel, Jean-Bernard Salomond. Sequential Quasi Monte Carlo for Dirichlet Process Mixture Models. BNP 2017 - 11th Conference on Bayesian NonParametrics, Jun 2017, Paris, France. pp.1.

�hal-01667781�

(2)

Sequential Quasi Monte Carlo for Dirichlet Process Mixture Models

Julyan Arbel, www.julyanarbel.com, Inria, Mistis, Grenoble, France Jean-Bernard Salomond, Université Paris-Est, France

Sampling: SMC, QMC, SQMC

Sequential Monte Carlo (SMC), or Particle filtering, is a principled technique which sequentially approximates the full posterior using particles (Doucet et al., 2001). It focuses on sequential state-space models: the density of the observations yt conditionally on Markov states xt in X ⊆ Rd is given by yt|xt ∼ f Y(yt|xt), with kernel

x0 ∼ f0X(x0), xt|xt1 ∼ f X(xt|xt1). (1)

• The initial motivation of quasi Monte Carlo (QMC) is to use low discrepancy vectors instead of unconstrained random vectors in order to improve the calculation of integrals via Monte Carlo.

• Gerber and Chopin (2015) introduce a se- quential quasi Monte Carlo (SQMC) method- ology. This assumes the existence of transforms Γt mapping uniform random variables to the state variables. Requires that (1) can be rewritten as

x(n)0 = Γ0(u(n)0 ) ↔ x(n)0 ∼ f0(dx(n)0 )

x(n)1:t = Γt(x(n)1:t1, u(n)t ) ↔ x(n)1:t |x(n)1:t1 ∼ ft(dx(n)1:t |x(n)1:t1)

where u(n)t ∼ U([0, 1)d) is to be a quasi random vector of uniforms.

Dirichlet process & SQMC

• Nonparametric mixtures for density esti- mation: extension of finite mixture models when the number of clusters is unknown.

Observations y1:T follow a DPM model with kernel ψ parameterized by θ ∈ Θ,

yt|G i.i.d.∼ Z

ψ(y; θ)dG(θ), t ∈ (1 : T ),

where G ∼ DP(α, G0).

• DPM cast as SMC samplers by Liu (1996);

Fearnhead (2004); Griffin (2015) : observations are spread out into unobserved clusters whose labels, or allocation variables, are latent vari- ables acting as observations states in the context of SMC. Transition is given by the (posterior) generalized Pólya urn scheme

pt,j = P(xt = j|x1:t1, y1:t).

• Complies with Gerber and Chopin (2015) need for a deterministic transform

Γt(x(n)1:t1, u(n)t ) = min n

j ∈ {1, . . . , kt(n)1 + 1} :

j

X

i=1

p(n)t,i > u(n)t o

for any particle n, with u(n)t ∼ U([0, 1)).

Goal

• Peculiarity to the DPM setting:

◦ state-space ≈ (1 : T )T is discrete and varies

◦ transition is not Markovian

Goal: investigate how SQMC fares

◦ compare allocation trajectories x(n)1:T , n = 1, . . . , N in SMC & SQMC

◦ measure their dispersion with a principal component analysis (PCA) → proportion of variance explained by number of compo- nents in the PCA

References

Doucet, A., de Freitas, N., and Gordon, N. J. (2001).

Sequential Monte Carlo methods in Practice. Springer- Verlag, New York.

Fearnhead, P. (2004). Particle filters for mixture models with an unknown number of components. Statistics and Computing, 14(1):11–21.

Gerber, M. and Chopin, N. (2015). Sequential quasi Monte Carlo. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(3):509–579.

Griffin, J. E. (2015). Sequential Monte Carlo methods for mixtures with normalized random measures with independent increments priors. Statistics and Computing, in press.

Liu, J. S. (1996). Nonparametric hierarchical Bayes via sequential imputation. The Annals of Statistics, 24:910–

930.

Results II

y

Density

−10 −5 0 5 10

0.00.10.20.30.4

true MC SMC SQMC

0 50 100 150 200

0.20.40.60.81.0

number of component

Proportion of variance

MC SMC SQMC

0 50 100 150 200

12345678

Number of different particules for each data point

Index SMC

SQMC

y

Density

−10 −8 −6 −4 −2 0 2

0.00.10.20.30.4 true

MC SMC SQMC

0 50 100 150 200

0.20.40.60.81.0

number of component

Proportion of variance

MC SMC SQMC

0 50 100 150 200

12345678

Number of different particules for each data point

Index SMC

SQMC

y

Density

−3 −2 −1 0 1 2 3

0.00.10.20.30.40.5 true

MC SMC SQMC

0 50 100 150 200

0.30.40.50.60.70.80.91.0

number of component

Proportion of variance

MC SMC SQMC

0 50 100 150 200

2468

Number of different particules for each data point

Index SMC

SQMC

Left: Density fit; Middle: Particles diversity (PCA); Right: Number of different particles for each data point.

Three samplers, non sequential Monte Carlo, MC, sequential Monte Carlo, SMC and sequential quasi Monte Carlo SQMC.

Sample size T = 200, number of particles N = 1 000.

Top row: Heavy tailed distr. (student 2); Middle row: Skewed distr. (log-Gamma); Bottom row: Multimodal distr. (mixture of normals).

Références

Documents relatifs

We compare two approaches for quantile estimation via randomized quasi-Monte Carlo (RQMC) in an asymptotic setting where the number of randomizations for RQMC grows large but the

The latter, also called particle filtering, focuses on sampling in state space models based on sequential data, while the former’s initial goal is to enhance Monte Carlo efficiency

To assess the distribution of NPV at the fixed horizon time by the AQMC method, we simulate N copies of the Markov chain in parallel for each strategy, defined in (2) and we reorder

This approach can be generalized to obtain a continuous CDF estimator and then an unbiased density estimator, via the likelihood ratio (LR) simulation-based derivative estimation

Randomized quasi-Monte Carlo (RQMC) methods replace the independent random numbers by dependent vectors of uniform random numbers that cover the space more evenly.. When estimating

In contrast to this direct simulation scheme, where the particles represent the number density, a mass flow scheme was introduced in [2]: the particles represent the mass density

Keywords: random number generator, pseudorandom numbers, linear generator, multi- ple recursive generator, tests of uniformity, random variate generation, inversion,

When & 2 − & 1 becomes small, the parameters of the beta distribution decrease, and its (bimodal) density concentrates near the extreme values 0 and 1.. 82) say in