Parallel and interacting Markov chains Monte Carlo method

(1)

HAL Id: inria-00103871

https://hal.inria.fr/inria-00103871v2

Submitted on 2 Nov 2006

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents

method

Fabien Campillo, Vivien Rossi

To cite this version:

Fabien Campillo, Vivien Rossi. Parallel and interacting Markov chains Monte Carlo method. [Research Report] RR-6008, INRIA. 2006. �inria-00103871v2�

(2)

inria-00103871, version 2 - 2 Nov 2006

a p p o r t

d e r e c h e r c h e

Thème NUM

Parallel and interacting

Markov chains Monte Carlo method

Fabien Campillo and Vivien Rossi

N° 6008

October 2006

(3)

(4)

Fabien Campillo

∗

and VivienRossi

† ‡

ThèmeNUMSystèmesnumériques

ProjetsAspi

Rapportdereherhe n°6008Otober200627pages

Abstrat: Inmany situations it is importantto beable to propose N independent real- izations of a givendistribution law. We propose a strategy for making N ^parallel ^Monte

CarloMarkovChains(MCMC)interatinordertogetanapproximationofanindependent

N^-sampleôfâ^given^target ^law. În^this^method êahîndividual ^hain^proposesândidates

forallotherhains. WeprovethatthesetofinteratinghainsisitselfaMCMCmethodfor

theprodutofN ^target^measures. ^Compared^toindependentparallelhainsthismethodis moretimeonsuming,butweshowthroughonreteexamplesthatitpossessesmanyadvan-

tages: itanspeeduponvergenetowardthetargetlawaswellashandlethemulti-modal

ase.

Key-words: MarkovhainMonteCarlomethod,Metropolis-Hastings,interatinghains,

partileapproximation

∗

INRIA/IRISA,Rennes,Fabien.Campilloinria.fr

†

IURC,UniversityofMontpellierIViven.Rossiiur.montp. inse rm. fr

‡

TheresearhoftheseondauthorwasdoneduringapostdotoralstayattheINRIA/IRISA,Rennes.

(5)

Résumé : Dans de nombreuses situations il est important de pouvoir disposer de N

réalisations indépendantes d'une loi donnée. Notre but est de développer une stratégie

d'interation deN ^méthodes^de ^Monte^Carlo^par^Chaîne^de ^Markov^(MCCM) ^dans ^le^but

de proposer une approximation d'un éhantillon indépendant de taille N ^d'une ^loi ^ible

donnée. L'idée estquehaquehaînepropose unandidatpourelle-mêmemaiségalement

pourtoutesles autreshaînes. Onmontre que l'ensemblede es N ^haînes^en ^interation

estlui-mêmeuneméthodeMCCMpourleproduitdeN ^mesuresîbles. ^Cetteâpproheêst

naturellement plusoûteuse queN ^haînesindépendantes, onmontre toutefois autravers d'exemplesonretsqu'ellepossèdeplusieursavantages: ellepeutsensiblementaélérerla

onvergeneverslaloiible,ellepermetégalementd'appréhenderleasmultimodal.

Mots-lés: méthodedeMonteCarloparhaînedeMarkov,Metropolis-Hastings,haînes

eninteration,approximationpartiulaire

(6)

Contents

1 Introdution 5

2 Parallel/interating MHalgorithm 5

2.1 Thealgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 DesriptionoftheMHkernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Invarianeproperty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Parallel/interating MwG algorithm 12

3.1 Thealgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 DesriptionoftheMHkernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Invarianeproperty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Numerial tests 19

4.1 Amulti-modalexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 AnhiddenMarkovmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Conlusion 23

(7)

(8)

1 Introdution

MarkovhainMonteCarlo(MCMC)algorithms[19,12,18℄allowsustodrawsamplesfrom

aprobabilitydistribution π(x)^dx^known ûp^to â multipliative onstant. They onsist in sequentiallysimulatingasingleMarkovhainwhoselimitdistributionisπ(x)^dx^. ^Thereêxist

manytehniquesto speeduptheonvergene towardthetargetdistribution byimproving

themixing propertiesofthe hain[13℄. Moreover,speialattentionshould begiventothe

onvergenediagnosis ofthis method[1,6,15℄.

An alternativeis to run manyMarkov hains in parallel. The simplest multiple hain

algorithmistomakeuseofparallelindependenthains[9℄. Thereommendationsonerning

thisideaseemontraditoryin theliterature(f. themany shortruns vs onelongrun

debate desribed in [10℄). We an note with [11℄ and [18, 6.5℄ that independent parallel

hainsouldbeapooridea: amongthesehainssomemaynotonverge,soonelonghain

ould be preferableto many short ones. Moreover,many parallel independenthains an

artiiallyexhibitamorerobustbehaviorwhihdoesnotorrespondtoarealonvergene

ofthealgorithm.

In pratie onehowevermakeuse of several hains in parallel. It is then tempting to

exhange information between these hains to improve mixing properties of the MCMC

samplers [4, 5, 16, 3, 7, 8℄. A general framework of Population Monte Carlo has been

proposedinthisontext[14,17,2℄. Inthispaperweproposeaninteratingmethodbetween

parallelhainswhihprovidesanindependentsamplefromthetargetdistribution. Contrary

topaperspreviouslyited,theproposallawnourworkisgivenanddoesnotadaptitselfto

theprevioussimulations. Hene,theproblemofthehoieofthislawstillremains.

TheMetropolis-Hastings(MH)algorithmanditstheoretialpropertiesarepresentedin

setion2. TheorrespondingMetropoliswithinGibbs(MwG)algorithmanditstheoretial

propertiesarepresentedinsetion3. InSetion4,twosimplenumerialexamplesillustrate

howtheintrodutionofinterationsanspeeduptheonvergeneandhandlemulti-modal

ases.

2 Parallel/interating Metropolis Hastings (MH) algo-

rithm

Considera target density lawπ(x) ^dened ôn (Rⁿ,B(Rⁿ)) ând â ^proposal ^kernel ^density π^prop(y|x)^. ^We ^proposeâ^method ^for ^samplingN independentvaluesX¹, . . . , X^N ∈Rⁿ ôf

thelawπ(x)^dx^.

Notations: Let

X =X^1:N =X1:n ∈R^n×N,

sothatXℓ∈R^N ândXⁱ∈Rⁿ ^(the^same^forY ândZ^);x∈Rⁿ ^so^thatxℓ∈R^(the^same^for y ând z^); ξ, ξ^′ ∈R^. ^HereX^1:N = (X¹, . . . , X^N)ând X1:n = (X1, . . . , Xn)^. ^Weâlso ^dene

(9)

¬ℓ={1, . . . , n} \ {ℓ}^. ^Note^that ^the ^struture^of^the ^matrixX ^is:

X =

Xⁱ

↑







X₁¹ · · · X₁ⁱ · · · X₁^N

.

X_ℓ¹ · · · X_ℓⁱ · · · X_ℓ^N

.

X_n¹ · · · X_nⁱ · · · X_n^N







→ Xℓ .

2.1 The algorithm

WedesribetheMarkov hain {X^(k)}k≥0 ôverR^n×N orresponding theMH algorithm. It onsistsinN ^mutually^dependent realizationsXî,(k) ⁽i= 1, . . . , N⁾ôf^the^state^variableând

itslimitdistributionwillbe

Π(^dX)^def=π(X¹)^dX¹· · ·π(X^N)^dX^N.

WedetailaniterationX^(k)=X →X^(k+1)=Z ôf^the^MHâlgorithm. ^TheN ^vetorsâre

updatedsequentially:

[X^1:N]→[Z¹X^2:N]→[Z^1:2X^3:N]· · ·[Z^1:N⁻¹X^N]→[Z^1:N].

Atsub-iterationi^,^thatîs[Z^1:i−1Xî:N]→[Z^1:iXî+1:N]^,^we^simulateZⁱ ⁱⁿ ^two^steps:

Proposal step: independently onefrom theother, eah hain j = 1· · ·N ^proposes^a ^an-

didate Y^j ∈ Rⁿ âording ^to ^the ^proposal ^kernel ^starting ^from îts ûrrent ^position,

i.e.

Y^j ∼π^prop_i,j (y|Z^1:i−1, Xⁱ, X^i+1:N)^dy .

Notethat theandidatesY^j ^dependâlso ôni^. ^We^willûseâ^lighter^notation:

π_i,j^prop(y|Xⁱ) =π_i,j^prop(y|Z^1:i−1, Xⁱ, X^i+1:N). ⁽¹⁾

(10)

Seletion step: We anhoseamong theseN ândidates Y^1:N ôr^stayât Xⁱ âording^to

themultinomiallaw:

Zⁱ←











Y¹ ^withprobability

1

N α^i,1(Xⁱ, Y¹),

.

Y^N ^withprobability

1

N α^i,N(Xⁱ, Y^N), Xⁱ ^withprobabilityρ˜ⁱ(Xⁱ, Y)

wheretheaeptaneprobabilitiesare

α^i,j(x, y)^def= π(y) π(x)

π^prop_i,j (x|y) π^prop_i,j (y|x)∧1,

˜

ρⁱ(Xⁱ, Y)^def= 1− 1 N

XN

j=1

α^i,j(Xⁱ, Y^j).

Thenal algorithmisdepitedinAlgorithm1.

hooseX∈R^n×N

fork= 1,2, . . . ^do

fori= 1 :N ^do

forj= 1 :N ^do Y^j ∼π^prop_i,j (y|Xⁱ)^dy

α^j←[π(Y^j)π_i,j^prop(Xⁱ|Y^j)]/[π(Xⁱ)π^prop_i,j (Y^j|Xⁱ)]∧1

endfor

˜

ρ←1−_N¹ PN j=1α^j

Xⁱ←











Y¹ ^withprobabilityα¹/N

.

Y^N ^withprobabilityα^N/N Xⁱ ^withprobabilityρ˜

endfor

Algorithm1: Parallel/interatingMHalgorithm.

2.2 Desription of the MH kernel

Lemma2.1 The Markov kernelassoiatedwiththe MH proeduredesribedinSetion2.1

is

P(X;^dZ)^def=P¹(X^1:N;^dZ¹)P²(Z¹, X^2:N;^dZ²)· · ·P^N(Z^1:N−1, X^N;^dZ^N) ⁽²⁾

(11)

where

Pⁱ(Z^1:i−1, X^i:N;^dz)^def= 1 N

XN

j=1

α^i,j(Xⁱ, z)π_i,j^prop(z|Xⁱ)^dz+ρⁱ(Xⁱ)δ_Xⁱ(^dz). ⁽³⁾

Aeptationprobability is

α^i,j(x, z)^def=

rî,j(x, z)∧1 îf(x, z)∈Rî,j,

0 ^otherwise, ⁽⁴⁾

r^i,j(x, z)^def= π(z) π(x)

π_i,j^prop(x|z)

π_i,j^prop(z|x), ⁽⁵⁾

ρⁱ(x)^def= 1− 1 N

XN

j=1

Z

R

α^i,j(x, z)π_i,j^prop(z|x)^dz . ⁽⁶⁾

Theset R^i,j ^is ^dened^by:

R^i,j^def=

(x, z)∈Rⁿ×Rⁿ;π(z)π_i,j^prop(x|z)>0 ^and π(x)π^prop_i,j (z|x)>0 .

Note that the funtions αî,j(x, z)^, ρⁱ(x)^, rî,j(x, z) ând ^the ^set Rî,j ^depend ôn Z^1:i−1 ând Xî:N^.

The measures

ν(^dx×^dz) =π(z)π_i,j^prop(x|z)^dz^dx , ν^T(^dx×^dz) =π(x)π^prop_i,j (z|x)^dz^dx

are mutually absolutely ontinuous over Rî,j ând ^mutually ^singular ôn ^the omplementary set [Rî,j]^c^. ^The ^set Rî,j îsûnique, ûp^to ^the ν ândν^T ^negligible ^sets, ând^symmetri, î.e.

(x, z)∈R^i,j⇒(z, x)∈R^i,j^.

Proof ThisonstrutionfollowsthegeneralsetupproposedbyLukeTierney in [20℄. We

now derive the probability kernel assoiated with the iteration desribed in the previous

subsetion 2.1. ThekernelPⁱ(Z^1:i−1, Xî:N;^dz) îs^the ômposition ôf â proposition kernel andofaseletionkernel:

Pⁱ(Z^1:i−1, X^i:N;^dz) = Z

Y^1:N

Sⁱ(Z^1:i−1, X^i:N, Y^1:N;^dz)Qⁱ(Z^1:i−1, X^i:N;^dY^1:N)

whih onsists in proposing independently N ^andidates Y^1:N ^sampled ^from ^the ^density

proposition,i.e.

Qⁱ(Z^1:i−1, X^i:N;^dY^1:N)^def= YN

k=1

π_i,k^prop(Y^k|Xⁱ)^dY^k

(12)

thentoseletamongtheseandidatesortostayatXⁱ ^with^the^MH^aeptaneprobability, i.e.

Sⁱ(Z^1:i−1, X^i:N, Y^1:N;^dz)^def= 1 N

XN

j=1

α^i,j(Xⁱ, Y^j)δY^j(^dz) + ˜ρⁱ(Xⁱ, Y)δXⁱ(^dz).

Hene:

Pⁱ(Z^1:i−1, X^i:N;^dz) =

= 1 N

XN

j=1

Z

Y^1:N

α^i,j(Xⁱ, Y^j)δY^j(^dz)nY^N

k=1

π^prop_i,k (Y^k|Xⁱ)^dY^ko

+ Z

Y^1:N

˜

ρⁱ(Xⁱ, Y)δ_Xⁱ(^dz)nY^N

k=1

=A1+A2

and

A1= 1 N

XN

j=1

Z

Y^j

α^i,j(Xⁱ, Y^j)δ_Y^j(^dz)π^prop_i,j (Y^j|Xⁱ) Z

Y^¬j

nY^N

k6=j

π_i,k^prop(Y^k|Xⁱ)^dY^ko

| {z }

=1

dY^j

= 1 N

XN

j=1

α^i,j(Xⁱ, z)π^prop_i,j (z|Xⁱ)^dz

beause

R

Y^jδ_Y^j(^dz)dY^j=^dz^. ^The^seond ^termA2^reads:

A2= Z

Y^1:N

˜

ρⁱ(Xⁱ, Y)δ_Xⁱ(^dz)nY^N

k=1

=δ_Xⁱ(^dz) Z

Y^1:N

n1− 1 N

XN

j=1

α^i,j(Xⁱ, Y^j)o nY^N

k=1

=δ_Xⁱ(^dz)n 1− 1

N XN

j=1

Z

Y^1:N

α^i,j(Xⁱ, Y^j) YN

k=1

=δXⁱ(^dz)n 1− 1

N XN

j=1

Z

Y^j

α^i,j(Xⁱ, Y^j)π_i,j^prop(Y^j|Xⁱ)^dY^jo .

SummingupA1 ^andA2 ^proves^the^Lemma. 2

(13)

2.3 Invariane property

Lemma2.2 Forall(x, z)∈Rⁿ×Rⁿ ^a.e. ^we^have:

α^i,j(x, z)π(x)π_i,j^prop(z|x) =α^i,j(z, x)π(z)π^prop_i,j (x|z).

Proof For(x, z)6∈Rî,j ^the^resultîsôbvious. ^Fôr(x, z)∈Rî,j ^we^have:

(r^i,j(x, z)∧1)π(x)π^prop_i,j (z|x)

= minn

π(z)π_i,j^prop(x|z), π(x)π^prop_i,j (z|x)o

= (r^i,j(z, x)∧1)π(z)π_i,j^prop(x|z).

2

Lemma2.3(onditional detailedbalane) The following equality of measuresdened

onRⁿ×Rⁿ

Pⁱ(Z^1:i−1, X^i:N;^dZⁱ)π(Xⁱ)^dXⁱ=Pⁱ(Z^1:i, X^i+1:N;^dXⁱ)π(Zⁱ)^dZⁱ ⁽⁷⁾

holdstrue forany i= 1, . . . , N^,Z^1:i−1∈R^(i−1)×N^,^andX^i+1:N ∈R^(N^−i)×N^.

Proof Left hand sideof (7) is ameasure,say ν(^dZⁱ×^dXⁱ)^on (Rⁿ×Rⁿ,B(Rⁿ×Rⁿ))^.

ForallA1, A2∈ B(Rⁿ)^, ^we^want^to^prove^thatν(A1×A2) =ν(A2×A1)^. ^We^have:

ν(A1×A2) = Z

Pⁱ(Z^1:i−1, X^i:N;A1)1_A

2(Xⁱ)π(Xⁱ)^dXⁱ

and

Pⁱ(Z^1:i−1, X^i:N;A1) = 1 N

XN

j=1

Z 1_A

1(Zⁱ)α^i,j(Xⁱ, Zⁱ)π^prop_i,j (Zⁱ|Xⁱ)^dZⁱ +ρⁱ(Xⁱ)1_A

1(Xⁱ)

sothat

ν(A1×A2)

= 1 N

XN

j=1

Z Z 1_A

1(Zⁱ)1_A

2(Xⁱ)α^i,j(Xⁱ, Zⁱ)π(Xⁱ)π_i,j^prop(Zⁱ|Xⁱ)^dXⁱ^dZⁱ +

Z

ρⁱ(Xⁱ)1_A

1(Xⁱ)1_A

2(Xⁱ)π(Xⁱ)^dXⁱ. ⁽⁸⁾

(14)

AndfromLemma2.2,weget:

ν(A1×A2)

= 1 N

XN

j=1

Z Z 1_A

1(Zⁱ)1_A

2(Xⁱ)α^i,j(Zⁱ, Xⁱ)π(Zⁱ)π^prop_i,j (Xⁱ|Zⁱ)^dZⁱ^dXⁱ +

Z

ρⁱ(Xⁱ)1_A

1(Xⁱ)1_A

2(Xⁱ)π(Xⁱ)^dXⁱ

Exhangingthe nameof variablesXⁱ ↔ Zⁱ ⁱⁿ ^the^rst ^term^of ^the^right^hand ^side ^of^the

previousequality, leadsto thesameexpressionas(8)where A1 ^andA2 ^wereinterhanged,

inotherwordsν(A1×A2) =ν(A2×A1)^. 2

Proposition 2.4(invariane) Theprobability measure

Π(^dX) =π(X¹)^dX¹· · ·π(X^N)^dX^N

isaninvariant distributionofthe Markov kernel P^,^i.e. ΠP = Π^that ^is:

Z

X

P(X,^dZ)nY^N

i=1

π(Xⁱ)^dXⁱo

= YN

i=1

π(Zⁱ)^dZⁱ. ⁽⁹⁾

Proof

Z

X

P(X,^dZ)nY^N

i=1

π(Xⁱ)^dXⁱo

= Z

X

P¹(X^1:N;^dZ¹)P²(Z¹, X^2:N;^dZ²)· · ·

· · ·P^N(Z^1:N⁻¹, X^N;^dZ^N)nY^N

i=1

π(Xⁱ)^dXⁱo

= Z

X

P¹(X^1:N;^dZ¹)π(X¹)^dX¹P²(Z¹, X^2:N;^dZ²)· · ·

· · ·Pⁿ(Z^1:N⁻¹, X^N;^dZ^N)nY^N

i=2

π(Xⁱ)^dXⁱo .

Using(7)withi= 1^gives:

Z

X

P(X,^dZ)nY^N

i=1

π(Xⁱ)^dXⁱo

=

= Z

X

P¹(Z¹, X^2:N;^dX¹)π(Z¹)^dZ¹P²(Z¹, X^2:N;^dZ²)· · ·

· · ·Pⁿ(Z^1:N⁻¹, X^N;^dZ^N)nY^N

i=2

π(Xⁱ)^dXⁱo .