HAL Id: inria-00103871
https://hal.inria.fr/inria-00103871v2
Submitted on 2 Nov 2006
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents
method
Fabien Campillo, Vivien Rossi
To cite this version:
Fabien Campillo, Vivien Rossi. Parallel and interacting Markov chains Monte Carlo method. [Research Report] RR-6008, INRIA. 2006. �inria-00103871v2�
inria-00103871, version 2 - 2 Nov 2006
a p p o r t
d e r e c h e r c h e
Thème NUM
Parallel and interacting
Markov chains Monte Carlo method
Fabien Campillo and Vivien Rossi
N° 6008
October 2006
Fabien Campillo
∗
and VivienRossi
† ‡
ThèmeNUMSystèmesnumériques
ProjetsAspi
Rapportdereherhe n°6008Otober200627pages
Abstrat: Inmany situations it is importantto beable to propose N independent real- izations of a givendistribution law. We propose a strategy for making N parallel Monte
CarloMarkovChains(MCMC)interatinordertogetanapproximationofanindependent
N-sampleofagiventarget law. Inthismethod eahindividual hainproposesandidates
forallotherhains. WeprovethatthesetofinteratinghainsisitselfaMCMCmethodfor
theprodutofN targetmeasures. Comparedtoindependentparallelhainsthismethodis moretimeonsuming,butweshowthroughonreteexamplesthatitpossessesmanyadvan-
tages: itanspeeduponvergenetowardthetargetlawaswellashandlethemulti-modal
ase.
Key-words: MarkovhainMonteCarlomethod,Metropolis-Hastings,interatinghains,
partileapproximation
∗
INRIA/IRISA,Rennes,Fabien.Campilloinria.fr
†
IURC,UniversityofMontpellierIViven.Rossiiur.montp. inse rm. fr
‡
TheresearhoftheseondauthorwasdoneduringapostdotoralstayattheINRIA/IRISA,Rennes.
Résumé : Dans de nombreuses situations il est important de pouvoir disposer de N
réalisations indépendantes d'une loi donnée. Notre but est de développer une stratégie
d'interation deN méthodesde MonteCarloparChaînede Markov(MCCM) dans lebut
de proposer une approximation d'un éhantillon indépendant de taille N d'une loi ible
donnée. L'idée estquehaquehaînepropose unandidatpourelle-mêmemaiségalement
pourtoutesles autreshaînes. Onmontre que l'ensemblede es N haînesen interation
estlui-mêmeuneméthodeMCCMpourleproduitdeN mesuresibles. Cetteapproheest
naturellement plusoûteuse queN haînesindépendantes, onmontre toutefois autravers d'exemplesonretsqu'ellepossèdeplusieursavantages: ellepeutsensiblementaélérerla
onvergeneverslaloiible,ellepermetégalementd'appréhenderleasmultimodal.
Mots-lés: méthodedeMonteCarloparhaînedeMarkov,Metropolis-Hastings,haînes
eninteration,approximationpartiulaire
Contents
1 Introdution 5
2 Parallel/interating MHalgorithm 5
2.1 Thealgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 DesriptionoftheMHkernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Invarianeproperty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Parallel/interating MwG algorithm 12
3.1 Thealgorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 DesriptionoftheMHkernel . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Invarianeproperty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 Numerial tests 19
4.1 Amulti-modalexample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 AnhiddenMarkovmodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5 Conlusion 23
1 Introdution
MarkovhainMonteCarlo(MCMC)algorithms[19,12,18℄allowsustodrawsamplesfrom
aprobabilitydistribution π(x)dxknown upto a multipliative onstant. They onsist in sequentiallysimulatingasingleMarkovhainwhoselimitdistributionisπ(x)dx. Thereexist
manytehniquesto speeduptheonvergene towardthetargetdistribution byimproving
themixing propertiesofthe hain[13℄. Moreover,speialattentionshould begiventothe
onvergenediagnosis ofthis method[1,6,15℄.
An alternativeis to run manyMarkov hains in parallel. The simplest multiple hain
algorithmistomakeuseofparallelindependenthains[9℄. Thereommendationsonerning
thisideaseemontraditoryin theliterature(f. themany shortruns vs onelongrun
debate desribed in [10℄). We an note with [11℄ and [18, 6.5℄ that independent parallel
hainsouldbeapooridea: amongthesehainssomemaynotonverge,soonelonghain
ould be preferableto many short ones. Moreover,many parallel independenthains an
artiiallyexhibitamorerobustbehaviorwhihdoesnotorrespondtoarealonvergene
ofthealgorithm.
In pratie onehowevermakeuse of several hains in parallel. It is then tempting to
exhange information between these hains to improve mixing properties of the MCMC
samplers [4, 5, 16, 3, 7, 8℄. A general framework of Population Monte Carlo has been
proposedinthisontext[14,17,2℄. Inthispaperweproposeaninteratingmethodbetween
parallelhainswhihprovidesanindependentsamplefromthetargetdistribution. Contrary
topaperspreviouslyited,theproposallawnourworkisgivenanddoesnotadaptitselfto
theprevioussimulations. Hene,theproblemofthehoieofthislawstillremains.
TheMetropolis-Hastings(MH)algorithmanditstheoretialpropertiesarepresentedin
setion2. TheorrespondingMetropoliswithinGibbs(MwG)algorithmanditstheoretial
propertiesarepresentedinsetion3. InSetion4,twosimplenumerialexamplesillustrate
howtheintrodutionofinterationsanspeeduptheonvergeneandhandlemulti-modal
ases.
2 Parallel/interating Metropolis Hastings (MH) algo-
rithm
Considera target density lawπ(x) dened on (Rn,B(Rn)) and a proposal kernel density πprop(y|x). We proposeamethod for samplingN independentvaluesX1, . . . , XN ∈Rn of
thelawπ(x)dx.
Notations: Let
X =X1:N =X1:n ∈Rn×N,
sothatXℓ∈RN andXi∈Rn (thesameforY andZ);x∈Rn sothatxℓ∈R(thesamefor y and z); ξ, ξ′ ∈R. HereX1:N = (X1, . . . , XN)and X1:n = (X1, . . . , Xn). Wealso dene
¬ℓ={1, . . . , n} \ {ℓ}. Notethat the strutureofthe matrixX is:
X =
Xi
↑
X11 · · · X1i · · · X1N
.
.
.
.
.
.
.
.
.
Xℓ1 · · · Xℓi · · · XℓN
.
.
.
.
.
.
.
.
.
Xn1 · · · Xni · · · XnN
→ Xℓ .
2.1 The algorithm
WedesribetheMarkov hain {X(k)}k≥0 overRn×N orresponding theMH algorithm. It onsistsinN mutuallydependent realizationsXi,(k) (i= 1, . . . , N)ofthestatevariableand
itslimitdistributionwillbe
Π(dX)def=π(X1)dX1· · ·π(XN)dXN.
WedetailaniterationX(k)=X →X(k+1)=Z oftheMHalgorithm. TheN vetorsare
updatedsequentially:
[X1:N]→[Z1X2:N]→[Z1:2X3:N]· · ·[Z1:N−1XN]→[Z1:N].
Atsub-iterationi,thatis[Z1:i−1Xi:N]→[Z1:iXi+1:N],wesimulateZi in twosteps:
Proposal step: independently onefrom theother, eah hain j = 1· · ·N proposesa an-
didate Yj ∈ Rn aording to the proposal kernel starting from its urrent position,
i.e.
Yj ∼πpropi,j (y|Z1:i−1, Xi, Xi+1:N)dy .
Notethat theandidatesYj dependalso oni. Wewillusealighternotation:
πi,jprop(y|Xi) =πi,jprop(y|Z1:i−1, Xi, Xi+1:N). (1)
Seletion step: We anhoseamong theseN andidates Y1:N orstayat Xi aordingto
themultinomiallaw:
Zi←
Y1 withprobability
1
N αi,1(Xi, Y1),
.
.
.
YN withprobability
1
N αi,N(Xi, YN), Xi withprobabilityρ˜i(Xi, Y)
wheretheaeptaneprobabilitiesare
αi,j(x, y)def= π(y) π(x)
πpropi,j (x|y) πpropi,j (y|x)∧1,
˜
ρi(Xi, Y)def= 1− 1 N
XN
j=1
αi,j(Xi, Yj).
Thenal algorithmisdepitedinAlgorithm1.
hooseX∈Rn×N
fork= 1,2, . . . do
fori= 1 :N do
forj= 1 :N do Yj ∼πpropi,j (y|Xi)dy
αj←[π(Yj)πi,jprop(Xi|Yj)]/[π(Xi)πpropi,j (Yj|Xi)]∧1
endfor
˜
ρ←1−N1 PN j=1αj
Xi←
Y1 withprobabilityα1/N
.
.
.
YN withprobabilityαN/N Xi withprobabilityρ˜
endfor
endfor
Algorithm1: Parallel/interatingMHalgorithm.
2.2 Desription of the MH kernel
Lemma2.1 The Markov kernelassoiatedwiththe MH proeduredesribedinSetion2.1
is
P(X;dZ)def=P1(X1:N;dZ1)P2(Z1, X2:N;dZ2)· · ·PN(Z1:N−1, XN;dZN) (2)
where
Pi(Z1:i−1, Xi:N;dz)def= 1 N
XN
j=1
αi,j(Xi, z)πi,jprop(z|Xi)dz+ρi(Xi)δXi(dz). (3)
Aeptationprobability is
αi,j(x, z)def=
ri,j(x, z)∧1 if(x, z)∈Ri,j,
0 otherwise, (4)
ri,j(x, z)def= π(z) π(x)
πi,jprop(x|z)
πi,jprop(z|x), (5)
ρi(x)def= 1− 1 N
XN
j=1
Z
R
αi,j(x, z)πi,jprop(z|x)dz . (6)
Theset Ri,j is denedby:
Ri,jdef=
(x, z)∈Rn×Rn;π(z)πi,jprop(x|z)>0 and π(x)πpropi,j (z|x)>0 .
Note that the funtions αi,j(x, z), ρi(x), ri,j(x, z) and the set Ri,j depend on Z1:i−1 and Xi:N.
The measures
ν(dx×dz) =π(z)πi,jprop(x|z)dzdx , νT(dx×dz) =π(x)πpropi,j (z|x)dzdx
are mutually absolutely ontinuous over Ri,j and mutually singular on the omplementary set [Ri,j]c. The set Ri,j isunique, upto the ν andνT negligible sets, andsymmetri, i.e.
(x, z)∈Ri,j⇒(z, x)∈Ri,j.
Proof ThisonstrutionfollowsthegeneralsetupproposedbyLukeTierney in [20℄. We
now derive the probability kernel assoiated with the iteration desribed in the previous
subsetion 2.1. ThekernelPi(Z1:i−1, Xi:N;dz) isthe omposition of a proposition kernel andofaseletionkernel:
Pi(Z1:i−1, Xi:N;dz) = Z
Y1:N
Si(Z1:i−1, Xi:N, Y1:N;dz)Qi(Z1:i−1, Xi:N;dY1:N)
whih onsists in proposing independently N andidates Y1:N sampled from the density
proposition,i.e.
Qi(Z1:i−1, Xi:N;dY1:N)def= YN
k=1
πi,kprop(Yk|Xi)dYk
thentoseletamongtheseandidatesortostayatXi withtheMHaeptaneprobability, i.e.
Si(Z1:i−1, Xi:N, Y1:N;dz)def= 1 N
XN
j=1
αi,j(Xi, Yj)δYj(dz) + ˜ρi(Xi, Y)δXi(dz).
Hene:
Pi(Z1:i−1, Xi:N;dz) =
= 1 N
XN
j=1
Z
Y1:N
αi,j(Xi, Yj)δYj(dz)nYN
k=1
πpropi,k (Yk|Xi)dYko
+ Z
Y1:N
˜
ρi(Xi, Y)δXi(dz)nYN
k=1
πpropi,k (Yk|Xi)dYko
=A1+A2
and
A1= 1 N
XN
j=1
Z
Yj
αi,j(Xi, Yj)δYj(dz)πpropi,j (Yj|Xi) Z
Y¬j
nYN
k6=j
πi,kprop(Yk|Xi)dYko
| {z }
=1
dYj
= 1 N
XN
j=1
αi,j(Xi, z)πpropi,j (z|Xi)dz
beause
R
YjδYj(dz)dYj=dz. Theseond termA2reads:
A2= Z
Y1:N
˜
ρi(Xi, Y)δXi(dz)nYN
k=1
πi,kprop(Yk|Xi)dYko
=δXi(dz) Z
Y1:N
n1− 1 N
XN
j=1
αi,j(Xi, Yj)o nYN
k=1
πi,kprop(Yk|Xi)dYko
=δXi(dz)n 1− 1
N XN
j=1
Z
Y1:N
αi,j(Xi, Yj) YN
k=1
πpropi,k (Yk|Xi)dYko
=δXi(dz)n 1− 1
N XN
j=1
Z
Yj
αi,j(Xi, Yj)πi,jprop(Yj|Xi)dYjo .
SummingupA1 andA2 provestheLemma. 2
2.3 Invariane property
Lemma2.2 Forall(x, z)∈Rn×Rn a.e. wehave:
αi,j(x, z)π(x)πi,jprop(z|x) =αi,j(z, x)π(z)πpropi,j (x|z).
Proof For(x, z)6∈Ri,j theresultisobvious. For(x, z)∈Ri,j wehave:
(ri,j(x, z)∧1)π(x)πpropi,j (z|x)
= minn
π(z)πi,jprop(x|z), π(x)πpropi,j (z|x)o
= (ri,j(z, x)∧1)π(z)πi,jprop(x|z).
2
Lemma2.3(onditional detailedbalane) The following equality of measuresdened
onRn×Rn
Pi(Z1:i−1, Xi:N;dZi)π(Xi)dXi=Pi(Z1:i, Xi+1:N;dXi)π(Zi)dZi (7)
holdstrue forany i= 1, . . . , N,Z1:i−1∈R(i−1)×N,andXi+1:N ∈R(N−i)×N.
Proof Left hand sideof (7) is ameasure,say ν(dZi×dXi)on (Rn×Rn,B(Rn×Rn)).
ForallA1, A2∈ B(Rn), wewanttoprovethatν(A1×A2) =ν(A2×A1). Wehave:
ν(A1×A2) = Z
Pi(Z1:i−1, Xi:N;A1)1A
2(Xi)π(Xi)dXi
and
Pi(Z1:i−1, Xi:N;A1) = 1 N
XN
j=1
Z 1A
1(Zi)αi,j(Xi, Zi)πpropi,j (Zi|Xi)dZi +ρi(Xi)1A
1(Xi)
sothat
ν(A1×A2)
= 1 N
XN
j=1
Z Z 1A
1(Zi)1A
2(Xi)αi,j(Xi, Zi)π(Xi)πi,jprop(Zi|Xi)dXidZi +
Z
ρi(Xi)1A
1(Xi)1A
2(Xi)π(Xi)dXi. (8)
AndfromLemma2.2,weget:
ν(A1×A2)
= 1 N
XN
j=1
Z Z 1A
1(Zi)1A
2(Xi)αi,j(Zi, Xi)π(Zi)πpropi,j (Xi|Zi)dZidXi +
Z
ρi(Xi)1A
1(Xi)1A
2(Xi)π(Xi)dXi
Exhangingthe nameof variablesXi ↔ Zi in therst termof therighthand side ofthe
previousequality, leadsto thesameexpressionas(8)where A1 andA2 wereinterhanged,
inotherwordsν(A1×A2) =ν(A2×A1). 2
Proposition 2.4(invariane) Theprobability measure
Π(dX) =π(X1)dX1· · ·π(XN)dXN
isaninvariant distributionofthe Markov kernel P,i.e. ΠP = Πthat is:
Z
X
P(X,dZ)nYN
i=1
π(Xi)dXio
= YN
i=1
π(Zi)dZi. (9)
Proof
Z
X
P(X,dZ)nYN
i=1
π(Xi)dXio
= Z
X
P1(X1:N;dZ1)P2(Z1, X2:N;dZ2)· · ·
· · ·PN(Z1:N−1, XN;dZN)nYN
i=1
π(Xi)dXio
= Z
X
P1(X1:N;dZ1)π(X1)dX1P2(Z1, X2:N;dZ2)· · ·
· · ·Pn(Z1:N−1, XN;dZN)nYN
i=2
π(Xi)dXio .
Using(7)withi= 1gives:
Z
X
P(X,dZ)nYN
i=1
π(Xi)dXio
=
= Z
X
P1(Z1, X2:N;dX1)π(Z1)dZ1P2(Z1, X2:N;dZ2)· · ·
· · ·Pn(Z1:N−1, XN;dZN)nYN
i=2
π(Xi)dXio .