Population Monte Carlo
Olivier Capp ´e
CNRS / ENST D ´epartement TSI, Paris
Arnaud Guillin
Jean–Michel Marin
CEREMADE, Universit ´e Paris Dauphine, Paris
Christian P. Robert
yCEREMADE, Universit ´e Paris Dauphine, and CREST, INSEE, Paris
Summary. Importance sampling methods can be iterated like MCMC algorithms, while being
more robust against dependence and starting values, as shown in this paper. The population
Monte Carlo principle we describe here consists of iterated generations of importance samples,
with importance functions depending on the previously generated importance samples. The
advantage over MCMC algorithms is that the scheme is unbiased at any iteration and can thus
be stopped at any time, while iterations improve the performances of the importance function,
thus leading to an adaptive importance sampling. We first illustrate this method on a toy mixture
example with multiscale importance functions. A second example reanalyses the ion channel
model of Hodgson (1999), using an importance sampling scheme based on a hidden Markov
representation, and compares population Monte Carlo with a corresponding MCMC algorithm.
Keywords: Adaptive algorithm, degeneracy, hidden semi-Markov model, importance sampling,
ion channel model, MCMC algorithms, mixture model, multiple scales, particle system, random
walk, unbiasedness.
1.
Introduction
When reviewing the literature on MCMC methodology, a striking feature is that it has predominantly fo ussed on produ ing single outputs from a given target distribution, . Thismaysoundaparadoxi alstatementwhen onsideringthatoneofthemajorappli ations ofMCMCalgorithmsistheapproximationofintegrals
I= Z
h(x)(x)dx
withempiri alsums
^ I= 1 T T X t=1 h(x (t) ); where(x (t)
)isaMarkov hainwithstationarydistribution. Butthemainissueisthatis onsideredasthelimitingdistributionofx
t
perseandthattheMarkov orrelationbetween
yAddress for orresponden e: CEREMADE, UniversiteParis Dauphine, 16pla edu Mare hal deLattredeTassigny,75775Paris edex16
thex t
'siseva uatedthroughtheergodi theorem(Meyn andTweedie,1993). There only exist a few referen es to the use of MCMC algorithms for the produ tion of samples of size n from , in luding Warnes (2001) and Mengersen and Robert (2003), although the on eptofsimulationfromaprodu tdistribution
N n
isnotfundamentallydierentfrom theprodu tionofasingleoutputfromthetargetdistribution.
AnotherstrikingfeatureintheMCMCliteratureistheearlyattempttodisso iateitself frompre-existingte hniquessu h asimportan esampling, althoughthelattersharedwith MCMCalgorithmsthepropertyofsimulatingfrom thewrongdistribution to produ e ap-proximategenerationfromthe orre tdistribution(seeRobertandCasella,1999,Chap.3). Itisonlylatelythattherealisationthatbothapproa hes anbesu essfully oupled ame upontheMCMC ommunity, asshownfor instan eby Ma Ea hernandPeruggia(2000), Liu(2001), orLiu et al. (2001). One learexample of this fruitfulsymbiosis is given by iteratedparti lesystems(Dou etetal.,2001). Originally,iteratedparti lesystemswere in-trodu edtodealwithdynami targetdistributions,asforinstan einradartra king,where theimperativesofon-linepro essingofrapidly hangingtargetdistributionsprohibitedto resortto repeatedMCMC sampling. Fundamentally, the basi idea, from aMonte Carlo pointof view, onsistsin re y lingpreviousweightedsamplesprimarily througha modi- ationoftheweights(Gordonetal.,1993),possiblyenhan edbyadditionalsamplingsteps (Berzuinietal.,1997; PittandShephard,1999;GilksandBerzuini,2001). Aspointedout inChopin(2002),aparti lesystemsimpliesintoaparti ulartypeofimportan esampling s hemeinastati |asopposedtodynami |setup,wherethetargetdistribution isxed, whi his thesettingwe onsider here.
We thus study in this paper a method, alled population Monte Carlo, that aims at simulating from the target distribution
N n
and that tries to link these dierent \loose ends"intoa oherentsimulationprin iple: PopulationMonteCarloborrowsfrom MCMC algorithmsforthe onstru tionoftheproposal,fromimportan esamplingforthe onstru -tionof appropriateestimators,from SIR(Rubin, 1987)for sampleequalisation, andfrom iterated parti le systems for sample improvement. The population Monte Carlo (PMC) algorithm is in essen e aniterated importan e sampling s hemethat simultaneously pro-du es,at ea h iteration,asampleapproximatelysimulatedfrom atargetdistribution and (approximately) unbiased estimates
^
I of integralsI under that distribution. The sample is onstru ted using sampledependentproposalsfor generationand importan e sampling weightsforpruningtheproposedsample.
Wedes ribein Se tion2thepopulationMonteCarlote hnique,andapplythese devel-opment,rsttoasimplemixtureexampleinSe tion 3,and se ondtothemoreambitious ion hannelmodelthatweassessinSe tion4. Whilereasonablein omplexity,themixture examplestill oersaninterestingmedia to assessthe adaptivityof thepopulationMonte Carlosamplerandtheresistan etodegenera y. Theion hannelmodelismore hallenging inthat thereis no losedform representationoftheobservedlikelihood,whilethe omple-tionstepismoredeli atethanin mixturesettings. Inparti ular, Se tion4.6explainswhy aMetropolis{Hastings algorithm basedon the same proposal aspopulationMonte Carlo doesnotwork.Se tion 5 ontainstheoverall on lusionsof thepaper.
Population Monte Carlo
3
2.
The population Monte Carlo approach
AsnotedinMengersenandRobert(2003),itispossibleto onstru tanMCMCalgorithm asso iatedwiththetargetdistribution
N n (x 1 ;:::;x n )= n Y i=1 (x i ); (1) onthe spa e X n
, rather than with the distribution (x 1
), onthe spa e X. All standard results and s hemes for MCMC algorithms apply in this parti ular ase, and irredu ible Markov hainsasso iatedwithsu h MCMCalgorithms onvergetothetarget
N n
in dis-tribution,thatis,getapproximatelydistributedasaniidsamplefromaftera\suÆ ient" number of iterations. Mengersen and Robert (2003) point out that additional sampling devi es an be used to onstru t the proposal distributions, like Gibbs-type omponent-wiserepulsiveproposalsthatex ludeimmediateneighbourhoodsoftheotherpointsin the sample.
When onsidering,at MCMCiterationt,asamplex (t) =(x (t) 1 ;:::;x (t) n ), we anthink
ofprodu ing thenextiteration of thesample x (t+1)
su h that the omponents x (t+1) i
are
generatedfrom aproposalq(xjx (t) i
). However,rather thana eptingea hproposedx (t+1) i individually (whi h would bea standardform of parallel MCMC sampling) orthe whole sample x
(t+1)
globally (whi h would suer from the urse of dimensionality), we an al-together remove the issue of assessing the onvergen e to the stationary distribution by orre tingatea hiterationfortheuseofthewrongdistribution byimportan e weighting.
Thus, instead of using an asymptoti justi ation to an MCMC iterated simulation s heme, we an instead resort to importan e sampling arguments: if the sample x
(t) is
produ edbysimulatingthex (t) i
'sfromdistributionsq it
,independentlyofoneanother,and
ifweasso iatetoea hpointx (t) i
ofthis sampletheimportan e weight
% (t) i = (x (t) i ) q it (x (t) i ) ; i=1;:::;n;
estimatorsoftheform
I t = 1 n n X i=1 % (t) i h(x (t) i )
areunbiasedforeveryintegrablefun tionhandatevery iterationt.
This isthe startingpoint forpopulation Monte Carlo methods, namelythat extending regular importan e sampling te hniques to ases where the importan e distribution for
x (t) i
may depend on both the sample index i and the iteration index t does not modify theirvalidity. Asalreadyindi ated in Robertand Casella(1999,Lemma 8.3.1)in amore restri tivesetting, importan e sampling estimatorshavethe interestingproperty that the
terms% (t) i h(x (t) i
) areun orrelated, even whenthe proposal q it
depends onthe whole past
oftheexperiment: assumingthatthevarian esvar % (t) i h(x (t) i )
exist forevery1in,
whi hmeansthat theproposalsq it
shouldhaveheaviertailsthan, wehave
var( I t )= 1 n 2 n X var % (t) i h(x (t) i ) ; (2)
duetothe an elingee toftheweights% (t) i
.
Obviously,inmostsettings,thedistributionofinterestisuns aledandwehavetouse instead % (t) i / (x (t) i ) q it (x (t) i ) ; i=1;:::;n;
s aledsothattheweights% (t) i
sumupto1. Inthis ase,theaboveunbiasednessproperty and the varian ede omposition are lost, although they approximately hold. In fa t, the estimationofthenormalising onstantofimproveswithea hiterationt,sin etheoverall average $ t = 1 tn t X =1 n X i=1 (x () i ) q i (x () i ) (3)
isa onvergentestimatoroftheinverseofthenormalising onstant. Therefore,astin reases, $
t
is ontributing less and less to the variability of I t
and the aboveproperties an be onsidered asholdingfor t largeenough. Inaddition, if thesum(3) is onlybasedon the (t 1) rstiterations,thatis, if
% (t) i = (x (t) i ) $ t 1 q it (x (t) i ) ;
thevarian ede omposition (2) anbere overed,viathesame onditioningargument. A related point is that attentionmust bepaid to the sele tion ofthe proposals q it
so that the normalising onstants in these densities (or at least the part that depend oni) mustbeavailable in losedform.
AspointedoutbyRubin(1987)forregularimportan esampling,itispreferable,rather
thantoupdatetheweights% (t) i
atea hiterationt,toresample(withrepla ement)nvalues
y (t) i from(x (t) 1 ;:::;x (t) n
)usingtheweights% (t) i
(and possiblythe varian eredu tiondevi e ofsystemati sampling,asin Carpenteret al.,1998). This partiallyavoidsthedegenera y phenomenon, that is, the preservation of negligible weights and orresponding irrelevant
pointsin thesample. The sample(y (t) 1
;:::;y (t) n
) resultingfrom this sampling importan e resampling(SIR)stepisthusakinto aniidsample extra tedfromtheweightedempiri al distributionasso iatedwith
N n (x 1 ;:::;x n ).
Apseudo- oderenderingofthePMCalgorithmisasfollows
PMCA: PopulationMonteCarlo Algorithm
For
t=1;:::;TFor
i=1;:::;n,
(i) Select the generating distribution
q it ()(ii) Generate
x (t) i q it (x)(iii) Compute
% (t) i =(x (t) i )=q it (x (t) i )Normalise the
% (t)’s to sum up to
1Population Monte Carlo
5
Resample
nvalues from the
x(t)
i
’s with replacement, using the weights
% (t) i, to create
the sample
(x (t) 1 ;:::;x (t) n )Step(i)inthisrepresentationisstressedbe ausethisisanessentialfeatureofthePMC algorithm: the proposal distributions an be individualized at ea h step of thealgorithm withoutjeopardisingthevalidityofthemethod. Theproposalsq
it
anthereforebepi ked a ordingtotheperforman esofthepreviousq
i(t 1)
'sand,in parti ular,they andepend
ontheprevioussample(x (t 1) 1
;:::;x (t 1) n
)orevenonallthepreviouslysimulatedsamples, if storage allows. For instan e, in the mixture setting of Se tion 3, the q
it
's are random
walk proposals entered at the x (t 1) 1
's, with various possible s ales hosen from earlier performan es,andthey ouldalsoin ludelargetailsproposalsasinthedefensivesampling strategyofHesterberg(1998),to ensurenitevarian e. Similarly,Warnes (2001)usesthe previoussampletobuild akernelnon-parametri approximationto .
Thefa t that theproposaldistribution q i(t 1)
andepend onthepastiterationinany
possiblewaywithoutmodifyingtheweight% (t) i
isdue tothefeature thattheunbiasedness equation E h % (t) i h(x (t) i ) i = Z Z (x) q it (x) h(x)q it (x)dxg()d = Z Z h(x)(x)dxg()d =E [h(X)℄;
where denotestheve torofpastrandomvariatesthat ontributetoq it
,doesnotdepend onthedistributiong()ofthisrandom onstituent.
There aresimilaritiesbetweenPMCandearlierproposalsin theparti lesystem litera-ture,in parti ularwithBerzuiniet al.'s(1997)andGilksandBerzuini(2001),sin ethese authors also onsider iterated samples with (SIR) resampling steps based on importan e weights. Amajordieren ethough(besidestheirdynami settingofmovingtarget distri-butions) is that they remain within the MCMC realm by using theresample step before theproposal move. These authors are thus for ed to use Markovtransition kernelswith givenstationarydistributions. ThereisalsoasimilaritywithChopin(2002)who onsiders iteratedimportan esamplingwith hangingproposals. Hissettingisaspe ial aseofPMC inaBayesianframework,wheretheproposalsq
it
aretheposteriordistributionsasso iated withaportionk
t
oftheobserveddataset(andarethusindependentofiandoftheprevious samples).
Asnotedearlier,amostnoti eablepropertyofthePMCmethodisthat thegenerality in the hoi e of theproposaldistributions q
it
is due to the relinquishmentof the MCMC framework. Indeed,without the importan e resampling orre tion, a regularMetropolis{ Hastings a eptan e step for ea h point of the n-dimensional sample produ es aparallel MCMC sampler whi h simply onverges to the target
N n
in distribution. Similarly, a regularMetropolis{Hastingsa eptan estep forthe whole ve torx
(t)
onvergesto N
n ; theadvantageinprodu inganasymptoti approximationtoaniidsampleisbalan edbythe drawba kthatthea eptan eprobabilityde reases approximatelyasapowerofn. Sin e, in PMC,wepi kat ea h iterationthepointsin thesamplea ordingto theirimportan e
weight% (t) i
,webothremovethe onvergen eissueand onstru tasele tionme hanismover boththepointsoftheprevioussampleandtheproposaldistributions. Thisisnotsolelya
that aMetropolis{Hastingss hemebasedonthe sameproposaldoesnotwork well,while aPMCalgorithmprodu es orre tanswers.
The PMC framework thus allowsfor a mu h easier onstru tion of adaptive s hemes, i.e.ofproposalsthat orre tthemselvesagainstpastperforman es,thaninMCMCsetups. Indeed,whileadaptiveimportan esamplingstrategieshavealreadybeen onsideredinthe pre-MCMCarea,asin,e.g.,OhandBerger (1992,1993),theMCMCenvironmentis mu h harsherfor adaptive algorithms, be ausethe adaptivity an els the Markovian nature of thesequen eandthus allsformoreelaborate onvergen estudies toestablishergodi ity. See,e.g.,AndrieuandRobert(2001)andHaarioetal. (1999,2001)forre entdevelopments inthisarea. ForPMCmethods,ergodi ityisnotanissuesin ethevalidityisobtainedvia importan esamplingjusti ations.
The samples produ ed by the PMC method an be exploited as regular importan e samplingoutputsat anyiterationT,andthus donotrequirethe onstru tionofstopping rules as for MCMC samples (Robert and Casella, 1999, Chap. 8). Quite interestingly though,thewholesequen eofsamples anbeexploited,bothforadaptationoftheproposals andforestimationpurposes,asillustratedwiththe onstantapproximation(2). Thisdoes notrequireastati storageofallsamplesprodu edthough,sin eapproximationslike(2) an beupdateddynami ally. Inaddition,thispossibilitytoexploitthewholesetofsimulations impliesthat thesample size nis not ne essarilyverylarge, sin ethe ee tive simulation size isnT. A last remarkis that thenumberofpointsin thesampleis notne essarily onstantoveriterations. As in Chopin (2002), onemayin rease thenumber ofpointsin thesampleon ethealgorithmseemstostabilisein astationaryregime.
3.
Mixture model
Ourrstexample isaBayesianmodelling ofamixture model, whi h isaproblem simple enoughto introdu ebut omplexenoughto leadtopoorperforman esforbadly designed algorithms(RobertandCasella,1999, Chap.9;Cappeetal.,2003). Themixtureproblem we onsiderisbasedonaniidsamplex=(x
1 ;:::;x
n
)from thedistribution
pN( 1 ; 2 )+(1 p)N( 2 ; 2 );
where both p6=1=2and >0 areknown. Theprior asso iated with thismodel, , is a normalN(; 2 =)prioronboth 1 and 2
. Wethus aimatsimulatingfromtheposterior distribution ( 1 ; 2 jx)/f(xj 1 ; 2 )( 1 ; 2 ):
Althoughthe\standard"MCMCresolutionofthemixtureproblemistouseaGibbssampler basedonadataaugmentationstepviaindi atorvariables,re entdevelopments(Celeuxet al., 2000; Chopin, 2002; Cappeet al., 2003)haveshownthat thedata augmentation step isnotne essarytorunanMCMCsampler. WewillnowdemonstratethataPMCsampler anbeeÆ ientlyimplementedwithoutthisaugmentationstepeither.
OurPMCalgorithm isadaptivein thefollowingsense: Theinitializationstep onsists rst in hoosing a set of initial values for
1 and
2
(e.g., a grid of points around the empiri almeanofthex
i
's). Theproposalsarethenrandomwalks,thatis,randomisotropi perturbationsofthepointsofthe urrentsample. Asnotedabove,averyappealingfeature ofthePMCmethodisthattheproposalmayvaryfromonepointofthesampletoanother without jeopardizing the validity of the method. At a rst level, the proposals are all
Population Monte Carlo
7
level, we an also hoose dierent varian es for these normal distributions, for instan e within apredetermined setof ps alesvi (1ip)ranging from 10 3 downto 10 3 ,and sele t these varian es at ea h step of the PMC algorithm a ordingto the performan es of the s ales on the previous iterations. In our implementation, we de ided to sele t a s aleproportionallytoitsnon-degenera yrateonthepreviousiterations. (Notetheformal similarityofthiss heme withStavropoulosand Titterington's(1999)smooth bootstrap, or adaptive importan esampling, andWarnes'(2001)kernel oupler, whenthekernelusedin theirmixtureapproximationof isnormal. Themaindieren eisthatwedonotaimata goodapproximationof usingstandardkernelresultslikebandwidthsele tion,but rather keepthedierents alesv
i
overtheiterations.) OurPMCalgorithmthus looksasfollows:
Mixture PMC Step0: Initialisation
For
j=1;:::;n=pm, choose
( 1 ) (0) j ;( 2 ) (0) jFor
k=1;:::;p, set
r k =m Stepi: Update (i=1;:::;I)For
k=1;:::;p,
(a) generate a sample of size
r kas
( 1 ) (i) j N ( 1 ) (i 1) j ;v kand
( 2 ) (i) j N ( 2 ) (i 1) j ;v k(b) compute the weights
% j / f x ( 1 ) (i) j ;( 2 ) (i) j ( 1 ) (i) j ;( 2 ) (i) j ' ( 1 ) (i) j ( 1 ) (i 1) j ;v k ' ( 2 ) (i) j ( 2 ) (i 1) j ;v k
Resample the
( 1 ) (i) j ;( 2 ) (i) j jusing the weights
% j,
For
k=1;:::;p,
update
rk
as the number of elements generated with variance
vk
which have
been resampled.
where'(qjs;v)isthedensityofthenormaldistribution withmeansandvarian ev atthe pointq.
Asmentionedabove,theweightasso iatedwithea hvarian ev k
isthusproportionalto theregeneration(orsurvival)rateofthe orrespondingsample. Ifmost
j
'sasso iatedwith agivenv
k
arenotresampled,thenextstepwill seelessgenerationsusingthisvarian ev k
. However,to avoidthe ompleteremovalof agivenvarian ev
k
,wemodiedthealgorithm toensurethat aminimumnumberofpointsissimulatedfromea hvarian elevel,namely
Theperforman esoftheabovealgorithmareillustratedonasimulateddatasetof1000 observationsfromthedistribution0:2N(0;1)+0:8N(2;1). Wealsotook=1and=0:1 ashyperparametersof the prior. Applying the abovePMC algorithm to this sample, we see that it produ es a non-degenerate sample, that is, not restri ted to n repli ations of thesamepoint. Inaddition,theadaptivefeaturefor hoosingamongthev
k
'sisdenitely helpful toexplore thestatespa e oftheunknown means. Inthis ase,p=5and theve varian esareequalto5;2;:1;:05and:01. Moreover,at ea h stepi ofthePMCalgorithm, wegeneratedn=1050samplepoints.
ThetwouppergraphsofFigure1illustratethedegenera yphenomenonasso iatedwith thePMCalgorithm: theyrepresentthesizes ofthesamplesissued from thedierent pro-posals,thatis,thenumberofdierentpointsresultingfromtheresamplingstep: theupper left graph exhibits anearly y li behaviorfor the largest varian esv
k
, alternatingfrom nopointissued from these proposals to a largenumber of points. This behaviour agrees withintuition: proposalsthat havetoolargeavarian emostlyprodu epointsthat are ir-relevantforthedistributionofinterest,buton einawhiletheyhappentogeneratepoints that are lose to one of the modes of the distribution of interest. In the later situation, the orrespondingpointsareasso iatedwith largeweights%
j
andare thus heavily resam-pled. The upperrightgraphshowsthat theother proposalsare ratherevenly onsidered along iterations. This is not surprising for the smaller varian es, sin e theymodify very little the urrent sample, but the y li predominan e of the three possible varian es is quitereassuringaboutthemixingabilitiesofthealgorithmandthusaboutitsexploration performan es.
We analsostudy the in uen e ofthe variationin theproposals onthe estimationof the means
1 and
2
, as illustrated by the middle and lowerpanels of Figure 1. First, when onsideringthe umulativemeansoftheseestimationsoveriterations,thequantities qui klystabilise. The orrespondingvarian esarenotso stableoveriterations,butthis is tobeexpe ted,giventheregularreappearan eofsubsamples withlargevarian es.
Figure2providesanadditionalinsightintotheperforman esofthePMCalgorithm,by representingaweightedsampleofmeanswithdotsproportionaltotheweights. Asshould be obviousfrom thisgraph,there is nooverwhelmingpointthat on entratesmostof the weight. On theopposite, the 1050points are ratherevenlyweighted, espe ially forthose loseto theposteriormodesofthemeans.
NotethatabetterPMCs heme ouldbe hosen. Theapproa hsele tedforthisse tion does not take advantageof the latent stru ture of the model, as in Chopin (2002), and ontrarytothefollowingse tion. Indeed,afteraninitializationstep,one ouldrstsimulate the latent indi ator variable onditionally on the previous sample of (
1 ;
2
) and then simulate anew sampleof means onditionally on thelatent indi ator variables. Iterating thisPMCs hemeseemsto onstituteasortofparallelGibbssampling, butthiss hemeis valid at any iteration and anthus bestopped at any time. That wehave notused this approa histoemphasizethat thePMCmethodhasnorealneedofthelatentstru tureof themodeltoworksatisfa torily.
4.
Ion channels
4.1.
The stylised model
Population Monte Carlo
9
0
100
200
300
400
500
0
200
600
Iterations
Resampling
0
100
200
300
400
500
0
200
600
Iterations
Resampling
0
100
200
300
400
500
0
200
600
Iterations
Resampling
0
100
200
300
400
500
0
200
600
Iterations
Resampling
0
100
200
300
400
500
0
200
600
Iterations
Resampling
0
100
200
300
400
500
0
1
2
3
4
Iterations
Var
(
µ
1
)
0
100
200
300
400
500
0.05
0.15
0.25
Iterations
µ
1
0
100
200
300
400
500
0
1
2
3
4
Iterations
Var
(
µ
2
)
0
100
200
300
400
500
2.010
2.020
Iterations
µ
2
Fig. 1. Performances of the mixture PMC algorithm: (upper left) Number of resampled points for
the variances
v1 =5(darker) and
v2 = 2; (upper right) Number of resampled points for the other
variances,
v3
=0:1
is the darkest one; (middle left) Variance of the simulated
1
’s along iterations;
(middle right) Complete average of the simulated
1’s over iterations; (lower left) Variance of the
simulated
2’s along iterations; (lower right) Complete average of the simulated
2’s over iterations.
wellastoBall etal. (1999)andCarpenteret al. (1999),forabiologi almotivationofthis model,alternativeformulations,andadditionalreferen es. Letusinsistatthispointonthe formalisedaspe t onourmodel, whi hpredominantly servesasarealisti supportfor the omparisonofaPMCapproa hwithamorestandardMCMCapproa hinasemi-Markov setting. The ner points of model hoi e and model omparison for themodelling ofion hannel kineti s, while of importan e as shown by Ball et al. (1999) and Hodgson and Green (1998), are not addressedby the presentpaper. Note also that, while a Bayesian analysisofthis modelprovidesa ompleteinferentialperspe tive,thefo usofattentionis generallysetontherestorationofthetrue hannel urrent,ratherthanontheestimation oftheparametersofthemodel.
Consider, thus, observables y = (y t
) 1tT
dire ted by a hidden Gamma (indi ator) pro essx=(x
t )
1tT
inthefollowingway:
y t jx t N( x t ; 2 ); whilex t 2f0;1g,withdurationsd j Ga(s i ; i
)(i=0;1). Moreexa tly,thehiddenpro ess (x) is a( ontinuous time) Gammajump pro esswith jump timest (j =1;2;:::) su h
−0.4
−0.2
0.0
0.2
0.4
1.6
1.8
2.0
2.2
2.4
µ
1
µ
2
Fig. 2. Representation of the log-posterior distribution via grey levels (darker stands for lower and
lighter for higher) and of a weighted sample of means. (The weights are proportional to the surface
of the circles.)
Population Monte Carlo
11
that d j+1 =t j+1 t j Ga(s i ; i ) ifx t =ifort j t<t j+1 , that is,E[d j+1 ℄=s i = i. Figure3providesasimulatedsample ofsize4000fromthismodel.
0
1000
2000
3000
4000
−2
0
2
4
t
y
Fig. 3. Simulated sample of size
4000from the ion channel model
A rstmodi ationofHodgson's(1999) ion hannel model is introdu ed at thislevel: weassumethatthedurationsd
j
,thatis,thetimeintervalsin whi hthepro ess(x t
) 1tT remainsin agiven state,areintegervalued, ratherthan realvalued. Thereasonsforthis hangearethat
(a) thetruedurationsoftheGammapro essarenotidentiable;
(b) thismodel isastraightforwardgeneralisationofthehiddenMarkovmodelwherethe jumps do o ur at integer times (see Ball et al., 1999, or Carpenter et al., 1999). A natural generalisation of the geometri duration of the hidden Markov model is a negative binomial distribution, Neg(s;!), whi h is very lose to a Gamma den-sity Ga(s+1; log(1 !)) (up to a onstant) for s small. Indeed, the former is approximately d s s! (1 !) d ! 1 ! s
whilethelateris
d s s! (1 !) d f log(1 !)g s+1
(The simulations detailed below were also implemented using a negative binomial modelling, leadingtoverysimilarresultsintherestorationpro ess.)
( ) inferen e on the d j 's given (x t ) 1tT
involves an extra level of simulations, even if it an be easily implemented via a sli e sampler, as long as we do not onsider thepossibility ofseveral jumpsbetweentwointegerobservationaltimes. (This later possibilityisa tuallynegligibleforthedatasets we onsider.); and
(d) therepla ementofd j
byitsintegralpartdoesnotstronglymodifythelikelihood.
Inasimilarvein,weomitthe ensoringee tofboththerstandthelast intervals,given thatthein uen eofthis ensoringonalongseriesisbound tobesmall.
Ase ond modi ationofHodgson's(1999)modelisthatwe hooseauniformpriorfor the shape parameterss
0 and s
1
on the nite set f1;:::;Sg, rather than an exponential +
(a) thehiddenMarkovpro esshasgeometri swit hingtimes,whi h orrespondto expo-nentialdurations. Anaturalextension isto onsiderthat thedurationsof thestays within ea hstate(orregime) anberepresentedasthe umulateddurationofs
i ex-ponential stays, with s
i
an unknown integer, whi h exa tly orresponds to gamma durations. This representation thus removesthe need to all for alevelof variable dimension modelling. Carpenter et al. (1999) and Hodgson and Green (1999) use a dierent approa h, based on the repli ation of the \open"and " losed" sets into severalstates,toapproximate thesemi-Markovmodel.
(b) thefollowingsimulationsshowthat theparameterss 0
and s 1
arestronglyidentied bytheobservables(y
t )
1tT ;
( ) thepriorinformationontheparameterss 0
ands 1
ismostlikelytobesparseandthus auniformpriorislessinformativethanaGammapriorwhenS islarge;and
(d) theuseofanitesupportpriorallowsforthe omputationofthenormalising onstant intheposterior onditionaldistributionoftheparameterss
0 ands
1
,afeaturethatis paramountfortheimplementationofPMC.
A third modi ation,when omparedwith bothHodgson(1999) andCarpenter etal. (1999),isthat theobservablesareassumedto beindependent, giventhex
t
's,rather than distributed from either an AR(15) (Hodgson, 1999) oran ARMA(1,1) (Carpenter et al., 1999)model. Thismodi ationsomehowweakenstheidentiabilityofbothregimesasthe databe omespotentiallymorevolatile.
TheotherparametersofthemodelaredistributedasinHodgson(1999),using onjugate priors, 0 ; 1 N( 0 ; 2 ) 2 G(;) 0 ; 1 G(;)
Figure4illustrates thedependen esindu edbythismodellingonaDAG.
1
y
0
µ
µ
1
τ
2
σ
ζ
η
α
β
S
S
S
λ
λ
x
0
1
0
Fig. 4. DAG representation of the probabilistic dependences in the Bayesian ion channel model.
This formalised ion hannelmodelis thus aspe ial ase ofdis rete time hidden semi-Markovmodelforwhi h thereexists noexpli itpolynomialtimeformulafortheposterior distribution of the hidden pro ess (x
t )
1tT
, as opposed to the hidden Markov model with the forward{ba kward formula of Baum and Petrie (1966). From a omputational (MCMC) point of view, there is therefore no way of integrating this hidden pro ess out tosimulatedire tlytheparameters onditionalon theobservables(y) , aswasdone
Population Monte Carlo
13
in thehidden Markovmodelby, e.g.,Cappe etal. (2003). Note alsothat, asopposed to Hodgson(1999), we use the saturated missing data representation of the model via x to avoidthe ompli ationof usingreversiblejump te hniques forwhi h PMCalgorithmsare morediÆ ulttoimplement.4.2.
Population Monte Carlo for ion channel models
Our hoi e of proposal fun tion is based on the availability of losed form formulas for thehidden Markov model. We thus reate a pseudohidden Markov model based on the urrentvaluesoftheparametersfortheion hannelmodel,simplybybuildingtheMarkov transitionmatrixfrom theaveragedurationsinea hstate,
P= 1 0 s0 0 s0 1 s1 1 1 s1 ! ;
sin e,forahiddenMarkovmodel,theaveragesojournwithinonestateisexa tlytheinverse ofthetransitionprobabilitytotheotherstate. Wedenoteby
H
(xjy ;!)thefull onditional distributionofthehiddenMarkov hainxgiventheobservablesy andtheparameters
!=( 0 ; 1 ;; 0 ; 1 ;s 0 ;s 1 )
onstru tedvia theforward{ba kwardformula: see, e.g., Cappeet al. (2003)for details. Thesimulationoftheparameters!pro eedsinanaturalwaybyusingthefull onditional distribution(!jy ;x) sin e it is available. In order not to onfuse theissues, we do not onsiderthepossibleadaptationoftheapproximationmatrixPovertheiterations,thatis, amodi ationoftheswit hprobabilitiesfrom
i =s
i
(i=1;2).
Note that Carpenter et al. (1999) also onsider the ion hannel model in their par-ti le lter paper, with the dieren es that they repla e the semi-Markov stru ture with an approximative hidden Markovmodel with morethan 2 states, and that theywork in a dynami setting based on this approximation. The observables y are also dierent in thatthey omefromanARMA(1,1)modelwithonlythelo ationparameterdependingon theunknownstate. HodgsonandGreen (1998)similarly ompared severalhiddenMarkov modelwithdupli ated\open"and\ losed"states. Balletal. (1999)alsorelyonahidden Markovmodellingwithmissingobservations.
Thesubsequentuseofimportan esamplingbypassestheexa tsimulationofthehidden pro ess(x
t )
1tT
andthusavoidsthere ourseto variable dimensionmodelsand tomore sophisti atedtoolslikereversiblejumpMCMC. Thissaturationoftheparameterspa eby theadditionof thewholeindi ator pro ess(x
t )
1tT
isobviouslymore ostly intermsof storage,butitprovidesunrestri tedmovesbetween ongurationsofthepro ess(x
t )
1tT . Sin e we do not need to dene the orresponding jump moves, we are thus less likely to en ountertheslow onvergen eproblemsofHodgson(1999).
WethereforerunPMCasin thefollowingpseudo- oderendering,whereI denotes the numberofiterations,T beingusedforthenumberofobservations:
Ion hannel PMC Step0: Initialisation
For
j=1;:::;n,
(i) generate
(! (j) ;x (j) )(!) H (xjy ;! (j) )(ii) compute the weight
%j /(! (j) ;x (j) jy ) =(! (j) ) H (x (j) jy ;! (j) )
Resample the
(! (j) ;x (j) )j
using the weights
% j Stepi: Update(i=1;:::;I)For
j=1;:::;n,
(i) generate
(! (j) ;x (j) + )(!jy ;x (j) ) H (xjy ;! (j) )(ii) compute the weight
%j /(! (j) ;x (j) + jy ) Æ (! (j) jy ;x (j) ) H (x (j) + jy ;! (j) )
Resample the
(! (j) ;x (j) + )j
using the weights
%j, and take x (j) = x (j) +
(
j = 1;:::;n).
Thejusti ation forthe weights% j
used in theabovealgorithmis that onditional on
the x (j)
's, ! (j)
is simulated from (!jy ;x (j) ) and, onditional on ! (j) , x (j) + is simulated from H (xjy ;! (j)
). Thenormalisingfa torofthe% j
's onvergestothe orre t onstantby thelawoflargenumbers.
4.3.
Normalising constants
Letusstressthespe i ityofthePMCmethodintermsofnormalising onstants: (!jy ;x) isavailablein losedform(seebelowinSe tion4.4),in ludingitsnormalising onstant,due to the onjuga yof the distributions on
0 ; 1 ;; 0 ; 1
and the niteness ofthe support of s
0 ;s
1
. The onditional distribution H
(xjy ;!) is also available with its normalising onstant,byvirtueoftheforward{ba kwardformula. TheonlydiÆ ultyintheratio
(!;xjy )
(!jy ;x) H
(xjy ;!)
lieswithinthenumerator(!;xjy )whosenormalisedversionisunknown. Wethereforeuse insteadtheproportionalterm
(!;xjy )/(!)f(y jx;!)f(xj!): (4)
and normalisethe % j
'sby theirsum. The foremost featureof this reweightingis that the normalising onstantmissingin(4)onlydependsontheobservablesyandisthereforetruly
a onstant, that is, doesnotdependon thepreviousvalueof thepoint x (j)
. Thiss heme ru iallyrelies on (i)thepoints en ompassingboththeparameters! and thelatentdata
Population Monte Carlo
15
4.4.
Simulation details
Figure 5 illustrates the performan es of PMC by representing the graph of the dataset againstthettedaverage
J X j=1 % j x (j) t
forea hobservationy t
. Asobviousfrom thepi ture,thetisquitegood.
0
1000
2000
3000
4000
−2
0
2
4
−3
−2
−1
0
1
2
0.0
0.3
0.6
0
1000
2000
3000
4000
0.0
0.6
Fig. 5. (top) Histograms of residuals after fit by averaged
xt
; (middle) Simulated sample of size
4000against fitted averaged
xt
; (bottom) Probability of allocation to first state for each observation
TheunobservedGammapro essisdistributedasM Y m=1 (t m+1 t m ) sm 1 sm m e m(tm+1 tm) (s m ) = n 0 s 0 0 e 0 v 0 s 0 1 0 (s 0 ) n0 n 1 s 1 1 e 1 v 1 s 1 1 1 (s 1 ) n1 ;
withobviousnotations: Misthenumberof hanges,thet m
'sarethesu essivetimeswhen thegammapro ess hanges state,thes
m 's,
m
'sarethe orrespondingsequen esofs 0 ;s 1 and 0 ; 1 ,n i
isthenumberofvisitsto statei, i
istheprodu tofthesojourndurations in state i [ orresponding to the geometri mean℄, v
i
thetotal sojourn durationin statei [ orrespondingtothearithmeti mean℄. (Thisisbasedontheassumptionofno ensoring, madeinSe tion4,namelythatt
1
=1andt M+1
=T +1.) Theposteriordistributionsonthe
i
'sand 2
[ onditionalonthehiddenpro ess℄are thusthestandardNormal-Gamma onjugatepriorswhile
i js i ;x Ga(+n i s i ;+v i ) s i jx (s i jx)/ i (+v i ) n i s i (n i s i +) (s i ) n i I f1;2;:::;Sg (s i )
Therefore, ex eptfor the s i
Thedistributiononthes i
'sishighlyvariable,in thattheprodu t i (+v i ) n i s i (n i s i +) (s i ) n i (5)
often leads to a highly asymmetri distribution, whi h puts most of the weight on the
minimumvalueof s. Indeed, when the geometri and arithmeti means, 1=n i
and v i
=n, are similar, a simple Stirling approximation to the Gamma fun tion leads to (5) being equivalentto p n= p s n .
Figure6givesthehistogramsoftheposteriordistributionsofthevariousparametersof ! withoutreweighting by theimportan e sampling weights%
j
. As seenfrom this graph, thehistograms in
i
and are well on entrated,while the histogramin 1
exhibits two modes whi h orrespond to the two modes of the histogram of s
1
and indi ate that the parameter(
i ;s
i
)isnotwellidentied. Thisistobeexpe ted,giventhatweonlyobservea fewrealisationsoftheunderlyinggammadistribution,andthiswith addednoisesin ethe durationsarenotdire tlyobserved. However,thehistogramsoftheaveragedurationss
i =
i donotexhibitsu hmultimodalityandarewell- on entratedaroundthevaluesofinterest.
µ
0
0.00
0.02
0.04
0.06
0.08
0
10
20
30
40
µ
1
2.12
2.14
2.16
2.18
2.20
2.22
2.24
0
5
10
20
σ
2
0.54
0.56
0.58
0.60
0
10
20
30
40
(
µ
0
− µ
1
)
σ
−3.00
−2.95
−2.90
−2.85
−2.80
−2.75
−2.70
0
2
4
6
8
10
λ
0
0.02
0.03
0.04
0.05
0
40
80
120
λ
1
0.05
0.10
0.15
0.20
0
10
20
30
s
0
1.0
1.5
2.0
2.5
3.0
0
10
30
s
1
1
2
3
4
5
0
4
8
12
s
0
λ
0
50
60
70
80
90
100
0.00
0.03
0.06
s
1
λ
1
25
30
35
40
0.00
0.10
Fig. 6. Histograms of the samples produced by PMC, before resampling
4.5.
Degeneracy
Asnotedabove,PMCissimplyanimportan esamplingalgorithmwhenimplementedon e, that is, for asingle olle tionof n points (!
(j) ;x
(j)
). As su h, itprovidesan approxima-tiondevi e forthe target distribution but it is alsowell-known that a poor hoi e of the importan e sampling distribution anjeopardise theinterestof the approximation, asfor instan ewhentheweights%
j
haveinnitevarian e.
An in entiveofusing PMCin astati setting isthus toover omeapoor hoi eof the importan efun tionbyre y lingthebestpointsanddis ardingtheworstones. Thispoint ofviewmakesPMCappearasaprimitivekindofadaptivealgorithm,inthatthesupportof theimportan efun tionisadaptedtotheperforman eofthepreviousimportan esampler. The diÆ ulty with this approa h is in determining thelong-term behaviourof the
al-Population Monte Carlo
17
thealgorithm any longer. For instan e, it often happensthat onlya few pointsare kept aftertheresamplingstepofthealgorithm,be auseonlyafewweights%j
aredierentfrom 0. Figure 7 givesfor instan e the sequen e of the numberof points that matter at ea h iteration,outofn=1000originalpoints: theper entageofrelevantpointsisthuslessthan 10%onaverageandinfa tmu h loserto5%. Inaddition,thereisno lear utstabilisation in either thenumber of relevantpoints orthe varian eof the orresponding weights, the laterbeingfarfrom exhibitingastabilisation asthenumberofiterationsin reases. Some morerudimentarysignals anbe onsideredthough,likethestabilisationofthetinFigure 8. Whiletheaveragesfor1and 2iterationsarequite unstablefor mostobservations, the twostatesaremu hmore learlyidentiedfor5and10iterations,andhardly hangeover subsequentiterations.
2e−04
4e−04
6e−04
8e−04
1e−03
0
20
40
60
80
100
120
Fig. 7. (left) Variance of the weights
%j
along
100
iterations, and (right) Number of points with
descendants along
100iterations, for a sample of
4000observations and
1000points.
Arelatedphenomenonpertainstothedegenera yofan estorsobservedintheiterations of our algorithm: asthe number of steps in reases, the number of points from the rst generation used to generate points from the last generation diminishes and, after a few dozeniterations,redu esto asinglean estor. Thisisforinstan ewhat o ursin Figure9 where,after onlytwoiterations,thereisasinglean estortothewholesample. (Notealso theiterations where the whole sample originatesfrom asingle point.) This phenomenon appears in everysetting and, while it annot be avoided, sin e somepoints arebound to vanishatea hiterationevenwhenusingthesystemati samplingofCarpenteretal. (1999) thesurprising fa toristhespeedwithwhi hthenumberofan estorsde reases.
4.6.
A comparison with Hastings–Metropolis algorithms
Asmentionedabove,theproposaldistributionasso iatedwith thepseudo hiddenMarkov model ouldalternativelybeused asaproposaldistribution in a Metropolis{Hastings al-gorithmofthefollowingform:
MCMC Algorithm
Step
i(
i=1;:::;I)
(a) Generate
! (i) (!jy ;x (i 1) )0
500
1000
1500
2000
−1
0
1
2
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
0.0
0.4
0.8
0
500
1000
1500
2000
0.0
0.6
0
500
1000
1500
2000
−1
0
1
2
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
0.0
0.6
0
500
1000
1500
2000
0.0
0.6
0
500
1000
1500
2000
−1
0
1
2
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
0.0
0.4
0.8
0
500
1000
1500
2000
0.0
0.6
0
500
1000
1500
2000
−1
0
1
2
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
2.0
0.0
0.4
0.8
0
500
1000
1500
2000
0.0
0.6
Fig. 8. Successive fits of PMC iterated by the weight-resample algorithm for
2000observations and
n= 2000
points, for (clockwise starting from top left)
1,
2,
5and
10iterations. (See the caption of
Fig. 5 for a description of the displayed quantities.)
(b) Generate
x ? H (xjy ;! (i) ),
uU([0;1℄)and take
x (i) = 8 < : x ?if
u (x ? j! (i) y ) H (x ? jy ;! (i) ) (x (i 1) j! (i) y ) H (x (i 1) jy ;! (i) ) ; x (i 1)otherwise
The performan es of this alternative algorithm are, however, quite poor. Even with awell-separateddatasetlikethesimulateddatasetrepresentedin Figure 3,thealgorithm requires a very areful preliminary tuning not to degenerate into a single state output. More pre isely, the following o urs: when started at random, the algorithm onverges veryqui klytoa ongurationwherebothmeans
0 and
1
oftheion hannel modelare very lose to one another (and to the overall mean of the sample), with, orrelatively, a largevarian e
2
andveryshortdurationswithinea hstate. Toover omethisdegenera y of the sample, we had paradoxi ally to resort to a sequential implementation as follows: noti ingthatthedegenera yisonlyo urringwithlargesamplesizes,westarttheMCMC algorithm on the rst 100 observations y
1:100
and, on e a stable onguration has been a hieved,wegraduallyin reasethenumberofobservationstakenintoa ount[byafa tor of min(s 0 = 0 ;s 1 = 1
)℄ till the whole sample is in luded. The results provided in Figures 10{12wereobtainedfollowingthiss heme.
For on iseness' sake, we did not reprodu e the historyof the allo ations x over the iterations. The orresponding graphshows averystable historywith hardly any hange, ex eptonafewboundaries. Notethe onne tedstrongstabilityinthenumberofswit hes
Population Monte Carlo
19
5
10
15
20
25
0
200
400
600
800
1000
Fig. 9. Representation of the sequence of descendants (lighter grey) and ancestors (darker grey)
along iterations through bars linking a given ancestor and all its descendants (light grey) or a given
point and its ancestor (darker grey). In the simulation corresponding to this graph, there were
4000observations and
1000points.
iterationsoftheMCMCsamplerwouldhavebeenne essarybutourpurposeistoillustrate theproperbehaviourofthissampler,providedtheinitialisationisadequate.℄
AttemptswithverymixeddatasetsastheoneusedinFigure8weremu hlesssu essful sin e, even with a areful tuning of the starting values (we even tried starting with the knownvaluesoftheparameters),we ouldnotavoidthedegenera ytoasinglestate. The problemwiththeMetropolis{Hastingsalgorithminthis aseis learlyastrongdependen e on the starting value, i.e., a poor mixing ability. This is further demonstrated by the following experiment: when starting the above sampler from n = 1000 points obtained byrunning PMC20 times, the sampleralwaysprodu ed asatisfa tory solutionwith two lear utstatesandnodegenera y. Figure12 omparesthedistributionsofthePMCpoints and the MCMC samples via a qq-plot and shows there is very little dieren e between both. The samebehaviour is shown by a omparison of the allo ations(not represented here). This indi ates that theMCMC algorithm does notleadto abetter explorationof theparameterspa e.
Fora fairly mixed datasetof 2000 observations orresponding to Figure 8, while the MCMC algorithminitialised at random ould not avoid degenera y, apreliminary run of PMCprodu ed stable allo ationsto twostates,asshownin Figure 13 by thet forboth PMCandMCMCsamples: theyareindistinguishable,eventhoughtheqq-plotsinFigure 14indi atedierenttailbehaviours.
ThisisnottosaythatanMCMCalgorithm annotworkinthissetting,sin eHodgson (1999) demonstrated the ontrary, but this shows that global updating s hemes, that is, proposals that update the whole missing data x at on e, are diÆ ult to ome with, and that one has to instead rely on more lo al moves asthose proposed by Hodgson (1999). A similar on lusion was drawn by Billio et al. (1999) in the setup of swit hing ARMA
0
1000
2000
3000
0.0
0.2
0.4
0.6
0.8
1.0
0
1000
2000
3000
−2
0
2
4
Fig. 10. Representation of a dataset of
3610simulated values, along with the average fit (bottom),
and average probabilities of allocation to the upper state (top). This fit was obtained using a
sequen-tial tuning scheme and
5000MCMC iterations in the final run.
5.
Conclusion
Theabovedevelopmentshave onrmed Chopin's(2002)realisationthatPMC isauseful toolinstati |asopposedtosequential,ratherthandynami |setups. Quiteobviously,the spe i Monte Carlos heme webuilt anbeusedin asequentialsettingin averysimilar way. The omparison with the equivalent MCMC algorithm in Se tion 4.6 is also very instru tiveinthatitshowsthesuperiorrobustnessofPMCtoapossiblypoor hoi eofthe proposaldistribution.
There still are issues to explore about PMC s heme. In parti ular, a more detailed assessmentoftheiterativeandadaptivefeaturesisinorder,tode idetowhi hextentthis isarealasset. Whentheproposaldistributionisnotmodiedoveriterations,asinSe tion 4,it is possiblethat there isan equivalent to the \ uto phenomenon": after agivent
0 , thedistribution of x
(t)
may be verysimilar for allt t 0
. Further omparisonswith full Metropolis{Hastingsmovesbasedon similar proposalswould alsobeof interest,to study whi h s heme brings the most information about the distribution of interest. The most promising avenue seems however the development of adaptive proposals as in Se tion 3, whereone anborrowfromearlierwork onMCMC algorithmstobuildassessmentsofthe improvementbroughtbymodifyingtheproposals. Ourfeelingatthispointisthatalimited numberofiterationsisne essarytoa hievestabilityoftheproposals.
An extensionnotstudiedin thispaperisthatthePMCalgorithm anbestartedwith afewpointsthat exploretheparameterspa eand,on ethemixingiswell-established,the samplesize anbein reasedtoimprovethepre isionoftheapproximationtotheintegrals ofinterest. Thisisastraightforwardextensionintermsofprogramming,butthesele tion
Population Monte Carlo
21
µ
0
−0.04
−0.02
0.00
0.02
0.04
0.06
0.08
0 5 15 25µ
1
2.10
2.15
2.20
2.25
0 5 10 15 20σ
2
0.54
0.56
0.58
0.60
0.62
0 10 20 30(
µ
0
−µ
1
)
σ
−3.00
−2.95
−2.90
−2.85
−2.80
−2.75
−2.70
0 2 4 6 8λ
0
0.01
0.02
0.03
0.04
0.05
0 20 60 100λ
1
0.05
0.10
0.15
0.20
0.25
0.30
0 2 4 6 8 12s
0
1.0
1.5
2.0
2.5
3.0
0 10 20 30s
1
2
3
4
5
6
7
8
9
0.0 1.0 2.0 3.0s
0
λ
0
50
60
70
80
90
100
110
0.00 0.02 0.04s
1
λ
1
25
30
35
40
0.00 0.10 0.200
1000
2000
3000
4000
5000
0.025 0.035 0.045µ
0
0
1000
2000
3000
4000
5000
2.170 2.180 2.190µ
1
0
1000
2000
3000
4000
5000
0.566 0.570 0.574 0.578σ
2
0
1000
2000
3000
4000
5000
0.016 0.020 0.024λ
1
0
1000
2000
3000
4000
5000
0.10 0.12 0.14λ
2
0
1000
2000
3000
4000
5000
1.0 1.2 1.4 1.6s
1
0
1000
2000
3000
4000
5000
2.5 3.0 3.5 4.0 4.5s
2
0
200
400
600
800
1000
70 72 74 76 78Fig. 11. Details of the MCMC sample for the dataset of Figure 10: (left) histograms of the components
of the MCMC sample and (right) cumulative averages for the parameters of the models and evolution
of the number of switches (lower right graph).
Acknowledgments
Theauthors aregrateful to the Friday lun h dis ussion groupat Institut Henri Poin are fortheirinput. Commentsfrom Ni olasChopinandGarethRobertswerealsoparti ularly helpful. Earlier onversationswithChristopheAndrieuandArnaud Dou et in Cambridge havenon- oin idental onne tionswiththis workandaremostsin erelya knowledged.
References
Andrieu,C.and RobertC.P. (2001) ControlledMarkov hain MonteCarlo methodsfor optimal sampling. Cahiersdu Ceremade 0125,UniversiteParisDauphine.
Ball,F.G.,Cai,Y.,Kadane,J.B.andO'Hagan,A.(1999)Bayesianinferen eforion hannelgating me hanismsdire tlyfromsingle hannelre ordings, usingMarkov hain MonteCarlo. Pro . RoyalSo ietyLondonA455,2879{2932.
Baum, L.E. and Petrie, T. (1966) Statisti al inferen e for probabilisti fun tions of nite state Markov hains. AnnalsMath. Statist. 37,1554{1563.
Berzuini, C., Best, N., Gilks, N.G., and Larizza, C. (1997) Dynami onditional independen e modelsandMarkovChainMonteCarlomethods. JASA92,590{599.
Billio,M., Monfort,A.andRobert,C.P.(1999) Bayesianestimationofswit hingARMAModels. J.E onometri s93,229{255.
−0.05
0.00
0.05
0.10
−0.05
0.00
0.05
0.10
µ
0
particles
mcmc
2.10
2.15
2.20
2.25
2.05
2.10
2.15
2.20
2.25
µ
1
particles
mcmc
0.55
0.60
0.65
0.52
0.54
0.56
0.58
0.60
0.62
0.64
0.66
σ
2
particles
mcmc
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0.06
λ
0
particles
mcmc
0.5
1.0
1.5
2.0
2.5
0.5
1.0
1.5
2.0
2.5
λ
1
particles
mcmc
Fig. 12. QQ-plot comparing the distribution of the sample with the distribution of the MCMC sample
obtained after
5000iterations started at the PMC points.
0
500
1000
1500
2000
−3
−2
−1
0
1
2
3
Fig. 13. Simulated dataset of
2000points with superimposed fits by PMC and the MCMC samples,
the later being initialised by PMC. Both fits are indistinguishable.
Carpenter, J.,Cliord, P.,and Fernhead, P. (1999) Building robust simulation basedlters for evolvingdatasets. Te hni alreport, DepartmentofStatisti s,OxfordUniversity.
Celeux, G., Hurn, M. and Robert, C.P. (2000) Computational and inferential diÆ ulties with mixtureposteriordistributions. J.Amer. Statist. Asso . 95,957{979.
Chopin,N.(2002)Asequentialparti leltermethodforstati models. Biometrika89,539{552. Dou et,A.,deFreitas,N.,andGordon,N.(2001)SequentialMCMCinPra ti e. Springer-Verlag,
NewYork.
Gilks,W.R.,andBerzuini,C.(2001)Followingamovingtarget{MonteCarloinferen efordynami Bayesianmodels. JRSSB63(1),127{146.
Gordon, N.,Salmond, D., andSmith, A.F.M. (1993) Novelapproa hto nonlinear/non-Gaussian Bayesianstateestimation. IEEEPro eedingsF,140,107{113.
Green, P.J. (1995) Reversible jump MCMC omputation and Bayesian model determination. Biometrika82(4),711{732.
Haario,H.,Saksman,E.andTamminen,J.(1999)Adaptiveproposaldistributionforrandomwalk Metropolisalgorithm. Computational Statisti s 14375{395.
Population Monte Carlo
23
0.85
0.90
0.95
1.00
1.05
1.10
1.15
0.80
0.90
1.00
1.10
µ
1
particle
mcmc
−0.10
−0.05
0.00
−0.15
−0.05
µ
2
particle
mcmc
0.95
1.00
1.05
1.10
0.95
1.00
1.05
1.10
σ
2
particle
mcmc
0.02
0.04
0.06
0.08
0.02
0.04
0.06
0.08
λ
1
particle
mcmc
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.01
0.03
0.05
0.07
λ
2
particle
mcmc
15
20
25
30
35
40
45
20
30
40
50
s
1
λ
1
particle
mcmc
Fig. 14. QQ-plot comparing the distribution of the PMC sample with the distribution of the MCMC
sample obtained after
5000iterations started at one point.
Hesterberg,T.(1998)Weightedaverageimportan esamplinganddefensivemixturedistributions. Te hnometri s37,185{194.
Hodgson,M.E.A.(1999)ABayesianrestorationofaion hannelsignal. JRSSB61(1),95{114. Hodgson, M.E.A.andGreen, P.G. (1999)Bayesian hoi eamongMarkov models ofion hannels
usingMarkov hainMonteCarlo. Pro . RoyalSo ietyLondonA455,3425{3448.
Kim,S.,Shephard,N.andChib,S.(1998)Sto hasti volatility: Likelihoodinferen eand ompar-isonwithARCHmodels. Rev. E onom. Stud. 65,361{393.
Liu,J.S.(2001)MonteCarlo Strategies inS ienti Computing. Springer-Verlag,NewYork. Liu, J.S., Liang, F. and Wong, W.H. (2001) A theory for dynami weighting in Monte Carlo
omputation. JASA96,561{573.
Ma Ea hern,S.N.andPeruggia,M.(2000)Importan elinkfun tionestimationforMarkovChain MonteCarlomethods. J.Comput. Graph. Statist. 9,99{121.
Mengersen,K.L.andRobert,C.P.(2003)Iidsamplingwithself-avoidingparti lelters: thepinball sampler. In Bayesian Statisti s 7, J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D.He kerman,A.F.M.SmithandM. West(Eds.),OxfordUniversityPress(toappear). Meyn, S.P. and Tweedie, R.L. (1993) Markov Chains and Sto hasti Stability. Springer-Verlag,
NewYork.
Oh,M.S. andBerger, J.O. (1992)Adaptive importan e samplinginMonteCarlo integration. J. Statist. Comput. Simul. 41,143{168.
Oh,M.S.andBerger,J.O.(1993)Integrationofmultimodalfun tionsbyMonteCarloimportan e sampling. JASA88,450{456.
Pitt,M.K.andShephard,N.(1999)Filteringviasimulation: auxiliaryparti lelters. JASA 94, 1403{1412, 1403{1412
Robert,C.P.andCasella, G.(1999)MonteCarloStatisti alMethods. Springer-Verlag,NewYork. Rubin,D.B.(1987)MultipleImputationforNonresponseinSurveys. JohnWiley,NewYork Stavropoulos, P.and Titterington, D.M. (2001) Improvedparti le lters and smoothing. In
Se-quentialMCMCinPra ti e,Dou et,A.,deFreitas,N.,andGordon,N.(Eds),Springer-Verlag, NewYork.
Warnes,G.R.(2001)TheNormalKernelCoupler: AnadaptiveMarkovChainMonteCarlomethod for eÆ ientlysamplingfrommulti-modeldistributions. Te h. report395,Universityof Wash-ington.