Population Monte Carlo

(1)

Population Monte Carlo

Olivier Capp ´e

CNRS / ENST D ´epartement TSI, Paris

Arnaud Guillin

Jean–Michel Marin

CEREMADE, Universit ´e Paris Dauphine, Paris

Christian P. Robert

y

CEREMADE, Universit ´e Paris Dauphine, and CREST, INSEE, Paris

Summary. Importance sampling methods can be iterated like MCMC algorithms, while being

more robust against dependence and starting values, as shown in this paper. The population

Monte Carlo principle we describe here consists of iterated generations of importance samples,

with importance functions depending on the previously generated importance samples. The

advantage over MCMC algorithms is that the scheme is unbiased at any iteration and can thus

be stopped at any time, while iterations improve the performances of the importance function,

thus leading to an adaptive importance sampling. We first illustrate this method on a toy mixture

example with multiscale importance functions. A second example reanalyses the ion channel

model of Hodgson (1999), using an importance sampling scheme based on a hidden Markov

representation, and compares population Monte Carlo with a corresponding MCMC algorithm.

Keywords: Adaptive algorithm, degeneracy, hidden semi-Markov model, importance sampling,

ion channel model, MCMC algorithms, mixture model, multiple scales, particle system, random

walk, unbiasedness.

1. Introduction

When reviewing the literature on MCMC methodology, a striking feature is that it has predominantly fo ussed on produ ing single outputs from a given target distribution, . Thismaysoundaparadoxi alstatementwhen onsideringthatoneofthemajorappli ations ofMCMCalgorithmsistheapproximationofintegrals

I= Z

h(x)(x)dx

withempiri alsums

^ I= 1 T T X t=1 h(x (t) ); where(x (t)

)isaMarkov hainwithstationarydistribution. Butthemainissueisthatis onsideredasthelimitingdistributionofx

t

perseandthattheMarkov orrelationbetween

yAddress for orresponden e: CEREMADE, UniversiteParis Dauphine, 16pla edu Mare hal deLattredeTassigny,75775Paris edex16

(2)

thex t

'siseva uatedthroughtheergodi theorem(Meyn andTweedie,1993). There only exist a few referen es to the use of MCMC algorithms for the produ tion of samples of size n from , in luding Warnes (2001) and Mengersen and Robert (2003), although the on eptofsimulationfromaprodu tdistribution

N n

isnotfundamentallydierentfrom theprodu tionofasingleoutputfromthetargetdistribution.

AnotherstrikingfeatureintheMCMCliteratureistheearlyattempttodisso iateitself frompre-existingte hniquessu h asimportan esampling, althoughthelattersharedwith MCMCalgorithmsthepropertyofsimulatingfrom thewrongdistribution to produ e ap-proximategenerationfromthe orre tdistribution(seeRobertandCasella,1999,Chap.3). Itisonlylatelythattherealisationthatbothapproa hes anbesu essfully oupled ame upontheMCMC ommunity, asshownfor instan eby Ma Ea hernandPeruggia(2000), Liu(2001), orLiu et al. (2001). One learexample of this fruitfulsymbiosis is given by iteratedparti lesystems(Dou etetal.,2001). Originally,iteratedparti lesystemswere in-trodu edtodealwithdynami targetdistributions,asforinstan einradartra king,where theimperativesofon-linepro essingofrapidly hangingtargetdistributionsprohibitedto resortto repeatedMCMC sampling. Fundamentally, the basi idea, from aMonte Carlo pointof view, onsistsin re y lingpreviousweightedsamplesprimarily througha modi- ationoftheweights(Gordonetal.,1993),possiblyenhan edbyadditionalsamplingsteps (Berzuinietal.,1997; PittandShephard,1999;GilksandBerzuini,2001). Aspointedout inChopin(2002),aparti lesystemsimpliesintoaparti ulartypeofimportan esampling s hemeinastati |asopposedtodynami |setup,wherethetargetdistribution isxed, whi his thesettingwe onsider here.

We thus study in this paper a method, alled population Monte Carlo, that aims at simulating from the target distribution

N n

and that tries to link these dierent \loose ends"intoa oherentsimulationprin iple: PopulationMonteCarloborrowsfrom MCMC algorithmsforthe onstru tionoftheproposal,fromimportan esamplingforthe onstru -tionof appropriateestimators,from SIR(Rubin, 1987)for sampleequalisation, andfrom iterated parti le systems for sample improvement. The population Monte Carlo (PMC) algorithm is in essen e aniterated importan e sampling s hemethat simultaneously pro-du es,at ea h iteration,asampleapproximatelysimulatedfrom atargetdistribution and (approximately) unbiased estimates

^

I of integralsI under that distribution. The sample is onstru ted using sampledependentproposalsfor generationand importan e sampling weightsforpruningtheproposedsample.

Wedes ribein Se tion2thepopulationMonteCarlote hnique,andapplythese devel-opment,rsttoasimplemixtureexampleinSe tion 3,and se ondtothemoreambitious ion hannelmodelthatweassessinSe tion4. Whilereasonablein omplexity,themixture examplestill oersaninterestingmedia to assessthe adaptivityof thepopulationMonte Carlosamplerandtheresistan etodegenera y. Theion hannelmodelismore hallenging inthat thereis no losedform representationoftheobservedlikelihood,whilethe omple-tionstepismoredeli atethanin mixturesettings. Inparti ular, Se tion4.6explainswhy aMetropolis{Hastings algorithm basedon the same proposal aspopulationMonte Carlo doesnotwork.Se tion 5 ontainstheoverall on lusionsof thepaper.

(3)

Population Monte Carlo

3

2. The population Monte Carlo approach

AsnotedinMengersenandRobert(2003),itispossibleto onstru tanMCMCalgorithm asso iatedwiththetargetdistribution

N n (x 1 ;:::;x n )= n Y i=1 (x i ); (1) onthe spa e X n

, rather than with the distribution (x 1

), onthe spa e X. All standard results and s hemes for MCMC algorithms apply in this parti ular ase, and irredu ible Markov hainsasso iatedwithsu h MCMCalgorithms onvergetothetarget

N n

in dis-tribution,thatis,getapproximatelydistributedasaniidsamplefromaftera\suÆ ient" number of iterations. Mengersen and Robert (2003) point out that additional sampling devi es an be used to onstru t the proposal distributions, like Gibbs-type omponent-wiserepulsiveproposalsthatex ludeimmediateneighbourhoodsoftheotherpointsin the sample.

When onsidering,at MCMCiterationt,asamplex (t) =(x (t) 1 ;:::;x (t) n ), we anthink

ofprodu ing thenextiteration of thesample x (t+1)

su h that the omponents x (t+1) i

are

generatedfrom aproposalq(xjx (t) i

). However,rather thana eptingea hproposedx (t+1) i individually (whi h would bea standardform of parallel MCMC sampling) orthe whole sample x

(t+1)

globally (whi h would suer from the urse of dimensionality), we an al-together remove the issue of assessing the onvergen e to the stationary distribution by orre tingatea hiterationfortheuseofthewrongdistribution byimportan e weighting.

Thus, instead of using an asymptoti justi ation to an MCMC iterated simulation s heme, we an instead resort to importan e sampling arguments: if the sample x

(t) is

produ edbysimulatingthex (t) i

'sfromdistributionsq it

,independentlyofoneanother,and

ifweasso iatetoea hpointx (t) i

ofthis sampletheimportan e weight

% (t) i = (x (t) i ) q it (x (t) i ) ; i=1;:::;n;

estimatorsoftheform

I t = 1 n n X i=1 % (t) i h(x (t) i )

areunbiasedforeveryintegrablefun tionhandatevery iterationt.

This isthe startingpoint forpopulation Monte Carlo methods, namelythat extending regular importan e sampling te hniques to ases where the importan e distribution for

x (t) i

may depend on both the sample index i and the iteration index t does not modify theirvalidity. Asalreadyindi ated in Robertand Casella(1999,Lemma 8.3.1)in amore restri tivesetting, importan e sampling estimatorshavethe interestingproperty that the

terms% (t) i h(x (t) i

) areun orrelated, even whenthe proposal q it

depends onthe whole past

oftheexperiment: assumingthatthevarian esvar % (t) i h(x (t) i )

exist forevery1in,

whi hmeansthat theproposalsq it

shouldhaveheaviertailsthan, wehave

var( I t )= 1 n 2 n X var % (t) i h(x (t) i ) ; (2)

(4)

duetothe an elingee toftheweights% (t) i

.

Obviously,inmostsettings,thedistributionofinterestisuns aledandwehavetouse instead % (t) i / (x (t) i ) q it (x (t) i ) ; i=1;:::;n;

s aledsothattheweights% (t) i

sumupto1. Inthis ase,theaboveunbiasednessproperty and the varian ede omposition are lost, although they approximately hold. In fa t, the estimationofthenormalising onstantofimproveswithea hiterationt,sin etheoverall average $ t = 1 tn t X =1 n X i=1 (x () i ) q i (x () i ) (3)

isa onvergentestimatoroftheinverseofthenormalising onstant. Therefore,astin reases, $

t

is ontributing less and less to the variability of I t

and the aboveproperties an be onsidered asholdingfor t largeenough. Inaddition, if thesum(3) is onlybasedon the (t 1) rstiterations,thatis, if

% (t) i = (x (t) i ) $ t 1 q it (x (t) i ) ;

thevarian ede omposition (2) anbere overed,viathesame onditioningargument. A related point is that attentionmust bepaid to the sele tion ofthe proposals q it

so that the normalising onstants in these densities (or at least the part that depend oni) mustbeavailable in losedform.

AspointedoutbyRubin(1987)forregularimportan esampling,itispreferable,rather

thantoupdatetheweights% (t) i

atea hiterationt,toresample(withrepla ement)nvalues

y (t) i from(x (t) 1 ;:::;x (t) n

)usingtheweights% (t) i

(and possiblythe varian eredu tiondevi e ofsystemati sampling,asin Carpenteret al.,1998). This partiallyavoidsthedegenera y phenomenon, that is, the preservation of negligible weights and orresponding irrelevant

pointsin thesample. The sample(y (t) 1

;:::;y (t) n

) resultingfrom this sampling importan e resampling(SIR)stepisthusakinto aniidsample extra tedfromtheweightedempiri al distributionasso iatedwith

N n (x 1 ;:::;x n ).

Apseudo- oderenderingofthePMCalgorithmisasfollows

PMCA: PopulationMonteCarlo Algorithm

For

t=1;:::;T

For

i=1;:::;n

,

(i) Select the generating distribution

q it ()

(ii) Generate

x (t) i q it (x)

(iii) Compute

% (t) i =(x (t) i )=q it (x (t) i )

Normalise the

% (t)

’s to sum up to

1

(5)

Population Monte Carlo

5 Resample

n

values from the

x

(t)

i

’s with replacement, using the weights

% (t) i

, to create

the sample

(x (t) 1 ;:::;x (t) n )

Step(i)inthisrepresentationisstressedbe ausethisisanessentialfeatureofthePMC algorithm: the proposal distributions an be individualized at ea h step of thealgorithm withoutjeopardisingthevalidityofthemethod. Theproposalsq

it

anthereforebepi ked a ordingtotheperforman esofthepreviousq

i(t 1)

'sand,in parti ular,they andepend

ontheprevioussample(x (t 1) 1

;:::;x (t 1) n

)orevenonallthepreviouslysimulatedsamples, if storage allows. For instan e, in the mixture setting of Se tion 3, the q

it

's are random

walk proposals entered at the x (t 1) 1

's, with various possible s ales hosen from earlier performan es,andthey ouldalsoin ludelargetailsproposalsasinthedefensivesampling strategyofHesterberg(1998),to ensurenitevarian e. Similarly,Warnes (2001)usesthe previoussampletobuild akernelnon-parametri approximationto .

Thefa t that theproposaldistribution q i(t 1)

andepend onthepastiterationinany

possiblewaywithoutmodifyingtheweight% (t) i

isdue tothefeature thattheunbiasedness equation E h % (t) i h(x (t) i ) i = Z Z (x) q it (x) h(x)q it (x)dxg()d = Z Z h(x)(x)dxg()d =E [h(X)℄;

where denotestheve torofpastrandomvariatesthat ontributetoq it

,doesnotdepend onthedistributiong()ofthisrandom onstituent.

There aresimilaritiesbetweenPMCandearlierproposalsin theparti lesystem litera-ture,in parti ularwithBerzuiniet al.'s(1997)andGilksandBerzuini(2001),sin ethese authors also onsider iterated samples with (SIR) resampling steps based on importan e weights. Amajordieren ethough(besidestheirdynami settingofmovingtarget distri-butions) is that they remain within the MCMC realm by using theresample step before theproposal move. These authors are thus for ed to use Markovtransition kernelswith givenstationarydistributions. ThereisalsoasimilaritywithChopin(2002)who onsiders iteratedimportan esamplingwith hangingproposals. Hissettingisaspe ial aseofPMC inaBayesianframework,wheretheproposalsq

it

aretheposteriordistributionsasso iated withaportionk

t

oftheobserveddataset(andarethusindependentofiandoftheprevious samples).

Asnotedearlier,amostnoti eablepropertyofthePMCmethodisthat thegenerality in the hoi e of theproposaldistributions q

it

is due to the relinquishmentof the MCMC framework. Indeed,without the importan e resampling orre tion, a regularMetropolis{ Hastings a eptan e step for ea h point of the n-dimensional sample produ es aparallel MCMC sampler whi h simply onverges to the target

N n

in distribution. Similarly, a regularMetropolis{Hastingsa eptan estep forthe whole ve torx

(t)

onvergesto N

n ; theadvantageinprodu inganasymptoti approximationtoaniidsampleisbalan edbythe drawba kthatthea eptan eprobabilityde reases approximatelyasapowerofn. Sin e, in PMC,wepi kat ea h iterationthepointsin thesamplea ordingto theirimportan e

weight% (t) i

,webothremovethe onvergen eissueand onstru tasele tionme hanismover boththepointsoftheprevioussampleandtheproposaldistributions. Thisisnotsolelya

(6)

that aMetropolis{Hastingss hemebasedonthe sameproposaldoesnotwork well,while aPMCalgorithmprodu es orre tanswers.

The PMC framework thus allowsfor a mu h easier onstru tion of adaptive s hemes, i.e.ofproposalsthat orre tthemselvesagainstpastperforman es,thaninMCMCsetups. Indeed,whileadaptiveimportan esamplingstrategieshavealreadybeen onsideredinthe pre-MCMCarea,asin,e.g.,OhandBerger (1992,1993),theMCMCenvironmentis mu h harsherfor adaptive algorithms, be ausethe adaptivity an els the Markovian nature of thesequen eandthus allsformoreelaborate onvergen estudies toestablishergodi ity. See,e.g.,AndrieuandRobert(2001)andHaarioetal. (1999,2001)forre entdevelopments inthisarea. ForPMCmethods,ergodi ityisnotanissuesin ethevalidityisobtainedvia importan esamplingjusti ations.

The samples produ ed by the PMC method an be exploited as regular importan e samplingoutputsat anyiterationT,andthus donotrequirethe onstru tionofstopping rules as for MCMC samples (Robert and Casella, 1999, Chap. 8). Quite interestingly though,thewholesequen eofsamples anbeexploited,bothforadaptationoftheproposals andforestimationpurposes,asillustratedwiththe onstantapproximation(2). Thisdoes notrequireastati storageofallsamplesprodu edthough,sin eapproximationslike(2) an beupdateddynami ally. Inaddition,thispossibilitytoexploitthewholesetofsimulations impliesthat thesample size nis not ne essarilyverylarge, sin ethe ee tive simulation size isnT. A last remarkis that thenumberofpointsin thesampleis notne essarily onstantoveriterations. As in Chopin (2002), onemayin rease thenumber ofpointsin thesampleon ethealgorithmseemstostabilisein astationaryregime.

3. Mixture model

Ourrstexample isaBayesianmodelling ofamixture model, whi h isaproblem simple enoughto introdu ebut omplexenoughto leadtopoorperforman esforbadly designed algorithms(RobertandCasella,1999, Chap.9;Cappeetal.,2003). Themixtureproblem we onsiderisbasedonaniidsamplex=(x

1 ;:::;x

n

)from thedistribution

pN( 1 ; 2 )+(1 p)N( 2 ; 2 );

where both p6=1=2and >0 areknown. Theprior asso iated with thismodel, , is a normalN(; 2 =)prioronboth 1 and 2

. Wethus aimatsimulatingfromtheposterior distribution ( 1 ; 2 jx)/f(xj 1 ; 2 )( 1 ; 2 ):

Althoughthe\standard"MCMCresolutionofthemixtureproblemistouseaGibbssampler basedonadataaugmentationstepviaindi atorvariables,re entdevelopments(Celeuxet al., 2000; Chopin, 2002; Cappeet al., 2003)haveshownthat thedata augmentation step isnotne essarytorunanMCMCsampler. WewillnowdemonstratethataPMCsampler anbeeÆ ientlyimplementedwithoutthisaugmentationstepeither.

OurPMCalgorithm isadaptivein thefollowingsense: Theinitializationstep onsists rst in hoosing a set of initial values for

1 and

2

(e.g., a grid of points around the empiri almeanofthex

i

's). Theproposalsarethenrandomwalks,thatis,randomisotropi perturbationsofthepointsofthe urrentsample. Asnotedabove,averyappealingfeature ofthePMCmethodisthattheproposalmayvaryfromonepointofthesampletoanother without jeopardizing the validity of the method. At a rst level, the proposals are all

(7)

Population Monte Carlo

7

level, we an also hoose dierent varian es for these normal distributions, for instan e within apredetermined setof ps alesv

i (1ip)ranging from 10 3 downto 10 3 ,and sele t these varian es at ea h step of the PMC algorithm a ordingto the performan es of the s ales on the previous iterations. In our implementation, we de ided to sele t a s aleproportionallytoitsnon-degenera yrateonthepreviousiterations. (Notetheformal similarityofthiss heme withStavropoulosand Titterington's(1999)smooth bootstrap, or adaptive importan esampling, andWarnes'(2001)kernel oupler, whenthekernelusedin theirmixtureapproximationof isnormal. Themaindieren eisthatwedonotaimata goodapproximationof usingstandardkernelresultslikebandwidthsele tion,but rather keepthedierents alesv

i

overtheiterations.) OurPMCalgorithmthus looksasfollows:

(a) generate a sample of size

r k

as

( 1 ) (i) j N ( 1 ) (i 1) j ;v k

and

( 2 ) (i) j N ( 2 ) (i 1) j ;v k

(b) compute the weights

% j / f x ( 1 ) (i) j ;( 2 ) (i) j ( 1 ) (i) j ;( 2 ) (i) j ' ( 1 ) (i) j ( 1 ) (i 1) j ;v k ' ( 2 ) (i) j ( 2 ) (i 1) j ;v k

Resample the

( 1 ) (i) j ;( 2 ) (i) j j

using the weights

% j

,

For

k=1;:::;p

,

update

r

k

as the number of elements generated with variance

v

k

which have

been resampled.

where'(qjs;v)isthedensityofthenormaldistribution withmeansandvarian ev atthe pointq.

Asmentionedabove,theweightasso iatedwithea hvarian ev k

isthusproportionalto theregeneration(orsurvival)rateofthe orrespondingsample. Ifmost

j

'sasso iatedwith agivenv

k

arenotresampled,thenextstepwill seelessgenerationsusingthisvarian ev k

. However,to avoidthe ompleteremovalof agivenvarian ev

k

,wemodiedthealgorithm toensurethat aminimumnumberofpointsissimulatedfromea hvarian elevel,namely

(8)

Theperforman esoftheabovealgorithmareillustratedonasimulateddatasetof1000 observationsfromthedistribution0:2N(0;1)+0:8N(2;1). Wealsotook=1and=0:1 ashyperparametersof the prior. Applying the abovePMC algorithm to this sample, we see that it produ es a non-degenerate sample, that is, not restri ted to n repli ations of thesamepoint. Inaddition,theadaptivefeaturefor hoosingamongthev

k

'sisdenitely helpful toexplore thestatespa e oftheunknown means. Inthis ase,p=5and theve varian esareequalto5;2;:1;:05and:01. Moreover,at ea h stepi ofthePMCalgorithm, wegeneratedn=1050samplepoints.

ThetwouppergraphsofFigure1illustratethedegenera yphenomenonasso iatedwith thePMCalgorithm: theyrepresentthesizes ofthesamplesissued from thedierent pro-posals,thatis,thenumberofdierentpointsresultingfromtheresamplingstep: theupper left graph exhibits anearly y li behaviorfor the largest varian esv

k

, alternatingfrom nopointissued from these proposals to a largenumber of points. This behaviour agrees withintuition: proposalsthat havetoolargeavarian emostlyprodu epointsthat are ir-relevantforthedistributionofinterest,buton einawhiletheyhappentogeneratepoints that are lose to one of the modes of the distribution of interest. In the later situation, the orrespondingpointsareasso iatedwith largeweights%

j

andare thus heavily resam-pled. The upperrightgraphshowsthat theother proposalsare ratherevenly onsidered along iterations. This is not surprising for the smaller varian es, sin e theymodify very little the urrent sample, but the y li predominan e of the three possible varian es is quitereassuringaboutthemixingabilitiesofthealgorithmandthusaboutitsexploration performan es.

We analsostudy the in uen e ofthe variationin theproposals onthe estimationof the means

1 and

2

, as illustrated by the middle and lowerpanels of Figure 1. First, when onsideringthe umulativemeansoftheseestimationsoveriterations,thequantities qui klystabilise. The orrespondingvarian esarenotso stableoveriterations,butthis is tobeexpe ted,giventheregularreappearan eofsubsamples withlargevarian es.

Figure2providesanadditionalinsightintotheperforman esofthePMCalgorithm,by representingaweightedsampleofmeanswithdotsproportionaltotheweights. Asshould be obviousfrom thisgraph,there is nooverwhelmingpointthat on entratesmostof the weight. On theopposite, the 1050points are ratherevenlyweighted, espe ially forthose loseto theposteriormodesofthemeans.

NotethatabetterPMCs heme ouldbe hosen. Theapproa hsele tedforthisse tion does not take advantageof the latent stru ture of the model, as in Chopin (2002), and ontrarytothefollowingse tion. Indeed,afteraninitializationstep,one ouldrstsimulate the latent indi ator variable onditionally on the previous sample of (

1 ;

2

) and then simulate anew sampleof means onditionally on thelatent indi ator variables. Iterating thisPMCs hemeseemsto onstituteasortofparallelGibbssampling, butthiss hemeis valid at any iteration and anthus bestopped at any time. That wehave notused this approa histoemphasizethat thePMCmethodhasnorealneedofthelatentstru tureof themodeltoworksatisfa torily.

4. Ion channels

4.1. The stylised model

(9)

Population Monte Carlo

9

0

100

200

300

400

500

0

200

600 Iterations

Resampling

0

100

200

300

400

500

0

200

600 Iterations

Resampling

0

100

200

300

400

500

0

200

600 Iterations

Resampling

0

100

200

300

400

500

0

200

600 Iterations

Resampling

0

100

200

300

400

500

0

200

600 Iterations

Resampling

0

100

200

300

400

500

0

1

2

3

4 Iterations

Var

(

µ

1 )

0

100

200

300

400

500

0.05

0.15

0.25 Iterations

µ

1

0

100

200

300

400

500

0

1

2

3

4 Iterations

Var

(

µ

2 )

0

100

200

300

400

500

2.010

2.020 Iterations

µ

2 Fig. 1. Performances of the mixture PMC algorithm: (upper left) Number of resampled points for

the variances

v1 =5

(darker) and

v2 = 2

; (upper right) Number of resampled points for the other

variances,

v

3

=0:1

is the darkest one; (middle left) Variance of the simulated

1

’s along iterations;

(middle right) Complete average of the simulated

1

’s over iterations; (lower left) Variance of the

simulated

2

’s along iterations; (lower right) Complete average of the simulated

2

’s over iterations.

wellastoBall etal. (1999)andCarpenteret al. (1999),forabiologi almotivationofthis model,alternativeformulations,andadditionalreferen es. Letusinsistatthispointonthe formalisedaspe t onourmodel, whi hpredominantly servesasarealisti supportfor the omparisonofaPMCapproa hwithamorestandardMCMCapproa hinasemi-Markov setting. The ner points of model hoi e and model omparison for themodelling ofion hannel kineti s, while of importan e as shown by Ball et al. (1999) and Hodgson and Green (1998), are not addressedby the presentpaper. Note also that, while a Bayesian analysisofthis modelprovidesa ompleteinferentialperspe tive,thefo usofattentionis generallysetontherestorationofthetrue hannel urrent,ratherthanontheestimation oftheparametersofthemodel.

Consider, thus, observables y = (y t

) 1tT

dire ted by a hidden Gamma (indi ator) pro essx=(x

t )

1tT

inthefollowingway:

y t jx t N( x t ; 2 ); whilex t 2f0;1g,withdurationsd j Ga(s i ; i

)(i=0;1). Moreexa tly,thehiddenpro ess (x) is a( ontinuous time) Gammajump pro esswith jump timest (j =1;2;:::) su h

(10)

−0.4

−0.2

0.0

0.2

0.4

1.6

1.8

2.0

2.2

2.4 µ

1 µ

2 Fig. 2. Representation of the log-posterior distribution via grey levels (darker stands for lower and

lighter for higher) and of a weighted sample of means. (The weights are proportional to the surface

of the circles.)

(11)

Population Monte Carlo

11

that d j+1 =t j+1 t j Ga(s i ; i ) ifx t =ifort j t<t j+1 , that is,E[d j+1 ℄=s i = i

. Figure3providesasimulatedsample ofsize4000fromthismodel.

0 1000

2000

3000

4000

−2

0

2

4 t

y

Fig. 3. Simulated sample of size

4000

from the ion channel model

A rstmodi ationofHodgson's(1999) ion hannel model is introdu ed at thislevel: weassumethatthedurationsd

j

,thatis,thetimeintervalsin whi hthepro ess(x t

) 1tT remainsin agiven state,areintegervalued, ratherthan realvalued. Thereasonsforthis hangearethat

(a) thetruedurationsoftheGammapro essarenotidentiable;

(b) thismodel isastraightforwardgeneralisationofthehiddenMarkovmodelwherethe jumps do o ur at integer times (see Ball et al., 1999, or Carpenter et al., 1999). A natural generalisation of the geometri duration of the hidden Markov model is a negative binomial distribution, Neg(s;!), whi h is very lose to a Gamma den-sity Ga(s+1; log(1 !)) (up to a onstant) for s small. Indeed, the former is approximately d s s! (1 !) d ! 1 ! s

whilethelateris

d s s! (1 !) d f log(1 !)g s+1

(The simulations detailed below were also implemented using a negative binomial modelling, leadingtoverysimilarresultsintherestorationpro ess.)

( ) inferen e on the d j 's given (x t ) 1tT

involves an extra level of simulations, even if it an be easily implemented via a sli e sampler, as long as we do not onsider thepossibility ofseveral jumpsbetweentwointegerobservationaltimes. (This later possibilityisa tuallynegligibleforthedatasets we onsider.); and

(d) therepla ementofd j

byitsintegralpartdoesnotstronglymodifythelikelihood.

Inasimilarvein,weomitthe ensoringee tofboththerstandthelast intervals,given thatthein uen eofthis ensoringonalongseriesisbound tobesmall.

Ase ond modi ationofHodgson's(1999)modelisthatwe hooseauniformpriorfor the shape parameterss

0 and s

1

on the nite set f1;:::;Sg, rather than an exponential +

(12)

(a) thehiddenMarkovpro esshasgeometri swit hingtimes,whi h orrespondto expo-nentialdurations. Anaturalextension isto onsiderthat thedurationsof thestays within ea hstate(orregime) anberepresentedasthe umulateddurationofs

i ex-ponential stays, with s

i

an unknown integer, whi h exa tly orresponds to gamma durations. This representation thus removesthe need to all for alevelof variable dimension modelling. Carpenter et al. (1999) and Hodgson and Green (1999) use a dierent approa h, based on the repli ation of the \open"and " losed" sets into severalstates,toapproximate thesemi-Markovmodel.

(b) thefollowingsimulationsshowthat theparameterss 0

and s 1

arestronglyidentied bytheobservables(y

t )

1tT ;

( ) thepriorinformationontheparameterss 0

ands 1

ismostlikelytobesparseandthus auniformpriorislessinformativethanaGammapriorwhenS islarge;and

(d) theuseofanitesupportpriorallowsforthe omputationofthenormalising onstant intheposterior onditionaldistributionoftheparameterss

0 ands

1

,afeaturethatis paramountfortheimplementationofPMC.

A third modi ation,when omparedwith bothHodgson(1999) andCarpenter etal. (1999),isthat theobservablesareassumedto beindependent, giventhex

t

's,rather than distributed from either an AR(15) (Hodgson, 1999) oran ARMA(1,1) (Carpenter et al., 1999)model. Thismodi ationsomehowweakenstheidentiabilityofbothregimesasthe databe omespotentiallymorevolatile.

TheotherparametersofthemodelaredistributedasinHodgson(1999),using onjugate priors, 0 ; 1 N( 0 ; 2 ) 2 G(;) 0 ; 1 G(;)

Figure4illustrates thedependen esindu edbythismodellingonaDAG.

1 y

0 µ

µ

₁

τ

2 σ

ζ

η

α

β

S

λ

x

0

1

0 Fig. 4. DAG representation of the probabilistic dependences in the Bayesian ion channel model.

This formalised ion hannelmodelis thus aspe ial ase ofdis rete time hidden semi-Markovmodelforwhi h thereexists noexpli itpolynomialtimeformulafortheposterior distribution of the hidden pro ess (x

t )

1tT

, as opposed to the hidden Markov model with the forward{ba kward formula of Baum and Petrie (1966). From a omputational (MCMC) point of view, there is therefore no way of integrating this hidden pro ess out tosimulatedire tlytheparameters onditionalon theobservables(y) , aswasdone

(13)

Population Monte Carlo

13

in thehidden Markovmodelby, e.g.,Cappe etal. (2003). Note alsothat, asopposed to Hodgson(1999), we use the saturated missing data representation of the model via x to avoidthe ompli ationof usingreversiblejump te hniques forwhi h PMCalgorithmsare morediÆ ulttoimplement.

4.2. Population Monte Carlo for ion channel models

Our hoi e of proposal fun tion is based on the availability of losed form formulas for thehidden Markov model. We thus reate a pseudohidden Markov model based on the urrentvaluesoftheparametersfortheion hannelmodel,simplybybuildingtheMarkov transitionmatrixfrom theaveragedurationsinea hstate,

P= 1 0 s0 0 s0 1 s1 1 1 s1 ! ;

sin e,forahiddenMarkovmodel,theaveragesojournwithinonestateisexa tlytheinverse ofthetransitionprobabilitytotheotherstate. Wedenoteby

H

(xjy ;!)thefull onditional distributionofthehiddenMarkov hainxgiventheobservablesy andtheparameters

!=( 0 ; 1 ;; 0 ; 1 ;s 0 ;s 1 )

onstru tedvia theforward{ba kwardformula: see, e.g., Cappeet al. (2003)for details. Thesimulationoftheparameters!pro eedsinanaturalwaybyusingthefull onditional distribution(!jy ;x) sin e it is available. In order not to onfuse theissues, we do not onsiderthepossibleadaptationoftheapproximationmatrixPovertheiterations,thatis, amodi ationoftheswit hprobabilitiesfrom

i =s

i

(i=1;2).

Note that Carpenter et al. (1999) also onsider the ion hannel model in their par-ti le lter paper, with the dieren es that they repla e the semi-Markov stru ture with an approximative hidden Markovmodel with morethan 2 states, and that theywork in a dynami setting based on this approximation. The observables y are also dierent in thatthey omefromanARMA(1,1)modelwithonlythelo ationparameterdependingon theunknownstate. HodgsonandGreen (1998)similarly ompared severalhiddenMarkov modelwithdupli ated\open"and\ losed"states. Balletal. (1999)alsorelyonahidden Markovmodellingwithmissingobservations.

Thesubsequentuseofimportan esamplingbypassestheexa tsimulationofthehidden pro ess(x

t )

1tT

andthusavoidsthere ourseto variable dimensionmodelsand tomore sophisti atedtoolslikereversiblejumpMCMC. Thissaturationoftheparameterspa eby theadditionof thewholeindi ator pro ess(x

t )

1tT

isobviouslymore ostly intermsof storage,butitprovidesunrestri tedmovesbetween ongurationsofthepro ess(x

t )

1tT . Sin e we do not need to dene the orresponding jump moves, we are thus less likely to en ountertheslow onvergen eproblemsofHodgson(1999).

WethereforerunPMCasin thefollowingpseudo- oderendering,whereI denotes the numberofiterations,T beingusedforthenumberofobservations:

(14)

Ion hannel PMC Step0: Initialisation

For

j=1;:::;n

,

(i) generate

(! (j) ;x (j) )(!) H (xjy ;! (j) )

(ii) compute the weight

%

j /(! (j) ;x (j) jy ) =(! (j) ) H (x (j) jy ;! (j) )

Resample the

(! (j) ;x (j) )

j

using the weights

% j Stepi: Update(i=1;:::;I)

For

j=1;:::;n

,

(i) generate

(! (j) ;x (j) + )(!jy ;x (j) ) H (xjy ;! (j) )

(ii) compute the weight

%

j /(! (j) ;x (j) + jy ) Æ (! (j) jy ;x (j) ) H (x (j) + jy ;! (j) )

Resample the

(! (j) ;x (j) + )

j

using the weights

%j, and take x (j) = x (j) +

(

j = 1;:::;n

).

Thejusti ation forthe weights% j

used in theabovealgorithmis that onditional on

the x (j)

's, ! (j)

is simulated from (!jy ;x (j) ) and, onditional on ! (j) , x (j) + is simulated from H (xjy ;! (j)

). Thenormalisingfa torofthe% j

's onvergestothe orre t onstantby thelawoflargenumbers.

4.3. Normalising constants

Letusstressthespe i ityofthePMCmethodintermsofnormalising onstants: (!jy ;x) isavailablein losedform(seebelowinSe tion4.4),in ludingitsnormalising onstant,due to the onjuga yof the distributions on

0 ; 1 ;; 0 ; 1

and the niteness ofthe support of s

0 ;s

1

. The onditional distribution H

(xjy ;!) is also available with its normalising onstant,byvirtueoftheforward{ba kwardformula. TheonlydiÆ ultyintheratio

(!;xjy )

(!jy ;x) H

(xjy ;!)

lieswithinthenumerator(!;xjy )whosenormalisedversionisunknown. Wethereforeuse insteadtheproportionalterm

(!;xjy )/(!)f(y jx;!)f(xj!): (4)

and normalisethe % j

'sby theirsum. The foremost featureof this reweightingis that the normalising onstantmissingin(4)onlydependsontheobservablesyandisthereforetruly

a onstant, that is, doesnotdependon thepreviousvalueof thepoint x (j)

. Thiss heme ru iallyrelies on (i)thepoints en ompassingboththeparameters! and thelatentdata

(15)

Population Monte Carlo

15

4.4. Simulation details

Figure 5 illustrates the performan es of PMC by representing the graph of the dataset againstthettedaverage

J X j=1 % j x (j) t

forea hobservationy t

. Asobviousfrom thepi ture,thetisquitegood.

0 1000

2000

3000

4000

−2

0

2

4 −3

−2

−1

0

1

2

0.0

0.3

0.6

0 1000

2000

3000

4000

0.0

0.6 Fig. 5. (top) Histograms of residuals after fit by averaged

xt

; (middle) Simulated sample of size

4000

against fitted averaged

xt

; (bottom) Probability of allocation to first state for each observation

TheunobservedGammapro essisdistributedas

M Y m=1 (t m+1 t m ) sm 1 sm m e m(tm+1 tm) (s m ) = n 0 s 0 0 e 0 v 0 s 0 1 0 (s 0 ) n0 n 1 s 1 1 e 1 v 1 s 1 1 1 (s 1 ) n1 ;

withobviousnotations: Misthenumberof hanges,thet m

'sarethesu essivetimeswhen thegammapro ess hanges state,thes

m 's,

m

'sarethe orrespondingsequen esofs 0 ;s 1 and 0 ; 1 ,n i

isthenumberofvisitsto statei, i

istheprodu tofthesojourndurations in state i [ orresponding to the geometri mean℄, v

i

thetotal sojourn durationin statei [ orrespondingtothearithmeti mean℄. (Thisisbasedontheassumptionofno ensoring, madeinSe tion4,namelythatt

1

=1andt M+1

=T +1.) Theposteriordistributionsonthe

i

'sand 2

[ onditionalonthehiddenpro ess℄are thusthestandardNormal-Gamma onjugatepriorswhile

i js i ;x Ga(+n i s i ;+v i ) s i jx (s i jx)/ i (+v i ) n i s i (n i s i +) (s i ) n i I f1;2;:::;Sg (s i )

Therefore, ex eptfor the s i

(16)

Thedistributiononthes i

'sishighlyvariable,in thattheprodu t i (+v i ) n i s i (n i s i +) (s i ) n i (5)

often leads to a highly asymmetri distribution, whi h puts most of the weight on the

minimumvalueof s. Indeed, when the geometri and arithmeti means, 1=n i

and v i

=n, are similar, a simple Stirling approximation to the Gamma fun tion leads to (5) being equivalentto p n= p s n .

Figure6givesthehistogramsoftheposteriordistributionsofthevariousparametersof ! withoutreweighting by theimportan e sampling weights%

j

. As seenfrom this graph, thehistograms in

i

and are well on entrated,while the histogramin 1

exhibits two modes whi h orrespond to the two modes of the histogram of s

1

and indi ate that the parameter(

i ;s

i

)isnotwellidentied. Thisistobeexpe ted,giventhatweonlyobservea fewrealisationsoftheunderlyinggammadistribution,andthiswith addednoisesin ethe durationsarenotdire tlyobserved. However,thehistogramsoftheaveragedurationss

i =

i donotexhibitsu hmultimodalityandarewell- on entratedaroundthevaluesofinterest.

µ

0

0.00

0.02

0.04

0.06

0.08

0

10

20

30

40 µ

1

2.12

2.14

2.16

2.18

2.20

2.22

2.24

0

5

10

20 σ

2

0.54

0.56

0.58

0.60

0

10

20

30

40 (

µ

0 − µ

1 )

σ

−3.00

−2.95

−2.90

−2.85

−2.80

−2.75

−2.70

0

2

4

6

8

10 λ

0

0.02

0.03

0.04

0.05

0

40

80

120 λ

1

0.05

0.10

0.15

0.20

0

10

20

30 s

0

1.0

1.5

2.0

2.5

3.0

0

10

30 s

1

2

3

4

5

0

4

8

12 s

0 λ

0

50

60

70

80

90

100

0.00

0.03

0.06 s

1 λ

1

25

30

35

40

0.00

0.10 Fig. 6. Histograms of the samples produced by PMC, before resampling

4.5. Degeneracy

Asnotedabove,PMCissimplyanimportan esamplingalgorithmwhenimplementedon e, that is, for asingle olle tionof n points (!

(j) ;x

(j)

). As su h, itprovidesan approxima-tiondevi e forthe target distribution but it is alsowell-known that a poor hoi e of the importan e sampling distribution anjeopardise theinterestof the approximation, asfor instan ewhentheweights%

j

haveinnitevarian e.

An in entiveofusing PMCin astati setting isthus toover omeapoor hoi eof the importan efun tionbyre y lingthebestpointsanddis ardingtheworstones. Thispoint ofviewmakesPMCappearasaprimitivekindofadaptivealgorithm,inthatthesupportof theimportan efun tionisadaptedtotheperforman eofthepreviousimportan esampler. The diÆ ulty with this approa h is in determining thelong-term behaviourof the

(17)

al-Population Monte Carlo

17

thealgorithm any longer. For instan e, it often happensthat onlya few pointsare kept aftertheresamplingstepofthealgorithm,be auseonlyafewweights%

j

aredierentfrom 0. Figure 7 givesfor instan e the sequen e of the numberof points that matter at ea h iteration,outofn=1000originalpoints: theper entageofrelevantpointsisthuslessthan 10%onaverageandinfa tmu h loserto5%. Inaddition,thereisno lear utstabilisation in either thenumber of relevantpoints orthe varian eof the orresponding weights, the laterbeingfarfrom exhibitingastabilisation asthenumberofiterationsin reases. Some morerudimentarysignals anbe onsideredthough,likethestabilisationofthetinFigure 8. Whiletheaveragesfor1and 2iterationsarequite unstablefor mostobservations, the twostatesaremu hmore learlyidentiedfor5and10iterations,andhardly hangeover subsequentiterations.

2e−04

4e−04

6e−04

8e−04

1e−03

0

20

40

60

80

100

120 Fig. 7. (left) Variance of the weights

%

j

along

100

iterations, and (right) Number of points with

descendants along

100

iterations, for a sample of

4000

observations and

1000

points.

Arelatedphenomenonpertainstothedegenera yofan estorsobservedintheiterations of our algorithm: asthe number of steps in reases, the number of points from the rst generation used to generate points from the last generation diminishes and, after a few dozeniterations,redu esto asinglean estor. Thisisforinstan ewhat o ursin Figure9 where,after onlytwoiterations,thereisasinglean estortothewholesample. (Notealso theiterations where the whole sample originatesfrom asingle point.) This phenomenon appears in everysetting and, while it annot be avoided, sin e somepoints arebound to vanishatea hiterationevenwhenusingthesystemati samplingofCarpenteretal. (1999) thesurprising fa toristhespeedwithwhi hthenumberofan estorsde reases.

4.6. A comparison with Hastings–Metropolis algorithms

Asmentionedabove,theproposaldistributionasso iatedwith thepseudo hiddenMarkov model ouldalternativelybeused asaproposaldistribution in a Metropolis{Hastings al-gorithmofthefollowingform:

MCMC Algorithm

Step

i

(

i=1;:::;I

)

(a) Generate

! (i) (!jy ;x (i 1) )

(18)

0

500 1000

1500

2000

−1

0

1

2 −1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

0.0

0.4

0.8

0

500 1000

1500

2000

0.0

0.6

0

500 1000

1500

2000

−1

0

1

2 −1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

0.0

0.6

0

500 1000

1500

2000

0.0

0.6

0

500 1000

1500

2000

−1

0

1

2 −1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

0.0

0.4

0.8

0

500 1000

1500

2000

0.0

0.6

0

500 1000

1500

2000

−1

0

1

2 −1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

0.0

0.4

0.8

0

500 1000

1500

2000

0.0

0 and

1

oftheion hannel modelare very lose to one another (and to the overall mean of the sample), with, orrelatively, a largevarian e

2

andveryshortdurationswithinea hstate. Toover omethisdegenera y of the sample, we had paradoxi ally to resort to a sequential implementation as follows: noti ingthatthedegenera yisonlyo urringwithlargesamplesizes,westarttheMCMC algorithm on the rst 100 observations y

1:100

and, on e a stable onguration has been a hieved,wegraduallyin reasethenumberofobservationstakenintoa ount[byafa tor of min(s 0 = 0 ;s 1 = 1

)℄ till the whole sample is in luded. The results provided in Figures 10{12wereobtainedfollowingthiss heme.

For on iseness' sake, we did not reprodu e the historyof the allo ations x over the iterations. The orresponding graphshows averystable historywith hardly any hange, ex eptonafewboundaries. Notethe onne tedstrongstabilityinthenumberofswit hes

(19)

Population Monte Carlo

19

5

10

15

20

25

0

200

400

600

800 1000

Fig. 9. Representation of the sequence of descendants (lighter grey) and ancestors (darker grey)

along iterations through bars linking a given ancestor and all its descendants (light grey) or a given

point and its ancestor (darker grey). In the simulation corresponding to this graph, there were

4000

observations and

1000

points.

iterationsoftheMCMCsamplerwouldhavebeenne essarybutourpurposeistoillustrate theproperbehaviourofthissampler,providedtheinitialisationisadequate.℄

AttemptswithverymixeddatasetsastheoneusedinFigure8weremu hlesssu essful sin e, even with a areful tuning of the starting values (we even tried starting with the knownvaluesoftheparameters),we ouldnotavoidthedegenera ytoasinglestate. The problemwiththeMetropolis{Hastingsalgorithminthis aseis learlyastrongdependen e on the starting value, i.e., a poor mixing ability. This is further demonstrated by the following experiment: when starting the above sampler from n = 1000 points obtained byrunning PMC20 times, the sampleralwaysprodu ed asatisfa tory solutionwith two lear utstatesandnodegenera y. Figure12 omparesthedistributionsofthePMCpoints and the MCMC samples via a qq-plot and shows there is very little dieren e between both. The samebehaviour is shown by a omparison of the allo ations(not represented here). This indi ates that theMCMC algorithm does notleadto abetter explorationof theparameterspa e.

Fora fairly mixed datasetof 2000 observations orresponding to Figure 8, while the MCMC algorithminitialised at random ould not avoid degenera y, apreliminary run of PMCprodu ed stable allo ationsto twostates,asshownin Figure 13 by thet forboth PMCandMCMCsamples: theyareindistinguishable,eventhoughtheqq-plotsinFigure 14indi atedierenttailbehaviours.

ThisisnottosaythatanMCMCalgorithm annotworkinthissetting,sin eHodgson (1999) demonstrated the ontrary, but this shows that global updating s hemes, that is, proposals that update the whole missing data x at on e, are diÆ ult to ome with, and that one has to instead rely on more lo al moves asthose proposed by Hodgson (1999). A similar on lusion was drawn by Billio et al. (1999) in the setup of swit hing ARMA

(20)

0 1000

2000

3000

0.0

0.2

0.4

0.6

0.8

1.0

0 1000

2000

3000

−2

0

2

4 Fig. 10. Representation of a dataset of

3610

simulated values, along with the average fit (bottom),

and average probabilities of allocation to the upper state (top). This fit was obtained using a

sequen-tial tuning scheme and

5000

MCMC iterations in the final run.

5. Conclusion

Theabovedevelopmentshave onrmed Chopin's(2002)realisationthatPMC isauseful toolinstati |asopposedtosequential,ratherthandynami |setups. Quiteobviously,the spe i Monte Carlos heme webuilt anbeusedin asequentialsettingin averysimilar way. The omparison with the equivalent MCMC algorithm in Se tion 4.6 is also very instru tiveinthatitshowsthesuperiorrobustnessofPMCtoapossiblypoor hoi eofthe proposaldistribution.

There still are issues to explore about PMC s heme. In parti ular, a more detailed assessmentoftheiterativeandadaptivefeaturesisinorder,tode idetowhi hextentthis isarealasset. Whentheproposaldistributionisnotmodiedoveriterations,asinSe tion 4,it is possiblethat there isan equivalent to the \ uto phenomenon": after agivent

0 , thedistribution of x

(t)

may be verysimilar for allt t 0

. Further omparisonswith full Metropolis{Hastingsmovesbasedon similar proposalswould alsobeof interest,to study whi h s heme brings the most information about the distribution of interest. The most promising avenue seems however the development of adaptive proposals as in Se tion 3, whereone anborrowfromearlierwork onMCMC algorithmstobuildassessmentsofthe improvementbroughtbymodifyingtheproposals. Ourfeelingatthispointisthatalimited numberofiterationsisne essarytoa hievestabilityoftheproposals.

An extensionnotstudiedin thispaperisthatthePMCalgorithm anbestartedwith afewpointsthat exploretheparameterspa eand,on ethemixingiswell-established,the samplesize anbein reasedtoimprovethepre isionoftheapproximationtotheintegrals ofinterest. Thisisastraightforwardextensionintermsofprogramming,butthesele tion

(21)

Population Monte Carlo

21 µ

0 −0.04

−0.02

0.00

0.02

0.04

0.06

0.08

0 5 15 25

µ

1

2.10

2.15

2.20

2.25

0 5 10 15 20

σ

2

0.54

0.56

0.58

0.60

0.62

0 10 20 30

(

µ

0 −µ

1 )

σ

−3.00

−2.95

−2.90

−2.85

−2.80

−2.75

−2.70

0 2 4 6 8

λ

0

0.01

0.02

0.03

0.04

0.05

0 20 60 100

λ

1

0.05

0.10

0.15

0.20

s

1 λ

1

25

30

35

Fig. 11. Details of the MCMC sample for the dataset of Figure 10: (left) histograms of the components

of the MCMC sample and (right) cumulative averages for the parameters of the models and evolution

of the number of switches (lower right graph).

Acknowledgments

Theauthors aregrateful to the Friday lun h dis ussion groupat Institut Henri Poin are fortheirinput. Commentsfrom Ni olasChopinandGarethRobertswerealsoparti ularly helpful. Earlier onversationswithChristopheAndrieuandArnaud Dou et in Cambridge havenon- oin idental onne tionswiththis workandaremostsin erelya knowledged.

References

Andrieu,C.and RobertC.P. (2001) ControlledMarkov hain MonteCarlo methodsfor optimal sampling. Cahiersdu Ceremade 0125,UniversiteParisDauphine.

Ball,F.G.,Cai,Y.,Kadane,J.B.andO'Hagan,A.(1999)Bayesianinferen eforion hannelgating me hanismsdire tlyfromsingle hannelre ordings, usingMarkov hain MonteCarlo. Pro . RoyalSo ietyLondonA455,2879{2932.

Baum, L.E. and Petrie, T. (1966) Statisti al inferen e for probabilisti fun tions of nite state Markov hains. AnnalsMath. Statist. 37,1554{1563.

Berzuini, C., Best, N., Gilks, N.G., and Larizza, C. (1997) Dynami onditional independen e modelsandMarkovChainMonteCarlomethods. JASA92,590{599.

Billio,M., Monfort,A.andRobert,C.P.(1999) Bayesianestimationofswit hingARMAModels. J.E onometri s93,229{255.

(22)

−0.05

0.00

0.05

0.10 −0.05

0.00

0.05

0.10 µ

0 particles

mcmc

2.10

2.15

2.20

2.25

2.05

2.10

2.15

2.20

2.25 µ

1 particles

mcmc

0.55

0.60

0.65

0.52

0.54

0.56

0.58

0.60

0.62

0.64

0.66 σ

2 particles

mcmc

0.01

0.02

0.03

0.04

0.05

0.01

0.02

0.03

0.04

0.05

0.06 λ

0 particles

mcmc

0.5

1.0

1.5

2.0

2.5

0.5

1.0

1.5

2.0

2.5 λ

1 particles

mcmc

Fig. 12. QQ-plot comparing the distribution of the sample with the distribution of the MCMC sample

obtained after

5000

iterations started at the PMC points.

0

500 1000

1500

2000

−3

−2

−1

0

1

2

3 Fig. 13. Simulated dataset of

2000

points with superimposed fits by PMC and the MCMC samples,

the later being initialised by PMC. Both fits are indistinguishable.

Carpenter, J.,Cliord, P.,and Fernhead, P. (1999) Building robust simulation basedlters for evolvingdatasets. Te hni alreport, DepartmentofStatisti s,OxfordUniversity.

Celeux, G., Hurn, M. and Robert, C.P. (2000) Computational and inferential diÆ ulties with mixtureposteriordistributions. J.Amer. Statist. Asso . 95,957{979.

Chopin,N.(2002)Asequentialparti leltermethodforstati models. Biometrika89,539{552. Dou et,A.,deFreitas,N.,andGordon,N.(2001)SequentialMCMCinPra ti e. Springer-Verlag,

NewYork.

Gilks,W.R.,andBerzuini,C.(2001)Followingamovingtarget{MonteCarloinferen efordynami Bayesianmodels. JRSSB63(1),127{146.

Gordon, N.,Salmond, D., andSmith, A.F.M. (1993) Novelapproa hto nonlinear/non-Gaussian Bayesianstateestimation. IEEEPro eedingsF,140,107{113.

Green, P.J. (1995) Reversible jump MCMC omputation and Bayesian model determination. Biometrika82(4),711{732.

Haario,H.,Saksman,E.andTamminen,J.(1999)Adaptiveproposaldistributionforrandomwalk Metropolisalgorithm. Computational Statisti s 14375{395.

(23)

Population Monte Carlo

23

0.85

0.90

0.95

1.00

1.05

1.10

1.15

0.80

0.90

1.00

1.10 µ

1 particle

mcmc

−0.10

−0.05

0.00 −0.15

−0.05

µ

2 particle

mcmc

0.95

1.00

1.05

1.10

0.95

1.00

1.05

1.10 σ

2 particle

mcmc

0.02

0.04

0.06

0.08

0.02

0.04

0.06

0.08 λ

1 particle

mcmc

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.01

0.03

0.05

0.07 λ

2 particle

mcmc

15

20

25

30

35

40

45

20

30

40

50 s

1 λ

1 particle

mcmc

Fig. 14. QQ-plot comparing the distribution of the PMC sample with the distribution of the MCMC

sample obtained after

5000

iterations started at one point.

Hesterberg,T.(1998)Weightedaverageimportan esamplinganddefensivemixturedistributions. Te hnometri s37,185{194.

Hodgson,M.E.A.(1999)ABayesianrestorationofaion hannelsignal. JRSSB61(1),95{114. Hodgson, M.E.A.andGreen, P.G. (1999)Bayesian hoi eamongMarkov models ofion hannels

usingMarkov hainMonteCarlo. Pro . RoyalSo ietyLondonA455,3425{3448.

Kim,S.,Shephard,N.andChib,S.(1998)Sto hasti volatility: Likelihoodinferen eand ompar-isonwithARCHmodels. Rev. E onom. Stud. 65,361{393.

Liu,J.S.(2001)MonteCarlo Strategies inS ienti Computing. Springer-Verlag,NewYork. Liu, J.S., Liang, F. and Wong, W.H. (2001) A theory for dynami weighting in Monte Carlo

omputation. JASA96,561{573.

Ma Ea hern,S.N.andPeruggia,M.(2000)Importan elinkfun tionestimationforMarkovChain MonteCarlomethods. J.Comput. Graph. Statist. 9,99{121.

Mengersen,K.L.andRobert,C.P.(2003)Iidsamplingwithself-avoidingparti lelters: thepinball sampler. In Bayesian Statisti s 7, J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D.He kerman,A.F.M.SmithandM. West(Eds.),OxfordUniversityPress(toappear). Meyn, S.P. and Tweedie, R.L. (1993) Markov Chains and Sto hasti Stability. Springer-Verlag,

NewYork.

Oh,M.S. andBerger, J.O. (1992)Adaptive importan e samplinginMonteCarlo integration. J. Statist. Comput. Simul. 41,143{168.

Oh,M.S.andBerger,J.O.(1993)Integrationofmultimodalfun tionsbyMonteCarloimportan e sampling. JASA88,450{456.

Pitt,M.K.andShephard,N.(1999)Filteringviasimulation: auxiliaryparti lelters. JASA 94, 1403{1412, 1403{1412

Robert,C.P.andCasella, G.(1999)MonteCarloStatisti alMethods. Springer-Verlag,NewYork. Rubin,D.B.(1987)MultipleImputationforNonresponseinSurveys. JohnWiley,NewYork Stavropoulos, P.and Titterington, D.M. (2001) Improvedparti le lters and smoothing. In

Se-quentialMCMCinPra ti e,Dou et,A.,deFreitas,N.,andGordon,N.(Eds),Springer-Verlag, NewYork.

Warnes,G.R.(2001)TheNormalKernelCoupler: AnadaptiveMarkovChainMonteCarlomethod for eÆ ientlysamplingfrommulti-modeldistributions. Te h. report395,Universityof Wash-ington.