Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers

(1)

and more general continuous time samplers

y

Olivier Capp ´e

CNRS / ENST D ´epartement TSI, 75634 Paris cedex 13, France

Christian P. Robert

CREST, INSEE, and CEREMADE, Universit ´e Paris Dauphine, 75775 Paris cedex 16, France

Tobias Ryd ´en

z

Centre for Mathematical Sciences, Lund University, Box 118, 221 00 Lund, Sweden

Summary. At present, reversible jump methods are the most common tool for exploring variable

di-mension statistical models. Recently however, an alternative approach based on birth-and-death

pro-cesses has been proposed by Stephens (2000)in the case of mixtures of distributions. We address the

comparison of both methods by demonstrating that upon appropriate rescaling of time, the reversible

jump chain converges to a limiting continuous time birth-and-death chain. We show in addition that the

birth-and-death setting can be generalised to include other types of jumps like split/combine jumps in

the spirit of Richardson and Green (1997). We illustrate these extensions in the case of hidden Markov

models.

AMS 1991 classification. Primary 62C05. Secondary 60J05, 62F15, 65D30, 65C60.

Keywords: Bayesian inference, birth-and-death process, completion, hidden Markov model, Jacobian,

label switching, MCMC algorithms, mixture distribution, rescaling

1. Introduction

Markov Chain Monte Carlo [MCMC℄ methods for statisti al inferen e, in parti ular Bayesian inferen e,haveundoubtedlybe omestandardduringthepasttenyears(CappeandRobert,2000). For variable dimension problems, often arising through model sele tion, a popular approa h is Green's(1995)reversiblejumpMCMC[RJMCMC℄methodology. Re entlyhowever,inthe ontext of mixturesofdistributions,Stephens (2000a,b)rekindledinterestin adierentmethod basedon ontinuoustimebirth-and-deathpro essesforestimatingthenumberof omponentsofthemixture, followingearlierproposalsbyGeyerandMller(1994),GrenanderandMiller(1994)andPhillips and Smith(1996). Wewill allthisapproa hbirth-and-deathMCMC[BDMCMC℄.

Amainquestionaddressedinthepresentpaperisasfollows: isthereafundamentaldieren e betweenthereversiblejump andbirth-and-death MCMCmethodologies,oraretheseapproa hes similar? As an answer to this question we show in Se tion 3 that for any BDMCMC pro ess satisfying some weak regularity onditions there exists a sequen e of RJMCMC pro esses that onverges,inasense tobepre isedbelow,to theBDMCMCpro ess.

In their appli ation of reversible jump MCMC to mixtures of distributions, Ri hardson and Green (1997) involved twotypes of movesthat ould hange the number of omponents of the mixture: onewasbirth/death, in whi h anew omponentis reatedoranexisting oneis deleted, and theother wassplit/ ombine,in whi h one omponentissplit in two,ortwo omponentsare ombinedin one. Onthe opposite, Stephens(2000a) onlydealswithbirth/death movesin order to keepthealgorithmwithin thetheoryof(marked)pointanddeathpro essesongeneralspa es. Weshowthat onvergen eofreversiblejumptobirth-and-deathMCMCisnotlimitedtomovesof

yWorkpartiallysupportedbyEUTMRnetworkERB{FMRX{CT96{0095on"Computationaland Sta-tisti alMethodsfortheAnalysisofSpatialData". TheauthorsaregratefultoGarethRobertsforhelpful ommentsontheRao-Bla kwellisation improvement.

(2)

thiskindhowever,butismu hmoregeneral. Forexample,theabovesplit/ ombinemoves ouldbe in orporated. Theapproa hsoobtained ouldbenamed ontinuoustimereversiblejumpMCMC andtheappropriatetheoreti alframeworkisthat ofMarkovjump pro esses.

The paper is organised as follows: in Se tion 2, we provide a review of the main features of reversible jump and birth-and-death MCMC methodologies. The onvergen e of RJMCMC to BDMCMC is established in Se tion 3. In Se tion 4, we dis uss the generalisation of moves for BDMCMC besides birth/death moves, while in Se tion 5, we show how sampling an be mademoreeÆ ientin thisapproa h,introdu inga ontinuous-timeRao-Bla kwellisations heme. Se tion 6 illustrates the general BDMCMC methodology for hidden Markovmodels, in parallel withtheRJMCMCapproa hofRobert,RydenandTitterington(2000). Se tion7 on ludeswith adis ussionoftheprosand ons ofea hmethod.

2. A quick review of reversible jump and birth-and-death MCMC methodologies

Inthisse tionwegiveaqui kreviewofRJMCMCandBDMCMCin themixture ase onsidered by Stephens (2000a). We will onsider the extension of BDMCMC to hidden Markov models in Se tion 6. Further reading is provided by Ri hardsonand Green (1997, 1998) and Stephens (2000a,b).

2.1. Mixture models

Themodelweworkwiththushasaprobabilitydensityfun tion oftheform

p(yjk;w ;)= k X i=1 w i f(yj i );

wherekisthenumberof omponents,w=(w 1

;:::;w k

)arethe omponentweights,=( 1

;:::; k

) arethe omponentparametersand f(;) issomeparametri lassof densitiesindexed bya pa-rameter. Common examples arethe Gaussian family, theGamma family(in whi h ases is typi allytwo-dimensional)andthePoissonfamily(inwhi h aseisone-dimensional). The om-ponentweightsarenon-negativenumberssummingupto unity. Note that wewrite all densities as onditional ones, as our statisti al approa h is Bayesian. Hen e we need to spe ify a prior density for(k;w ;), denoted byr(k;w ;). Wedo notmakeany further assumptionsabout the prior, ex ept that it is proper and that, for ea h k, it is ex hangeable, that is, invariant under permutationsofthepairs(w

i ;

i

). WealsodenotebyL(k;w ;)thelikelihoodwhi hisgivenby

L(k;w ;)= m Y i=1 p(y i jk;w ;); wherey=(y 1 ;:::;y m

)istheobservedsequen e. Theposteriordensity,whi hisourstartingpoint for inferen e, is thus proportional to r(k;w ;)L(k;w ;). A real model typi ally also involves hyperparameters,whi h assu hdonotadd anyfurther diÆ ulty. Wedonotspe i allyaddress thisissuetillSe tion6wherehyperparametersareused. Belowweput=(w ;);inthisnotation kisimpli it.

Afeatureinherenttomixturemodelsisthatwemayasso iatewithea hobservationy i alabel (orallo ation)z i 2f1;:::;kgwithP(z i =jjk;w )=w j

that indi atesfrom whi h omponenty i wasdrawn. Givendata,theselabels anbesampledindependentlywith

P(z i =jjk;w ;;y i )= w j f(y i j j ) P k `=1 w ` f(y ` j ` ) : (1)

We allsu hasimulation ompletingthesample as(z;y )isoftenreferredtoasthe ompletedata. As detailed below in the setup of hidden Markov models and as demonstrated in Celeux et al.

(3)

2.2. Birth-and-death MCMC

We nowstudy thefollowingform of BDMCMC: in state , new omponentsare reated (born) in ontinuous time, at rate (). Whenever a new omponent is born in this state, its weight w and parameter are drawn from ajoint density h(;(w;)). In order to make spa e forthe new omponent, theold omponentweightsare s aleddown proportionallyasto makeallofthe weights, in luding the new one, sum to unity; that is, w

i := w

i

=(1+w). The new omponent weight-parameterpair(w;) isalsoaugmentedto. We denotethese operationsby`[', sothat the new state is [(w;). Furthermore, in a(k +1) omponent onguration [(w;), the omponent(w;)iskilledat rate

Æ(;(w;))= L()r() L([(w;))r([(w;)) 1 k+1 ()h(;(w;)) (1 w) k 1 : (2) Thefa tor (1 w) k 1

in (2)resultsfrom a hangeofvariable Ja obiandeterminantwhen renor-malising the weights. Indeed, when the omponent(w;)is removed, theremaining omponent weightsarealso renormalisedastosum tounity. Wedenotethese twooperationsby`n',sothat =([(w;))n(w;). An importantfeatureof theBDMCMC isthat ( ontinuous time)jump pro essesareasso iatedwiththebirthanddeathrates: wheneverajumpo urs,the orresponding moveisalwaysa epted. What repla esthea eptan e probability of lassi alMCMC methods isthedurationofthestayinea hstate. Inparti ular,implausiblestates,thatis,statessu hthat L()r()=0,dieimmediately.

2.3. Reversible jump MCMC

Wenowturntothe orrespondingreversiblejump MCMCsampler. Inak omponentstate,at ea hiteration,thealgorithmproposeswithprobabilityb()to reateanew omponentandwith probabilityd()itproposestokillone. Obviously,b()+d()=1. Ifanattemptto reateanew omponentismade,itsweightandparameteraredrawnfromh(;(w;))asabove. Ifanattempt tokilla omponentismade,ea h omponentissele tedwithequalprobability. Anew omponent is a eptedwithprobabilitymin(1;A),where A=A(;[(w;)) isgivenby

A(;[(w;)) = L([(w;))r([(w;)) L()r() (k+1) d([(w;)) (k+1)b() (1 w) k 1 h(;(w;)) = L([(w;))r([(w;)) L()r() d([(w;)) b() (1 w) k 1 h(;(w;)) : (3)

Here the rst ratio, ombined with the rst fa tor k+1, is the ratio of posterior densities; b()h(;(w;)) is thedensity ofproposing anew omponent(w;) andd([(w;))=(k+1) is theprobabilityofproposingtokill omponent(w;)wheninstate[(w;). Finally(1 w)

k 1

is thesameJa obiandeterminantasabove.

In(3), therst fa tork+1 omes from theassumption that in the RJMCMCalgorithm we keep

1 ;:::;

k

ordered using a predetermined ordering. For example, in the ase of Gaussian omponents we ould sort a ording to the mean. This ordering will, loosely speaking, redu e the size ofthe spa eof k omponentparametersbya fa tork!, and thefa tork+1isthe ratio (k+1)!=k!. Thisfa torshouldthusbeasso iatedwiththeposteriordensityratio. Wedoremark, however,thattheassumptionofordered omponentsisapurelyte hni alidentiabilitydevi eand doesnotmakeanypra ti al hangeto thealgorithm: whenanew omponent(w;)is proposed wekeepthe omponentsordered bysortingthem. Indeed,ifordering isnotimposed,one rather has to work on a quotient spa e indu ed by the equivalen e relation dened by

0 if and

0

are identi al up to apermutation of indi es. Working with thequotient spa e also gives rise to a fa tor k+1 as in (2). Regarding deaths in the RJMCMC sampler, if in a (k +1) omponentstate[(w;)aproposaltodeletea omponentismade,ea h omponentissele ted with equalprobability. Assumingthat omponent(w;)issele ted,thea eptan eprobabilityis then min(1;1=A),whereA=A(;[(w;)) isasabove.

(4)

RJM-andthe parameters i

as well as,possibly,the hyperparameters for axed k|see, for instan e, Ri hardsonand Green(1997). A ompletesweep of thealgorithm onsistsin the omposition of abirth/death move with these other|xed k|moves. Stephens (2000a) resampled omponent weightsandparametersatregularlyseparatedinstants. Samplingforaxedk anbe arriedout usingaGibbsmoveafter ompleting thesamplea ordingto(1),but ompletionwasnotusedby Stephens(2000a)whoonly onsideredMetropolis-Hastingsupdates. Asnotedabove,Ri hardson andGreen(1997)designed,inaddition,movesforsplittingand ombining omponents(seeSe tion 4forthegeneralisationofBDMCMC.)

3. Convergence of reversible jump to birth-and-death MCMC

Weshall now,starting fromaBDMCMCalgorithm asabove, onstru tasequen eof RJMCMC samplers onverging,ina ertainsensetobedened,totheBDMCMCsampler. Beforepro eeding weintrodu esomeadditionalnotation. LetS

k 1 =f(w 1 ;:::;w k ): w i >0; P w i =1g,denoteby thespa einwhi hea h

i

liesandput (k ) =S k 1 k . Thus (k )

isthespa eofk-dimensional parameters. Finally=[

k 1

(k )

denotestheoverallparameterspa e. ForN =1;2;3;:::,wedeneanRJMCMCsamplerbyletting

b N ()=1 expf ()=Ng; d N ()=1 b N ()=expf ()=Ng;

where()isthebirth rateoftheBDMCMCsampler. ThenAalsodepends onN,andwewrite A=A

N

. WeremarkthatasN !1,b N

()()=N,andif()isboundedwe antakeinstead b

N

()=()=N. Thestateat time n=0;1;2;:::of theN-th RJMCMCsampleris denotedby

N n

,and forea hN we onstru t a ontinuous timepro essf N (t)g t0 as N (t)= N bNt , where b denotestheintegerpart. ThestateoftheBDMCMCsamplerattimet0isdenotedby(t). We now onsider what happensas N !1. Theprobability of proposing abirth in state tendstozeroas()=N. Hen e,thea eptan eratioA

N

tendstoinnity,sothatabirthproposal isalwaysa epted. Iftimeisspeededupats aleN,onthenominaltimes alethelimitingpro ess ofa eptedbirthsin state is aPoisson pro essof rate(). Thes aled probabilityof deleting omponent(w;)in astate[(w;)2 (k +1) is Nd N () 1 k+1 min (1;1=A N (;[(w;))) ! L()r() L([(w;))r([(w;)) 1 k+1 () h(;(w;)) (1 w) k 1 asN !1;

and the right hand side is nothing but Æ(;(w;)). Considering the res aled time axisand the independent attempts to reate or delete omponents, in the limit the waiting time until this omponentis killed hasan exponentialdistribution with rate Æ(;(w;)), whi h agrees with the BDMCMCsampler. Thus,summingup,asN !1abirthisrarelyproposedbutalwaysa epted andadeathisalmost alwaysproposed butrarelya epted. Boththese s hemesresultin waiting timeswhi h are asymptoti allyexponentiallydistributed with rates in a ordan ewith the BD-MCMCsampler. Thus, onemay expe t that asN ! 1, thepro esses f

N

(t)g andf(t)g will be omemoreandmoresimilar.

We will now make this reasoning stri t. We rst note that sin e the standardtopology on theopen unit interval (0;1)is separableand anbemetrised by a omplete metri ,for example d(x;y) = jlog (x=(1 x)) log (y=(1 y))j, S

k 1

an be viewed as a omplete separable metri spa e. Likewiseweassumethat hasaseparabletopologywhi h anbemetrisedbya omplete metri . Then , with the indu ed naturaltopology, is a spa e of the same kind. The pro ess f (t)g is aMarkov pro ess on whi h we assumehas sample paths in D

[0;1), the spa e of -valuedfun tionson[0;1)whi hareright- ontinuousandhavelefthandlimitseverywhere. We makethefollowingassumptions:

(5)

(A3) For ea h (w;)2 (0;1), h(;(w;)) is ontinuouson and forea h 2 there is a neighbourhoodGof su hthat sup

0 2G h( 0 ;)isintegrable.

Theorem 1. Under(A1){(A3)andassumingthat(0)and 0

aredrawnfromthesameinitial distribution, f N (t)g t0 onverges weakly to f(t)g t0

in the Skorohod topology on D

[0;1)as N !1.

TheproofisgiveninAppendix A.

4. Generalisations of birth-and-death MCMC

As noted above, Stephens (2000a) resampled omponent weights and parameters with xed k, aswell ashyperparameters, at equidistanttimes. This obviously makesthe overallpro ess non-Markovian. We an,however,in orporatesu hmovesintothe ontinuoustimesampler. Suppose for examplethat in state oftheRJMCMCsampler,amovethat resamples omponentweights and parametersas well ashyperparameters, while keeping k xed, is proposed with probability 1 exp( ()=N). Res alingtime asaboveandpassingtothelimitprodu esa ontinuoustime pro essinwhi h,instate,su hmoveso uratrate (). Birthanddeathratesstaythesame. Of oursewe an alsohavedierentratesfor resampling omponentweightsand parametersand hyperparameters,respe tively.

Afurthers opeforgeneralisationistointrodu emore omplexmoves,likethesplitand ombine movesof Ri hardsonand Green(1997). We onsider herethe aseofasplit or ombine movein the RJMCMC setting where, following Green (1995), the ombine move is deterministi . For simpli ity, we denote by an element of the k omponent parameterve tor and assume that thereisno onstrainton. (Inthemixtureexample onsideredinSe tion2,=(w;)wasindeed two orthree dimensional and there wasa onstrainton theset of w's. Wewill see in Se tion 6 howthe onstraint anbeee tivelyremoved.)

The RJMCMCsampler proposes to split arandomly hosen omponent of the k omponent ve torwithprobabilitys

N

()soastogiverisetoanewparameterve torwithk+1 omponents, dened as((n)[T(;"))whereT isadierentiableone-to-onemappingonto

2

,where2 , and " is arandomvariablewith p.d.f. p. Wealso assumethat the mappingis symmetri in the sense that P(T(;")2AA 0 )=P(T(;")2A 0 A) (4) forallA;A 0

. Forinstan eifpisasymmetri p.d.f.,T(;")=( ";+")isavalidmapping, and likewise, if p is su h that " and "

1

have the same distribution, T(;") = (";=") is also a validmapping. Conversely,the probability ofproposing to ombine arandomly hosenpairof omponentsof (there are k(k 1)=2 pairs)is denoted by

N

()=1 s N

(). The a eptan e probabilityof the ombinemoveae tingthe k+1 omponentve tor( (n)[T(;"))isgiven by min 1; L()r()k! L((n)[T(;"))r( (n)[T(;"))(k+1)! s N ()k(k+1)=2 N ((n)[T(;"))k :2p(") T(;") (;")

where, aspreviously, the fa torialsin the rst ratiostems from the ordering of the omponents before andafter the ombine move. Thefa tor2is aresultof thesymmetryassumption (4): in asplit move,a omponent(w;) anbe splitinto thepair((w

0 ; 0 );(w 00 ; 00

))aswellasintothe reversely ordered pair((w

00 ; 00 );(w 0 ; 0

)), but upon sortingthe omponentsthese ongurations are equivalent. However, the two ways of getting there are typi ally asso iated with dierent values of " and possibly also with dierent densitiesp("); thesymmetry assumption is pre isely what assures that the densities at these twovalues of " oin ide and hen ewemay repla ethe sum oftwodensitiesthat we would otherwiseberequired to ompute bythefa tor2. We ould pro eedwithoutsu hsymmetrybutwouldthenneedto onsiderthedensitiesof"when ombining

(6)

AsinSe tion 3,welets N

()=1 expf '()=Ng,sothat Ns N

()!'(),anda ordingly s alebyN thetraje toryofthe orrespondingdis retetimesampler. Thelimiting ontinuoustime pro essthushasarateofmovingfrom((n)[T(;"))to bya ombine movewhi h isgiven by L()r() L((n)[T(;"))r((n)[T(;")) '()=k k+1 2p(") T(;") (;") (5)

Note that it is not ne essary to onsider the equivalent dis rete time RJMCMC sampler to obtaintheaboveresultasitispossibleto he kdire tlythatthelo al balan e

r()L() '()

k

=L((n)[T(;"))r( (n)[T(;"))q(((n)[T(;"));)

holdswithq(((n)[T(;"));)denedby(5). Spe ial areisrequiredwithsu h onsiderations howeversin ethetransitionkernelofthejump hain(denedinSe tion5)typi allydoesnothave adensityw.r.t.asingledominatingmeasure. Forexample,afterkillinga omponentthenewstate is ompletelyknowngiventhe urrentone. This problemalsoo urs forRJMCMC samplers,as exemplied by the measure onstru tion in Green (1995), and we do not detail it further here. Further reading on Markovjump pro esses is found in,for example, Ripley (1987)and Resni k (1994).

5. Sampling in continuous time

Whenrunningadis retetimeRJMCMCsampler,itsstateistypi allystored(sampled)afterea h steporsweep,oronregularintervalsinordertode reaseinter-sample orrelation,asinRi hardson andGreen(1997)andRobert,RydenandTitterington(2000),evenif onvergen eassessmentfor RJMCMCsamplersisstillinitsinfan y(BrooksandGiudi i, 1999).

In ontinuoustimesettings,therearemoreoptions. Forexample,thepro essmaybestoredat regulartimes,asin Stephens (2000a),ormaybesampled usingan independent Poissonpro ess. Ineither aseposteriormeansE[g()jy℄areestimatedbysamplemeansN

1 P N 1 g(( i )),where i

arethe sampling instants. Suppose weadopt the former sampling s heme. If wethen let the samplingdistan etendto zero,weee tivelyput aweightonea hstatevisitedbyf (t)gthatis equalto thelengthof theholding time in that state, when omputing thesample mean. Before elaboratingfurtheronthisidea,weintrodu esomeadditionalnotation.

Let T n

be the time of the n-th jump of f(t)g with T 0

= 0. By the jump hain we mean the Markov hain f (T

n

)g of states visited by f(t)g. We denote this hain by f e n g, that is, e n =(T n

). Let()bethetotalrateoff (t)gleavingstate,thatis,thesumofthebirthand alldeath rates, plusthe ratesof all other kinds of movesthere may be. Then theholding time T n T n 1 off(t)gin itsn-thstate e n

hasa( onditional)distributionwhi hisexponentialwith rate(

e n

).

Returning to thesampling s heme, we anthen redu e sampling variabilityby repla ing the weightT n T n 1 byitsexpe tation1=( e n 1

). Inthisway,thevarian esofestimatorsbuiltfrom thesampleroutputarede reasedbyvirtueoftheRao-Bla kwelltheorem,sin e

~ g= 1 N N X i=1 g( e n 1 ) ( e n 1 ) = 1 N N X i=1 E[T n T n 1 j e n 1 ℄g( e n 1 ):

Whensamplingf(t)gthis way,weonlysimulate thejump hain andstoreea hstateitvisitsas well asthe orresponding expe ted holding time. Alternatively, theexpe tedholding times may bere omputedlaterwhenpro essingthesampleroutput. Inordertosimulatethejump hainwe notethatitstransition lawisasfollows: theprobabilityofaneventhappeningisproportionalto itsrate. Hen e,forexample,theprobabilityofabirthis()=();ifabirtho ursthenthenew omponentweightandparameteraredrawnfromh(;(w;))asbefore. Thusweneedto ompute

(7)

6. An illustration for hidden Markov models

6.1. Setting

We onsiderinthisse tionanappli ationofthe ontinuoustimeMCMCmethodologytothe ase ofhiddenMarkovmodels,asinRobert, RydenandTitterington(2000). That is,theobservations y

t

aresu hthat, onditionalonahiddenMarkov hainfz n

g,withnitestatespa ef1;:::;kg,y n is distributedasanormalvariate

N( z n ; 2 zn )

Contrary to previous implementations, we hoose to parametrise the transition matrix for the Markov hainfz n gbyP=(! ij ),asfollows: P(z n+1 =jjz n =i)=! ij = X ` ! i` The! ij

'sarethereforenotidentied,but thisparameterisationisboundto fa ilitatetheMCMC moves(provided avagueproperprioris sele ted). As in Robert et al. (2000),weare interested in estimatingthenumberofhiddenstates,k. ThepriormodellingontheparametersisanExp(1) distributionon the! ij 's,anormal N(0;9 2 i )distribution on the i

'sand anExp(1) distribution onthe

i 's.

InRobertetal.(2000),themodelunder onsideration onsistedof

N(0; 2 z n

)

forthedistributionofy n

onditionalonz n

,i.e. didnotinvolveanunknownmeanparameter. For thismodel,weusethesameprior,namelyauniformU(0;)prioronthe

i

'sandanExp(5maxjx n

j) prioronthehyperparameter1=. (Robertetal.(2000)noti edthatthefa tor5intheexponential distribution was of little in uen e on the results.) Note that we do not impose identiability onstraintsatthesimulationlevelbyorderingthevarian es, ontraryto Robertetal.(2000).

A major dieren e with the above papers is that, as in Stephens (2000a), we will not use ompletiontorunouralgorithm.Thatistosay,thelatentMarkov hainfz

n

gisnottobesimulated bythealgorithm. This anbeavoidedthankstoboththeforwardre ursiverepresentationofthe likelihood for a hidden Markov model (Baum and Petrie, 1966), already used in Robert et al. (1999), and the random walkproposalsas in Hurn et al. (2001). We believe that this hoi e is bound to a elerate onvergen e ofthealgorithm byadrasti redu tionof thedimensionalityof thespa e.

6.2. The moves of the BDMCMC algorithm

Sin ereversiblete hnologywasimplementedforthismodelinRobertetal.(2000),wenowfo uson the ontinuoustimeMCMC ounterpart,extendingStephens(2000a)andHurnetal.(2001)tothis framework. Inaddition to birth-and-deathmoves,whi h were enoughtoprovidegood mixing in theabovepapers,wedoneedtointrodu eadditionalproposals,similartothoseinRi hardsonand Green (1997)and Robertet al. (2000), be auseweobservedthat thebirth-and-deathmovesare not,bythemselves,suÆ ienttoensurefast onvergen eoftheMCMCalgorithm. Theproposalswe add aresplit/ ombinemoves,followingthedenominationofRi hardsonandGreen(1997),where agiven omponentisbrokenintotwoparts,andxedkmoves,wheretheparametersaremodied via aregularMCMC step. (Thelater proposals arequintessentialin ensuringgood onvergen e properties.)

Thebirth-and-death andxed k movesare simpleto implement,and are equivalentto those given in Stephens (2000a) and Hurn et al. (2001), with xed k moves relying on random walk proposals overthe transforms log(!

i ) and log( i )|or log( i = i

) in the onstrained ase of Robertetal. (2000). Thesplit/ ombinemovefollowsthegeneralframeworkexposedin Se tion4 with a ombine intensity given by (5). We use '

S

as an individual split intensity whi h is the sameforall omponents. Thismeansthattheoverallintensityofasplit moveforak omponent ve toris '()=k'

S

. In thepra ti alimplementationof thealgorithm, we hose ' S

=' B

=2

(8)

Therearemanywaysofdevisingasplit/ ombinemovebut, ontrarytoRi hardsonandGreen's (1997)observationthat theirrstattempt wassu essful, wehad totryseveral proposalsbefore obtainingpropermixingbehaviour,asdetailed now.

Inthe aseofanormalhiddenMarkovmodelwithmeans i andvarian es 2 i bothunknown, asplitofstatet 0 intostatest 0

and(k+1)involvesfourdierenttypesofa tions:

(a) asplit movein rowj6=t 0 of! j;t0 as ~ ! j;t0 =! j;t0 " j ; !~ j;k +1 =! j;t0 (1 " j ); with " j

uniform on(0;1); thisproposalis sensible whenthinking that boththenew states k+1and t

0

are issued from thestatet 0

: theprobabilitiesto rea ht 0

arethus distributed betweentheprobabilitiestorea hthenewt

0

andto rea hk+1; (b) asplit movein olumni6=t

0 ! t 0 ;i as ~ ! t 0 ;i =! t 0 ;i j ; !~ k +1;i =! t 0 ;i = j where j

islognormalLN(0;1). Thesymmetry onstraint(4)isthussatised. Notethatwe rsttriedthismovewithahalf-Cau hyC

+

(0;1)proposal,whi halsopreservesthe distribu-tionbyinversion(thatis,

j

and1= j

havethesamedistribution),butthis ledto verypoor mixingpropertiesforthealgorithm;

( ) asplit movefor! t0;t0 as ~ ! t 0 ;t 0 =! t 0 ;t 0 " t 0 t 0 ; !~ t 0 ;k +1 = ! t 0 ;t 0 (1 " t 0 ) k +1 ; ~ ! k +1;t 0 =! t 0 ;t 0 " t 0 = t 0 ; !~ k +1;k +1 = ! t 0 ;t 0 (1 " t 0 )= k +1 where" t 0 isuniformon(0;1)and t 0 ; k +1 areLN(0;1); (d) asplit moveon( t 0 ; 2 t0 )as ~ t 0 = t 0 +3 t 0 " ; ~ k +1 = t 0 3 2 t0 " ; ~ 2 t0 = 2 t0 " ; ~ 2 k +1 = 2 t0 =" ; where" N(0;1) and" LN(0;1).

The ombinemoveis hoseninasymmetri way,sothatstatest 0

andt 1

are ombinedintostate t

0

by taking rst thegeometri average ofrowst 0

and t 1

in theexponential omponentmatrix, by adding olumnst

0 andt

1

, andnally swit hingstatest 1

andk+1. (One an he kthat this sequen eofmovesalsoapplies tothe parti ular ase of!

t0;t0

.) Themean t0

is obtainedasthe arithmeti averageofthemeans~

t0 and~

t1

,whilethevarian e 2 t 0

isthegeometri averageofthe varian es~ 2 t 0 and ~ 2 t 1

. AppendixBdetailsthe omputation ofthe orrespondingJa obian.

6.3. Illustrations

First,we onsiderasimulateddatasetof500observations,representedonFigure1(a);thisdataset wasbuilt by joining stret hes of three dierent normal samples that an bespotted dire tly on thegraph. Themostvisitedvalue(and posterior mode) ofk is3, asshownin Figure 1(b), with regular visits to 2 and 4. Larger values were hardly visited (although we used a at prior on k 2 f1;:::;10g). As shown by Figure 1(d), the orresponden e betweenthe estimated density, obtainedbyaveragingallthedensityestimatesovertheiterations,andthestandard nonparamet-ri kernelestimate, is quite satisfa tory. Note in addition that the parameter hains, separated omponent by omponent, produ ea label swit hing behaviour that is to be expe ted from the theory (see Hurn et al., 2001), as well as good mixing properties. (The graphs represented in Figure1a tually orrespondto50;000iterationsoftheMCMCalgorithm,with anaverageof25 jumpsperobservationunit.)

Our se ond dataset orresponds to atransform ofthe IBM sto koveraperiod of ve years, startingin1992,whi hrepresentsthevolatilityofthesto k[kindlyprovidedtousbyCatalinStari a (UniversiteLibre de Bruxelles)℄. As an be seenfrom therawplotof the datasetin Figure 2(a), thestatesareless learlyidentiedand,moreimportantly,thereseemstobefewermovesbetween

(9)

Simulated dataset

−5

0

5

0.00

0.05

0.10

0.15

0.20 −5

0

5 Number of states

temp[, 1]

Relative Frequency

2

4

6

8

10

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0

200

400

600

800 1000

1200

1

2

3

4

5

6 instants

number of states

Log likelihood value

temp[, 2]

Relative Frequency

−2500

−2000

−1500

−1000

−500

0.0000

0.0010

0.0020

0

200

400

600

800 1000

1200

−2500

−1500

−500

instants

log−likelihood

(a) (b)

0

500 1000

1500

0.0

0.2

0.4

0.6

0.8

1.0 pi

0

500 1000

1500

−10

−5

0

5

10 mu

0

500 1000

1500

0

5

10

20

30 sigma

−10

−5

0

5

0.00

0.05

0.10

0.15

( ) (d)

Fig. 1. Continuous time MCMC algorithm output for a simulated dataset of

500

points: (a) histogram and

rawplot of the dataset; (b) MCMC output on

k

(histogram and rawplot), the simulated number of states and

corresponding likelihood values; (c) MCMC sequence of the parameters of the three components when

conditioning on

k=3

; (d) MCMC evaluation of the marginal density, compared with R nonparametric density

(10)

havesimilarposteriorprobabilitiesand,inoppositiontoFigure1(b),thespreadoftheloglikelihood valuesismu hlarger, suggestingthat theposterior distribution hasseveral modes that anonly belinkedbyvisitingintermediatelowlikelihoodregions. Sin ethe asek=3isvisitedlessoften, thenumberofsimulationsin Figure 2( )islowerthanthenumberof simulationsin Figure1( ), but alsoexhibits the orre tlabel-swit hing behaviour andpropermixing features (even though one anspot longer regions when the hain remainsinvariant). Note also that the t in Figure 2(d)isjustassatisfa toryasthenonparametri estimation.

Fora omparisonwithRobertetal.(2000),wealso onsideronedatasetstudiedinthisprevious paper,namelythewindintensityinAthens[kindlyprovidedtousbyChristianFran q(Universite duLittoral)℄. On e again, the modelling setting slightly diers from the above: the means are nowallset to 0,the priordistributiononthe's is notanexponentialdistribution but rather a uniformU(0;), beingestimatedfromthedatasetin ahierar hi alwayandupdatedthrougha sli esampler(sin e the onditionaldistribution is atrun atedgamma) viaanadditionalpro ess withintensity'

,equalto1.

Figure3summarisestheoutputforthedataset orrespondingtothewind intensityinAthens. Themain pointisthat,asinRobert etal.(2000),weobtainamodeoftheposteriordistribution ofk at k=3, although the posteriordistribution slightly diersin our ase,sin e theposterior probabilitiesfor 1;2;3;4are :0064;:1848;:7584;:0488,to be ompared with Table 1in Robert et al.(2000). NotethatFigure3(b)providesinadditionthedistributionofthenumberofmovesper unitoftime(onthe ontinuoustimeaxis). Theloglikelihoodsarea tually overingawiderranger thanthosefoundin Robertet al.(2000),althoughthehighestvaluesarethesame. Forinstan e, thelargestlikelihood fork =2is 688,while it is 675for k=3and 670fork =4. Thet betweenthenonparametri densityandtheBayesianposterioraverageisquitea urate.

7. Discussion

ConsideringTheorem 1,onemaybe temptedto say`everythingthat may bedone in ontinuous time anbe done in dis rete time'. Whilethat might betrue from atheoreti al pointof view, thingsareless lear utwhenperforman e onsiderationsaretakenintoa ount.

Stephens (2000a)madesome omparisonsof hisalgorithm to Ri hardsonand Green's (1997) reversiblejump MCMCsampler,whi hwe ite:

A. Our algorithm works in ontinuous time, repla ing the a ept-reje t s heme by allowing eventstoo uratdieringrates.

B. Our dimension- hanging birth and death moves do not make use of the missing data z, ee tivelyintegratingoutoverthemwhen al ulatingthelikelihood.

C. Our birth and deathmoves takeadvantageof the natural nestedstru ture of the models, removingtheneedforthe al ulationofa ompli atedJa obian,andmakingimplementation morestraightforward.

D. Ourbirth anddeathmovestreattheparametersasapointpro ess,anddonotmakeuseof any onstraintsu has

1

<< k

[usedbyRi hardsonandGreen(1997)in deningtheir splitand ombinemoves℄.

We disagree with point C. sin e any Ja obian involved does appear in both ontinuous and dis retetime. Aswehaveseen,theJa obiandeterminant(1 w)

k 1

duetorenormalising ompo-nent weightsappearsin boththedeathrates(2) andthea eptan eratio (3). Indeed,Stephens (2000a,p.71)attributesthisdeterminanttoa`simple hangeofvariableformula'. Inourview,the determinantshouldbeasso iatedwiththeproposaldensityh,asthe(k+1) omponentparameter [(w;)isnotdrawndire tlyfromadensityon

(k +1)

butratherindire tlythroughrstdrawing (w;)andthenrenormalising. Inorderto omputetheresultingdensityon

(k +1)

onemustthen al ulateaJa obian. (Infa t,asnotedabove,thereisnodensityw.r.t.axedreferen emeasure on

(k +1)

.) WealsosawinSe tion6thattheJa obiandeterminantofthesplitand ombinemove doesappearin ontinuoustime. The omplexityis thereforeidenti alforbothmethodologies.

(11)

Starica’s IBM stochastic volatilities

−4

−2

0

2

4

0.0

0.1

0.2

0.3

0.4 −2

0

2

4 Number of states

temp[, 1]

Relative Frequency

2

4

6

8

10

0.00

0.10

0.20

0.30

0 1000

2000

3000

4000

5000

1

2

3

4

5

6

7

8 instants

number of states

Log likelihood value

temp[, 2]

Relative Frequency

−7000

−6000

−5000

−4000

−3000

−2000

−1000

0

0.000

0.002

0.004

0 1000

2000

3000

4000

5000

−6000

−4000

−2000

0 instants

log−likelihood

(a) (b)

0

50

100

150

0.0

0.2

0.4

0.6

0.8 pi

0

50

100

150 −10

0

10

20 mu

0

50

100

150

0

5

10

15 sigma

−4

−2

0

2

4

0.0

0.1

0.2

0.3

( ) (d)

Fig. 2. Continuous time MCMC algorithm output for a transform of

507

IBM stockprices: (a) histogram and

rawplot of the dataset; (b) MCMC output on

k

(histogram and rawplot), number of states, and corresponding

likelihood values; (c) MCMC sequence of the parameters of the three components when conditioning on

(12)

Wind intensity in Athens

−5

0

5

0.0

0.1

0.2

0.3

0.4

0.5 −6

−4

−2

0

2

4

6 Number of states

temp[, 1]

Relative Frequency

2

4

6

8

10

0.0

0.2

0.4

0.6

0

500 1000

1500

2000

2500

1

2

3

4

5 instants

number of states

Log likelihood values

temp[, 2]

Relative Frequency

−1400

−1200

−1000

−800

−600

−400

−200

0.000

0.010

0.020

0.030

0

500 1000

1500

2000

2500

−1400

−1000

−600

−200

instants

log−likelihood

Number of moves

temp[, 3]

Relative Frequency

5

10

15

20

25

30

0.00

0.05

0.10

0.15

0

500 1000

1500

2000

2500

5

10

20

30 instants

Number of moves

(a) (b)

0

500 1000

1500

0.0

0.2

0.4

0.6

0.8

1.0 pi

0

500 1000

1500

0

1

2

3

4

5 sigma

−5

0

5

0.0

0.1

0.2

0.3

0.4

0.5

0.6

( ) (d)

Fig. 3. Continuous time MCMC algorithm output for a sequence of

500

wind intensities in Athens; (a)

histogram and rawplot of the dataset; (b) MCMC output on

k

(histogram and rawplot), number of states,

and corresponding likelihood values; (c) MCMC sequence of the parameters of the three components when

conditioning on

k=3

; (d) MCMC evaluation of the marginal density, compared with R nonparametric density

(13)

unorderedbutthey anbesortedagain. Nonetheless,wedidnotimposeorderingwhensimulating the parameters with xed k's and, more importantly, did not restri t ourselves to implement ombinemovesonlyonadja ent omponentsasin Ri hardsonandGreen(1997).

Hen e,theaboveitem thatwendmostimportantis B.;whether themissing dataz iskept tra k of in all movesor not. It would indeed be interesting to ompare the performan e of two algorithms,indis reteor ontinuoustime,thatareidenti alex eptforthisaspe t. (Were allthat Robertetal.(2000)didresortto ompletionin theirimplementationofRJMCMC.)

Wenowpro eedtodis ussing omputationalaspe tsofdis reteand ontinuoustimealgorithms. In ontinuoustime, on e astate is entered, itis ne essaryto ompute theratesof allpossible movesleadingtoanexitfromthatstate,attheexpenseofO(k)forbirth/deathmovesandO(k

2 ) for split/ ombines ones. In dis retetime thisnotne essary,asthea eptan eratio ofamoveis not omputeduntilthemoveisproposed. ThisisanadvantageofreversiblejumpMCMC.Onthe other hand,for movessu has birth andsplit in ontinuous time, ratesare typi allyverysimple and itisonlythedeathor ombine ratesthatareexpensiveto ompute. Thisis anadvantageof ontinuoustimealgorithms.

What an wesayaboutthemixing performan eofthe dierentalgorithms? A typi alset-up of BDMCMCis to let() be onstant, say () =1 (a dierent onstantonly res alestime). Likewise, for RJMCMC b() =d() =1=2 is typi al, ex ept for states with k = 1 for whi h b()=1. UndertheseassumptionsEqs.(2)and(3)relateasA=(k+1)Æ

1

. Sin ebothsamplers have the samestationary distribution, wend that if one of thealgorithms performs poorly, so doesthe other one. ForRJMCMC this ismanifested asasmall A's|birthproposals arerarely a epted|whileforBDMCMCitismanifestedaslargeÆ's|new omponentsareindeedbornbut die againqui kly.

FinallyweagainmentionRao-Bla kwellisationasanadvantageof ontinuoustimealgorithms; this featureis, asnotedabove,obtainedatnoextra ost. Rao-Bla kwellisation ouldin prin iple be arried outin dis rete timeaswellsin ethe holdingtimes havegeometri distributions. But asopposed to ontinuous time,the expe tedholding times annot be omputedeasily{see(6) in theproofofLemma 1below.

References

Baum, L.E. and Petrie, T. (1966) Statisti al inferen e for probabilisti fun tions of nite state Markov hains. Ann. Math. Statist. 37,1554{1563.

Brooks,S.andGiudi i,P.(1999)DiagnosingConvergen eofReversibleJumpMCMCAlgorithms. InBayesianStatisti sVI,Bernardo,J.,Berger,J.,Dawid,A.P.andSmith,A.F.M.(Eds),733{742. OxfordUniversityPress,Oxford.

Cappe,O.andRobert,C.P.(2000)MCMC:Tenyearsandstillrunning! J. Amer. Statist. Asso . 95, 1282{1286.

Celeux, G., Hurn, M. and Robert, C.P. (2000) Computational and inferential diÆ ulties with mixture posteriordistributions. J.Amer. Statist. Asso . 95, 957{979.

Geyer,C.J.andMller,J.(1994)Simulationpro eduresandlikelihood inferen eforspatialpoint pro esses. S andinavian J.Statist. 21, 359-373.

Green, P.J.(1995)Reversiblejump Markov hainMonteCarlo omputation andBayesianmodel determination. Biometrika 82,711{732.

Grenander, U. and Miller, M. (1994) Representations of knowledge in omplex systems (with dis ussion). J. Roy. Statist. So . Ser. B 56,549{603.

Hurn,M.,Justel,A.andRobert,C.P.(2001)EstimatingmixturesofregressionsJ.Computational and Graphi al Statisti s(toappear).

Karr,A.F.(1975)Weak onvergen eofasequen eofMarkov hains. Z.Wahrs heinli hkeitstheorie verw. Gebiete 33,41{48.

Phillips, D.B. and Smith, A.F.M. (1996) Bayesian model omparison via jump diusions. In Markov hainMonteCarloinPra ti e,W.R.Gilks,S.T.Ri hardsonandD.J.Spiegelhalter(Eds.), 215{240. ChapmanandHall,London.

(14)

Ri hardson,S.andGreen,P.J.(1997)OnBayesiananalysisofmixtureswithanunknownnumber of omponents(withdis ussion). J. Roy. Statist. So . Ser. B 59, 731{792.

Ri hardson,S. and Green, P.J.(1998)Corrigendum: \OnBayesiananalysis ofmixtures withan unknownnumberof omponents". J.Roy. Statist. So . Ser. B 60, 661.

Robert, C.P., Ryden, T. and Titterington, D.M. (1999) Convergen e ontrols for MCMC algo-rithms,withappli ationstohiddenMarkov hains. J.Statisti alComputationandSimulation 64, 327{355.

Robert, C.P., Ryden, T. and Titterington, D.M. (2000) Jump Markov hain Monte Carlo algo-rithmsforBayesianinferen ein hiddenMarkovmodelsJ. Roy. Statist. So . Ser. B 62, 57{75. Ripley,B.(1987)Sto hasti Simulation. J.Wiley,NewYork.

Stephens,M.(2000a)Bayesiananalysisofmixturemodelswithanunknownnumberof omponents| analternativetoreversiblejumpmethods. Ann. Statist. 28, 40{74.

Stephens,M.(2000b)Dealingwithlabelswit hingin mixturemodels. J. Roy. Statist. So . Ser. B 62,795{809.

A. Proof of Theorem 1

Letfor2 (k ) , ()=()+ k X i=1 Æ(n(w i ; i );(w i ; i ))

be the overall rate of leaving state in the BDMCMC sampler and let N

() be the overall probabilityofmovingawayfromstate (in onestep) intheRJMCMCsampler.

Beforeprovingthetheorem, westateandprovealemma.

Lemma 1. For ea h k1 and 0 2 (k ) , there isa neighbourhood G (k ) of 0 su hthat sup 2G jN N () ()j!0asN!1.

Proof. Werstnotethat for2 (k ) , N () anbewritten N () = Z b N ()min fA N (;[(w;));1gh(;(w;))d(w;) + k X i=1 d N () 1 k minfA 1 N (n(w i ; i ););1g: (6) Thus sup 2G jN N () ()j Z sup 2G jNb N ()min fA N (;[(w;));1gh(;(w;)) ()h(;(w;))jd(w;) (7) + k X i=1 sup 2G j 1 k Nd N ()minfA 1 N (n(w i ; i ););1g Æ(n(w i ; i );(w i ; i ))j: (8)

Westartby lookingat the`birthpart'(7) ofthis expression. Weshall provethat it tendsto zerobyshowingthatthe integrandtendsto zeroforall(w;)and showingthat theintegrandis dominated,forallsuÆ ientlylargeN,byanintegrablefun tion. Bound theintegrandas

sup 2G jNb N ()minfA N (;[(w;));1gh(;(w;)) ()h(;(w;))j sup 2G jNb N () ()j1sup 2G h(;(w;)) (9)

+ sup()supjmin fA N

(15)

For 0andN >, N 1 2 2 N 2 1 e =N N ; sothat jN(1 e =N ) j 1 2 2 N :

Hen e,forsuÆ ientlylargeN (9)isboundedby

1 2N sup 2G 2 ()sup 2G h(;(w;)); (11)

by(A1)and(A3), foranappropriateGthisexpressiontendstozeroasN!1andisdominated byanintegrablefun tion.

Regarding(10), it is dominated by an integrablefun tion similar to (11)(remove1=2N and thesquare),anditremainstoshowthat ittendsto zeroasN !1. Wehave

jminfA N (;[(w;));1gh(;(w;)) h(;(w;))j =h(;(w;)) min L([(w;))r([(w;)) L()r() d N ([(w;)) b N () (1 w) k 1 ;h(;(w;)) :

By(A2),forea h(w;),L([(w;))r([(w;)) andL()r()areboundedawayfrom innity andzero,respe tively,onasuÆ ientlysmallG. Likewise,by(A1),d

N

([(w;)) andb N

()tend to unityandzero,respe tively,uniformlyoversu haG. Finally,by(A3),h(;(w;))isbounded onanappropriateG,andwe on ludethat(10)tendstozerouniformlyoverGasN!1ifGis small enough.

We now turn to the `deathpart' (8). By argumentssimilar to those above,for large N and suÆ ientlysmallGitholdsthat

1 k Nd N ()minfA 1 N (n(w i ; i ););1g = 1 k Nmin L(n(w i ; i ))r(n(w i ; i )) L()r() b N (n(w i ; i ))h(n(w i ; i );(w i ; i )) (1 w i ) k 2 ;d N () = L(n(w i ; i ))r(n(w i ; i )) L()r() 1 k Nb N ()h(n(w i ; i );(w i ; i )) (1 w i ) k 2

uniformly overG, and, also using argumentsasabove,one anshow therighthand side of this expression onvergestoÆ(n(w i ; i );(w i ; i

))asN !1,uniformlyoverasmallenoughG. 2

Re allthedenitionsofjump timesandthejump hain inSe tion5. Thesequen ef e n ;T n T n 1

)gofvisitedstatesandholdingtimesformaMarkovrenewalpro ess(MRP).Thetransition kernelofthisMRPisdenotedbyK,thatis,K(;AB)=P(

e n 2A;T n T n 1 2Bj e n 1 =). Sin e f (t)g isMarkov,the onditional distribution ofT

n T n 1 given e n 1 = is exponential with rate (). In addition, (T

n

) and T n

T n 1

are onditionally independent. Similarly, f

N

(t)g is a semi-Markov pro ess with jump times fT N n

g in the latti e i=N, and the kernel of the asso iated MRP is denoted by K

N . Sin e f N n g is Markov, N (T N n ) and T N n T N n 1 are

onditionally independentgiven N

(T N n 1

).

Proofof Theorem1. UsingresultsofKarr(1975),itissuÆ ienttoprovethatforea hreal-valued uniformly ontinuousfun tion g on[0;1),

(i) Kg()is ontinuouson;

(16)

Westartby showing (ii). Bythe stru ture of, it itsuÆ ient to showthat for ea h 0 2 (k ) , thereisaneighbourhoodG (k ) of 0 su hthatK N

g()!Kg()uniformly onG,andthisis whatwewilldo. For2

(k ) ,K

N

g()andKg( ) anbewritten

K N g() = 1 X m=1 Z (1 N ()) m 1 b N ()minfA N (;[(w;));1g h(;(w;))g([(w;); m N )d(w;) + 1 X m=1 (1 N ()) m 1 k X i=1 d N () 1 k minfA 1 N (n(w i ; i ););1gg(n(w i ; i ); m N ) = Z 1 0 Z (1 N ()) bNu Nb N ()minfA N (;[(w;));1g h(;(w;))g([(w;); dNue N )dud(w;) + Z 1 0 (1 N ()) bNu k X i=1 Nd N () 1 k minfA 1 N (n(w i ; i ););1gg(n(w i ; i ); dNue N )du; Kg() = Z 1 0 Z ()e ()u () () h(;(w;))g([(w;);u)dud(w;) + Z 1 0 k X i=1 ()e ()u Æ(n(w i ; i );(w i ; i )) () g(n(w i ; i );u)du = Z 1 0 Z e ()u ()h(;(w;))g([(w;);u)dud(w;) + Z 1 0 k X i=1 e ()u Æ(n(w i ; i );(w i ; i ))g(n(w i ; i );u)du;

wheredxeisthesmallestintegernosmallerthanx.

Weagainstartbylooking atthe`birthparts'ofthekernels,bounding the orrespondingpart ofjK N g() Kg( )jas Z 1 0 Z sup 2G (1 N ()) bNu Nb N ()minfA N (;[(w;));1gh(;(w;)) g([(w;); dNue N ) e ()u ()h(;(w;))g([(w;);u) dud(w;):

Wewishto provethat thisexpression tendstozeroasN !1. We andothis byshowingthat theintegrandtendsto zeroforallu0andall(w;)andthat thereexistsadominating(forall suÆ ientlylargeN)integrablefun tion.

Inordertoa omplishthis, weaddandsubtra tanumberofteles opingterms,givingus

sup 2G (1 N ()) bNu Nb N ()minfA N (;[(w;));1gh(;(w;))g([(w;); dNue N ) e ()u ()h(;(w;))g([(w;);u) sup 2G (1 N ()) bNu e ()u sup 2G Nb N ()1h(w;)jjgjj 1 + sup 2G e ()u sup 2G Nb N ()1h(w;)Æ g 1=N + supe ()u supjNb N () ()j1h(w;)jjgjj 1

(17)

+ sup 2G e ()u sup 2G () sup 2G jminfA N (;[(w;));1gh(;(w;)) h(;(w;)jjjgjj 1 ; where h(w;)= sup 2G h(;(w;)) and Æ g 1=N = sup ((;u);( 0 ;u 0 ))1=N jg(;u) g( 0 ;u 0 )j is g's

modulusof ontinuity; isametri making[0;1)separableand omplete. All oftheterms ontherighthandsidebuttherstone anbetreatedasintheproofofthelemma,withtheextra observationthat()()isboundedawayfromzeroon ompa tsubsetsof. Moreover,sin e

(1 N ()) bNu e N()bNu =e NN()(bNu =N) ;

thelemmaimpliesthatthersttermis,forlargeN,dominatedbyanintegrablefun tion. Finally

(1 N ()) bNu e ()u e N()bNu e ()u = e ()u e ()(bNu =N u)+bNu o(1=N) 1 ;

where, bythelemma,theo(1=N)-termisuniformoverasmallGsothattherighthandsidetends to zerouniformlyoversu haG. Theinequalitylog(1 x) x 2x

2

for0x1=2leadstoa reverseinequalitywhi h ishandledsimilarly.

The`deathparts'ofthekernels,thatis,boundingthe orrespondingpartsofjK N

g() Kg()j, anbehandled ombiningargumentsforthe`birthparts'andargumentsusedtoprovethelemma. Finallyrequirement(i)above anbeprovedusingentirelysimilarte hniques. 2

B. The Jacobian for the split-combine move

Theparts oftheJa obiandeterminant orrespondingtothesplitmoveinx6.2are

(a) ! j;t0 (b) 2! t0;i = i ( ) ! 3 t0;t0 " t 0 t 0 " t 0 = t 0 (1 " t 0 ) k +1 (1 " t 0 )= k +1 " t 0 " t 0 = 2 t0 0 0 0 0 (1 " t 0 ) (1 " t 0 )= 2 k +1 t 0 1= t 0 k +1 1= k +1 ; thatis, ! 3 t0;t0 " t 0 t 0 0 k +1 0 " t0 2" t0 = 2 t0 0 0 0 0 (1 " t0 ) 2(1 " t0 )= 2 k +1 (1+ t0 )=2 0 (1+ k +1 )=2 0 =4! 3 t 0 ;t 0 " t0 (1 " t0 ) t 0 k +1 (d) 1 3" =2 t0 3 t0 0 1 3" =2 t 0 3 t 0 0 0 " 0 2 t0 0 1=" 0 2 t0 =" 2 =12 3 t0 ="

giventhatwedierentiatew.r.t. 2 t 0 , not t0 .

TheoverallJa obiandeterminantforthesplitmoveistherefore

J = T(;") (;") =3 Y i ! i;t 0 ! t 0 ;i i ! t 0 ;t 0 " t 0 (1 " t 0 ) k +1 " 3 t0 2 k +3 :