HAL Id: hal-00505406
https://hal-mines-paristech.archives-ouvertes.fr/hal-00505406
Submitted on 23 Jul 2010
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Bayesian inversion by parallel interacting Markov chains
Thomas Romary
To cite this version:
Thomas Romary. Bayesian inversion by parallel interacting Markov chains. Inverse Problems in Science and Engineering, Taylor & Francis, 2010, 18 (1), pp.111-130. �10.1080/17415970903234620�.
�hal-00505406�
ThomasRomary
∗∗
Centre de Géosienes,Equipe Géostatistique, Eole des Mines de Paris,
35rue Saint-Honoré ,
77300,Fontainebleau, Frane
April15, 2009
Abstrat
Markov hains Monte-Carlo (MCMC) methods are known to produe samples of virtually
any distribution. Theyhave alreadybeen widelyused intheresolution ofnon-linear inverse
problemswherenoanalytialexpressionfortheforward relationbetweendataand modelpa-
rametersisavailable, andwherelinearizationisunsuessful. However, inBayesian inversion,
thetotalnumberofsimulationsweanaordishighlyrelatedtotheomputationalostofthe
forward model. Hene, theomplete browsing of thesupport of theposterior distribution is
hardlyperformedat naltime,espeiallywhentheposteriorishighdimensionaland/ormul-
timodal. In the latterase,the hainmay staystukinone ofthemodes. Reently,theidea
ofmakinginterat severalMarkovhainsat dierent temperatures hasbeenexplored. These
methods improve themixing propertiesof lassial singleMCMC. Furthermore,these meth-
ods anmakeeientuseoflargeCPUlusters,withoutinreasingtheglobal omputational
ost withrespetto lassial MCMC.
Keywords
Inverse problem ; Bayesian inversion ; MCMC ; interating Markov hains ; tempering ;
Historymathing
∗ ∗
Correspondingauthor. Email: thomas.romarymines-paristeh.fr
Monte-Carlo methods are beoming inreasingly important for the solution of nonlinear in-
verse problems. Typially,theinverse problemis formulated as a searh for solutions tting
the data within a ertain tolerane, given by data unertainties. In a non-probabilisti set-
ting this means that we searh for solutions with alulated data whose distane from the
observed islessthan a xed, positive number. InaBayesian ontext, thetolerane issoft: a
largenumberof samplesof statistiallynear-independent modelsfrom thea posterior proba-
bilitydistribution aresought. Suh solutions areonsistent withdataand prior information,
as they t the data within error bars, and adhere to soft prior onstraints given by a prior
probabilitydistribution.
Preisely,weonsiderthestudyofasystem
X ∈ X
,onwhihwehaveanindiretmeasure-ment
d
,thatis funtion of the state ofX
, modeled byF(X)
,and some a prioriinformation underthe form of the prior distributionP (X)
. We also onsider thatthe measurementd
isaetedbyanerrorandthatwe knowhowtosimulate
F
upto anapproximationerror,both errors being aounted for byP (d | X)
. We also dene the joint distributionP (d, X)
. Then,assuming thatall these distributions admit a densitywithrespetto theLebesgue measure,
denoted
f ( · )
,the onditional density ofX
withrespettod
takesthefollowing form:f (X | d) = f (d | X)f(X) R
X
f (d, X)dX .
(1)This is the Bayesian formulation of inverse problem and
P (X | d)
, whose densityf (X | d)
, isthe posterior distribution, see [1℄. The formula (1) shows that this problem an be viewed
asa lassialstatistialinfereneproblem, where wewantto sampleindependentrealizations
from theposterior distribution. Note that thenormalization onstant in (1)is generally in-
tratable in high-dimensional problems. Therefore, we onsider that the posterior is known
upto aonstant, beingdened fromthepriorknowledgeonthesystemstudiedand thedata
withits assoiated measurement error.
There exists several methods for solving (1 ) suh as the Kitanidis-Oliver algorithm (see
[2℄ and [3℄), developed for petroleum engineering appliations and the neighbourhood algo-
rithm([4 ℄and[5℄),developedforgeophysial inverseproblems. Inspiteofitsuniversalitythe
speedofonvergene oftherstone isontroversial: itonsistsinperformingalargenumber
of optimizations withan observed datum perturbed aording to its measurement error. It
is partiularly diult to know how many optimizations should be performed. The seond
one seems to be limited for low-dimensional problems: itan be seen asa geometri version
of an iterated importane sampling sheme (see e.g. [6 ℄, hapter 14). This artile fous on
Monte-Carlo Markovhains (MCMC)methods for their universalityand therelative ease of
their implementation.
MCMC methods suit indeed partiularly for this problem, asthey areknownto produe
samples ofvirtually anyposterior distribution. Two problems mayarisethen. Onone hand,
thedimensionof theproblemmaybe solarge thatthehainhasto be run foran intratable
numberofiterationsto onvergeand toahieve aneient sampling oftheposterior, wesay
that they have weak mixing properties. On the other hand, an evaluation of the forward
operator
F
an be very omputer demanding so that the pratitioner wishes to minimize thenumber ofiterations. Moreover, when the posterior hasseveral disonneted modesin ahigh-dimensional spae, whih isoftenthease innonlinear Bayesian inversion, theproblem
of exploring the wholesupport of theposterior is a diult one. It an be shown thateven
for very simple problems most lassial Markov hain algorithms an fail at identifying the
main modesofthe posterior,beauseof their lakof mixing (see[7 ℄).
olletionof hains inparallel at dierent temperatures and allowing them to interat. This
method is not more omputer demanding than lassial MCMC sine it an be easily paral-
lelized.
Thispaperaimsat providing researhers andengineers withsome reipesto applyinter-
ating MCMC methods. Thus, itbegins in setion2 with basisfor MCMC methods, some
examplesoflassialalgorithmandearlierattemptstoimprovemixingpropertieslikeanneal-
ingand tempering tehniques, whihrely onthesame basi priniplesasinterating MCMC
tehniques, exposed in setion 3. In setion 4, we will show an appliation to a reservoir
engineering problem. The paperends withsomeonlusions andperspetiveof futurework.
2 Markov hains Monte-Carlo methods
MCMC,introduedbyMetropolisetal. [8℄,isapopularmethodfor generatingsamplesfrom
virtuallyanydistribution
π
dened on( X , B ( X ))
,whereB ( X )
standsfortheBorelsetsofX
.In partiular there is no need for the normalizing onstant of
π
to be known and the spaeX ⊆ R
d (for some integerd
) on whih itis dened an be high dimensional. We reall here some lassial results on MCMC methods. For a omprehensive review of MCMC, see [6℄,hapters 6to 13. For a more detailedaount on Markovhainstheory,see [9 ℄.
2.1 Priniples
ThemethodonsistsinsimulatinganergodiMarkovhain
{ X
n, n ≥ 0 }
onX
withtransitionprobability
P
suh thatπ
isa stationary density for thishain, i.e.∀ A ∈ B ( X )
:Z
X
P (x, A)π(x)dx = π(A).
(2)Suhsamples an be usede.g. to ompute integrals
π(h) = Z
X
h(x)π(x)dx,
(3)estimating thisquantityby
S
n(h) = 1 n
n
X
i=1
h(X
i),
(4)for some
h : X → R
. A very useful onept in onstruting ergodi Markov hains is re- versibility. A Markovhain isreversibleifit satisesthedetailed balane ondition:P (x, dy)π(dx) = P (y, dx)π(dy).
(5)Thismeansthat, ifstartedinstationarity, theMarkovhainhasthesame haneof starting
at
x
andjumping toy
asstartingaty
andjumping tox
.We illustrate the priniples of MCMC with the Metropolis-Hastings (MH) update. It
requiresthe hoie of a proposal distribution
q
. The role ofq
onsistsin proposing potentialtransitions for the Markov hain. Given that the hain is urrently at
x
, a andidatey
isaepted withprobability
α(x, y)
dened as:α(x, y) =
( min n
1,
π(y)π(x)q(y,x)q(x,y)o
if
π(x)q(x, y) > 0,
1
otherwise.(6)
Otherwise,itisrejeted and theMarkovhainstays at itsurrent loation
x
. Thetransitionkernel
P
ofthis Markov haintakesthe form, for(x, A) ∈ X × B ( X )
:P(x, A) =
Z
A
α(x, y)q(x, y)dy + 1
A(x) Z
X
(1 − α(x, y))q(x, y)dy.
(7)The Markov hain dened by
P
is reversible with respet toπ
and therefore admitsπ
asinvariant distribution. Conditionsontheproposaldistribution
q
thatguaranteeirreduibility and positive reurreneareeasy to meet and manysatisfatoryhoies arepossible.2.2 Some examples of Metropolis-Hastings samplers
The arbitrariness of the hoie of
q(x, · )
allows onsiderable freedom to design a multitude of dierent hains, eah with stationary distributionπ
, although in the Bayesian inversionframework,
q
shouldrelyontheaprioridistribution. Someexamplesinlude(see[6℄,hapter 7,for more examples):1. theindependentsampler(IMH):
q(x, y) = q(y)
,whereq
isgenerallythepriorinBayesianinversion,
2. thesymmetri inrements random-walk sampler (SIMH):
q(x, y) = q( | y − x | )
, whereq
an be azero-mean versionof theprior,
3. theLangevin sampler(LMH): assuming that
π
is dierentiable onX
,itallows to takeadvantage ofthegradient information to givethesampling diretion,
q
takestheform:q(x, y) ∼ N
x + h
22 ∇ log(π(x)), h
2I
d,
(8)where
h
isa parameter to hooseaording to e.g. [10℄ or [11℄. Notethat abad hoieof
h
an indue erratibehaviour ofthehain,4. Theadaptive algorithm of [12 ℄ (ASIMH): Inthis algorithm,
y
isproposedaordingtoq
θn(x, · ) = N (x, Γ
n)
,whereθ = (µ, Γ)
. We also onsider a non-dereasing sequene of positive step sizes{ γ
n}
,suh thatP
∞n=1
γ
n= ∞
andP
∞n=1
γ
n1+δ< ∞
for someδ > 0
.Inpratie,we generallyuse:
γ
n= 1/n
,assuggestedin[12 ℄. Theparameter estimationalgorithm takes the following form:
µ
n+1= µ
n+ γ
n+1(X
n+1− µ
n) , n ≥ 0,
Γ
n+1= Γ
n+ γ
n+1(X
n+1− µ
n) (X
n+1− µ
n)
tr− Γ
n,
(9)5. TheGibbssampler: Here
X = X
1× . . . ×X
d,andq = q
ileavesalloordinatesxedexeptthe
i
thone,whihitproposesaordingtotheonditionaldistribution(x
i|{ x
j}
j6=i)
. Thisimplies that
α(x, y) = 1
for allx
andy
, sothere are no rejetions. If theresultingi
thomponentGibbssamplerisalled
P
i,thentheseomponentsanbeombinedtoyieldthe random-san Gibbs sampler whih is the average
P
RS=
1d(P
1+ . . . + P
d)
, or thedeterministi-san Gibbs sampler whih isthe produt
P
DU= P
1· · · P
d.2.3 Comments
One of the problems with Metropolis-Hastings algorithms is the abundane of hoie avail-
ablefor hoosingthe proposaldistribution
q(x, · )
. For instane even ifthetype ofalgorithmappropriate for
π( · )
. Suh a problemis knownas a saling problem. To make this questionmore onrete, onsider the following problem. Suppose that
q(x, · )
is distributed as thed- dimensional normaldistributionN (x, σ
2I
d)
,for someσ
2> 0
. We reallthat theaeptaneprobabilities forthis algorithm aregiven by(6). For verysmallvalues of
σ
2,smalljumpsareattempted bythe algorithm, and beause of the form of (6 ), these moves arealmost always
aepted. The Markovhain mixes very slowly beause its inrements are sosmall. Onthe
other hand,if
σ
2 is hosento be very large, longdistane jumps areattempted bythe algo-rithm,mostofwhiharerejeted. Thealgorithmthereforespendslongperiods oftimeinthe
same state,and thus the algorithm still onverges slowly. For thisproblem, "very large" and
"very small" have to be interpreted in a way related to the partiular form of
π
. It seemsreasonable that "moderate" values of
σ
2 should be preferred. However, it is diult to seehowtogureoutwhatvaluesare"moderate",espeiallyif
π
isveryompliated. InBayesianinversionontext,therandomwalktypealgorithms, liketheSIMHortheLMH,generallyfail
at identifyingdierent modes. Inlarge dimensional spae, thesalingfator generallyhasto
be "small" so as to get an aeptable aeptane rate (6). Therefore, these two algorithms
perform generallyaloalexplorationandarehardlyabletojumpfromonemode toanother.
We will thenreferto them as"loal" samplers. Note thatthey arealsogenerally really slow
to onvergetowards thestationaryregime inBayesian inversionontext.
Conversely, the IMH does not need any tuning. It will explore largely thesurfae of the
posterior distribution andwe will referto itasa"global" sampler. Nevertheless, inpratial
appliations,unless
q
istheposteriordistribution,thetransitionswillobviouslyalmostalways be rejeted.Finally,weannotieherethatthehaingeneratedbytheadaptivealgorithm isnolonger
homogeneous, butit anbeproved (see[12 ℄, [13℄and [14 ℄inamore general framework) that
it has the orret ergodi properties. The idea of adaptive sampling is to improve the pro-
posal eieny,makingitaslose aspossible to theposterior density. However, itshouldbe
stressed herethat thealgorithm presented above generallyfails inmulti-modal ontext for a
lownumber ofiterations (see e.g. [7 ℄). Regarding theGibbs sampler,itdoesnot seem to be
welladaptedtotheBayesian inversionproblem: theimportantnumberofallsoftheforward
modellimitsits relevany.
Due to the sequential nature of MCMC algorithm and to takle multi-modality problems,
MCMC pratitioners generally use several hains that they run in parallel. By simulating
several hains, variability and dependene on the initial value are redued and it should be
easierto ontrol onvergene to thestationarydistribution byomparing theestimation, us-
ing dierent hains, of quantities of interest. However, good performanes of these parallel
methods require a degree of a priori knowledge on the distribution of interest
π
, in orderto onstrut an initial distribution on
X
whih takesinto aount the featuresofπ
(modes,shape of highdensity regions,et.). Thisis rarely theasein Bayesian inversion. Moreover,
in highly non-linear setups, like in Bayesian inversion, a slow mixing hain will presumably
stayinthe neighborhood ofthe starting point witha highprobability (see [6 ℄hapter 12 for
a morethorough disussion).
Duetothe omplexityoftheposteriordistribution(e.g. multi-modality and/ordisonneted
support) in Bayesian inversion problems and lassial limitations of MH algorithms, other
methods than lassial MHalgorithm shouldbe investigated. Simulated annealing andtem-
pering, whih arepresented inthe nextparagraph, onsists instudying modied versions of
theposterior.
The simulated annealing algorithm has been introdued by [8℄, then generalized by [15℄ for
optimization problems. It an be applied to both optimization and simulation problems (see
[6℄ and referene therein). The simulated tempering has been introdued independently in
[16 ℄ and[17℄.
Thefundamentalideaofthesealgorithmsisthatahangeofsale,namedtemperature,allows
larger moves on the surfae of the distribution to explore, ompared with lassial MCMC
methods. Indeed, this hange of sale allows to avoid thehainto remain trapped ina loal
mode.
The name and inspiration of the rst one ome from annealing in metallurgy, a tehnique
involving heating and ontrolled ooling of a material to inrease the size of its rystals
and redue their defets. The heat auses the atoms to beome unstuk from their initial
positions (a loal minimum of the internal energy) and wander randomly through states of
higher energy;theslowooling gives themmore hanes of ndingongurations withlower
internalenergythantheinitialone. Converselythetemperingisabrutaloolingfollowedbya
aontrolledreheatingoftheworkpieetoatemperaturebelowitslowerritialtemperature.
Preipitationhardening alloys,likemanygrades ofaluminumandsuperalloys,aretempered
to preipitateintermetalli partileswhihstrengthen themetal.
Thesetwo methods aim partiularly at generating samplesfrom Gibbsdistribution.
Denition 2.1 A
X
-valued random eldX
, is a Gibbs eld of energyE
, if its probabilitydensityfuntion (with respet to the Lebesgue measure) is:
f (x) = 1
Z e
−E(x), Z = Z
X
e
−E(x)dx,
(10)named Gibbs density.
For pratial problems, the onstant
Z
is generally intratable due to the dimension ofX
.Notethatina wide variety ofinverse problems,theposteriordistribution (1)takestheform
(10 ); for instane when both prior and measurement error areassumed Gaussian. Note also
that simulated annealing and tempering are not onned to ope with Gibbs distributions.
We present here both algorithms inthis framework for sake of simpliity. For other kind of
targetdistributions
π
,thepratitionerhastoonsiderattenedversionsgiven byπ
T= π
1/T.2.4.1 Simulated annealing
Given a positive temperature
T
, a Markov hainX
is generated from the following Gibbsdensity:
π
T(x) ∝ exp ( − E(x)/T ).
(11)The simulated annealing is performed by gradually lowering the temperature
T
from ahighvaluetonear-zero. Closeto
T = 0
theGibbsdistribution approximatesadelta funtion attheglobal minimumforE(x)
(ifitisunique). Forsimulation purposes,theooling anbestopped at the value
T = 1
.This algorithm an be viewed as a non-homogeneous version of the MH algorithm. In-
deed,sine
T
dereasesalongthealgorithm,thekernelofthehainvarieswithtime. ClassialtheoretialresultsonMarkovhainsdoesnotapplyforthisalgorithm. Heuristirulesaregen-
erallyappliedtoensurethevalidityofsimulated annealing: thestartingtemperature mustbe
highenough andits dereaseslow enough. Aonvergene resultexists, foroptimization pur-
pose, withtheonditionthat
T
dereasesas1/ log(n)
. Inpratie, ageometriallydereasing sequeneis generallyused.Theprinipleofsimulatedtemperingislinkedtothesimulatedannealingoneinthesensethat
we will again onsider Gibbs distributions saled by a temperature parameter
T
. However,this algorithm aims at sampling from a Gibbs distribution
π
rather than minimizing theenergy of the system. Weonsider herea nite sequeneof temperatures and theassoiated
Gibbs distributions. In this algorithm, we authorize the hain to hange temperature level
aordingtoagivenprobability. Thiswillallowthehaintogo baktohigher temperatures,
esapingeventual loalmodesof thetarget distribution, thatresults inbetter mixing.
Werstdene aninreasingsequeneoftemperatures
1 = T
0< . . . < T
K,withitsassoiatedGibbs densities
π
i(x) ∝ e
−E(x)
Ti , an auxiliary
{ 0, . . . , K }
-valued variableM
, and the jointdistribution:
µ(x, m) = ρ
mπ(x),
K
X
i=0
ρ
m= 1.
(12)We also dene the probabilities
p
U andp
D of moving "up", fromm
tom + 1
,and "down", fromm
tom − 1
,withonlyT
hanging, the hainbeing at temperatureT
m,and theproba-bilityof hoosing a xedlevelmove
(1 − p
U− p
D)
.The priniple isto simulate a
X × { 0, . . . , K }
-valuedhain(X
n, M
n)
. Denoting respetivelyq
i→i+1(X
n+1| X
n= x
n)
andq
i+1→i(X
n+1| X
n= x
n)
the probabilities of transition proposi- tion towards the superior and theinferior temperature level, theaeptaneprobability of atransitionfrom
T
i toT
i+1 isproportional to:ρ
i→i+1(x
n, x
n+1) = p
Dp
Uπ
i+1π
iq
i→i+1(X
n+1= x
n+1| X
n= x
n)
q
i+1→i(X
n+1= x
n+1| X
n= x
n) .
(13)To maintain the detailed balane ondition, it is then neessary that
ρ
i+1→i(x
n+1, x
n) = 1/ρ
i→i+1(x
n, x
n+1)
and to hoose the proposition distributionsq
i→i+1(X
n+1| X
n= x
n)
andq
i+1→i(X
n+1| X
n= x
n)
aordingly. Still, it isimportant to notie that (13)dependson thenormalization onstants of
π
i+1 andπ
i. We an bypass this diultybydesigning an equalnumberofmovesfrom
m
tom + 1
andfromm + 1
tom
andbyaeptingtheentire sequeneasa single proposal, thus aneling the normalizing onstants intheaeptane probability,
asdesribed e.g. in[18℄.
2.4.3 Comments
Attempts to improve mixing properties of the hain by simulated annealing fail generally
beause ofthe monotonousdereaseintemperature; ifthehaingetsinaloalmode,itmay
be impossibleto esapeit ifthetemperature is alreadytoolow.
Conerning the simulated tempering, the potential gain in a better exploration of the
supportofthetargetdistribution, soastosay,abettermixing,doesnot seemtoompensate
fortheinreasedamountofforward operatorevaluationsforinverseproblems(
2K
bigger,fortheshemepresentedabove). However,thepresentationofthismethodisagoodintrodution
to theinterating Markovhainsalgorithms exposedinthenext setion.
3 Parallel interating Markov hains
Thepriniple ofmakinginterat MarkovChainsrstappearsin[19 ℄ underthenameparallel
tempering (PT). It has been mostly applied in physio-hemial simulations, see [20 ℄ and
referenes therein. It is known in the literature under dierent names suh as: exhange