• Aucun résultat trouvé

Bayesian inversion by parallel interacting Markov chains

N/A
N/A
Protected

Academic year: 2021

Partager "Bayesian inversion by parallel interacting Markov chains"

Copied!
21
0
0

Texte intégral

(1)

HAL Id: hal-00505406

https://hal-mines-paristech.archives-ouvertes.fr/hal-00505406

Submitted on 23 Jul 2010

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Bayesian inversion by parallel interacting Markov chains

Thomas Romary

To cite this version:

Thomas Romary. Bayesian inversion by parallel interacting Markov chains. Inverse Problems in Science and Engineering, Taylor & Francis, 2010, 18 (1), pp.111-130. �10.1080/17415970903234620�.

�hal-00505406�

(2)

ThomasRomary

∗∗

Centre de Géosienes,Equipe Géostatistique, Eole des Mines de Paris,

35rue Saint-Honoré ,

77300,Fontainebleau, Frane

April15, 2009

Abstrat

Markov hains Monte-Carlo (MCMC) methods are known to produe samples of virtually

any distribution. Theyhave alreadybeen widelyused intheresolution ofnon-linear inverse

problemswherenoanalytialexpressionfortheforward relationbetweendataand modelpa-

rametersisavailable, andwherelinearizationisunsuessful. However, inBayesian inversion,

thetotalnumberofsimulationsweanaordishighlyrelatedtotheomputationalostofthe

forward model. Hene, theomplete browsing of thesupport of theposterior distribution is

hardlyperformedat naltime,espeiallywhentheposteriorishighdimensionaland/ormul-

timodal. In the latterase,the hainmay staystukinone ofthemodes. Reently,theidea

ofmakinginterat severalMarkovhainsat dierent temperatures hasbeenexplored. These

methods improve themixing propertiesof lassial singleMCMC. Furthermore,these meth-

ods anmakeeientuseoflargeCPUlusters,withoutinreasingtheglobal omputational

ost withrespetto lassial MCMC.

Keywords

Inverse problem ; Bayesian inversion ; MCMC ; interating Markov hains ; tempering ;

Historymathing

∗ ∗

Correspondingauthor. Email: thomas.romarymines-paristeh.fr

(3)

Monte-Carlo methods are beoming inreasingly important for the solution of nonlinear in-

verse problems. Typially,theinverse problemis formulated as a searh for solutions tting

the data within a ertain tolerane, given by data unertainties. In a non-probabilisti set-

ting this means that we searh for solutions with alulated data whose distane from the

observed islessthan a xed, positive number. InaBayesian ontext, thetolerane issoft: a

largenumberof samplesof statistiallynear-independent modelsfrom thea posterior proba-

bilitydistribution aresought. Suh solutions areonsistent withdataand prior information,

as they t the data within error bars, and adhere to soft prior onstraints given by a prior

probabilitydistribution.

Preisely,weonsiderthestudyofasystem

X ∈ X

,onwhihwehaveanindiretmeasure-

ment

d

,thatis funtion of the state of

X

, modeled by

F(X)

,and some a prioriinformation underthe form of the prior distribution

P (X)

. We also onsider thatthe measurement

d

is

aetedbyanerrorandthatwe knowhowtosimulate

F

upto anapproximationerror,both errors being aounted for by

P (d | X)

. We also dene the joint distribution

P (d, X)

. Then,

assuming thatall these distributions admit a densitywithrespetto theLebesgue measure,

denoted

f ( · )

,the onditional density of

X

withrespetto

d

takesthefollowing form:

f (X | d) = f (d | X)f(X) R

X

f (d, X)dX .

(1)

This is the Bayesian formulation of inverse problem and

P (X | d)

, whose density

f (X | d)

, is

the posterior distribution, see [1℄. The formula (1) shows that this problem an be viewed

asa lassialstatistialinfereneproblem, where wewantto sampleindependentrealizations

from theposterior distribution. Note that thenormalization onstant in (1)is generally in-

tratable in high-dimensional problems. Therefore, we onsider that the posterior is known

upto aonstant, beingdened fromthepriorknowledgeonthesystemstudiedand thedata

withits assoiated measurement error.

There exists several methods for solving (1 ) suh as the Kitanidis-Oliver algorithm (see

[2℄ and [3℄), developed for petroleum engineering appliations and the neighbourhood algo-

rithm([4 ℄and[5℄),developedforgeophysial inverseproblems. Inspiteofitsuniversalitythe

speedofonvergene oftherstone isontroversial: itonsistsinperformingalargenumber

of optimizations withan observed datum perturbed aording to its measurement error. It

is partiularly diult to know how many optimizations should be performed. The seond

one seems to be limited for low-dimensional problems: itan be seen asa geometri version

of an iterated importane sampling sheme (see e.g. [6 ℄, hapter 14). This artile fous on

Monte-Carlo Markovhains (MCMC)methods for their universalityand therelative ease of

their implementation.

MCMC methods suit indeed partiularly for this problem, asthey areknownto produe

samples ofvirtually anyposterior distribution. Two problems mayarisethen. Onone hand,

thedimensionof theproblemmaybe solarge thatthehainhasto be run foran intratable

numberofiterationsto onvergeand toahieve aneient sampling oftheposterior, wesay

that they have weak mixing properties. On the other hand, an evaluation of the forward

operator

F

an be very omputer demanding so that the pratitioner wishes to minimize thenumber ofiterations. Moreover, when the posterior hasseveral disonneted modesin a

high-dimensional spae, whih isoftenthease innonlinear Bayesian inversion, theproblem

of exploring the wholesupport of theposterior is a diult one. It an be shown thateven

for very simple problems most lassial Markov hain algorithms an fail at identifying the

main modesofthe posterior,beauseof their lakof mixing (see[7 ℄).

(4)

olletionof hains inparallel at dierent temperatures and allowing them to interat. This

method is not more omputer demanding than lassial MCMC sine it an be easily paral-

lelized.

Thispaperaimsat providing researhers andengineers withsome reipesto applyinter-

ating MCMC methods. Thus, itbegins in setion2 with basisfor MCMC methods, some

examplesoflassialalgorithmandearlierattemptstoimprovemixingpropertieslikeanneal-

ingand tempering tehniques, whihrely onthesame basi priniplesasinterating MCMC

tehniques, exposed in setion 3. In setion 4, we will show an appliation to a reservoir

engineering problem. The paperends withsomeonlusions andperspetiveof futurework.

2 Markov hains Monte-Carlo methods

MCMC,introduedbyMetropolisetal. [8℄,isapopularmethodfor generatingsamplesfrom

virtuallyanydistribution

π

dened on

( X , B ( X ))

,where

B ( X )

standsfortheBorelsetsof

X

.

In partiular there is no need for the normalizing onstant of

π

to be known and the spae

X ⊆ R

d (for some integer

d

) on whih itis dened an be high dimensional. We reall here some lassial results on MCMC methods. For a omprehensive review of MCMC, see [6℄,

hapters 6to 13. For a more detailedaount on Markovhainstheory,see [9 ℄.

2.1 Priniples

ThemethodonsistsinsimulatinganergodiMarkovhain

{ X

n

, n ≥ 0 }

on

X

withtransition

probability

P

suh that

π

isa stationary density for thishain, i.e.

∀ A ∈ B ( X )

:

Z

X

P (x, A)π(x)dx = π(A).

(2)

Suhsamples an be usede.g. to ompute integrals

π(h) = Z

X

h(x)π(x)dx,

(3)

estimating thisquantityby

S

n

(h) = 1 n

n

X

i=1

h(X

i

),

(4)

for some

h : X → R

. A very useful onept in onstruting ergodi Markov hains is re- versibility. A Markovhain isreversibleifit satisesthedetailed balane ondition:

P (x, dy)π(dx) = P (y, dx)π(dy).

(5)

Thismeansthat, ifstartedinstationarity, theMarkovhainhasthesame haneof starting

at

x

andjumping to

y

asstartingat

y

andjumping to

x

.

We illustrate the priniples of MCMC with the Metropolis-Hastings (MH) update. It

requiresthe hoie of a proposal distribution

q

. The role of

q

onsistsin proposing potential

transitions for the Markov hain. Given that the hain is urrently at

x

, a andidate

y

is

aepted withprobability

α(x, y)

dened as:

α(x, y) =

( min n

1,

π(y)π(x)q(y,x)q(x,y)

o

if

π(x)q(x, y) > 0,

1

otherwise.

(6)

(5)

Otherwise,itisrejeted and theMarkovhainstays at itsurrent loation

x

. Thetransition

kernel

P

ofthis Markov haintakesthe form, for

(x, A) ∈ X × B ( X )

:

P(x, A) =

Z

A

α(x, y)q(x, y)dy + 1

A

(x) Z

X

(1 − α(x, y))q(x, y)dy.

(7)

The Markov hain dened by

P

is reversible with respet to

π

and therefore admits

π

as

invariant distribution. Conditionsontheproposaldistribution

q

thatguaranteeirreduibility and positive reurreneareeasy to meet and manysatisfatoryhoies arepossible.

2.2 Some examples of Metropolis-Hastings samplers

The arbitrariness of the hoie of

q(x, · )

allows onsiderable freedom to design a multitude of dierent hains, eah with stationary distribution

π

, although in the Bayesian inversion

framework,

q

shouldrelyontheaprioridistribution. Someexamplesinlude(see[6℄,hapter 7,for more examples):

1. theindependentsampler(IMH):

q(x, y) = q(y)

,where

q

isgenerallythepriorinBayesian

inversion,

2. thesymmetri inrements random-walk sampler (SIMH):

q(x, y) = q( | y − x | )

, where

q

an be azero-mean versionof theprior,

3. theLangevin sampler(LMH): assuming that

π

is dierentiable on

X

,itallows to take

advantage ofthegradient information to givethesampling diretion,

q

takestheform:

q(x, y) ∼ N

x + h

2

2 ∇ log(π(x)), h

2

I

d

,

(8)

where

h

isa parameter to hooseaording to e.g. [10℄ or [11℄. Notethat abad hoie

of

h

an indue erratibehaviour ofthehain,

4. Theadaptive algorithm of [12 ℄ (ASIMH): Inthis algorithm,

y

isproposedaordingto

q

θn

(x, · ) = N (x, Γ

n

)

,where

θ = (µ, Γ)

. We also onsider a non-dereasing sequene of positive step sizes

{ γ

n

}

,suh that

P

n=1

γ

n

= ∞

and

P

n=1

γ

n1+δ

< ∞

for some

δ > 0

.

Inpratie,we generallyuse:

γ

n

= 1/n

,assuggestedin[12 ℄. Theparameter estimation

algorithm takes the following form:

µ

n+1

= µ

n

+ γ

n+1

(X

n+1

− µ

n

) , n ≥ 0,

Γ

n+1

= Γ

n

+ γ

n+1

(X

n+1

− µ

n

) (X

n+1

− µ

n

)

tr

− Γ

n

,

(9)

5. TheGibbssampler: Here

X = X

1

× . . . ×X

d,and

q = q

ileavesalloordinatesxedexept

the

i

thone,whihitproposesaordingtotheonditionaldistribution

(x

i

|{ x

j

}

j6=i

)

. This

implies that

α(x, y) = 1

for all

x

and

y

, sothere are no rejetions. If theresulting

i

th

omponentGibbssamplerisalled

P

i,thentheseomponentsanbeombinedtoyield

the random-san Gibbs sampler whih is the average

P

RS

=

1d

(P

1

+ . . . + P

d

)

, or the

deterministi-san Gibbs sampler whih isthe produt

P

DU

= P

1

· · · P

d.

2.3 Comments

One of the problems with Metropolis-Hastings algorithms is the abundane of hoie avail-

ablefor hoosingthe proposaldistribution

q(x, · )

. For instane even ifthetype ofalgorithm

(6)

appropriate for

π( · )

. Suh a problemis knownas a saling problem. To make this question

more onrete, onsider the following problem. Suppose that

q(x, · )

is distributed as thed- dimensional normaldistribution

N (x, σ

2

I

d

)

,for some

σ

2

> 0

. We reallthat theaeptane

probabilities forthis algorithm aregiven by(6). For verysmallvalues of

σ

2,smalljumpsare

attempted bythe algorithm, and beause of the form of (6 ), these moves arealmost always

aepted. The Markovhain mixes very slowly beause its inrements are sosmall. Onthe

other hand,if

σ

2 is hosento be very large, longdistane jumps areattempted bythe algo-

rithm,mostofwhiharerejeted. Thealgorithmthereforespendslongperiods oftimeinthe

same state,and thus the algorithm still onverges slowly. For thisproblem, "very large" and

"very small" have to be interpreted in a way related to the partiular form of

π

. It seems

reasonable that "moderate" values of

σ

2 should be preferred. However, it is diult to see

howtogureoutwhatvaluesare"moderate",espeiallyif

π

isveryompliated. InBayesian

inversionontext,therandomwalktypealgorithms, liketheSIMHortheLMH,generallyfail

at identifyingdierent modes. Inlarge dimensional spae, thesalingfator generallyhasto

be "small" so as to get an aeptable aeptane rate (6). Therefore, these two algorithms

perform generallyaloalexplorationandarehardlyabletojumpfromonemode toanother.

We will thenreferto them as"loal" samplers. Note thatthey arealsogenerally really slow

to onvergetowards thestationaryregime inBayesian inversionontext.

Conversely, the IMH does not need any tuning. It will explore largely thesurfae of the

posterior distribution andwe will referto itasa"global" sampler. Nevertheless, inpratial

appliations,unless

q

istheposteriordistribution,thetransitionswillobviouslyalmostalways be rejeted.

Finally,weannotieherethatthehaingeneratedbytheadaptivealgorithm isnolonger

homogeneous, butit anbeproved (see[12 ℄, [13℄and [14 ℄inamore general framework) that

it has the orret ergodi properties. The idea of adaptive sampling is to improve the pro-

posal eieny,makingitaslose aspossible to theposterior density. However, itshouldbe

stressed herethat thealgorithm presented above generallyfails inmulti-modal ontext for a

lownumber ofiterations (see e.g. [7 ℄). Regarding theGibbs sampler,itdoesnot seem to be

welladaptedtotheBayesian inversionproblem: theimportantnumberofallsoftheforward

modellimitsits relevany.

Due to the sequential nature of MCMC algorithm and to takle multi-modality problems,

MCMC pratitioners generally use several hains that they run in parallel. By simulating

several hains, variability and dependene on the initial value are redued and it should be

easierto ontrol onvergene to thestationarydistribution byomparing theestimation, us-

ing dierent hains, of quantities of interest. However, good performanes of these parallel

methods require a degree of a priori knowledge on the distribution of interest

π

, in order

to onstrut an initial distribution on

X

whih takesinto aount the featuresof

π

(modes,

shape of highdensity regions,et.). Thisis rarely theasein Bayesian inversion. Moreover,

in highly non-linear setups, like in Bayesian inversion, a slow mixing hain will presumably

stayinthe neighborhood ofthe starting point witha highprobability (see [6 ℄hapter 12 for

a morethorough disussion).

Duetothe omplexityoftheposteriordistribution(e.g. multi-modality and/ordisonneted

support) in Bayesian inversion problems and lassial limitations of MH algorithms, other

methods than lassial MHalgorithm shouldbe investigated. Simulated annealing andtem-

pering, whih arepresented inthe nextparagraph, onsists instudying modied versions of

theposterior.

(7)

The simulated annealing algorithm has been introdued by [8℄, then generalized by [15℄ for

optimization problems. It an be applied to both optimization and simulation problems (see

[6℄ and referene therein). The simulated tempering has been introdued independently in

[16 ℄ and[17℄.

Thefundamentalideaofthesealgorithmsisthatahangeofsale,namedtemperature,allows

larger moves on the surfae of the distribution to explore, ompared with lassial MCMC

methods. Indeed, this hange of sale allows to avoid thehainto remain trapped ina loal

mode.

The name and inspiration of the rst one ome from annealing in metallurgy, a tehnique

involving heating and ontrolled ooling of a material to inrease the size of its rystals

and redue their defets. The heat auses the atoms to beome unstuk from their initial

positions (a loal minimum of the internal energy) and wander randomly through states of

higher energy;theslowooling gives themmore hanes of ndingongurations withlower

internalenergythantheinitialone. Converselythetemperingisabrutaloolingfollowedbya

aontrolledreheatingoftheworkpieetoatemperaturebelowitslowerritialtemperature.

Preipitationhardening alloys,likemanygrades ofaluminumandsuperalloys,aretempered

to preipitateintermetalli partileswhihstrengthen themetal.

Thesetwo methods aim partiularly at generating samplesfrom Gibbsdistribution.

Denition 2.1 A

X

-valued random eld

X

, is a Gibbs eld of energy

E

, if its probability

densityfuntion (with respet to the Lebesgue measure) is:

f (x) = 1

Z e

−E(x)

, Z = Z

X

e

−E(x)

dx,

(10)

named Gibbs density.

For pratial problems, the onstant

Z

is generally intratable due to the dimension of

X

.

Notethatina wide variety ofinverse problems,theposteriordistribution (1)takestheform

(10 ); for instane when both prior and measurement error areassumed Gaussian. Note also

that simulated annealing and tempering are not onned to ope with Gibbs distributions.

We present here both algorithms inthis framework for sake of simpliity. For other kind of

targetdistributions

π

,thepratitionerhastoonsiderattenedversionsgiven by

π

T

= π

1/T.

2.4.1 Simulated annealing

Given a positive temperature

T

, a Markov hain

X

is generated from the following Gibbs

density:

π

T

(x) ∝ exp ( − E(x)/T ).

(11)

The simulated annealing is performed by gradually lowering the temperature

T

from a

highvaluetonear-zero. Closeto

T = 0

theGibbsdistribution approximatesadelta funtion attheglobal minimumfor

E(x)

(ifitisunique). Forsimulation purposes,theooling anbe

stopped at the value

T = 1

.

This algorithm an be viewed as a non-homogeneous version of the MH algorithm. In-

deed,sine

T

dereasesalongthealgorithm,thekernelofthehainvarieswithtime. Classial

theoretialresultsonMarkovhainsdoesnotapplyforthisalgorithm. Heuristirulesaregen-

erallyappliedtoensurethevalidityofsimulated annealing: thestartingtemperature mustbe

highenough andits dereaseslow enough. Aonvergene resultexists, foroptimization pur-

pose, withtheonditionthat

T

dereasesas

1/ log(n)

. Inpratie, ageometriallydereasing sequeneis generallyused.

(8)

Theprinipleofsimulatedtemperingislinkedtothesimulatedannealingoneinthesensethat

we will again onsider Gibbs distributions saled by a temperature parameter

T

. However,

this algorithm aims at sampling from a Gibbs distribution

π

rather than minimizing the

energy of the system. Weonsider herea nite sequeneof temperatures and theassoiated

Gibbs distributions. In this algorithm, we authorize the hain to hange temperature level

aordingtoagivenprobability. Thiswillallowthehaintogo baktohigher temperatures,

esapingeventual loalmodesof thetarget distribution, thatresults inbetter mixing.

Werstdene aninreasingsequeneoftemperatures

1 = T

0

< . . . < T

K,withitsassoiated

Gibbs densities

π

i

(x) ∝ e

E(x)

Ti , an auxiliary

{ 0, . . . , K }

-valued variable

M

, and the joint

distribution:

µ(x, m) = ρ

m

π(x),

K

X

i=0

ρ

m

= 1.

(12)

We also dene the probabilities

p

U and

p

D of moving "up", from

m

to

m + 1

,and "down", from

m

to

m − 1

,withonly

T

hanging, the hainbeing at temperature

T

m,and theproba-

bilityof hoosing a xedlevelmove

(1 − p

U

− p

D

)

.

The priniple isto simulate a

X × { 0, . . . , K }

-valuedhain

(X

n

, M

n

)

. Denoting respetively

q

i→i+1

(X

n+1

| X

n

= x

n

)

and

q

i+1→i

(X

n+1

| X

n

= x

n

)

the probabilities of transition proposi- tion towards the superior and theinferior temperature level, theaeptaneprobability of a

transitionfrom

T

i to

T

i+1 isproportional to:

ρ

i→i+1

(x

n

, x

n+1

) = p

D

p

U

π

i+1

π

i

q

i→i+1

(X

n+1

= x

n+1

| X

n

= x

n

)

q

i+1→i

(X

n+1

= x

n+1

| X

n

= x

n

) .

(13)

To maintain the detailed balane ondition, it is then neessary that

ρ

i+1→i

(x

n+1

, x

n

) = 1/ρ

i→i+1

(x

n

, x

n+1

)

and to hoose the proposition distributions

q

i→i+1

(X

n+1

| X

n

= x

n

)

and

q

i+1→i

(X

n+1

| X

n

= x

n

)

aordingly. Still, it isimportant to notie that (13)dependson the

normalization onstants of

π

i+1 and

π

i. We an bypass this diultybydesigning an equal

numberofmovesfrom

m

to

m + 1

andfrom

m + 1

to

m

andbyaeptingtheentire sequene

asa single proposal, thus aneling the normalizing onstants intheaeptane probability,

asdesribed e.g. in[18℄.

2.4.3 Comments

Attempts to improve mixing properties of the hain by simulated annealing fail generally

beause ofthe monotonousdereaseintemperature; ifthehaingetsinaloalmode,itmay

be impossibleto esapeit ifthetemperature is alreadytoolow.

Conerning the simulated tempering, the potential gain in a better exploration of the

supportofthetargetdistribution, soastosay,abettermixing,doesnot seemtoompensate

fortheinreasedamountofforward operatorevaluationsforinverseproblems(

2K

bigger,for

theshemepresentedabove). However,thepresentationofthismethodisagoodintrodution

to theinterating Markovhainsalgorithms exposedinthenext setion.

3 Parallel interating Markov hains

Thepriniple ofmakinginterat MarkovChainsrstappearsin[19 ℄ underthenameparallel

tempering (PT). It has been mostly applied in physio-hemial simulations, see [20 ℄ and

referenes therein. It is known in the literature under dierent names suh as: exhange

Références

Documents relatifs

We propose a tractable inverse regression approach which has the advantage to produce full probability distributions as approximations of the target posterior distributions. In

We propose an algorithm to invert the following six petrophys- ical formation properties: the pore water conductivity (alterna- tively, connate water salinity), the formation

Key-words: Markov hain Monte Carlo method, Metropolis-Hastings, interating hains,..

In this paper, we start by including all these models into a com- mon probabilistic model called the Pairwise Markov Model (PMM) [13]. This model has been introduced in a

Moreover, compared with the latter, the Bayesian inversion has the advantage of providing not only the map of the contrast of the sought object, but also its segmentation

Pour le cas des perturbations inconnues à variations significatives a ffectant aussi bien le système que les mesures, nous nous sommes intéressés dans la dernière partie de la thèse

The category of ‘typed’ kernels on which these functors are defined allows to state elegantly the famous disintegration theorem and its generalisation, Bayesian inversion.. What’s

In recent studies, sev- eral thousand proteins could be identified from eukaryotic cells following sample fractionation (1, 17–19). This upstream separation step is often performed