• Aucun résultat trouvé

The Variational Formulation of the Fokker-Planck Equation

N/A
N/A
Protected

Academic year: 2022

Partager "The Variational Formulation of the Fokker-Planck Equation"

Copied!
26
0
0

Texte intégral

(1)

The Variational Formulation of the Fokker-Planck Equation

Richard Jordan

David Kinderlehrer

y

Center for Nonlinear Analysis

Carnegie Mellon University Felix Otto

z

Center for Nonlinear Analysis Carnegie Mellon University

and

Department of Applied Mathematics University of Bonn

February 4, 1999

Present address: Department of Mathematics, University of Michigan, jordan@math.lsa.umich.edu

ydavidk@andrew.cmu.edu

zPresent address: Courant Institute of Mathematical Sciences, ottof@cims.nyu.edu

1

(2)

Dedication

Dedicated to the memory of Richard Dun.

Abstract

The Fokker{Planck equation, or forward Kolmogorov equation, describes the evolu- tion of the probability density for a stochastic process associated with an Ito stochastic dierential equation. It pertains to a wide variety of time{dependent systems in which randomness plays a role. In this paper, we are concerned with Fokker{Planck equations for which the drift term is given by the gradient of a potential. For a broad class of potentials, we construct a time{discrete, iterative variational scheme whose solutions converge to the solution of the Fokker{Planck equation. The major novelty of this iterative scheme is that the time step is governed by the Wasserstein metric on prob- ability measures. This formulation enables us to reveal an appealing, and previously unexplored, relationship between the Fokker{Planck equation and the associated free energy functional. Namely, we demonstrate that the dynamics may be regarded as a gradient ux, or a steepest descent, for the free energy with respect to the Wasserstein metric.

Keywords: Fokker-Planck equation, steepest descent, free energy, Wasserstein metric.

AMS subject classications: 35A15, 35K15, 35Q99, 60J60.

1 Introduction and overview

The Fokker-Planck equation plays a central role in statistical physics and in the study of uctuations in physical and biological systems [?, ?, ?]. It is intimately connected with the theory of stochastic dierential equations: A (normalized) solution to a given Fokker-Planck equation represents the probability density for the position (or velocity) of a particle whose

2

(3)

motion is described by a corresponding Ito stochastic dierential equation (or Langevin equation). We shall restrict our attention in this paper to the case where the drift coecient is a gradient. The simplest relevant physical setting is that of a particle undergoing diusion in a potential eld [?].

It is known that, under certain conditions on the drift and diusion coecients, the stationary solution of a Fokker{Planck equation of the type that we consider here satises a variational principle. It minimizes a certain convex free energy functional over an appropriate admissible class of probability densities [?]. This free energy functional satises an H{

theorem: It decreases in time for any solution of the Fokker-Planck equation [?]. In this work, we shall establish a deeper, and apparently previously unexplored, connection between the free energy functional and the Fokker{Planck dynamics. Specically, we shall demonstrate that the solution of the Fokker{Planck equation follows, at each instant in time, the direction of steepest descent of the associated free energy functional.

The notion of a steepest descent, or a gradient ux, makes sense only in context with an appropriate metric. We shall show that the required metric in the case of the Fokker{Planck equation is the Wasserstein metric (dened in Section 3) on probability densities. As far as we know, the Wasserstein metric cannot be written as an induced metric for a metric tensor (the space of probability measures with the Wasserstein metric is not a Riemannian manifold). Thus, in order to give meaning to the assertion that the Fokker{Planck equation may be regarded as a steepest descent, or gradient ux, of the free energy functional with respect to this metric, we switch to a discrete time formulation. We develop a discrete, iterative variational scheme whose solutions converge, in a sense to be made precise below, to the solution of the Fokker-Planck equation. The time-step in this iterative scheme is associated with the Wasserstein metric. For a dierent view on the use of implicit schemes for measures, see [?,?].

For the purpose of comparison, let us consider the classical diusion (or heat) equation

@(t;x)

@t

= (t;x); t2(0;1); x2IRn;

which is the Fokker{Planck equation associated with a standard n{dimensional Brownian 3

(4)

motion. It is well{known (see, for example, [?, ?]) that this equation is the gradient ux of the Dirichlet integral 12RIRnjrj2 dx with respect to the L2(IRn) metric. The classical discretization is given by the scheme

Determine(k) that minimizes

1

2 k

(k?1)

?k 2

L 2

(IR n

) + h2 Z

IR n

jrj 2

dx; 9

>

>

>

=

>

>

>

;

over an appropriate class of densities . Here, h is the time step size. On the other hand, we derive as a special case of our results below that the scheme

Determine(k) that minimizes

1

2

d((k?1);)2 + h Z

I R

n

logdx over all2K;

9

>

>

>

>

>

>

>

=

>

>

>

>

>

>

>

;

(1)

whereK is the set of all probability densities onIRn having nite second moments, is also a discretization of the diusion equation when d is the Wasserstein metric. In particular, this allows us to regard the diusion equation as a steepest descent of the functionalRIRnlogdx with respect to the Wasserstein metric. This reveals a novel link between the diusion equation and the Gibbs{Boltzmann entropy (?RIRnlog dx) of the density. Furthermore, this formulation allows us to attach a precise interpretation to the conventional notion that diusion arises from the tendency of the system to maximize entropy.

The connection between the Wasserstein metric and dynamical problems involving dissi- pation or diusion (such as strongly overdamped uid ow or nonlinear diusion equations) seems to have rst been recognized by Otto in [?]. The results in [?] together with our re- cent research on variational principles of entropy and free energy type for measures [?,?, ?], provide the impetus for the present investigation. The work in [?] was motivated by the desire to model and characterize metastability and hysteresis in physical systems. We plan to explore in subsequent research the relevance of the developments in the present paper to the study of such phenomena. Some preliminary results in this direction may be found in [?, ?].

4

(5)

The paper is organized as follows. In Section (2), we rst introduce the Fokker{Planck equation and briey discuss its relationship to stochastic dierential equations. We then give the precise form of the associated stationary solution and of the free energy functional that this density minimizes. In Section (3), the Wasserstein metric is dened, and a brief review of its properties and interpretations is given. The iterative variational scheme is formulated in Section (4), and the existence and uniqueness of its solutions are established. The main result of this paper { namely the convergence of solutions of this scheme (after interpolation) to the solution of the Fokker-Planck equation { is the topic of Section (5). There, we state and prove the relevant convergence theorem.

2 The Fokker{Planck equation, stationary solutions, and the free energy functional

We are concerned with Fokker-Planck equations having the form

@

@t

= div(r (x)) +?1; (x;0) = 0(x); (2) where the potential (x) : IRn ! [0;1) is a smooth function, > 0 is a given constant, and 0(x) is a probability density on IRn. The solution(t;x) of (??) must, therefore, be a probability density onIRnfor almost every xed timet. That is, (t;x)0 for almost every (t;x)2(0;1)IRn, and RIRn(t;x)dx= 1 for almost every t 2(0;1).

It is well known that the Fokker-Planck equation (??) is inherently related to the Ito stochastic dierential equation [?, ?, ?]

dX(t) =?r (X(t))dt+q2?1 dW(t); X(0) =X0: (3) Here,W(t) is a standardn{dimensional Wiener process, andX0is ann{dimensional random vector with probability density 0. Equation (??) is a model for the motion of a particle undergoing diusion in the potential eld . X(t) 2 IRn then represents the position of the particle, and the positive parameter is proportional to the inverse temperature. This

5

(6)

stochastic dierential equation arises, for example, as the Smoluchowski{Kramers approxi- mation to the Langevin equation for the motion of a chemically bound particle [?,?, ?]. In that case, the function describes the chemical bonding forces, and the termp2?1dW(t) represents white noise forces resulting from molecular collisions [?]. The solution (t;x) of the Fokker-Planck equation (??) furnishes the probability density at time t for nding the particle at position x.

If the potential satises appropriate growth conditions, then there is a unique stationary solutions(x) of the Fokker{Planck equation, and it takes the form of the Gibbs distribution [?, ?]

s(x) =Z?1exp(? (x)); (4) where the partition function Z is given by the expression

Z =Z

I R

nexp(? (x))dx:

Note that, in order for equation (??) to make sense, must grow rapidly enough to ensure that Z is nite. The probability measure s(x)dx, when it exists, is the unique invariant measure for the Markov process X(t) dened by the stochastic dierential equation (??).

It is readily veried (see, for example, [?]) that the Gibbs distribution s satises a variational principle { it minimizes over all probability densities on IRn the free energy functional

F() =E() +?1S(); (5) where

E() :=Z

I R

n

dx (6)

plays the role of an energy functional, and

S() :=Z

IR n

logdx (7)

is the negative of Gibbs-Boltzmann entropy functional.

Even when the Gibbs measure is not dened, the free energy (??) of a density (t;x) satisfying the Fokker{Planck equation (??) may be dened, provided that F(0) is nite.

6

(7)

This free energy functional then serves as a Lyapunov function for the Fokker-Planck equa- tion: If (t;x) satises (??), then F((t;x)) can only decrease with time [?, ?]. Thus, the free energy functional is an H{function for the dynamics. The developments that follow will enable us to regard the Fokker-Planck dynamics as a gradient ux, or a steepest descent, of the free energy with respect to a particular metric on an appropriate class of probability measures. The requisite metric is the Wasserstein metric on the set of probability measures having nite second moments. We now proceed to dene this metric.

3 The Wasserstein metric

The Wasserstein distance of order two, d(1;2), between two (Borel) probability measures

1 and 2 onIRn is dened by the formula

d(1;2)2 = inf

p2P(

1

;

2 )

Z

IR n

IR n

jx?yj 2

p(dxdy); (8) whereP(1;2) is the set of all probability measures onIRnIRnwith rst marginal1 and second marginal 2, and the symbol jj denotes the usual Euclidean norm on IRn. More precisely, a probability measure pis inP(1;2) if and only if for each Borel subset AIRn there holds

p(AIRn) =1(A) ; p(IRnA) =2(A):

Wasserstein distances of order q with q dierent from 2 may be analogously dened [?].

Since no confusion should arise in doing so, we shall refer to d in the sequel as simply the Wasserstein distance.

It is well known that d denes a metric on the set of probability measures on Rn having nite second moments: RIRnjxj2(dx)<1[?,?]. In particular,dsatises the triangle inequality on this set. That is, if 1;2, and 3 are probability measures onIRn with nite second moments, then

d(1;3)d(1;2) +d(2;3): (9) We shall make use of this property at several points later on.

7

(8)

We note that the Wasserstein metric may be equivalently dened by [?]

d(1;2)2 = infEjX?Yj2; (10) where E(U) denotes the expectation of the random variable U, and the inmum is taken over all random variablesX and Y such thatX has distribution 1 and Y has distribution

2. In other words, the inmum is over all possible couplings of the random variables

X and Y. Convergence in the metric d is equivalent to the usual weak convergence plus convergence of second moments. This latter assertion may be demonstrated by appealing to the representation (??) and applying the well{known Skorohod theorem from probability theory (see Theorem 29.6 of [?]). We omit the details.

The variational problem (??) is an example of a Monge{Kantorovich mass transference problem with the particular cost functionc(x;y) = jx?yj2 [?]. In that context, an inmizer

p

2 P(1;2) is referred to as an optimal transference plan. When 1 and 2 have nite second moments, the existence of such ap for (??) is readily veried by a simple adaptation of our arguments in Section 4. For a probabilistic proof that the inmum in (??) is attained when1 and2 have nite second moments, see [?]. Brenier [?] has established the existence of a one{to{one optimal transference plan in the case that the measures 1 and 2 have bounded support and are absolutely continuous with respect to Lebesgue measure. Caarelli [?] and Gangbo and McCann [?,?] have recently extended Brenier's results to more general cost functions cand to cases in which the measures do not have bounded support.

If the measures1and2are absolutely continuous with respect to the Lebesgue measure, with densities 1 and 2, respectively, we will write P(1;2) for the set of probability measures having rst marginal1 and second marginal2. Correspondingly, we will denote by d(1;2) the Wasserstein distance between 1 and 2. This is the situation that we will be concerned with in the sequel.

8

(9)

4 The discrete scheme

We shall now construct a time{discrete scheme that is designed to converge in an appropriate sense (to be made precise in the next section) to a solution of the Fokker-Planck equation.

The scheme that we shall describe was motivated by a similar scheme developed by Otto in an investigation of pattern formation in magnetic uids [?]. We shall make the following assumptions concerning the potential introduced in Section (2):

2C

1(IRn);

(x) 0 for allx2IRn; (11)

jr (x)j C( (x) + 1) for allx2IRn ; (12) for some constant C < 1. Notice that our assumptions on allow for cases in which

R

I R

nexp(? ) dx is not dened, so that the stationary density s given by (??) does not exist. These assumptions allow us to treat a wide class of Fokker{Planck equations. In particular, the classical diusion equation @@t =?1, for which const., falls into this category. We also introduce the set K of admissible probability densities:

K := n:IRn![0;1) measurable Z

I R

n

(x)dx= 1;M()<1o; where

M() =Z

IR n

jxj 2

(x)dx:

With these conventions in hand, we now formulate the iterative discrete scheme:

Determine(k) that minimizes

1

2

d((k?1);)2 + hF() over all 2K:

9

>

>

>

>

>

>

>

=

>

>

>

>

>

>

>

;

(13)

Here we use the notation (0) = 0. The scheme (??) is the obvious generalization of the scheme (??) set forth in the Introduction for the diusion equation. We shall now establish existence and uniqueness of the solution to (??).

9

(10)

Proposition. Given 0 2K, there exists a unique solution of the scheme (??).

Proof

Let us rst demonstrate that S is well{dened as a functional on K with values in (?1;+1] and that, in addition, there exist <1 and C < 1 depending only on n such that

S() ?C (M() + 1) for all2K: (14) Actually, we shall show that (??) is valid for any 2(n+2n ;1). For future reference, we prove a somewhat ner estimate. Namely, we demonstrate that there exists a C <1, depending only on n and , such that for allR 0, and for each 2K, there holds

Z

IR n

?B

R

jminf log;0gjdx C 1

R 2+ 1

(2+n)?n

2 (M() + 1); (15) whereBR denotes the ball of radiusR centered at the origin inIRn. Indeed, for <1 there holds

jminfz logz;0gj Cz for allz 0: Hence by Holder's inequality, we obtain

Z

I R

n

?B

R

jminf log;0gjdx

C

Z

IR n

?B

R

dx

C

0

@ Z

IR n

?B

R

1

jxj 2+ 1

!

1?

dx 1

A 1?

(M() + 1) : On the other hand, for 1? > n2, we have

Z

I R

n

?B

R

1

jxj 2+ 1

!

1?

dx C

1

R 2+ 1

1?

? n

2

:

Let us now prove that for given (k?1) 2 K, there exists a minimizer 2 K of the functional

K 3 7!

1

2

d((k?1);)2 + hF(): (16) 10

(11)

Observe that S is not bounded below on K and hence F is not bounded below onK either.

Nevertheless, using the inequality

M(1) 2M(0) + 2d(0;1)2 for all0;1 2K (17) (which immediately follows from the inequalityjyj2 2jxj2+2jx?yj2 and from the denition of d) together with (??) we obtain

1

2

d((k?1);)2 + hF()

(??)

1

4

M() ? 12M((k?1)) + hS() (18)

(??)

1

4

M() ? C (M() + 1) ? 12M((k?1)) for all2K;

which ensures that (??) is bounded below. Now, let fgbe a minimizing sequence for (??).

Obviously, we have that

fS()g is bounded above; (19)

and according to (??),

fM()g is bounded: (20)

The latter result, together with (??) implies that

Z

IR n

jminf log;0gjdx

is bounded; which combined with (??) yields that

Z

IR

nmaxf log;0gdx

is bounded:

As z 7!maxfzlogz;0g;z 2 [0;1), has superlinear growth, this result, in conjunction with (??) guarantees the existence of a(k) 2K such that (at least for a subsequence)

w

*

(k) inL1(IRn): (21) Let us now show that

S((k)) liminf

"1

S(): (22)

11

(12)

As [0;1)3z 7! zlogz is convex and [0;1)3 z 7!maxfzlogz;0g is convex and nonnega- tive, (??) implies that for any R <1,

Z

B

R

(k) log(k) dx liminf

"1 Z

B

R

log dx; (23)

Z

IR n

?B

R

maxf(k)log(k);0gdx liminf

"1 Z

IR n

?B

R

maxflog;0gdx: (24) On the other hand we have according to (??) and (??)

lim

R"1

sup

2IN Z

I R

n

?B

R

jminf log;0gjdx = 0: (25) Now observe that for anyR <1, there holds

S((k)) Z

B

R

(k) log(k) dx + Z

IR n

?B

R

maxf(k) log(k);0gdx: which together with (??),(??) and (??) yields (??).

It remains for us to show that

E((k)) liminf

"1

E(); (26)

d((k?1);(k))2 liminf

"1

d((k?1);)2: (27) Equation (??) follows immediately from (??) and Fatou's Lemma. As for (??), we choose

p

2P((k?1);) satisfying

Z

I R

n

IR n

jx?yj 2

p

(dxdy) d((k?1);)2 + 1

By (??) the sequence of probability measures fdxg"1 is tight, or relatively compact with respect to the usual weak convergence in the space of probability measures on IRn (i.e., convergence tested against bounded continuous functions) [?]. This, together with the fact that the density (k?1) has nite second moment, guarantees that the sequence

fp

g

"1 of probability measures on IRn IRn is tight. Hence, there is a subsequence of

fp

g

"1 that converges weakly to some probability measure p. From (??) we deduce that

p2P((k?1);(k)). We now could invoke the Skorohod Theorem [?] and Fatou's Lemma to 12

(13)

infer (??) from this weak convergence, but we prefer here to give a more analytic proof. For

R<1 let us select a continuous function R:IRn ![0;1] such that

R = 1 inside ofBR and R = 0 outside ofB2R: We then have Z

IR n

IR n

R(x)R(y)jx?yj2p(dxdy)

= lim

"1 Z

I R

n

IR n

R(x)R(y)jx?yj2p(dxdy)

liminf

"1

d((k?1);)2;

9

>

>

>

>

>

>

>

>

=

>

>

>

>

>

>

>

>

;

(28) for each xed R < 1. On the other hand, using the monotone convergence theorem, we deduce that

d((k?1);(k))2 Z

I R

n

IR n

jx?yj 2

p(dxdy)

= lim

R"1 Z

IR n

IR n

R(x)R(y)jx?yj2p(dxdy); which combined with (??) yields (??).

To conclude the proof of the proposition we establish that the functional (??) has at most one minimizer. This follows from the convexity of K and the strict convexity of (??).

The strict convexity of (??) follows from the strict convexity of S, the linearity of E, and the (obvious) convexity over K of the functional 7!d((k?1);)2. 2

Remark. One of the referees has communicated to us the following simple estimate that could be used in place of (14){(15) in the previous and subsequent analysis: For any IRn (in particular, for =IRn?BR) and for all2K there holds

Z

jminflog;0gjdxCZ

e?jxj2 dx+M() + 14Zdx; (29) for any > 0. To obtain the inequality (??), select C > 0 such that for all z 2 [0;1], we have zjlogzj Cpz. Then, dening the sets 0 = \f exp(?jxj)g and 1 = \fexp(?jxj)<1g, we have

Z

jminflog;0gjdx = Z

0

j(log)?jdx+Z

1

j(log)?jdx

C

Z

e?jxj2 dx+Z

jxjdx:

13

(14)

The desired result (??) then follows from the inequality jxjjxj2+ 1=(4);for >0.

5 Convergence to the solution of the Fokker-Planck equation

We come now to our main result. We shall demonstrate that an appropriate interpolation of the solution to the scheme (??) converges to the unique solution of the Fokker-Planck equation. Specically, the convergence result that we will prove here is:

Theorem. Let 0 2 K satisfy F(0) < 1, and for given h > 0, let f(k)h gk2IN be the solution of (??). Dene the interpolation h:(0;1)IRn ![0;1) by

h(t) = (k)h for t2[kh;(k+1)h) and k 2IN [f0g Then as h#0,

h(t) * (t) weakly inL1(IRn) for all t2(0;1); (30) where 2C1((0;1)IRn) is the unique solution of

@

@t

= div(r ) + ?1; (31)

with initial condition

(t) ! 0 strongly in L1(IRn) for t#0 (32) and

M(); E() 2 L1((0;T)) for all T <1: (33) Remark. A ner analysis reveals that

h

! strongly inL1((0;T)IRn) for all T <1: Proof of the theorem

14

(15)

The proof basically follows along the lines of [?, Proposition 2, Theorem 3]. The crucial step is to recognize that the rst variation of (??) with respect to the independent variables indeed yields a time{discrete scheme for (??), as will now be demonstrated. For notational convenience only, we shall set 1 from here on in. As will be evident from the ensuing arguments, our proof works for any positive . In fact, it is not dicult to see that, with appropriate modications to the scheme (??), we can establish an analogous convergence result for time{dependent.

Let a smooth vector eld with bounded support, 2C01(IRn;IRn), be given, and dene the corresponding uxfg2IR, by

@

= for all 2IR and 0 = id:

For any 2 IR, let the measure (y)dy be the push forward of (k)(y)dy under . This means that

Z

IR n

(y)(y)dy = Z

I R

n

(k)(y)((y))dy for all 2C00(IRn): (34) As is invertible, (??) is equivalent to the following relation for the densities:

detr = (k): (35)

By (??), we have for each >0

1

1

2

d((k?1);)2+hF()?12d((k?1);(k))2 +hF((k)) 0; (36) which we now investigate in the limit # 0. Because is nonnegative, equation (??) also holds for = , i.e.,

Z

IR n

(y) (y)dy = Z

IR n

(k)(y) ((y))dy: This yields

1

E()?E((k)) = Z

I R

n 1

( ((y))? (y))(k)(y)dy: 15

(16)

Observe that the dierence quotient under the integral converges uniformly to r (y)(y), hence implying that

d

d [E()]=0 = Z

IR n

r (y)(y)(k)(y)dy: (37) Next, we calculate dd [S()]=0. Invoking an appropriate approximation argument (for instance approximating log by some function that is bounded below), we obtain

Z

I R

n

(y) log((y))dy

(??)= Z

I R

n

(k)(y) log(((y)))dy

(??)= Z

I R

n

(k)(y) log( (k)(y)

detr(y))dy: Therefore, we have

1

S()?S((k)) = ?Z

I R

n

(k)(y)1 log(detr(y))dy: Now using

d

d[detr(y)]=0 = div(y);

together with the fact that 0 = id, we see that the dierence quotient under the integral converges uniformly to div, hence implying that

d

d [S()]=0 = ?Z

I R

n

(k)divdy: (38)

Now, let p be optimal in the denition of d((k?1);(k))2 (see Section 3). The formula

Z

IR n

IR n

(x;y)p(dxdy) =Z

IR n

IR n

(x;(y))p(dxdy); 2C00(IRn IRn) then denes a p 2P((k?1);). Consequently, there holds

1

1

2

d((k?1);)2 ? 12d((k?1);(k))2

Z

I R

n

IR n

1

1

2

j(y)?xj2 ? 12jy?xj2p(dxdy); 16

Références

Documents relatifs

After train- ing a supervised deep learning algorithm to pre- dict attachments on the STAC annotated corpus 1 , we then constructed a weakly supervised learning system in which we

En comparant les efforts de coupe enregistrés lors de l’usinage suivant la direction parallèle aux fibres dans les deux matériaux bio-composites (Figure 55), il est clairement

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

This statistical estimator is original as it involves the resolution of a fixed point equation, and a classical deconvolution by a Cauchy distribution.. This is due to the fact that,

We prove the existence and uniqueness of a positive nor- malized equilibrium (in the case of a general force) and establish some exponential rate of convergence to the equilibrium

Keywords: general stochastic hybrid systems, Markov models, continuous-time Markov processes, jump processes, Fokker-Planck-Kolmogorov equation, generalized Fokker-Planck

– Le taux horaire de l’allocation d’activité partielle versée à l’employeur correspond, pour chaque salarié autorisé à être placé en activité partielle, à un

We also use the Poincar´ e-type inequality to give an elementary proof of the exponential convergence to equilibrium for solutions of the kinetic Fokker-Planck equation which