Monte Carlo methods in molecular dynamics.
Tony Leli `evre
CERMICS, Ecole Nationale des Ponts et Chauss ´ees, France, MicMac team, INRIA, France.
http://cermics.enpc.fr/∼lelievre
T. Leli `evre, ECODOQUI, November 2008 – p. 1
Outline
1 Free energy and metastability, 2 Constrained dynamics,
2.1 Thermodynamic integration, 2.2 Constrained SDEs,
2.3 Non-equilibrium dynamics, 3 Adaptive methods,
3.2 Adaptive methods: algorithms, 3.2 Adaptive methods: convergence, 3.3 Multiple replicas implementations, 3.4 Application to Bayesian statistics.
1 Free energy and metastability
We consider a molecular system with N particles with position (x1, ...,xN) = x ∈ R3N interacting through the potential V (x1, ..., xN).
In the NVT ensemble, one wants to sample the Boltzmann-Gibbs probability measure:
dµ(x) = Z−1 exp(−βV (x)) dx,
where Z = R
exp(−βV (x)) dx is the partition function and β = (kBT)−1 is proportional to the inverse of the temperature.
Aim: compute “macroscopic quantities” like the likelihood of molecular conformations, reaction rates, ...
T. Leli `evre, ECODOQUI, November 2008 – p. 3
1 Free energy and metastability
Typically, V is a sum of potentials modelling interaction between two particles, three particles and four
particles:
V = X
i<j
V1(xi, xj) + X
i<j<k
V2(xi, xj, xk) + X
i<j<k<l
V3(xi, xj, xk, xl).
For example, V1(xi, xj) = VLJ(|xi − xj|) where
VLJ(r) = 4
σ r
12
− σr 6
is the Lennard-Jones potential.
1 Free energy and metastability
Examples of quantities of interest:
• specific heat
C ∝ hV 2iµ − hV i2µ
• pressure
P ∝ −hq · ∇V (q)iµ
T. Leli `evre, ECODOQUI, November 2008 – p. 5
1 Free energy and metastability
Since this is a high-dimensional problem (N ≫ 1) Monte Carlo methods are used, typically based on Markov chains.
For example, to sample µ, one can use Xt solution to the Stochastic Differential Equation (SDE):
(GD) dXt = −∇V (Xt) dt + p
2β−1dW t.
(gradient or over-damped Langevin dynamics).
Under suitable assumption, we have the ergodic property: for µ-a.e. X0,
Tlim→∞
1 T
Z T
0
φ(Xt)dt = Z
φ(x)dµ(x).
1 Free energy and metastability
Probabilistic insert (1): discretization of SDEs.
The discretization of (GD) by the Euler scheme is (for a fixed timestep ∆t):
Xn+1 = Xn − ∇V (Xn) ∆t + p
2β−1∆tGn
where (Gin)1≤i≤3N,n≥0 are i.i.d. random variables with law N (0, 1). Indeed,
(W(n+1)∆t − Wn∆t)n≥0 =L √
∆t(Gn)n≥0.
In practice, a sequence of i.i.d. random variables with law N (0, 1) may be obtained from a sequence of i.i.d.
random variables with law U((0, 1)) (given by the rand() function on computers).
T. Leli `evre, ECODOQUI, November 2008 – p. 7
1 Free energy and metastability
Proof (invariant measure): One needs to show that if the law of X0 is µ, then the law of Xt is also µ. Let us denote Xxt the solution to (GD) such that X0 = x. Let us consider the function u(t, x) solution to:
( ∂tu(t, x) = −∇V (x) · ∇u(t, x) + β−1∆u(t, x), u(0, x) = φ(x)(+ assumptions on decay at infinity),
then, u(t, x) = E(φ(Xxt )). Thus, the measure µ is invariant:
d dt
Z
E(φ(Xxt ))dµ(x) = Z−1 Z
∂tu(t, x) exp(−βV (x))dx
= Z−1 Z
−∇V · ∇u + β−1∆u
exp(−βV )= 0.
Therefore,
Z
E(φ(Xxt ))dµ(x) = Z
φ(x)dµ(x).
1 Free energy and metastability
Probabilistic insert (2): Feynman-Kac formula.
Why u(t, x) = E(φ(Xxt )) ? For 0 < s < t, we have (characteristic method):
du(t − s, Xxs ) = −∂tu(t − s, Xxs ) ds + ∇u(t − s, Xxs ) · dXxs +β−1∆u(t − s, Xxs ) ds,
=
− ∂tu(t − s, Xxs ) − ∇V (Xxs ) · ∇u(t − s, Xxs ) + β−1∆u(t − s, Xxs )
ds + p
2β−1∇u(t − s, Xxs ) · dW s.
Thus, integrating over s ∈ (0, t) and taking the expectation:
E(u(0, Xxt )) − E(u(t, Xx0 )) = p
2β−1E
Z t
0 ∇u(t − s, Xxs ) · dWs
= 0.
T. Leli `evre, ECODOQUI, November 2008 – p. 9
1 Free energy and metastability
Probabilistic insert (3): It ˆo’s calculus. (in 1d.)
Where does the term ∆u come from ? Starting from the discretization:
Xn+1 = Xn − V ′(Xn) ∆t + p
2β−1∆tGn,
we have (for a time-independent function u):
u(Xn+1) = u
Xn − V ′(Xn) ∆t + p
2β−1∆tGn ,
= u(Xn) − u′(Xn)V ′(Xn) ∆t + p
2β−1∆tu′(Xn)Gn +β−1(Gn)2u′′(Xn)∆t + o(∆t).
Thus, summing over n ∈ [0...t/∆t] and taking the limit ∆t → 0,
u(Xt) = u(X0) −
Z t 0
V ′(Xs)u′(Xs) ds + p
2β−1
Z t 0
u′(Xs)dWs
+β−1
Z t
u′′(Xs) ds. T. Leli `evre, ECODOQUI, November 2008 – p. 10
1 Free energy and metastability
In practice, (GD) is discretized in time, and Cesaro means are computed: limNT→∞ N1
T
PNT
n=1 φ(Xn). Remark: Practitioners do not use over-damped
Langevin dynamics but rather Langevin dynamics:
( dXt = M−1P t dt,
dPt = −∇V (Xt) dt − γM−1P t dt + p
2γβ−1dW t,
where M is the mass tensor and γ is the friction coefficient. In the following, we only consider
over-damped Langevin dynamics.
T. Leli `evre, ECODOQUI, November 2008 – p. 11
1 Free energy and metastability
We therefore have a method to compute (an
approximation of) R φ(x)dµ(x), using Xt. But, in practice, Xt is a metastable process, so that the convergence of the ergodic limit is very slow.
A bi-dimensional example: Xt1 is a slow variable of the system.
x1 x2
V (x1, x2)
Xt1
t
1 Free energy and metastability
A more realistic example : (Dellago, Geissler): Influence of the solvation on a dimer conformation. The
interaction potentials are (J.D. Weeks, D. Chandler et H.C.
Andersen):
VWCA(r) =
( 4ǫ h
σ r
12
− σr 6i
+ ǫ if r ≤ r0, 0 if r > r0,
VS(r) = h
1 − (r − r0 − w)2 w2
2 ,
where ǫ, σ and w are positive constants and r0 = 21/6σ.
VS is a double-well potential.
T. Leli `evre, ECODOQUI, November 2008 – p. 13
1 Free energy and metastability
00000000 00000000 00000000 0000
11111111 11111111 11111111 1111
000000 000000 000000 111111 111111 111111
00000000 00000000 00000000 0000
11111111 11111111 11111111 1111 00000000 00000000 00000000 0000
11111111 11111111 11111111 1111
00000000 00000000 00000000 11111111 11111111 11111111
000000 000000 000000 000
111111 111111 111111 111
00000000 00000000 00000000 11111111 11111111 11111111
000000 000000 000000 111111 111111
111111 000000000000000000000000 11111111 11111111 11111111 000000 000000 000000 111111 111111 111111 000000 000000 000000 111111 111111 111111
000000 000000 000000 111111 111111 111111
00000000 00000000 00000000 0000
11111111 11111111 11111111 1111
000000 000000 000000 111111 111111 111111
000000 000000 000000 000
111111 111111 111111 111
000000 000000 000000 111111 111111 111111
000000 000000 000000 000
111111 111111 111111 111
000000 000000 000000 000
111111 111111 111111 111
00000000 00000000 00000000 11111111 11111111 11111111 000000 000000 000000 000
111111 111111 111111 111
00000000 00000000 00000000 11111111 11111111 11111111
000000 000000 000000 000
111111 111111 111111 111
000000 000000 000000 111111 111111 111111
00000000 00000000 00000000 0000
11111111 11111111 11111111 1111 00000000 00000000 00000000 11111111 11111111 11111111 000000 000000 000000 000
111111 111111 111111 111
Left: compact state (ξ = 0). Right: stretched state (ξ = 1).
A slow variable is ξ(Xt) where ξ(x) = |x1−2wx2|−r0 is a so-called reaction coordinate.
1 Free energy and metastability
A “real” example: ions canal in a cell membrane.
(C. Chipot).
T. Leli `evre, ECODOQUI, November 2008 – p. 15
1 Free energy and metastability
We suppose in the following that the slow variable is of dimension 1 and known: ξ(x), where ξ : Rn → R.
This slow variable contains most of the information needed in practice so that it would be enough to compute the law of ξ(X), for X with law µ.
Lemme 1 The image of the measure µ by ξ is
Z−1 exp(−βA(z)) dz, where
A(z) = −β−1 ln
Z
Σz
e−βV |∇ξ|−1dσΣz
= −β−1 ln ZΣz,
where Σz = {x, ξ(x) = z} is a (smooth) submanifold of
Rn, and σΣz is the Lebesgue measure on Σz.
1 Free energy and metastability
Co-area formula: Let X be a random variable with law
ψ(x) dx in Rn. Then ξ(X) has law RΣ
z ψ |∇ξ|−1 dσΣz dz,
and the law of X conditioned to a fixed value z of ξ(X)
is dµz = ψ |∇ξ|
−1 dσΣz
R
Σz ψ|∇ξ|−1 dσΣz .
Indeed, for any bounded functions f and g,
E(f(ξ(X))g(X)) = Z
Rn
f(ξ(x))g(x)ψ(x) dx,
= Z
Rp
Z
Σz
f ◦ ξ g ψ |∇ξ|−1dσΣz dz,
= Z
Rp
f(z) R
Σz g ψ |∇ξ|−1dσΣz R
Σz ψ |∇ξ|−1dσΣz Z
Σz
ψ |∇ξ|−1dσΣz dz.
T. Leli `evre, ECODOQUI, November 2008 – p. 17
1 Free energy and metastability
Remarks:
- The measure |∇ξ|−1dσΣz is sometimes denoted
δξ(x)−z in the literature.
- A is the free energy associated with the reaction coordinate or collective variable ξ (angle, length, ...).
A is defined up to an additive constant, so that it is enough to compute free energy differences, or the derivative of A (the mean force).
- A(z) = −β−1 ln ZΣz and ZΣz is the partition function associated with the conditioned probability measures:
µΣz = ZΣ−1
z e−βV |∇ξ|−1dσΣz.
1 Free energy and metastability
Example of a free energy profile (solvation of a dimer)
(Profiles computed using TI)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Parameter z
Free energy difference ∆ F(z)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Parameter z
Free energy difference ∆ F(z)
The density of the solvent molecules is lower on the left than on the right. At high density, the compact state is more likely but (claim of physicists)
spontaneous transitions are less frequent (free energy barrier) ... to be better understood. T. Leli `evre, ECODOQUI, November 2008 – p. 19
1 Free energy and metastability
Some direct numerical simulations...
Remark: Free energy is not energy !
−3 −2 −1 0 1 2 3
0 0.5 1 1.5 2 2.5 3
x coordinate
y coordinate
−3 −2 −1 0 1 2 3
−2.5
−2
−1.5
−1
−0.5 0 0.5 1 1.5
x coordinate
Free energy
Left: The potential is 0 in the region enclosed by the curve, and +∞ outside.
Right: Associated free energy profile when the x
coordinate is the reaction coordinate (β = 1).
1 Free energy and metastability
Examples of methods to compute free energy differences A(z2) − A(z1):
• Thermodynamic integration (Kirkwood)
(homogeneous Markov process),
• Perturbation methods (Zwanzig) (importance sampling),
• Out of equilibrium dynamics (Jarzynski)
(non-homogeneous Markov process),
• Adaptive methods (ABF, metadynamics)
(non-homogeneous and non-linear Markov process).
T. Leli `evre, ECODOQUI, November 2008 – p. 21
1 Free energy and metastability
(a) Thermodynamic integration. (b) Perturbation method.
(c) Out of equilibrium dynamics. (d) Adaptive dynamics.
2 Constrained dynamics
Examples of methods to compute free energy differences A(z2) − A(z1):
• Thermodynamic integration (Kirkwood)
(homogeneous Markov process),
• Perturbation methods (Zwanzig) (importance sampling),
• Out of equilibrium dynamics (Jarzynski)
(non-homogeneous Markov process),
• Adaptive methods (ABF, metadynamics)
(non-homogeneous and non-linear Markov process).
T. Leli `evre, ECODOQUI, November 2008 – p. 23
2.1 Thermodynamic integration
Thermodynamic integration is based on two remarks:
(1) The derivative A′(z) can be obtained by sampling the conditioned probability measure µΣz (Sprik, Ciccotti, Kapral, Vanden-Eijnden, E, den Otter, ...)
A′(z) = ZΣ−1
z
Z ∇V · ∇ξ
|∇ξ|2 − β−1div
∇ξ
|∇ξ|2
exp(−βV )|∇ξ|−1dσΣz,
= ZΣ−1
z
Z ∇ξ
|∇ξ|2 ·
∇V˜ + β−1H
exp(−βV˜ )dσΣz,
= Z
f dµΣz,
where V˜ = V + β−1 ln |∇ξ|, f = ∇V|∇ξ|·∇ξ2 − β−1div
∇ξ
|∇ξ|2
and H = −∇ ·
∇ξ
|∇ξ|
∇ξ
|∇ξ| is the mean curvature vector.
2.1 Thermodynamic integration
Proof: (based on the co-area formula)
Z Z
exp(−βV˜ )dσΣz ′
φ(z) dz = −
Z Z
exp(−βV˜ )dσΣzφ′ dz,
= −
Z Z
exp(−βV˜ )φ′ ◦ ξ dσΣz dz,
= −
Z
exp(−βV˜ )φ′ ◦ ξ|∇ξ|dx,
= −
Z
exp(−βV˜ )∇(φ ◦ ξ) · ∇ξ
|∇ξ|2 |∇ξ|dx,
=
Z
∇ ·
exp(−βV˜ ) ∇ξ
|∇ξ|
φ ◦ ξ dx,
=
Z Z
−β ∇V˜ · ∇ξ
|∇ξ|2 + |∇ξ|−1∇ ·
∇ξ
|∇ξ|
!
exp(−βV˜)dσΣzφ(z) dz.
T. Leli `evre, ECODOQUI, November 2008 – p. 25
2.1 Thermodynamic integration
(2) It is possible to sample the conditioned probability measure µΣz = ZΣ−1
z exp(−βV˜ )dσΣz by considering the following constrained dynamics:
(RCD)
( dXt = −∇V˜(Xt) dt + p
2β−1dW t + ∇ξ(Xt)dΛt, dΛt such that ξ(Xt) = z.
Moreover, we have dΛt = dΛmt + dΛft, with
dΛmt = −p
2β−1 |∇ξ|∇ξ2 (Xt) · dWt and
dΛft =|∇ξ|∇ξ2 ·
∇V˜ + β−1H
(Xt) dt = f(Xt) dt so that
A′(z) = lim
T→∞
1 T
Z T 0
dΛt = lim
T→∞
1 T
Z T 0
dΛft.
2.1 Thermodynamic integration
The free energy profile is then obtained by thermodynamic integration:
A(z) − A(0) =
Z z 0
A′(z) dz ≃
XK i=0
ωiA′(zi).
T. Leli `evre, ECODOQUI, November 2008 – p. 27
2.1 Thermodynamic integration
The rigidly constrained dynamics can also be written:
(RCD) dXt = P (Xt)
−∇V˜ (Xt) dt + p
2β−1dW t
+ β−1H(Xt) dt,
where P(x) is the orthogonal projection operator:
P (x) = Id − n(x) ⊗ n(x),
with n the unit normal vector: n(x) = ∇ξ
|∇ξ|(x).
(RCD) can also be written using the Stratonovitch product: dXt = −P(Xt)∇V˜ (Xt) dt+ p
2β−1P (Xt)◦dW t. It is easy to check that ξ(Xt) = ξ(X0) = z for Xt
solution to (RCD).
2.1 Thermodynamic integration
Assume z = 0.
µΣ0 is the unique invariant measure with support in Σ0
for (RCD).
Proposition 1 Let Xt be the solution to (RCD) such that the law of X0 is µΣ0. Then, for all smooth
function φ and for all time t > 0,
E(φ(Xt)) = Z
φ(x)dµΣ0(x).
Proof: Introduce the infinitesimal generator and apply the divergence theorem on submanifolds :
∀φ ∈ C1(R3N, R3N),
Z
div Σ0(φ) dσΣ0 = − Z
H · φdσΣ0,
where tr .
T. Leli `evre, ECODOQUI, November 2008 – p. 29
2.1 Thermodynamic integration
Discretization: These two schemes are consistent with (RCD):
(S1)
( Xn+1 = Xn − ∇V˜ (Xn)∆t + p
2β−1∆W n + λn∇ξ(Xn+1),
with λn ∈ R such that ξ(Xn+1) = 0,
(S2)
( Xn+1 = Xn − ∇V˜ (Xn)∆t + p
2β−1∆W n + λn∇ξ(Xn),
with λn ∈ R such that ξ(Xn+1) = 0,
where ∆W n = W(n+1)∆t − Wn∆t. The constraint is
exactly satisfied (important for longtime computations).
The discretization of A′(0) = limT→∞ T1 R T
0 dΛt is:
Tlim→∞ lim
∆t→0
1 T
T /∆t
X
n=1
λn = A′(0).
2.1 Thermodynamic integration
In practice, the following variance reduction scheme may be used:
( Xn+1 = Xn − ∇V˜ (Xn)∆t+p
2β−1∆Wn + λ∇ξ(Xn+1),
with λ ∈ R such that ξ(Xn+1) = 0, ( X∗ = Xn − ∇V˜(Xn)∆t−p
2β−1∆Wn + λ∗∇ξ(X∗),
with λ∗ ∈ R such that ξ(X∗) = 0,
and λn = (λ + λ∗)/2.
The martingale part dΛmt (i.e. the most fluctuating part) of the Lagrange multiplier is removed.
T. Leli `evre, ECODOQUI, November 2008 – p. 31
2.1 Thermodynamic integration
An over-simplified illustration: in dimension 2,
V (x) = β2−1 |x|2 and ξ(x) = xa212 + xb222 − 1.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
-3 -2 -1 0 1 2 3 4
mes_int mes_non_int dyn_int_proj_1 dyn_int_proj_2 dyn_non_int_proj_1 dyn_non_int_proj_2
Measures samples theoretically and numerically (as a
function of the angle θ), with β = 1, a = 2, b = 1, ∆t = 0.01, and 50 000 000 timesteps.
2.1 Thermodynamic integration
Computation of the mean force: β = 1, a = 2, b = 1. The exact value is: 0.9868348150. The numerical result
(with ∆t = 0.001, M = 50000) is: [0.940613 ; 1.03204].
The variance reduction method reduces the variance by a factor 100. The result (with ∆t = 0.001, M = 50000) is: [0.984019 ; 0.993421].
T. Leli `evre, ECODOQUI, November 2008 – p. 33
2.1 Thermodynamic integration
App. mean force as a function of ∆t and M = T /∆t:
1e-05
1e-04
0.001
0.01
0.1 1000
10000
100000
1e+06
1e+07 0.86
0.88 0.9 0.92 0.94 0.96 0.98 1
dt
M
A balance needs to be find between the discretization error (∆t → 0) and the convergence in the ergodic limit (T → ∞).
2.1 Thermodynamic integration
Using classical technics (Talay-Tubaro like proof), one can check that the ergodic measure µ∆tΣ
0 sampled by the Markov chain (Xn) is an approximation of order one of µΣ0: for all smooth functions g : Σ0 → R,
Z
Σ0
gdµ∆tΣ0 − Z
Σ0
gdµΣ0
≤ C∆t.
T. Leli `evre, ECODOQUI, November 2008 – p. 35
2.1 Thermodynamic integration
Remarks:
- There are many ways to constrain the dynamics
(GD). We chose one which is simple to discretize. We may also have used, for example (for z = 0)
dXηt = −∇V (Xηt ) dt − 1
2η∇(ξ2)(Xηt ) dt + p
2β−1dW t,
where the constraint is penalized. One can show that
limη→0 Xηt = Xt (in L∞t∈[0,T](L2ω)-norm) where Xt satisfies (RCD). Notice that we used V and not V˜ in the penalized dynamics.
2.1 Thermodynamic integration
The statistics associated with the dynamics where the constraints are rigidly imposed and the dynamics
where the constraints are softly imposed through
penalization are different: “a stiff spring 6= a rigid rod”
(van Kampen, Hinch,...).
T. Leli `evre, ECODOQUI, November 2008 – p. 37
2.1 Thermodynamic integration
- TI yields a way to compute R φ(x)dµ(x):
Z
φ(x)dµ(x) = Z−1 Z
φ(x)e−βV (x)dx,
= Z−1 Z
z
Z
Σz
φe−βV |∇ξ|−1dσΣz dz, (co-area formula)
= Z−1 Z
z
R
Σz φe−βV |∇ξ|−1dσΣz R
Σz e−βV |∇ξ|−1dσΣz Z
Σz
e−βV |∇ξ|−1dσΣz dz,
=
Z
z
e−βA(z) dz
−1 Z
z
Z
Σz
φdµΣz
e−βA(z) dz.
with Σz = {x, ξ(x) = z}, A(z) = −β−1 ln R
Σze−βV |∇ξ|−1dσΣz
and
µΣz = e−βV |∇ξ|−1dσΣz/ R
Σze−βV |∇ξ|−1dσΣz.
2.2 Constrained SDEs
- For a general SDE (with a non isotropic diffusion), the following diagram does not commute:
Pcont
Pdisc Projected discretized process
?
Discretized process Projected continuous process
Continuous process
Discretized projected continuous process
∆t
∆t
T. Leli `evre, ECODOQUI, November 2008 – p. 39
2.2 Constrained SDEs
We are interested in simulating a SDE:
dXt = b(Xt) dt + σ(Xt)dWt
subject to the constraint
q(Xt) = 0.
Xt ∈ Rn, b : Rn → Rn and σ : Rn → Rn×m (with σσT > 0),
W t is a m-dimensional standard Brownian process
with filtration Ft. The functions b, σ and q are supposed to be smooth.
In this section, q : Rn → R and we suppose that
∀x ∈ Σ = {x, q(x) = 0}, |∇q|(x) 6= 0.
(In the MD framework, q may be the reaction coordinate or some molecular constraints.)
2.2 Constrained SDEs: continuous level
As such, the problem is ill-posed. We want to find a
Ft-adapted process Y t such that:
( dXt = b(Xt) dt + σ(Xt)dWt + dY t, q(Xt) = 0,
where dY t = dAt + St dWt. Additional assumption:
dAt and St dW t are colinear to D(Xt)∇q(Xt) dt, where D(x) is a n × n symmetric positive matrix.
D(x)∇q(x) is the normal to Σ at point x and can be given by some additional assumptions on the
constraining term Y t (D’alembert’s principle for example).
T. Leli `evre, ECODOQUI, November 2008 – p. 41
2.2 Constrained SDEs: continuous level
By Itô’s calculus on the constraint q(Xt) = 0, one then obtains:
(SDE)
dXt = P (Xt) (b(Xt) dt + σ(Xt)dWt)
−1 2
∇2q : (P σσT PT ) D∇q
||∇q||2D
(Xt) dt,
where the projection operator P(x) is:
P (x) = Id − D(x)∇q(x) ⊗ ∇q(x)
||∇q(x)||2D(x) ,
and, for any Y ∈ Rn and any SDP matrix S,
||Y ||2S = (Y · SY ).