HAL Id: hal-01078958
https://hal.sorbonne-universite.fr/hal-01078958
Submitted on 16 Nov 2018
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Computation of pair distribution functions and three- dimensional densities with a reduced variance principle
Daniel Borgis, Roland Assaraf, Benjamin Rotenberg, Rodolphe Vuilleumier
To cite this version:
Daniel Borgis, Roland Assaraf, Benjamin Rotenberg, Rodolphe Vuilleumier. Computation of pair
distribution functions and three- dimensional densities with a reduced variance principle. Molecular
Physics, Taylor & Francis, 2013, 111, pp.3486. �hal-01078958�
Computation of pair distribution functions and three-dimensional densities with a reduced variance
principle
Daniel Borgis
1, Roland Assaraf
2, Benjamin Rotenberg
3, and Rodolphe Vuilleumier
11
Pˆ ole de Physico-Chimie Th´ eorique, ´ Ecole Normale Sup´ erieure, UMR 8640 CNRS-ENS-UPMC, 24 rue Lhomond, 75005 Paris, France
2
CNRS, UPMC Univ. Paris 06, ESPCI, UMR 7616 LCT, 75005 Paris, France
3
CNRS, UPMC Univ. Paris 06, ESPCI, UMR 7195 PECSA, 75005 Paris, France
Abstract
No fancy statistical objects here, we go back to the computation of one of the most basic and fundamental quantities in the statistical mechanics of fluids, namely the pair distribution functions. Those functions are usually computed in molecular simulations by using histogram techniques. We show here that they can be estimated using a global information on the instantaneous forces acting on the particles and that this leads to a reduced variance compared to the standard histogram estimators. The technique is extended successfully to the computation of three-dimensional solvent densities around tagged molecular solutes, quantities that are noisy and very long to converge using histograms.
1 Introduction
What we all learned from our mentor is that, for a given statistical mechanics problem,
it is important (and intellectually rewarding) to first derive the proper, irrefutable,
mathematical formulation, to understand as much as possible the formulas in mechan-
ical terms, and, at the end, to be able to express the relevant formulas in computational
forms that can be tackled with efficient and accurate simulation algorithms. Such goal
can be hardly reached without the help of a chalk and a blackboard, and without questioning every single line.
1We hope that the present contribution is as much as possible in this vein, although it stays away from any complex problems in computational statistical mechanics that have attracted the community recently [1,2], in particular quantum-classical statistical mechanics [3,4], statistical mechanics with constraints [5–7] or phase-space exploration and characterization for complex systems using accelerated sampling methods [8, 9].
We go back here to the computation of the most elementary objects in liquid-state theories, namely the two-particle pair-distribution functions in bulk liquids, or the three-dimensional one-particle liquid density around a tagged molecular object. [10,11]
The evaluation of such quantities, as prescribed in simulation textbooks [12, 13], is usually done through histograms, or possibly improved histograms, that rely in any case on the representation of the Dirac δ-function on a discrete grid. With such method, the variance of the results diverges when the grid spacing goes to zero. Elaborating on the work of Assaraf and collaborators [14, 15] for the evaluation quantum electronic densities with zero-variance/zero-bias principles, we propose a different approach to the computation of pair distribution functions and one-particle densities that leads to a much reduced variance of the results with respect to straight histogram counting and furthermore achieves a finite variance when the grid spacing tends to zero. As will be seen, smooth curves can be obtained even with very limited statistics.
Their will be a fair amount of explicit rigorous mathematical manipulations in this contribution, as it suits to a rigorous scientist who denies any physical (not to speak of chemical) intuition and thinks that Nature can only speak through well defined mathematical objects and their relations. We must confess, however, that some points of the derivation will not have the degree of rigor that it deserves; we will elude in particular some mathematical proofs of convergence and variance properties –in part because our degree of understanding and current formalization is not yet complete.
Nevertheless, all the basic principles will be set.
The paper is organized as follows. Section 2 is devoted to the computation of
1
This contribution is dedicated to Prof. Giovanni Ciccotti, in honour of his 70th birthday
pair distribution functions whereas section 4 focuses on three-dimensional densities.
Section 4 concludes.
2 Pair distribution functions
Let us consider a fluid mixture for which microscopic configurations are generated by computer simulations in the canonical ensemble. The pair distribution function (or radial distribution function, RDF) between molecules of type a and b is defined as
g
ab(r) =
ab4πr
2*
NaX
i=1 Nb,
X
j=1
δ(r − r
ij) +
, (1)
where
ab= (1 −
12δ
ab)
NVaNb
, r
ij= r
j− r
i, and r
ij= |r
ij|. The prime in the second sum indicates that i = j should be excluded in the case a = b. This function is generally computed through histograms with bins of of finite width, ∆r, replacing the δ-function by
∆r1h
∆r(r), where h
∆r(r) is the characteristic function equal to 1 between r and r + ∆r and 0 otherwise. This statistical estimation is of infinite variance as ∆r → 0 since the instantaneous density in each bead oscillates between 0 and O(1/4πr
2∆r).
Accounting for the rotational invariance in the bulk fluid, the pair distribution can be also expressed as
g
ab(r) =
ab4π
Z dΩ
*
NaX
i=1 Nb0
X
j=1
δ(r − r
ij) +
, (2)
with Ω = r/r. Following the ideas of Assaraf et al for electron densities [14, 15], use can be made of the Poisson equality to replace the three-dimensional delta-function
δ(r − r
ij) = − 1 4π ∆
ri1
|r − r
ij| . (3) Insertion of the Laplacian with respect to either r
ior r
jin the canonical average and integration by part yields (after symmetrization)
h
ab(r) = − β
ab4π
Z dΩ
*
Na
X
i=1 Nb0
X
j=1
r
ij− r
|r
ij− r|
3· 1
2 (F
j− F
i) +
(4)
where h
ab= g
ab− 1.
Using the Gauss theorem for the electric field created by a uniformly charged sphere of radius r at the location r
ijZ
dΩ r
ij− rΩ
|r
ij− rΩ|
3= r
ijr
3ijH(r
ij− r), (5) with H the Heaviside function, we get from eqn 4
h
ab(r) = − β
ab4π
*
Na
X
i=1 Nb0
X
j=1
1
2 (F
j− F
i) · r
ijr
3ijH(r
ij− r) +
(6) This is a key formula of this work. Compared to the standard histogram procedure, it now involves the force acting on the particles in addition to their positions. It also implies a quite different numerical procedure: Here, for each configuration, every particles pair contributes to all distances r < r
ijinstead of just to r = r
ij. Furthermore application of the formula requires a predefined grid but does not necessarily imply the limit of infinitely small grid separation, ∆r → 0. The only requirement is that the chosen grid, not necessarily regular, is adapted to cope with the variations of the pair distribution at all distances.
The new procedure is illustrated in Figure 1 for the radial distribution function, g(r), of a pure Lennard-Jones fluid composed of 864 particles at a reduced density ρ
∗= 0.8 and reduced temperature T
∗= 1.35, computed by molecular dynamics simulation.
We have displayed the RDF computed from ONE SINGLE equilibrated configuration using either histograms or eqn 6 and we compare those ”instantaneous” curves to the converged result after 10000 time steps. In both approaches we used the same regular grid with ∆r = 0.005σ. It can be seen that the curve obtained by eqn 6 is already very smooth and quite close to the final converged result. The histogram curve does contain the converged one within its fluctuations but appears very noisy. This is further illustrated in Fig. 2 where we plot the variances
v(r) = 1 T
X
t
g
t(r)
2− 1 T
X
t
g
t(r)
!
2(7)
obtained after T = 1000 simulation steps; g
t(r) is the ”instantaneous” pair distribution
function measured at step number t. It can be verified that the variance measured with
the ”force” approach is indeed much reduced with respect to the histogram approach
0 1 2 3 4
r (σ)
0 1 2 3 4
g(r)
Figure 1: Radial distribution function obtained for a SINGLE equilibrated configu- ration of a Lennard-Jones liquid composed of 864 particles using either the force ap- proach, eqn 6, or the standard histogram technique, with a grid spacing ∆r = 0.005σ The dashed blue line indicates the converged result after 10000 simulation steps.
and appears independent of the chosen grid size; the histogram method leads to a variance that is inversely proportional to ∆r.
In Fig. 3 is displayed the oxygen-oxygen radial distribution obtained from 100
configurations of a DFT-MD simulation of 128 water molecules at ambient thermody-
namic conditions with BLYP functional, after preliminary equilibration. It can be seen
that, even for the relatively fine grid chosen and with a very limited number of steps,
the ”force RDF” is very smooth. It should also be noted that, even if the agreement
with the converged RDF is already and overall very good after such a short trajectory,
one observes slight discrepancies, in particular in the height of the first peak. At this
stage, the force method is able to improve the variance of the RDF, but does not correct
for the bias in the statistics. Looking for an improved method with minimum variance
AND minimum bias is still another matter. Nonetheless, Fig. 3 is meant to show that
0 1 2 3 4
r (σ)
0 0,05 0,1
v(r)
Figure 2: Variance on the value of the radial distribution function depicted in figure 1 after 1000 simulation steps using either the force approach, eqn 6, or the standard his- togram technique, with a grid spacing ∆r = 0.01σ (black and blue lines, respectively) or ∆r = 0.005σ (cyan and red lines, respectively).
it is certainly in the field of ab-initio studies, where the generation of the trajectories themselves is computationally very expensive, that the force method described here to compute the RDF’s reveals its full potential, given that the forces on the nuclei at each time step are readily available from the simulations.
In order to gain some physical insight into the above formula, one can write H(r
ij− r) =
Z
∞r
dr
0δ(r
0− r
ij) (8) Replacing into eq. 6 and inverting integral and canonical average, we get
ρ
bh
ab(r) = −β Z
∞r
dr
0F (r
0) (9)
with the mean force density defined by F (r) = ρ
bab4πr
2*
NaX
i=1 Nb0
X
j=1
1
2 (F
j− F
i) · r
ijr
ijδ(r − r
ij) +
. (10)
0 2 4 6 r (A)
-1 0 1 2 3
g(r)
Figure 3: Oxygen-oxygen radial distribution function averaged over 100 configurations extracted from a DFT-MD trajectory with 128 water molecules at ambient liquid con- ditions. The dashed blue line indicates the converged result obtained by averaging over 38600 configurations.
We are back to a histogram procedure, but for F (r) instead of g
ab(r) directly.
Denoting by ¯ F (r) the constrained (or conditional) mean force at a given distance, F ¯ (r) = F (r)/ρ
bg
ab(r), we get by differentiation of eqn 9, and division by g
ab(r):
1 g
ab(r)
dg
abdr = ¯ F (r) = −β dw
ab(r)
dr , (11)
which is the definition of the potential of mean force (PMF), g
ab(r) = exp(−βw
ab(r)).
A fundamental difference of the present approach with respect to standard PMF calcu- lations is the use of the force density F (r) instead of the mean force ¯ F (r). Eqn 9 is thus reminescent of the usual PMF formula but not equivalent to it in practice: Here, the integration of the force density F (r) yields g
ab(r) directly rather than its logarithm and this quantity is computed in its integrality during the simulation, rather than step by step using constraints or restraints on the a − b distance as in usual PMF calculations.
Note also that eqs 6 and 9-10 are rigorously equivalent only in the limit of infinitely
small grid size.
We note in passing that after eqn 4, a second integration by part can be performed, yielding
ρ
bh
ab(r) = β
ab4π
Z dΩ
*
NaX
i=1 Nb0
X
j=1
1
2 ( Φ
i+ Φ
j) 1
|r − r
ij| +
(12) with Φ
i= βF
2i− ∆
riU .
Using now the Gauss theorem for the electrostatic potential created by a uni- formly charged sphere of radius r at a location r
ijZ
dΩ 1
|r
ij− rΩ| = F (r, r
ij) = min( 1 r , 1
r
ij), (13)
eqn 12 can also be re-written as ρ
bh
ab(r) = β
ab4π
*
NaX
i=1 Nb0
X
j=1
1
2 ( Φ
i+ Φ
j) F (r, r
ij) +
. (14)
As for eqn 6, this formula has to be evaluated at all points r of a pre-defined grid by summation over all particle pairs for each microscopic configuration. The key quantities to be evaluated for each particle are now the Φ
i’s, that are scalar quantities involving the second derivatives of the potential felt by each particle (minus the mean square force times β).
In analogy to eqs 9-10, one can define also the continuous function Φ(r), Φ(r) = ρ
bab4πr
2*
NaX
i=1 Nb0
X
j=1
1
2 (Φ
j+ Φ
i)δ(r − r
ij) +
, (15)
that can be evaluated through standard histograms for a given simulation. This func- tion being computed, it is easy to show that the radial distribution can be extracted at the end through the one-dimensional integral relation
ρ
bh
ab(r) = β Z
∞r
dr
0r
0(1 − r
0/r)Φ(r
0) (16) This second formula appears much less familiar than the mean force formula of eqn 10.
We found its application, as well as that of the direct formula 14, to be very unstable
and thus quite disappointing. We attribute this to the large fluctuation of the Φ
i’s
that appear as the difference of two large numbers. We do not have a clear and final
understanding of this fact yet. For the time being, we limit ourselves to the force
formulation, eqs 6 and 9, for which the reduced variance properties was shown to work
quite fine.
3 Three-dimensional densities
The related problem that we address now is the computation of the three-dimensional density of a molecular solvent around a fixed molecule of arbitrary shape located at the origin using molecular simulations such as Monte-Carlo or Molecular Dynamics.
This density is defined by
ρ(r) =
*
NX
i=1
δ(r − r
i) +
. (17)
This quantity is usually computed by discretizing space into a 3D-grid and monitor- ing the filling of 3D-histograms of volume ∆V in the course of the simulation. This statistical process is indeed of infinite variance as ∆V → 0 since the instantaneous density in each bead oscillates betweeen 0 and O(1/∆V ). The exploration of the three-dimensional volume requires much more statistics than that required for radial functions and it is indeed known that three-dimensional densities are hard to get ac- curately that way. To bypass that problem, the Dirac function in eqn 17 may be again transformed according to the Poisson’s identity of eqn 3. Substitution in the average above and integration by part with respect to the Boltzmann weight gives
∆ρ(r) = ρ(r) − ρ
0= − β 4π
*
NX
i=1
r − r
i|r − r
i|
3· F
i+
(18) F
i= −∇
riU representing the force on molecule i, U is the total interaction potential.
A second integration by part yields
∆ρ(r) = − β 4π
*
NX
i=1
Φ
i|r − r
i| +
(19) with Φ
i= βF
2i− ∆
riU . Those two formulas have a different content that eqn 17 since every particle now contributes instantaneously to every grid point in space, even those at long distances, and not only to the closest ones.
The straight application of those formulas is time consuming however, with a cost of order N × N
g, where N
gis the number of grid points, for each solvent configuration.
This might not be a limitation in ab-initio molecular dynamics where the cost of a
time step is already very high. In molecular mechanics however, the cost should be
compared to N
2or even N ln(N ) for advanced algorithms. A compromise has therefore
to be found.
Inspired by the developments of the previous section, we can introduce a delta- function in r
0into eqn (18)
∆ρ(r) = − β 4π
*
NX
i=1
Z
dr
0δ(r
0− r
i) r − r
0|r − r
0|
3· F
i+
(20) and with the definition of the force density at point r
0F(r
0) =
*
NX
i=1
δ(r
0− r
i)F
i+
(21) one gets
∆ρ(r) = − β 4π
Z
dr
0r − r
0|r − r
0|
3· F(r
0). (22) In practice, F(r
0) can be computed by histograms during a simulation and at the end Fourier-transformed to F(k) using a discrete three-dimensional fast Fourier transform (FFT). ρ(r) is then obtained by inverse FFT of
∆ρ(k) = iβ
k
2k · F(k). (23)
Defining in the same way Φ(r
0) =
*
NX
i=1
δ(r
0− r
i)(βF
2i− ∆
riU ) +
(24) one has equivalently
∆ρ(r) = − β 4π
Z
dr
0Φ(r
0)
|r − r
0| (25)
or
∆ρ(k) = −β Φ(k)
k
2. (26)
Noting that in fact Φ(r) = ∇
r· F(r), or in k-space, Φ(k) = −ik · F(k), the above equations involving F or Φ are indeed equivalent. Eqn (26) involves the direct analytical calculation of the divergence of the mean force whereas eqn (23) involves its numerical estimation by differentiation in k-space.
As an illustration of this formalism, we display in Fig. 4 the three-dimensional
water density around a given water molecule computed from the DFT-MD simulation
with 128 water molecules already described in Fig. 3. Here the whole trajectory was
exploited, i.e. 38600 stored configurations separated by 5 f s. The simulation was
0 0,5 1 1,5 2
ρ(z,0,0)/ρ0
0 1 2 3 4 5
ρ(0,z,0)/ρ0
-6 -4 -2 0 2 4 6
z (A)
0 1 2 3
ρ(0,0,z)/ρ0
0 2 ρ(z,0,0)/ρ0
0 2 4
ρ(0,z,0)/ρ0
-5 0 5
z (A)
0 2 ρ(0,0,z)/ρ0