Measuring the Irreversibility of Numerical Schemes for Reversible Stochastic Differential Equations

(1)

DOI:10.1051/m2an/2013142 www.esaim-m2an.org

MEASURING THE IRREVERSIBILITY OF NUMERICAL SCHEMES FOR REVERSIBLE STOCHASTIC DIFFERENTIAL EQUATIONS

^∗

Markos Katsoulakis

¹

, Yannis Pantazis

¹

and Luc Rey-Bellet

¹

Abstract. For a stationary Markov process the detailed balance condition is equivalent to the time- reversibility of the process. For stochastic diﬀerential equations (SDE’s), the time discretization of numerical schemes usually destroys the time-reversibility property. Despite an extensive literature on the numerical analysis for SDE’s, their stability properties, strong and/or weak error estimates, large deviations and inﬁnite-time estimates, no quantitative results are known on the lack of reversibility of discrete-time approximation processes. In this paper we provide such quantitative estimates by using the concept of entropy production rate, inspired by ideas from non-equilibrium statistical mechanics.

The entropy production rate for a stochastic process is deﬁned as the relative entropy (per unit time) of the path measure of the process with respect to the path measure of the time-reversed process. By construction the entropy production rate is nonnegative and it vanishes if and only if the process is reversible. Crucially, from a numerical point of view, the entropy production rate is an a posteriori quantity, hence it can be computed in the course of a simulation as the ergodic average of a cer- tain functional of the process (the so-called Gallavotti−Cohen (GC) action functional). We compute the entropy production for various numerical schemes such as explicit Euler−Maruyama and explicit Milstein’s for reversible SDEs with additive or multiplicative noise. In addition we analyze the entropy production for the BBK integrator for the Langevin equation. The order (in the time-discretization stepΔt) of the entropy production rate provides a tool to classify numerical schemes in terms of their (discretization-induced) irreversibility. Our results show that the type of the noise critically aﬀects the behavior of the entropy production rate. As a striking example of our results we show that the Euler scheme for multiplicative noiseis not an adequate scheme from a reversibility point of view since its entropy production ratedoes not decrease withΔt.

Mathematics Subject Classification. 65C30, 82C3, 60H10.

Received July 23, 2012. Revised September 1, 2013.

Published online August 13, 2014.

1. Introduction

In molecular dynamics algorithms arising in the simulation of systems in materials science, chemical engi- neering, evolutionary games, computational statistical mechanics,etc.the steady-state statistics obtained from numerical simulations is of great importance [8,27,33]. For instance, the free energy of the system or free energy diﬀerences as well dynamical transitions between metastable states are quantities which are sampled in

Keywords and phrases. Stochastic diﬀerential equations, detailed balance, reversibility, relative entropy, entropy production, numerical integration, (overdamped) Langevin process.

∗ We thanks Natesh Pillai for useful comments and suggestions. M.A.K. and Y.P. are partially supported by NSF-CMMI 0835673 and L.R.-B. is partially supported by NSF -DMS-1109316.

1 Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA, USA.luc@math.umass.edu

Article published by EDP Sciences c EDP Sciences, SMAI 2014

(2)

M. KATSOULAKISET AL.

the stationary regime. At the microscopic level, physical processes are often modeled as interactions between particles described by a system of stochastic diﬀerential equations (SDE’s) [8,15]. To perform steady-state simulations for the sampling of desirable observables, the solution of the system of SDE’s must possess a (unique) ergodic invariant measure. The uniqueness of the invariant measure follows from the ellipticity or hypoellipticity of the generator of the process together with irreducibility, which means that the process can reach at some positive time any open set of the state space with positive probability [21,25]. Under such conditions, the point distribution of the process converges to the invariant measure (ergodicity) while the process is stationary,i.e.

the distribution of the path of the process is invariant under time-shift, when the initial data are drawn from the invariant measure. Moreover, many processes of physical origin, such as diﬀusion and adsorption/desorption of interacting particles, satisfy the condition of detailed balance (DB), or equivalently, time-reversibility,i.e., the distribution of the path of the process is invariant under time-reversal. It is easy to see that time-reversibility implies stationarity but this a strictly stronger condition in general. The condition of detailed balance often arises from a gradient-like behavior of the dynamics or from Hamiltonian dynamics if the time-reversal includes reversal of the velocities.

However, the numerical simulation of SDE’s necessitates the use of numerical discretization schemes. Dis- cretization procedures, except in very special cases, results in the destruction of the DB condition. This aﬀects the approximation process in at least two ways. First, the invariant measure of the approximation process, if it exists at all, is not known explicitly and, second, the time reversibility of the process is lost. Several recent results prove the existence of the invariant measure for the discrete-time approximation and provide error estimates [2,3,19,20] but, to the best of our knowledge, there is no quantitative assessment of the irreversibility of the approximation process. Of course there exist Metropolized numerical schemes such as MALA [26] and variations thereof [4,15] which do satisfy the DB condition but they are numerically more expensive, especially in high-dimensional systems, as they require an accept/reject step. Thus, a quantitative understanding of the lack of reversibility for simpler discretization schemes can provide new insights for selecting which schemes are closer to satisfying the DB condition.

The implications of irreversibility are only partially understood, both from the physical and mathematical point of view. These issues have emerged as a main theme in non-equilibrium statistical mechanics and it is well-known that irreversibility introduces a stationary current (net flow) to the system [11,18,23,28] but it is still under investigation in statistical mechanics community how this current affects the long-time properties (i.e., the dynamics and large deviations) of the process such as exit times, correlation times and phase transitions of metastable states. Nevertheless, a straightforward implication of irreversibility is the breaking of the symmetry of fundamental quantities and observables: for example, if a process is reversible theautocorrelation function R(s, t) of a given observable is symmetric, i.e. R(s, t) = R(t, s). Reversibility is a natural and fundamental property of physical systems and thus, if numerical approximation results in the destruction of reversibility, one should carefully quantify the irreversibility of the approximation process. More generally, structure preserving numerical schemes have proved to be vastly superior (in stability) in deterministic ODE’s, e.g Stormer–Verlet for Hamiltonian systems, [10] as well as in PDE, e.g. hyperbolic conservation laws and infinite dimensional Hamiltonian systems, and there is no reason to believe it should be different in the numerical analysis of stochastic processes. From a physical point of view the reason for detailed balance is the time-reversibility of an underlying microscopic dynamics over which an (effective) stochastic model is presumably built. Hence, violating such a condition reduces the physical interpretation of our stochastic model. The detailed balance condition has many implications (Kubo-formula for the linear response and Onsager relations for the response coefficients for example which are related to the symmetry of the autocorrelations functions) and as such it should be preserved as much as possible in computational algorithms. In this paper we quantify irreversibility in numerical schemes using the entropy production rate, which allows for a quantitative analysis of the path space statistics, including the long-time, stationary time-series regime. The entropy production rate, defined as the relative entropy (per unit time) of the path measure of the process with respect to the path measure of the reversed process, is widely used in statistical mechanics for the study of non-equilibrium steady states of irreversible systems [7,11,14,18].

(3)

MEASURING THE IRREVERSIBILITY OF NUMERICAL SCHEMES

A fundamental result on the structure of non-equilibrium steady states is the Gallavotti−Cohen ﬂuctuation theorem that describes the ﬂuctuations (of large deviations type) of the entropy production [7,11,14,18] and this result can be viewed as a generalization of the Kubo-formula and Onsager relations far from equilibrium.

Specifically for diffusion processes, entropy production governs the long time statistics of the empirical measure (also known as occupation statistics) as well as the current statistics through the Donsker–Varadhan rate functional, [16,17]. For our purposes, it is important to note that the entropy production rate is zero when the process is reversible and positive otherwise making entropy production rate a sensible quantitative measure of irreversibility. Furthermore, if we assume ergodicity of the approximation process, the entropy production rate equals the time-average of the Gallavotti−Cohen (GC) action functional which is defined as the logarithm of the Radon-Nikodym derivative between the path measure of the process and the path measure of the reversed process. A key observation of this paper is that GC action functional is ana posterioriquantity, hence, it is easily computable during the simulation makingthe numerical computation of entropy production rate tractable. We show that entropy production is a computable observable that distinguishes between different numerical schemes in terms of their discretization-induced irreversibility and as such could allow us to adjust the discretization in the course of the simulation.

We use entropy production to assess the irreversibility of various numerical schemes for reversible continuous- time processes. A simple class of reversible processes, yet of great interest, is the overdamped Langevin process with gradient-type drift [8,9,15]. The discretization of the process is performed using the explicit Euler−Maruyama (EM) scheme and we distinguish between two diﬀerent cases depending on the kind of the noise. In the case of additive noise, under the assumption of ergodicity of the approximation process [2,3,19,20]

we prove that the entropy production rate is of orderO(Δt²) whereΔtis the time step of the numerical scheme.

In the case of multiplicative noise, the results are strikingly different. Indeed, under ergodicity assumption, the entropy production rate for the explicit EM scheme is proved to have a lower positive bound which is indepen- dent ofΔt. Thus irreversibility is notreduced by adjustingΔt, as the approximation process converges to the continuous-time process. The different behavior of entropy production depending on the kind of noise is one of the prominent findings of this paper. As a further step in our study, we analyze the explicit Milstein’s scheme with multiplicative noise (it is the next higher-order numerical scheme). We prove that the entropy production rate of Milstein’s scheme decreases as time step decreases with orderO(Δt).

Finally, we compute both analytically and numerically the entropy production rate for a discretization scheme for Langevin systems which is another important and widely-used class of reversible models [8,15]. The Langevin equation is time-reversible if in addition to reversing time, one reverses the sign of the velocity of all particles.

The noise is degenerate but the process is hypo-elliptic and under mild conditions the Langevin equation is ergodic [20,24,31]. Our discretization scheme is a quasi-symplectic splitting scheme also known as BBK integrator [5,15]. We rigorously prove, under ergodicity assumption of the approximation process, that the entropy rate produced by the numerical scheme for the Langevin process with additive noise is of orderO(Δt), hence, in terms of irreversibility it is an acceptable integration scheme.

The paper is organized in four sections. In Section 2 we recall some basic facts about reversible processes and define the entropy production. Moreover we give the basic assumptions necessary for our proofs, namely, the ergodicity of both continuous-time and discrete-time approximation process. In Section3 we compute the entropy production rate for reversible overdamped Langevin processes. The section is split into three subsections for the additive and multiplicative noise for the Euler and Milstein schemes. In Section4we compute the entropy production rate for the reversible (up to momenta flip) Langevin process using the BBK integrator. Conclusions and future extensions of the current work are summarized in the fourth and final section.

2. Reversibility, Gallavotti − Cohen action functional, and entropy production

Let us consider ad-dimensional system of SDE’s written as

dX_t=a(X_t)dt+b(X_t)dB_t (2.1)

(4)

where X_t∈R^d is a diffusion Markov process,a:R^d →R^d is the drift vector,b :R^d →R^d×m is the diffusion matrix, and B_t∈R^m is a standardm-dimensional Brownian motion. We will always assume that aandb are sufficiently smooth and satisfy suitable growth conditions and/or dissipativity conditions at infinity to ensure the existence of global solutions. The generator of the diffusion process is defined by

Lf = d i=1

a_i∂f

∂x_i +1 2

d i,j=1

(bb^T)_i,j ∂²f

∂x_i∂x_j· (2.2)

for smooth test functionsf. We assume that the processX_t has a (unique) invariant measureμ(dx), and that it satisﬁes the Detailed Balance (DB) condition,i.e., its generator is symmetric in the Hilbert space L²(μ):

Lf, g_L²_(μ)=f,Lg_L²_(μ) (2.3) for suitable smooth test functionsf, g.

A Markov processX_tis said to be time-reversible if for anynand sequence of times t₁< . . . < t_n the ﬁnite dimensional distributions of (X_t₁, . . . , X_t_n) and of (X_t_n, . . . , X_t₁) are identical. More formally, letP^ρ_[0,t] denote the path measure of the processX_ton the time-interval [0, t] withX₀∼ρ. LetΘ denote the time reversal,i.e.

Θ acts on a path{X_s}0≤s≤thas

(ΘX)_s = X_t−s (2.4)

Then reversibility is equivalent to P^μ_[0,t] = P^μ_[0,t] ◦Θ and it is well-known that a stationary² process which satisﬁes the DB condition is time-reversible.

The condition of reversibility can be also expressed in terms of relative entropy as follows. Recall that for two probability measureπ₁, π₂ on some measurable space, the relative entropy ofπ₁ with respect toπ₂is given byR(π₁|π2)≡

dπ₁log^dπ_dπ¹

2 ifπ₁ is absolutely continuous with respect toπ₂ and +∞otherwise. The relative entropy is nonnegative,R(π₁|π2)≥0 andR(π₁|π2) = 0 if and only ifπ₁=π₂. The entropy production rate of a Markov processX_t is deﬁned by

EP_cont:= lim

t→∞

1 tR

P^ρ_[0,t]|P^ρ_[0,t]◦Θ

= lim

t→∞

1 t

dP^ρ_[0,t]log dP^ρ_[0,t]

dP^ρ_[0,t]◦Θ (2.5) If X_t satisﬁes DB and X₀ ∼ μ then R(P^μ_[0,t]|P^μ_[0,t] ◦Θ) is identically 0 for all t and the entropy production rate is 0. Note that if X₀ ∼ ρ =μ then R(P^ρ_[0,t]|P^ρ_[0,t]◦Θ) is a boundary term, in the sense that it is O(1) and so the entropy rate vanishes in this case in the large time limit (under suitable ergodicity assumptions).

Conversely whenEP_cont= 0 the process is truly irreversible. The entropy production rate for Markov processes and stochastic diﬀerential equations is discussed in more detail in [14,18].

Let us consider a numerical integration scheme for the SDE (2.1) which has the general form

x_i+1=F(x_i, Δt, ΔW_i) i= 1,2, . . . (2.6) Herex_i ∈R^d is a discrete-time continuous state-space Markov process,Δtis the time-step andΔW_i∈R^m, i= 1,2, . . .are i.i.d. Gaussian random variables with mean 0 and varianceΔtI_m. We will assume that the Markov processx_i has transition probabilities which are absolutely continuous with respect to Lebesgue measure with everywhere positive densitiesΠ(x_i, x_i+1) :=Π_F_(x,Δt,ΔW)(x_i+1|x_i) and we also assume that x_i has a invariant measure which we denote ¯μ(dx) and which is then unique and has a density with respect to Lebesgue measure.

In general the invariant measure for X_tandx_i diﬀer,μ= ¯μandx_i does not satisfy a DB condition. Note also that the very existence of ¯μ is not guaranteed in general. Results on the existence of ¯μ do exist however and typically require that the SDE is elliptic or hypoellitptic and that the state space ofX_tis compact or that a global Lipschitz condition on the drift holds [2,3,19,20].

2Stationarity is equivalent to starting the processXt from its invariant measure,i.e.,X0∼µ.

(5)

Proceeding as in the continuous case we introduce an entropy production rate for the Markov processx_i. Let us assume that the process starts from some distribution ρ(x)dx, then the ﬁnite dimensional distribution on the time window [0, t] wheret=nΔtis given by

P¯_[0,t](dx₀, . . . ,dx_n) =ρ(x₀)Π(x₀, x₁). . . Π(x_n−1, x_n)dx₀. . .dx_n. (2.7) For the time reversed pathΘ(x₀, . . . x_n) = (x_n, . . . , x₀) we have then

P¯_[0,t]◦Θ(dx₀, . . . ,dx_n) =ρ(x_n)Π(x_n, x_n−1). . . Π(x₁, x₀)dx₀. . .dx_n (2.8) and the Radon-Nikodym derivative takes the form

d ¯P_[0,t]

d ¯P_[0,t]◦Θ = exp(W(t))ρ(x₀)

ρ(x_n) (2.9)

whereW(t) is the Gallavotti−Cohen (GC) action functional given by W(t) =W(n;Δt) :=

n−1

i=0

logΠ(x_i, x_i+1)

Π(x_i+1, x_i)· (2.10)

Note that W(t) is an additive functional of the paths and thus if x_i is ergodic, by the ergodic theorem the following limit exists

EP(Δt) = lim

t→∞

1

tW(t) = lim

n→∞

1

nΔtW(n;Δt) P¯−a.s. (2.11)

We call the quantityEP(Δt) the entropy production rate associated to the numerical scheme. Note that we have, almost surely,

EP(Δt) = 1 Δt lim

n→∞

1 n

n−1

i=0

logΠ(x_i, x_i+1) Π(x_i+1, x_i) = 1

Δt μ(x)Π¯ (x, y) logΠ(x, y)

Π(y, x)dxdy (2.12) and for concrete numerical schemes we will compute fairly explicitly the entropy production in the next sections. Since we are interested in the ergodic average we will systematically omit boundary terms which do not contribute to ergodic averages and we will use the notation

W₁(t) ˙=W₂(t) if lim

t→∞

1

t(W₁(t)−W₂(t)) = 0. (2.13) For example we have

W(t) ˙= log d ¯P_[0,t]

d ¯P_[0,t]◦Θ· (2.14)

Note also that using (2.11) and (2.10), entropy production rate is tractable numerically and it can be easily calculated “on-the-ﬂy” once the transition probability density functionΠ(·,·) is provided.

In the following sections we investigate the behavior of the entropy production rate for diﬀerent discretization schemes of various reversible processes in the stationary regime. However, before proceeding with our analysis, let us state formally the basic assumptions necessary for our results to apply.

Assumption 2.1. We have

• The driftaand the diﬀusionbin (2.1) as well as the vectorF in (2.6) areC^∞and all their derivatives have at most polynomial growth at inﬁnity.

(6)

• The generatorLis elliptic or hypo-elliptic, in particular the transition probabilities and the invariant measure (if it exists) are absolutely continuous with respect to Lebesgue with smooth densities. We assume thatx_t is ergodic, i.e. every open set can be reached with positive probability starting from any point. For the discretized scheme we assume thatx_i has smooth everywhere positive transition probabilities.

• Both the continuous-time processX_tand discrete-time processx_iare ergodic with unique invariant measures μand ¯μ, respectively. Furthermore for suﬃciently smallΔtwe have

|Eμ[f]−Eμ¯[f]|=O(Δt) (2.15) for functionsf which areC^∞ with at most polynomial growth at inﬁnity.

Notice that inequality (2.15) is an error estimate for the invariant measures of the processesX_t andx_i. The rate of convergence in terms of Δtdepends on the particular numerical scheme [19,30]. Ergodicity results for (numerical) SDEs can be found in [2,3,12,19,20,26,30–32]. For instance, if both drift terma(x) and diﬀusion termb(x) have bounded derivatives of any order, the covariance matrix (bb^T)(x) is elliptic for allx∈R^d and there is a compact set outside of which holdsx^Ta(x)<−C|x|²for allx∈R^d(Lyapunov exponent) then it was shown in [30] that the continuous-time process as well both Euler and Milstein numerical schemes are ergodic and error estimate (2.15) holds. Another less restrictive example where ergodicity properties were proved is for SDE systems with degenerate noise and particularly for Langevin processes [20,31]. Again, a Lyapunov functional is the key assumption in order to handle the stochastic process at the inﬁnity. More recently, Mattinglyet al.[19]

showed ergodicity for SDE-driven processes restricted on a torus as well their discretizations utilizing only the assumptions of ellipticity or hypoellipticity and the assumption of local Lipschitz continuity for both drift and diﬀusion terms.

3. Entropy production for overdamped langevin processes

The overdamped Langevin process,X_t∈R^d, is the solution of the following system of SDE’s dX_t=−1

2Σ(X_t)∇V(X_t)dt+1

2∇Σ(Xt)dt+σ(X_t)dB_t (3.1)

whereV :R^d→Ris a smooth potential function,σ:R^d→R^d×mis the diﬀusion matrix,Σ:=σσ^T :R^d→R^d×d is the covariance matrix and B_t is a standardm-dimensional Brownian motion. We assume from now on that Σ(x) is invertible for any xso that the process is elliptic. It is straightforward to show that the generator of the processX_t satisﬁes the DB condition (2.3) with invariant measure

μ(dx) = 1

Zexp(−V(x))dx (3.2)

whereZ =

R^dexp(−V(x))dxis the normalization constant and thus ifX₀∼μthen the Markov processX_tis reversible.

The explicit Euler−Maruyama (EM) scheme for numerical integration of (3.1) is given by x_i+1=x_i−1

2Σ(x_i)∇V(x_i)Δt+1

2∇Σ(xi)Δt+σ(x_i)ΔW_i (3.3)

with ΔW_i ∼N(0, ΔtI_m), i = 1,2, . . . are m-dimensional iid Gaussian random variables. The process x_i is a discrete-time Markov process with transition probability density given by

Π(x_i, x_i+1) = 1 Z(x_i)exp

1 2Δt

Δx_i+1

2Σ(x_i)∇V(x_i)Δt−1

2∇Σ(xi)Δt

T

×Σ⁻¹(x_i)

Δx_i+1

2∇Σ(xi)Δt

(3.4)

(7)

where Δx_i = x_i+1 −x_i and Z(x_i) = (2π)^m/2|detΣ(x_i)|^1/2 is the normalization constant for the multidi- mensional Gaussian distribution. The following lemma provides the GC action functional for the explicit EM time-discretization scheme of the overdamped Langevin process.

Lemma 3.1. Assume that detΣ(x) = 0 ∀x ∈ R^d. Then the GC action functional of the process x_i solv- ing(3.3)is

W(n;Δt) ˙= −1 2

n−1

i=0

Δx^T_i [∇V(x_i+1) +∇V(x_i)] +1 2

n−1

i=0

Δx^T_i[Σ⁻¹(x_i+1)∇Σ(x_i+1) +Σ⁻¹(x_i)∇Σ(x_i)]

+ 1 2Δt

n−1

i=0

Δx^T_i

Σ⁻¹(x_i+1)−Σ⁻¹(x_i) Δx_i

(3.5)

where=˙ means equality up to boundary terms, as deﬁned in (2.13).

Proof. The assumption for non-zero determinant is imposed so that the transition probabilities and hence the GC action functional are non-singular. The proof is then a straightforward computation using (3.4) and (2.10).

W(n;Δt) :=

n−1

i=0

[logΠ(x_i, x_i+1)−logΠ(x_i+1, x_i)] =

n−1

i=0

[logZ(x_i+1)−logZ(x_i)]

− 1 2Δt

n−1

i=0

Δx_i+1

2∇Σ(xi)Δt

T

×Σ⁻¹(x_i)

Δx_i+1

2∇Σ(xi)Δt

−

−Δx_i+1

2Σ(x_i+1)∇V(x_i+1)Δt−1

2∇Σ(x_i+1)Δt

T

×Σ⁻¹(x_i+1)

−Δxi+1

2Σ(x_i+1)∇V(x_i+1)Δt−1

2∇Σ(x_i+1)Δt

˙

= − 1 2Δt

n−1

i=0

Δx^T_iΣ⁻¹(x_i)Δx_i+1

4∇V(x_i)^TΣ(x_i)∇V(x_i)Δt²+1

4∇Σ(xi)^TΣ⁻¹(x_i)∇Σ(xi)Δt² +Δx^T_i ∇V(x_i)Δt−Δx^T_i Σ⁻¹(x_i)∇Σ(xi)Δt−1

2∇V(x_i)^T∇Σ(xi)Δt²

−Δx^T_iΣ⁻¹(x_i+1)Δx_i−1

4∇V(x_i+1)^TΣ(x_i+1)∇V(x_i+1)Δt²−1

4∇Σ(x_i+1)^TΣ⁻¹(x_i+1)∇Σ(x_i+1)Δt² +Δx^T_i ∇V(x_i+1)Δt−Δx^T_i Σ⁻¹(x_i+1)∇Σ(x_i+1)Δt+1

2∇V(x_i+1)^T∇Σ(x_i+1)Δt²

˙

= − 1 2Δt

n−1

i=0

Δx^T_i

Σ⁻¹(x_i)−Σ⁻¹(x_i+1)

Δx_i−1 2

n−1

i=0

Δx^T_i[∇V(x_i+1) +∇V(x_i)]

+1 2

n−1

i=0

Δx^T_i[Σ⁻¹(x_i+1)∇Σ(x_i+1) +Σ⁻¹(x_i)∇Σ(xi)]

where all the terms of the general formG(x_i)−G(xi+1) in the sums were cancelled out since they form telescopic

sums which become boundary terms.

Three important remarks can readily be made from the above computation.

(8)

Remark 3.2. The numerical computation of entropy production rate as the time-average of the GC action functional on the path space (i.e., based on (2.9)) at first sight seems computationally intractable due to the large dimension of the path space. However, due to ergodicity, the numerical computation of the entropy production can be performed as a time-average based on (2.11) and (3.5) for largen. Additionally, this computation can be done for free and “on-the-fly” since the quantities involved are already computed in the simulation of the process. The numerical entropy production rate shown in the following figures is computed using this approach.

Remark 3.3. It was shown in [18] that the GC action functional of thecontinuous-timeprocess driven by (3.1) equals the Stratonovich integral

W_cont(t) =− _t

0 ∇V(X_s)◦dX_s=V(x₀)−V(x_t) (3.6) which reduces to a boundary term as expected. This functional has the discretization

W_cont(t)≈1 2

n−1

i=0

Δx^T_i [∇V(x_i+1) +∇V(x_i)] (3.7)

and this is exactly the first term in the GC action functionalW(n;Δt) for the explicit EM approximation process (see (3.5)). However, the discretization scheme introduces two additional terms to the GC action functional which may greatly affect the asymptotic behavior of entropy production asΔtgoes to zero, as we demonstrate in Section3.2. Notice that when the noise is additive,i.e., when the diffusion matrix is constant, then these two additional terms vanish and taking the limitΔt→0, the GC action functionalW(n;Δt), if exists, becomes the Stratonovich integralW_cont(t) which is a boundary term.

Remark 3.4. The GC action functionalW(n;Δt) consists of three terms (see (3.5)), each of which stems from a particular term in the SDE. Thus, each term in the SDE contributes to the entropy production functional a component which is totally decoupled from the other terms. The reason for this decomposition lies in the particular form of the transition probabilities for the explicit EM scheme which are exponentials with quadratic argument. This feature can be exploited for the study of entropy production of numerical schemes for processes with irreversible dynamics. Indeed, if a non-gradient term of the forma(X_t)dtis added to the drift of (3.1), the process is irreversible and its GC action functional is not anymore a boundary term and is given by [18]

W_cont(t) ˙=− _t

0

Σ⁻¹(X_t)a(X_t)◦dX_t≈1 2

n−1

i=0

Δx^T_i

Σ⁻¹(x_i)a(x_i) +Σ⁻¹(x_i+1)a(x_i+1)

(3.8) On the other hand, due to the separation property of the explicit EM scheme, the GC action functional of the discrete-time approximation processW(n;Δt) has the additional term

1 2

n−1

i=0

Δx^T_i [Σ⁻¹(x_i)a(x_i) +Σ⁻¹(x_i+1)a(x_i+1)]. (3.9) Evidently, the discretization of W_cont(t) equals the additional term of the GC functionalW(n;Δt). Thus, the GC action functional W(n;Δt) is decomposed into two components, one stemming from the irreversibility of the continuous-time process and another one stemming from the irreversibility of the discretization procedure.

3.1. Entropy production for the additive noise case

An important special case of (3.1) is the case of additive noise, i.e., when the covariance matrix does not depend in the process,Σ(x)≡Σ. In this case, the SDE system becomes

dX_t=−1

2Σ∇V(X_t)dt+σdB_t X₀∼μ

(3.10)

(9)

and the GC action functional is simply given by

W(n;Δt) ˙=−1 2

n−1

i=0

Δx^T_i[∇V(x_i+1) +∇V(x_i)] (3.11)

In this section we prove an upper bound for the entropy production of the explicit EM scheme. The proof uses several lemmas stated and proved in Appendix A.

Theorem 3.5. Let Assumption 2.1 hold. Assume also that the potential function V has bounded ﬁfth-order derivative and that the covariance matrix Σ is invertible. Then, for suﬃciently small Δt, there exists C = C(V, Σ)>0 such that

EP(Δt)≤CΔt² (3.12)

Proof. Utilizing the generalized trapezoidal rule (A.1) fork= 3, the GC action function is rewritten as

W(n;Δt) ˙= −1 2

n−1

i=0

Δx^T_i [∇V(x_i+1) +∇V(x_i)]

=

n−1

i=0

⎧⎨

⎩−(V(x_i+1)−V(x_i)) +

|α|=3

C_α[D^αV(x_i+1) +D^αV(x_i)]Δx^α_i

+

|α|=1,3,5

|β|=5−|α|

B_β[R^β_α(x_i, x_i+1) +R^β_α(x_i+1, x_i)]Δx^α+β_i

⎫⎬

⎭

˙

=

n−1

i=0

|α|=3

C_α[D^αV(x_i+1) +D^αV(x_i)]Δx^α_i

+

n−1

i=0

|α|=1,3,5

|β|=5−|α|

B_β[R^β_α(x_i, x_i+1) +R^β_α(x_i+1, x_i)]Δx^α+β_i .

(3.13)

Applying, once again, Taylor series expansion toD^αV(x_i+1), the GC action functional becomes

W(n;Δt) ˙=

n−1

i=0

⎧⎨

⎩

|α|=3

2C_αD^αV(x_i)Δx^α_i +

|α|=3

C_α

|β|=1

D^α+βV(x_i)Δx^α+β_i

⎫⎬

⎭ +

n−1

i=0

|α|=1,3,5

|β|=5−|α|

R¯^β_α(x_i, x_i+1)Δx^α+β_i

(3.14)

where ¯R^β_α(x_i, x_i+1) =B_β[R^β_α(x_i, x_i+1) +R^β_α(x_i+1, x_i)] +^½_|α|=3R^α_β(x_i, x_i+1). Moreover, expandingΔx^α_i using the multi-binomial formula

Δx^α_i =

−1

2Σ∇V(x_i)Δt+σΔW_i

α

=

ν≤α

α ν

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α−ν. (3.15)

(10)

Then, the GC action functional becomes

W(n;Δt) ˙= 2

n−1

i=0

|α|=3

ν≤α

C_α α

ν D^αV(x_i)

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α−ν

+

n−1

i=0

|α|=3

|β|=1

ν≤α+β

C_α α+β

ν D^α+βV(x_i)

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α+β−ν

+

n−1

i=0

|α|=1,3,5

|β|=5−|α|

ν≤α+β

α+β

ν R¯^β_α(x_i, x_i+1)

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α+β−ν.

(3.16)

From (2.11), the entropy production rate is the time-averaged GC action functional asn→ ∞. Thus, EP(Δt) = lim

n→∞

W(n;Δt) nΔt

= 2 Δt

|α|=3

ν≤α

C_α α

ν lim

n→∞

1 n

n−1

i=0

D^αV(x_i)

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α−ν

+ 1 Δt

|α|=3

|β|=1

ν≤α+β

C_α α+β

ν lim

n→∞

1 n

n−1

i=0

D^α+βV(x_i)

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α+β−ν

+ 1 Δt

|α|=1,3,5

|β|=5−|α|

ν≤α+β

α+β ν lim

n→∞

1 n

n−1

i=0

R¯^β_α(x_i, x_i+1)

−1

2Σ∇V(x_i)Δt

ν

(σΔW_i)^α+β−ν. (3.17) The ergodicity ofx_ias well the Gaussianity ofΔW_iguarantees that the ﬁrst two limits in the entropy production formula exist. Additionally, the residual terms, ¯R^β_α(x_i, x_i+1), are bounded due to the assumption on bounded ﬁfth-order derivative ofV, hence, the third limit also exists. Note here that this assumption could be changed by assuming boundedness of a higher order derivative and performing a higher-order Taylor expansion. Appendix A gives rigorous proofs of these ergodicity statements. Hence,

EP(Δt) = 2 Δt

|α|=3

ν≤α

C_α α

ν E¯μ

D^αV(x)

−1

2Σ∇V(x)Δt

ν Eρ

(σy)^α−ν

+ 1 Δt

|α|=3

|β|=1

ν≤α+β

C_α α+β

ν Eμ¯

D^α+βV(x)

−1

2Σ∇V(x)Δt

ν Eρ

(σy)^α+β−ν

+ 1 Δt

|α|=1,3,5

|β|=5−|α|

ν≤α+β

α+β ν Eμ×ρ¯

R¯^β_α(x, y)

−1

2Σ∇V(x)Δt

ν

Eρ[(σy)^α+β−ν]

(3.18)

where ¯μ is the equilibrium measure for x_i while ρ is the Gaussian measure of ΔW_i. Using the Isserlis−Wick formula we can compute the higher moments of multivariate Gaussian random variable from the second-order moments. Indeed, we have

E[y^ν] =E[y^ν₁¹. . . y_d^ν^d] =E

z₁z₂. . . z_|ν|

=

0 if |ν| odd

E[z_iz_j] if |ν| even (3.19) where means summing over all distinct ways of partitioningz₁, . . . , z_|ν| into pairs. Moreover, E[ziz_j] = Σ_ijΔt, hence, applying (3.19) into (3.18) and changing the multi-index notation to the usual notation, the

(11)

entropy production rate becomes EP(Δt) = 2

Δt d k1=1

d k2=1

d k3=1

C_k₁_k₂_k₃

Eμ_¯

∂³V

∂x_k₁∂x_k₂∂x_k₃

−1 2Σ∇V

k1

Σ_k₂_k₃Δt²

+E_¯μ

∂³V

−1 2Σ∇V

k2

Σ_k₁_k₃Δt²

+E_¯μ

∂³V

−1 2Σ∇V

k3

Σ_k₁_k₂Δt²+O Δt³

+ 1 Δt

d k1=1

d k2=1

d k3=1

d k4=1

C_k₁_k₂_k₃

Eμ_¯

∂⁴V

∂x_k₁. . . ∂x_k₄

×[Σ_k₁_k₂Σ_k₃_k₄+Σ_k₁_k₃Σ_k₂_k₄+Σ_k₁_k₄Σ_k₂_k₃]Δt²+O Δt³

+ 1 ΔtO

Δt³ .

(3.20)

Using that (−¹₂Σ∇V)_k_i=−₂¹_d

k4=1Σ_k_i_k₄_∂x^∂V

k4, entropy production is rewritten as EP(Δt) =

d k1=1

d k2=1

d k3=1

d k4=1

C_k₁_k₂_k₃

Σ_k₁_k₂Σ_k₃_k₄

−E_μ_¯

∂³V

∂x_k₁∂x_k₃∂x_k₄

∂V

∂x_k₂

+E_μ_¯

∂⁴V

∂x_k₁. . . ∂x_k₄ +Σ_k₁_k₃Σ_k₂_k₄

−Eμ¯

∂³V

∂x_k₁∂x_k₂∂x_k₄

∂V

∂x_k₃

+Eμ¯

∂⁴V

∂x_k₁. . . ∂x_k₄ +Σ_k₁_k₄Σ_k₂_k₃

−Eμ¯

∂³V

∂V

∂x_k₄

+Eμ¯

∂⁴V

∂x_k₁. . . ∂x_k₄ Δt+O Δt²

.

(3.21) By a simple integration by parts, we observe that for any combinationk₁, . . . , k₄= 1, . . . , d

Eμ

∂³V

∂V

∂x_k₄

=Eμ

∂⁴V

∂x_k₁. . . ∂x_k₄

(3.22) where the expectation is taken with respect ofμwhich is the invariant measure of the continuous-time process.

However, in (3.21) the expectation is w.r.t. the invariant measure of the discrete-time process (i.e., ¯μinstead of μ). Nevertheless, Assumption2.1guarantees that the alternation of the measure fromμto ¯μcosts an error of orderO(Δt). Hence, for any coeﬃcient in (3.21), we obtain that

E_μ_¯

∂³V

∂V

∂x_k₄

−E_μ_¯

∂⁴V

∂x_k₁. . . ∂x_k₄

≤2KΔt (3.23)

since the potentialV as well its derivatives are suﬃciently smooth. Hence, we overall showed that EP(Δt) =O

Δt²

(3.24)

which completes the proof.

Remark 3.6. Depending on the potential function the entropy production could be even smaller. For instance, when the potentialV is a quadratic function (i.e. the continuous-time process is an Ornstein−Uhlenbeck process), then, it is easily checked by a trivial calculation of (3.11) that the GC action function is a boundary term, thus, the entropy production of the explicit EM scheme is zero. However, for a generic potential V we expect that the entropy production rate decays quadratically as a function ofΔtbut not faster.

(12)

0 1 2 3 4 5 6 7 8 9 10

x 10⁴

−15

−10

−5 0 5 10 15

Time

GC Action Functional

0 1 2 3 4 5 6 7 8 9 10

x 10⁴

−10

−5 0 5x 10⁻³

Time

Entropy Production

Figure 1.Upper panel: the GC action functional as a function of time for ﬁxedΔt= 0.05. Its variance is large necessitating the use of many samples in order to obtain statistically conﬁdent quantities.Lower Panel: the entropy production rate as a function of time for the sameΔt. It converges to a positive value as expected.

3.1.1. Fourth-order potential on a torus

Lets now proceed with an important example where the potential is a forth-order polynomial while the process takes values on a torus. Assumed= 2 while potentialV =V_β is given by

V_β(x) =β |x|⁴

4 −|x|²

2 (3.25)

where β is a positive real number which in statistical mechanics has the meaning of the inverse temperature.

The diﬀusion matrix is set toσ=

2β⁻¹I_d. Based on [20], Assumption 2.1 is satisfied because the domain is restricted to a torus, the potential is locally Lipschitz continuous and the covariance matrix is elliptic. Figure1 presents both the GC action functional (upper panel) and the entropy production rate (lower panel) as a function of time for fixedΔt= 0.05. Both quantities are numerically computed while the inverse temperature is set to β = 10. Even though the variance of the GC action functional is large, entropy production which is the cumulative sum of the GC functional converges due to the law of large numbers to a (positive) value after relatively long time. Additionally, due to the ergodicity assumption, it converges to the correct value. Figure2 shows the loglog plot of the numerical entropy production rate as a function ofΔt forβ = 20, 40, 60. Final time was set to t = 2×10⁶ while initial point was set to one of the attraction points of the deterministic counterpart. For reader’s convenience, the thick black line denotes theO(Δt²) rate of convergence. This plot is in agreement with the theorem’s estimate (3.12). Notice also that, for smallΔt, entropy production rate is very close to 0 and even larger final time is needed in order to obtain a statistically confident numerical estimate for the entropy production. Moreover, as it is evident from the figure and the GC action functional in (3.11), the dependence of the entropy production w.r.t. the inverse temperature is inverse proportional. Thus, from a statistical mechanics point of view, the larger is the temperature the larger -in a linear manner- is the entropy production rate of the numerical scheme.