• Aucun résultat trouvé

DISCRETE STICKY COUPLING OF FUNCTIONAL AUTOREGRESSIVE PROCESSES

N/A
N/A
Protected

Academic year: 2021

Partager "DISCRETE STICKY COUPLING OF FUNCTIONAL AUTOREGRESSIVE PROCESSES"

Copied!
44
0
0

Texte intégral

(1)

HAL Id: hal-03199226

https://hal.archives-ouvertes.fr/hal-03199226

Preprint submitted on 20 Apr 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

AUTOREGRESSIVE PROCESSES

Alain Durmus, Andreas Eberle, Aurélien Enfroy, Arnaud Guillin, Pierre Monmarché

To cite this version:

Alain Durmus, Andreas Eberle, Aurélien Enfroy, Arnaud Guillin, Pierre Monmarché. DISCRETE

STICKY COUPLING OF FUNCTIONAL AUTOREGRESSIVE PROCESSES. 2021. �hal-03199226�

(2)

DISCRETE STICKY COUPLING OF FUNCTIONAL AUTOREGRESSIVE PROCESSES

By Alain Durmus Andreas Eberle Aurélien Enfroy Arnaud Guillin and Pierre Monmarché §

ENS Paris-Saclay , University of Bonn , Université Clermont-Auvergne and Sorbonne Université §

Université Paris-Saclay, ENS Paris-Saclay, CNRS, Centre Borelli, F-91190 Gif-sur-Yvette, France.

alain.durmus@cmla.ens-cachan.fr aurelien.enfroy@ens-paris-saclay.fr

Institute for Applied Mathematics University of Bonn Endenicher Allee 60

53115 Bonn, Germany eberle@uni-bonn.de

Laboratoire de Mathématiques Blaise Pascal CNRS - UMR 6620

Université Clermont-Auvergne Avenue des landais, 63177 Aubiere cedex, France

arnaud.guillin@uca.fr

LJLL, Sorbonne Université 4 place Jussieu, F-75005 Paris, France

pierre.monmarche@sorbonne-universite.fr

In this paper, we provide bounds in various metrics between the successive iterates of two functional autoregressive processes with isotropic Gaussian noise of the form Y k+1 = T γ (Y k ) + p

γσ 2 Z k+1 and ˜ Y k+1 = ˜ T γ ( ˜ Y k ) + p

γσ 2 Z ˜ k+1 in the limit where the param- eter γ → 0. More precisely, we give non-asymptotic bounds on ρ(L(Y k ),L( ˜ Y k )), where ρ is an appropriate weighted Wasserstein dis- tance or a V -distance, uniformly in the parameter γ, and on ρ(π γ , π ˜ γ ), where π γ and ˜ π γ are the respective stationary measures of the two processes. Of particular interest, this class of processes encompasses the Euler-Maruyama discretization of Langevin diffusions and its variants. To obtain our results, we rely on the construction of a dis- crete Markov chain (W k (γ) ) k∈ N for which we are able to bound the moments and show quantitative convergence rates uniform on γ. In addition, we show that this process converges in distribution to the continuous sticky process studied in [20, 18]. Finally, we illustrate our result on two numerical applications.

1. Introduction. We are interested in this paper in Markov chains (Y k ) k∈ N start- ing from y ∈ R d and defined by recursions of the form

(1) Y k+1 = T γ (Y k ) + σ

γZ k+1 ,

where σ > 0, γ ∈ (0, ¯ γ], for ¯ γ > 0, {T γ : γ ∈ (0, ¯ γ]} is a family of continuous functions from R d to R d and (Z k ) k > 1 is a sequence of i.i.d. d-dimensional standard Gaussian ran- dom variables. Note that the Euler-Maruyama discretization of overdamped Langevin

Keywords and phrases: .

MSC2020 subject classifications: to do.

1

(3)

diffusions or of general Komolgorov processes and its variants belong to this class of pro- cesses and in that setting γ corresponds to the discretization step size. Indeed, the Euler scheme consists in taking for any γ ∈ (0, ¯ γ], T γ (y) = y + γb(y) for some b : R d → R d . When b = −∇U for some potential U , these methods are now popular Markov Chain Monte Carlo algorithms to sample from the target density x 7→ e −U(x) / R

R

d

e −U(y) dy.

However, in some applications, explictly computing ∇U is not an option and further numerical methods must be implemented which come with additional bias since only approximations of ∇U can be used in (1). In this paper, we precisely study this ad- ditional source of error. In particular, based on a chain defined by (1), we consider a second Markov chain ( ˜ Y k ) k∈ N defined by the recursion

(2) Y ˜ k+1 = ˜ T γ ( ˜ Y k ) + σγ Z ˜ k+1 ,

where { T ˜ γ : γ ∈ (0, γ ¯ ]} is a family of functions from R d to R d such that for any γ , T ˜ γ is an approximation of T γ in a sense specified below, and ( ˜ Z k ) k > 1 is a sequence of i.i.d. d-dimensional standard Gaussian random variables potentially correlated with (Z k ) k > 1 .

We will enforce below conditions that ensure that both (Y k ) k∈ N and ( ˜ Y k ) k > 1 are geometrically ergodic, and denote by π γ and ˜ π γ their invariant probability measures respectively. If for any γ > 0, ˜ T γ is close in some sense to T γ , the overall process ( ˜ Y k ) k∈N can be seen as a perturbed version of (Y k ) k∈ N , and ˜ π γ is expected to be close to π γ . The main goal of this paper is to establish quantitative bounds between the finite-time laws of the two processes and between their equilibria. We pay particular attention to the dependency of these estimates on γ . Indeed, in the case of the Euler scheme of a continuous-time diffusion, π γ and the law of Y bt/γc for some t > 0 converge to the invariant measure and law at time t of the continuous-time process, and similarly for the perturbed chain. Hence, as γ → 0, our estimates should not degenerate, but rather yield quantitative estimates for the continuous time process. More precisely, the present paper is the discrete-time counterpart of the study conducted by [18] in the continuous-time case, and as γ vanishes we recover estimates that are consistent with those of [18].

As in [18], our results are based on the construction of a suitable coupling of the processes, i.e. a simultaneous construction of a pair (Y k , Y ˜ k ) k∈ N of non-independent chains that marginally follow respectively (1) and (2) and are designed to get and stay close one to the other. We use the maximal reflection coupling for Gaussian laws, namely at each step the two chains are coupled to merge with maximal probability and, otherwise, we use a reflection (see Section 2.2 below). Estimates on the laws of the chains then follow from the study of (kY kY ˜ k k) k∈ N , which is itself based on the analysis of a Markov chain (W k ) k∈ N on [0, +∞) that is such that, by design of the coupling, kY kY ˜ k k k 6 W k for all k ∈ N . Thus, the question of establishing bounds between the laws of two d-dimensional Markov chains is reduced to the study of a single one-dimensional chain. Besides, together with the Markov property, the auxiliary chain presents some nice features. First, it is stochastically monotonous, i.e. if (W k 0 ) k∈ N

is a Markov chain associated to the same Markov kernel as (W k ) k∈ N and such that W 0 6 W 0 0 , then for any k ∈ N , W k is stochastically greater than W k 0 , i.e. for any t > 0, P (W k 6 t) > P (W k 0 6 t). Secondly, (W k ) k∈N has an atom at 0.

The main results and main steps of this study are the following. First, we prove

that (W k ) k∈ N admits a unique invariant measure and that, independently from γ , the

moments and mass on (0, +∞) of this equilibrium are small when the difference between

T γ and ˜ T γ is small. Second, we establish the geometric convergence of the chain towards

its equilibrium, at an explicit rate (stable as γ → 0). Finally, we prove that, as γ → 0,

(4)

the chain (W k ) k∈ N converges in law to the continuous-time sticky diffusion that played the same role in [18]. This last part is not necessary to get estimates on the finite-time and equilibrium laws of (1) and (2) for a given γ > 0, but it sheds some new light on the limit sticky process which, in [18], is constructed as the limit of continuous- time diffusions with diffusion coefficients that vanish at zero, rather than discrete-time chains. In some sense, (W k ) k∈ N can be seen as a discretization scheme for the sticky process, see also [2] on this topic.

Besides the obvious continuous/discrete difference between [18] and the present work, let us emphasize a few other distinctions. First, in [18], the one-dimensional sticky process has an explicit invariant measure. This is not the case in our framework, which makes the derivation of the bounds on the moments of the equilibrium a bit more involved. Secondly, in [18], although it is proven that the mass at zero and the first moment of the law of the sticky diffusion converge to their value at equilibrium (which is sufficient to get estimates on the laws of the two initial d-dimensional processes), the question of long-time convergence is not addressed for the sticky diffusion, whereas our long-time convergence results for (W k ) k > 0 together with its convergence as γ → +∞

furnish an explicit convergence rate for the sticky diffusion. The proof of the stability of the mass at zero and of the first moment in [18] relies on a concave modification of the distance (such as used e.g. in [16]), which is contracted by the chain before it hits zero. This method does not apply to, say, the second moment of the process. As a consequence, the results of [18] only concern the total variation and W 1 Wasserstein distances, while we consider a broader class of distances.

Finally, our theoretical results are illustrated through numerical experiments. We study the influence of the discretization scheme generally needed to perform Bayesian inference for parameters of Ordinary Differetial Equations (ODEs).

Notation and convention. We denote by B(R d ), the Borel σ-field of R d endowed with the Euclidean distance and by ϕ σ

2

the density of the one-dimensional Gaussian distribution with zero-mean and variance σ 2 > 0. In the case σ = 1, we simply denote this density by ϕ. ∆ R

d

stands for the subset of R d {(x, x) ∈ R 2d : x ∈ R d } and for any A ⊂ R d , A c for its complement. Let µ and ν be two σ-finite measures on ( R d , B( R d )).

If ν is absolutely continuous with respect to µ, we write ν µ. We say that ν and µ are equivalent if and only if ν µ and µ ν. We denote by d·e and b·c the floor and ceiling function respectively. For d, n ∈ N , M d,n (R) stands for the set of d × n real matrices. We denote by C k (U, A) the set of k-th continuously differentiable functions from an open set U ⊂ R m in A ⊂ R p . We use the convention P p k=n = 0 and Q p k=n = 1 for n < p, n, p ∈ N , and a/0 = +∞ for a > 0.

2. Sticky reflection coupling.

2.1. Main result. The Markov kernels R γ associated with (Y k ) k∈ N defined in (1) are given for any y ∈ R d , A ∈ B( R d ) by

R γ (y, A) = (2πσ 2 γ)

d

/

2

Z

R

d

1 A (y 0 ) exp n − ky 0 − T γ (y)k 2 /(2σ 2 γ) o dy 0 .

Note that ˜ R γ associated with ( ˜ Y k ) k∈ N is given by the same expression upon replacing

T γ by ˜ T γ . We consider the following assumption on the family {T γ : γ ∈ (0, γ ¯ ]}. This

condition will ensure that R γ is geometrically ergodic (see Proposition 1) and it will be

important to derive our main results regarding the distance of R k γ and ˜ R k γ , for k ∈ N .

(5)

H 1 . There exist R 1 , L > 0 and m > 0 such that for any γ ∈ (0, γ ¯ ], there exists a non- decreasing function τ γ : [0, +∞) → [0, +∞) satisfying τ γ (0) = 0, kT γ (x) − T γx)k 6 τ γ (kx − xk) ˜ for any x, x ˜ ∈ R d , and

(3) sup

r∈(0,+∞)

γ (r)/r} 6 1 + γL , sup

r∈(R

1

,+∞)

γ (r)/r} 6 1 − γ m . In addition, sup γ∈(0,¯ γ] kT γ (0)k < +∞.

Note that the condition that for any γ ∈ (0, γ ¯ ], τ γ is non-decreasing can be omitted upon replacing in our study τ γ by the affine majorant

(4) τ ¯ γ : r 7→

( (1 + Lγ)r if r ∈ [0, R 1 ] , (1 + Lγ)R 1 + (1 − mγ)(r − R 1 ) otherwise .

Indeed, by definition and (3), for any r ∈ [0, +∞), τ γ (r) 6 ¯ τ γ (r), therefore for any x, x ˜ ∈ R d , kT γ (x) − T γx)k 6 τ ¯ γ (kx − xk). In addition, an easy computation leads to ˜ setting R 2 = 2R 1 (L + m)/m,

sup

r∈(0,+∞)

τ γ (r)/r} 6 1 + γL , sup

r∈(R

2

,+∞)

τ γ (r)/r} 6 1 − γm/2 . Then, ¯ τ γ satisfies H 1 and is non-decreasing.

Note that H 1 implies that for any r ∈ [0, +∞) and γ ∈ (0, γ ¯ ], τ γ (r) 6 (1 + γL)r, therefore T γ is (1 + γL)-Lipschitz. The second condition in (3) ensures that for any γ ∈ (0, ¯ γ], T γ is a contraction at infinity, i.e. for any x, x ˜ ∈ R d , kT γ (x) − T γx)k 6 (1 − γm) kx − xk, if ˜ kx − xk ˜ > R 1 .

The assumption H 1 holds for the Euler scheme applied to diffusions with scalar covariance matrices, i.e. (1) with T γ (x) = x + γb(x) and a drift function b : R d → R d , if, for some L b , m b , R b > 0, b is L b -Lipschitz continuous and satisfies

hx − y, b(x)b(y)i 6 −m b kx − yk 2 ,

for all x, y ∈ R d with kx − yk > R b . Indeed, this implies that for any x, y ∈ R d , kT γ (x) − T γ (y)k 6 (1 + L b γ)kxyk and, provided γ ∈ 0, L 2 /m b

and kx − yk > R b , kT γ (x) − T γ (y)k 2 6 (1 − m b γ)kxyk 2 . Therefore, it suffices to consider τ γ defined by (4) with L = L b , m = m b /2 and R 1 = R b .

Our results will be stated in term of Wasserstein distances and V -norms, whose definitions are the following. Consider a measurable cost function c : R 2d → [0, ∞).

Then the associated Wasserstein distance W c is given for two probability measures µ, ν on R d by

W c (ν, µ) = inf

π∈Π(ν,µ)

Z

R

2d

c(x, y)π(dx, dy) ,

where Π(ν, µ) is the set of transference plan between ν and µ, namely the set of proba- bility measures on R d whose first and second d-dimensional marginals are respectively ν and µ. In the particular case where c(x, y) = 1 ∆

c

Rd

(x, y), W c is simply the total variation distance k · k TV . For V : R d → [1, +∞), the choice c(x, y) = 1 ∆

c

Rd

(x, y){V (x) + V (y)}

yields the V -norm (see [11, Theorem 19.1.7]), i.e. W c (ν, µ) = kν − µk V . Finally, for

c(x, y) = kx − yk p with p ∈ [1, +∞), W c is the p-th power of the usual Wasserstein

distance of order p.

(6)

Following the same line as the proof of [9, Theorem 15] 1 , we can show that H 1 implies that the Markov kernel R γ is V c -uniformly geometrically ergodic where for any c > 0 and x ∈ R d , V c (x) = exp(ckxk 2 ), with a convergence rate that scales linearly with the step size γ.

Proposition 1 . Assume H1. Then, setting ¯ γ 1 = ¯ γ ∧ {1/m}, for any γ ∈ (0, γ ¯ 1 ], R γ

admits a unique stationary distribution π γ . In addition, there exist c > 0, ρ ∈ [0, 1) and C > 0 such that for any x ∈ R d and γ ∈ (0, γ ¯ 1 ], kδ x R k γπ γ k V

c

6 V c (x).

Proof. This result is a simple consequence of [9, Corollary 11]. Its proof is given for completeness in Section 5.1.1.

Note that this result can be made quantitative, and other convergence results in total variation and Wasserstein distance of order p ∈ [1, +∞) can also be established following the same lines as the proof of [9, Corollary 14]. However, these results are out of the scope of the present paper and would be simple adaptations of those in [9] or [17].

We now consider an assumption which quantifies the perturbation associated with T ˜ γ relatively to T γ , for γ ∈ (0, γ ¯ ].

H2 . There exists c> 0 such that sup x∈R

d

kT γ (x) − T ˜ γ (x)k 6 γcfor all γ ∈ (0, γ]. ¯ Example 2 . The assumption H2 holds for the Euler scheme applied to diffusions with scalar covariance matrices, i.e. (1) and (2) with

(5) T γ (x) = x + γb(x) and T ˜ γ (x) = x + γ ˜ b(x) ,

under the condition that sup x∈ R

d

kb(x) − ˜ b(x)k 6 c . This setting is exactly the one we introduced to motivate our study. In particular, in the case where b = −∇U for some potential U , ˜ b may correspond to a numerical approximation of this gradient.

Note that compared to T γ , γ ∈ (0, γ], we do not assume any smoothness condition ¯ on ˜ T γ . More precisely, we do not assume that ˜ T γ satisfies H1. Regarding the ergodicity properties of ˜ R γ associated with ˜ T γ , γ ∈ (0, γ], we have the following result. ¯

Proposition 3. Assume H1 and H 2 and set γ ¯ 1 = ¯ γ ∧ {1/m}. Then, for any γ ∈ (0, γ ¯ 1 ], R ˜ γ admits a unique stationary distribution ˜ π γ . In addition, there exists c > 0 such that for any γ ∈ (0, γ ¯ 1 ], there exist ρ γ ∈ [0, 1) and C γ > 0 such that for any x ∈ R dx R ˜ k γ − ˜ π γ k V

c

6 C γ ρ k γ V c (x), where V c (x) = exp(ckxk 2 ).

Proof. The proof is postponed to Section 5.1.2.

Similarly to Proposition 1 with respect to R γ , Proposition 3 implies that ˜ R γ is V c - uniformly geometrically ergodic. However in contrast to Proposition 1, the dependency of the rate of convergence with respect to the step size γ is not explicit anymore since the results and the method employed in [9] or [17] cannot be applied anymore.

Note that Proposition 1 and Proposition 3 imply that R γ and ˜ R γ converge to π γ and

˜

π γ respectively in total variation and Wasserstein metric of any order p ∈ [1, +∞).

1 [9, Theorem 15] consider the case where T γ comes from the Euler discretization scheme and has

form T γ (x) = x + γb(x).

(7)

Based on the two assumptions above, we can now state one of our main results. Our goal is to quantify the distance between the laws of the iterates of the two chains (Y k ) k∈ N

and ( ˜ Y k ), in particular starting from the same initial point x ∈ R d or at equilibrium.

Indeed, remark that, in view of Propositions 1 and 3, letting k → +∞ in the next statement yields quantitative bounds on W cγ , π ˜ γ ) for any γ ∈ (0, γ ¯ ∧ {1/m}].

Theorem 4. Assume H1 and H2 hold. There exist some explicit constants C, c > 0, ρ ∈ [0, 1) satisfying for any k ∈ N , γ ∈ (0, ¯ γ] and x, x ˜ ∈ R d ,

W cx R γ k , δ x ˜ R ˜ k γ ) 6 γk V (kx − xk) + ˜ cc, where c(x, x) = ˜ ˜ c(kxxk), with ˜

c, V ) ∈ {( 1 (0,+∞) , |·|), (|·| , |·|), ( 1 (0,+∞) exp(|·|), 1 (0,+∞) exp(|·|))} .

Remark 5 . It is also possible to treat the case of cost functions c of the form c(x, y) = ˜ c(kxyk) (V (x) + V (y)) with ˜ c as in Theorem 4 and V a positive function, simply by using Hölder’s inequality. Indeed, for p, q > 1 with 1/p + 1/q = 1, we can bound

W c (ν, µ) 6 W c

p

(ν, µ) 1/p (ν (V q )) 1/q + (µ (V q )) 1/q

with c p (x, y) = ˜ c p (kx − yk). Bounds on the W c

p

distance can then be established as in Theorem 4, while bounds on expected values of V q , independent from γ, are clas- sically obtained through Lyapunov arguments (see e.g. the proof of Proposition 1 in Section 5.1.1).

The rest of this section is devoted to the proof of this result. In particular, we define in the following the main object of this paper.

2.2. The discrete sticky kernel. We define a Markovian coupling of the two chains (Y k ) k∈ N and ( ˜ Y k ) k∈ N defined in (1) and (2) by using, at each step, the maximal reflec- tion coupling of the two Gaussian proposals, which is optimal for the total variation distance (i.e that maximizes the probability of coalescence). Let (U k ) k > 1 be a sequence of i.i.d. uniform random variables on [0, 1] independent from (Z k ) k > 1 which we recall is a sequence of i.i.d. d-dimensional standard Gaussian random variables. We define the discrete sticky Markov coupling K γ of R γ and ˜ R γ as the Markov kernel associated with the Markov chain on R 2d given for k ∈ N by

(6) X k+1 = T γ (X k ) + (σ 2 γ )

1

/

2

Z k+1

X ˜ k+1 = X k+1 B k+1 + (1 − B k+1 )F γ (X k , X ˜ k , Z k+1 ) , where B k+1 = 1 [0,+∞) (p γ (X k , X ˜ k , Z k+1 ) − U k+1 ) and

F γ (x, x, z) = ˜ ˜ T γx) + (σ 2 γ)

1

/

2

n Id −2e(x, x)e(x, ˜ x) ˜ T o z ,

(7) E(x, x) = ˜ ˜ T γx) − T γ (x) , e(x, x) = ˜

( E(x,˜ x)

kE(x,˜ x)k if E(x, x) ˜ 6= 0 e 0 otherwise ,

p γ (x, x, z) = 1 ˜ ∧

ϕ σ

2

γ

n kE(x, x)k − ˜ (σ 2 γ)

1

/

2

he(x, x), zi ˜ o ϕ σ

2

γ 2 γ)

1

/

2

he(x, x), zi ˜

, ,

(8)

where e 0 ∈ R d is an arbitrary unit-vector, i.e. ke 0 k = 1, and ϕ σ

2

γ is the density of the one-dimensional Gaussian distribution with mean 0 and variance σ 2 γ . In other words, K γ is given for any γ ∈ (0, ¯ γ], (x, y) ∈ R 2d and A ∈ B( R 2d ) by

K γ ((x, x), ˜ A) = Z

R

d

1 A (T γ (x) + (σ 2 γ)

1

/

2

z, T γ (x) + (σ 2 γ)

1

/

2

z)p γ (x, x, z) ˜ e

kzk2

/

2

(2π)

d

/

2

dz +

Z

R

d

1 A

T γ (x) + (σ 2 γ)

1

/

2

z, F γ (x, x, z) ˜ (1 − p γ (x, x, z)) ˜ e

kzk2

/

2

(2π)

d

/

2

dz . In words, from the initial conditions (x, x), this coupling works as follows: first, a ˜ Gaussian variable Z k+1 is drawn for the fluctuation of X k+1 . Then, ˜ X k+1 is made equal to X k with probability p(x, x, Z ˜ k+1 ) and, otherwise, the fluctuation of ˜ X k+1 with respect to its average ˜ T γx) is given by the orthogonal reflection of Z k+1 in the direction T ˜ γx) − T γ (x).

It is well known that for any (x, x) ˜ ∈ R d , K γ ((x, x), ˜ A × R d ) = R γ (x, A) and K γ ((x, x), ˜ R d × A) = ˜ R γ (x, A), see e.g. [4, Section 3.3], [13, Section 4.1], [17] or [9].

The starting point of our analysis is the next result, which will enable to compare the coupling difference process X k+1X ˜ k+1

with a Markov chain on [0, +∞). Define (G k ) k > 1 for any k > 1 by

(8) G k = he(X k−1 , X ˜ k−1 ), Z k i ,

where e is given by (7). For any a > 0, g ∈ R , u ∈ [0, 1] and γ ∈ (0, ¯ γ] define H γ (a, g, u) = 1 [0,+∞) (u − p σ

2

γ (a, g)) a − 2(σ 2 γ)

1

/

2

g , (9)

where

(10) p σ

2

γ (a, g) = 1 ∧ ϕ σ

2

γ a − (σ 2 γ)

1

/

2

g ϕ σ

2

γ2 γ)

1

/

2

g .

Proposition 6 . Assume H1 and H2 hold. Then for any γ ∈ (0, ¯ γ], k ∈ N , almost surely, we have

(11) kX k+1X ˜ k+1 k 6 G γ (kX kX ˜ k k, G k+1 , U k+1 ) ,

where (X k , X ˜ k ) k∈ N are defined by (6), and for any w ∈ [0, +∞), g ∈ R and u ∈ [0, 1], G γ (w, g, u) = H γγ (w) + γc, g, u) .

In addition, for any g ∈ R d and u ∈ [0, 1], w → G γ (w, g, u) is non-decreasing.

Proof. The proof is postponed to Section 5.1.3.

Consider now the stochastic process (W k ) k∈ N starting from kX 0 − X ˜ 0 k and defined by induction on k as follows,

W k+1 = G γ (W k , G k+1 , U k+1 )

=

( τ γ (W k ) + γc − 2σ √

γG k if U k+1 > p σ

2

γγ (W k ) + γc , G k )

0 otherwise.

(12)

By definition (8) and (7), an easy induction implies that (G k ) k > 1 and (U k ) k > 1 are

independent, (G k ) k > 1 are i.i.d. standard Gaussian random variables and (U k ) k > 1 are

(9)

i.i.d. uniform random variables on [0, 1]. Therefore, (W k ) k∈ N is a Markov chain with Markov kernel Q γ defined for w ∈ [0, +∞) and A ∈ B([0, +∞)) by

Q γ (w, A) = δ 0 (A) Z

R

p σ

2

γγ (w) + γc, g)ϕ(g)dg (13)

+ Z

R

1 A

τ γ (w) + γc ∞ − 2σγ 1/2 g {1 − p σ

2

γγ (w) + γc, g)}ϕ(g)dg , where ϕ is the density of the standard Gaussian distribution on R . By Proposition 6, we have almost surely for any k ∈ N ,

(14) kX kX ˜ k k 6 W k .

Another consequence of Proposition 6 is that Q γ is stochastically monotonous (see e.g. [23] or [28]), more precisely if (W k ) k∈ N and ( ˜ W k ) k∈ N are two chains given by (12) with the same variables (G k , U k ) k∈ N with W 0 6 W ˜ 0 , then almost surely W k 6 W ˜ k for all k ∈ N . This nice property will be used several times in the analysis of this chain.

The main consequence of (14) is the following result.

Corollary 7 . Assume H 1 and H 2 hold. Let c : R 2d → [0, +∞) of the form c(x, y) = ˜ c(kxyk) for some non-decreasing function ˜ c : [0, +∞) → [0, +∞), ˜ c(0) = 0.

For any x, x ˜ ∈ R d and k ∈ N , W cx R k γ , δ ˜ x R ˜ k γ ) 6

Z

R

2d

c(y, y)K ˜ γ k ((x, x), ˜ d(y, y)) ˜ 6 Z +∞

0

˜

c( ˜ w)Q k γ (kx − xk, ˜ d ˜ w) . Proof. Let k ∈ N. By (14) and since ˜ c is non-decreasing, we get almost surely

˜ c(kX kX ˜ k k) 6 ˜ c(W k ). Taking the expectation concludes the proof.

From Corollary 7, the question to get bounds on W cx R k γ , δ ˜ x R ˜ γ k ) boils down to the study of the Markov kernel Q γ on [0, +∞), which is the main part of our work.

2.3. Analysis of the auxiliary Markov chain. We start with a Lyapunov/drift result.

Proposition 8 . Assume H1 and H2 hold. Then for any w > 0,

Q γ V 1 (w) 6 (1 − γm) V 1 (w) 1 (R

1

,+∞) (w) + (1 + γ L) V 1 (w) 1 (0,R

1

] (w) + γc, where Q γ is defined by (13) and for any w ∈ R , V 1 (w) = |w| .

Proof. The proof is postponed to Section 5.1.4.

Proposition 8 implies in particular that for any w ∈ R ,

Q γ V 1 (w) 6 (1 − γ m) V 1 (w) 1 (R

1

,+∞) (w) + γ[(L + m)R 1 + c ] . Then, a straightforward induction shows that for any k ∈ N,

Q k γ V 1 (w) 6 (1 − γ m) k V 1 (w) + [(L + m)R 1 + c ∞ ]/m , and therefore by Corollary 7 taking ˜ c(t) = t,

(15) W 1 (δ x R k γ , δ ˜ x R ˜ k γ ) 6 (1 − γm) k kx − xk ˜ + [(L + m)R 1 + c ]/m .

However, this result is not sharp as k → +∞. Indeed, in the case c ∞ = 0, R γ = ˜ R γ and

by Proposition 1, it holds that W 1 (δ x R k γ , δ x ˜ R ˜ k γ ) → 0 as k → +∞, while the right-hand

side of (15) converges to (L + m)R 1 /m 6= 0. We need to refine our results in order to fill

this gap. To this end, we need to analyze more precisely the long-time behavior of Q γ .

A first step is to show that it is ergodic.

(10)

Proposition 9 . Assume H 1 and H 2 hold. For any γ ∈ (0, γ], Q γ admits a unique invariant probability measure µ γ and is geometrically ergodic. In addition, µ γ ({0}) > 0 and µ γ is absolutely continuous with respect to the measure δ 0 + Leb on ([0, +∞), B([0, +∞))). Finally, in the case c ∞ 6= 0, µ γ and δ 0 + Leb are equivalent.

Proof. The proof is postponed to Section 5.1.5.

Corollary 10. Assume H 1 and H 2 hold. Let c : R 2d → [0, +∞) of the form c(x, y) = ˜ c(kxyk) for some non-decreasing function ˜ c : [0, +∞) → [0, +∞), ˜ c(0) = 0.

For any x, x ˜ ∈ R d and k ∈ N , (16) W cx R k γ , δ ˜ x R ˜ γ k ) 6

Z +∞

0

˜

c( ˜ w){Q k γ (kx − xk, ˜ ·) − µ γ }(d ˜ w) + µ γc) .

where µ γ is the stationary distribution of Q γ given by (13). In particular, if x = ˜ x, W cx R k γ , δ x R ˜ γ k ) 6 µ γc).

Proof. The proof of (16) is a consequence of Proposition 9 and Corollary 7. The last statement follows from the fact that Q γ is stochastically monotonous. Indeed, by Proposition 6, for any w, w ˜ ∈ [0, +∞), w 6 w, and ˜ a ∈ [0, +∞), Q γ (w, [0, a]) >

Q γ ( ˜ w, [0, a]). Therefore, for any a ∈ [0, +∞), w 7→ Q γ (w, [0, a]) is non-increasing on [0, +∞) and for any non-increasing bounded function f, Q γ f(w) > Q γ f ( ˜ w) for any w, w ˜ ∈ [0, +∞), w 6 w. As a result, a straightforward induction shows that for any ˜ k ∈ N , w, w ˜ ∈ [0, +∞), w 6 w, and ˜ a ∈ [0, +∞), Q k γ (w, [0, a]) > Q k γ ( ˜ w, [0, a]). Then, we obtain Q k γ (0, [0, a]) > R 0 +∞ µ(dw)Q k γ (w, [0, a]) = µ γ ([0, a]). Since ˜ c is non-decreasing on [0, +∞), we get Q γ ˜ c(0) 6 µ γc), which combined with (16) completes the proof.

Corollary 10 then naturally brings us to derive moment bounds for the stationary distribution µ γ , γ ∈ (0, γ] and quantitative convergence for ¯ Q γ to µ γ . Our next results address these two problems.

Theorem 11 . Assume H1 and H2 hold. For any δ ¯ ∈ 0, {L −1 ∧ (σe −1 /c ∞ ) 2 } and γ ∈ (0, γ ¯ ],

(17)

Z

[0,+∞)

w µ γ (dw) 6 cc 1 , µ γ ((0, +∞)) 6 cc 2 ,

where µ γ is the stationary distribution of Q γ given by (13), and, considering ζ given below in (58),

c 1 = η 1 R 1 (1 + L/m) + 1/m ,

c 2 = e δ+¯ γ)L (c 1 (1 + ¯ γL)/ ¯ δ 1/2 + [¯ δ + ¯ γ] 1/2 )/( √

2πσ) + 2ζ[¯ δ + ¯ γ ] 1/2 e 3(¯ δ+¯ γ)L 3 , η 1 = [¯ δ + ¯ γ] 1/2

"

ζe 3(¯ δ+¯ γ)L

σ 3 + e δ+¯ γ)L 2 √

2πσ

#,

Φ − (1 + ¯ γ L)R 1 + (¯ δ + ¯ γ)c δ 1/2 σe −(¯ δ+¯ γ)L

! . Proof. The proof is postponed to Section 5.1.6.

Theorem 12 . Assume H1 and H2 hold. For any a > 0 and γ ∈ (0, γ ¯ ], Z +∞

0 1 (0,+∞) (w) exp(aw) dµ γ (w) 6 cc 3 ,

where c 3 is explicitly given in the proof and µ γ is the stationary distribution of Q γ given

by (13).

(11)

Proof. The proof is postponed to Section 5.1.7.

We now specify the convergence of Q γ to µ γ for any γ ∈ (0, γ]. ¯

Theorem 13 . Assume H 1 and H 2 hold. There exist explicit constants ρ ∈ [0, 1) and C > 0 such that for any γ ∈ (0, γ], ¯ w > 0,

w Q k γµ γ k V 6 γk V (w) , where V (w) = 1 + |w| or V (w) = exp(a |w|), for a > 0.

Proof. The proof is postponed to Section 5.1.8.

Combining the results of Corollary 10, Theorem 11, Theorem 12 and Theorem 13 allows to address the main questions raised in this section and prove Theorem 4.

Discussion on the bounds provided by Theorem 11. In this paragraph, we discuss how the constants c 1 , c 2 given in Theorem 11 behaves with respect to the parameters R 1 , L, m in the limit c ∞ → 0 and ¯ γ → 0. For ease of presentation, we also only consider the case σ = 1.

(1) First consider the case R 1 = 0. then c 1 ≈ m −1 and c 2 ≈ [C 1 /[m ¯ δ 1/2 ] + C 2 δ ¯ 1/2 for

¯ δ ∈ 0, {L −1 ∧ (σe −1 /c ∞ ) 2 } and numerical constants C 1 , C 2 > 0. Then, since in the case R 1 = 0, we can take L → 0 and ¯ δ can be arbitrary chosen in (0, +∞). Taking

¯ δ ← m −1 , we obtain that c 2 ≈ C ˜ 1 /m 1/2 . Note that the dependency of c 1 , c 2 with respect to m 1/2 is then sharp. Indeed, consider the particular example of two auto-regressive processes for which T γ (y) = (1 − %γ)y and ˜ T γ (y) = (1 − %γ)y + γ%a for γ ∈ 0, % −1 and a, % > 0. Then, on the one hand, H 1 and H 2 are satisfied with R 1 = 0, m =

% and c ∞ = %a which lead to c 1 c ∞ ∼ a and c 2 c ∞ ∼ Ca/% 1/2 , for some universal constant C > 0. On the other hand, an easy computation (see e.g. [12]) shows that the stationary distributions π γ and ˜ π γ provided by Proposition 1 and Proposition 3 are N(0, % −1 (2 − γ%γ) −1 ) and N(a, % −1 (2 − γ%γ) −1 ) respectively. Therefore, we get W 1 (π γ , ˜ π γ ) = a and kπ γ − ˜ π γ k TV ∼ Ca/% 1/2 as γ, a → 0 and we get up to universal constants c 1 and c 2 .

(2) We now consider the case R 1 > 1, L = 0. Note that in this case ¯ δ can be arbitrary taken in (0, 1). Then, for some universal constants C 1 , C 2 , C 3 , η 1 ∼ C 1 δ ¯ 1/2 /Φ{C 2 R 1 / δ ¯ 1/2 + C 3 cδ ¯ 1/2 }. Therefore, taking ¯ δ ← m −1R 2 1 , we get that for some universal constants D 1 , D 2 , E > 0, c 1 ≈ D 1 [(R 1 ∨ m −1/2 ) + m −1 ], c 2 ≈ Em −1/2R 1 . Note that the behaviour of c 2 with respect to R 1 and m is consistent with the results obtained in [18] (see [18, Lemma 1]) for the stationary distributions of continuous sticky processes. Note that it is shown in [18, Example 2] that this bound is sharp with respect to R 1 and m.

(3) In the case R 1 ∧ L > 1, taking ¯ δ = L −1 since we are in the regime c ∞ → 0, we get that up to logarithmic term and using ¯ γ 6 L −1 , c 1 , c 2 ≈ exp[e 4 (R 1 L

1

/

2

+ c ∞ ) 2 ]. The estimate for c 2 is also consistent with [18, Lemma 1] which holds for stationary distributions of continuous sticky processes.

3. Continuous-time limit. In the case where T γ and ˜ T γ are specified by (5), then under appropriate conditions on b and ˜ b, it can be shown, see e.g. [9, Proposition 25], that for any T > 0 and x ∈ R d ,

(18) lim

m→+∞ {kδ x R m T /m − δ x P T k V + kδ x R ˜ m T /m − δ x P ˜ T k V } = 0 ,

(12)

for some measurable function V : R d → [1, +∞) and where (P t ) t > 0 and ( ˜ P t ) t > 0

are the Markov semigroup corresponding to (1) and (2). Then, this naturally im- plies convergence in total variation and also Wasserstein distance of order p if inf x∈ R

d

{V (x)/ kxk p } > 0. As a consequence, results of Section 2 immediately trans- fer to the continuous-time processes. More precisely, let c : R 2d → [0, +∞) of the form c(x, y) = ˜ c(kxyk) for some non-decreasing function ˜ c : [0, +∞) → [0, +∞), ˜ c(0) = 0.

If (18) holds and sup x,y∈ R

d

{c(x, y)/{V (x)+ V (y)} < +∞, we get by the triangle inequal- ity that for any x, x ˜ ∈ R d , T > 0, W cx P T , δ x ˜ P ˜ T ) 6 lim inf m→+∞ W cx R m T /m , δ x ˜ R ˜ m T /m ).

Then, results of Section 2 can be applied implying if H 1 and H 2 holds, that for any x, x ˜ ∈ R d there exist C 1 , C 2 > 0 such that for any T > 0, W cx P T , δ x ˜ P ˜ T ) 6 C 1 ρ T + C 2 c ∞ . We therefore generalize the result provided in [18] which is specific to the total variation distance. We do not give a specific statement for this result which is mainly technical and is not the main subject of this paper. Instead, the goal of this section is to study the continuous-time limit of the coupling (6) (and not only of its marginals) toward some continuous-time sticky diffusion.

More precisely, let (γ n ) n∈ N be a sequence of step sizes such that lim n→+∞ γ n = 0 and w 0 > 0. Then, consider the sequence of Markov chains {(W k (n) ) k∈ N : n ∈ N} for any n ∈ N , (W k (n) ) k∈ N is the Markov chain defined by (12) with W 0 (n) = w 0 , γ = γ n and therefore associated with the Markov kernel Q γ

n

. Let {(W (n) t ) t∈(0,+∞) : n ∈ N } be the continuous linear interpolation of {(W k (n) ) k∈ N : n ∈ N }, i.e. the sequence of continuous processes defined for any n ∈ N , t ∈ (0, +∞) by

(19) W (n) t = W bt/γ (n)

n

c + {W dt/γ (n)

n

e − W bt/γ (n)

n

c }{t/γ n − bt/γ n c} .

Note that for any k ∈ N and h ∈ [0, γ n ], W (n)

n

+h = W k + (h/γ n ){W k+1 (n)W k (n) }. We denote by W = C([0, +∞), R ) endowed with the uniform topology on compact sets, W its corresponding σ-field and (W t ) t > 0 the canonical process defined for any t ∈ (0, +∞) and ω ∈ W by W t (ω) = ω t . Denote by (W t ) t > 0 the filtration associated with (W t ) t > 0 . Note that {(W (n) t ) t∈(0,+∞) : n ∈ N } is a sequence of W -valued random variables. The main result of this section concerns the convergence in distribution of this sequence.

We consider the following assumption on the function τ γ .

A 1 . There exists a function κ : [0, +∞) → [0, +∞) such that for any γ ∈ (0, γ], ¯ τ γ (w) = w + γκ(w) and κ(0) = 0. In addition, κ is L κ -Lipschitz: for any w 1 , w 2 ∈ (0, +∞), |κ(w 1 ) − κ(w 2 )| 6 L κ |w 1 − w 2 |.

This is not a restrictive condition since, under H 1, up to a possible modification of τ γ , it is always possible to ensure A1.

Under A 1, we consider a sticky process [29, 30, 18], which solves the stochastic differential equation

(20) dW t = {κ(W t ) + c ∞ }dt + 2σ 1 (0,+∞) (W t )dB t ,

where (B t ) t > 0 is a one-dimensional Brownian motion. Note that for any initial distri- bution µ 0 on ( R , B( R d )), (20) admits a unique weak solution by [18, Lemma 6].

The main result of this section is the following.

Theorem 14 . Assume A 1. Then, the sequence {(W (n) t ) t > 0 : n ∈ N } defined by

(19) converges in distribution to the solution (W t ) t > 0 of the SDE (20).

(13)

The proof of this theorem follows the usual strategy employed to show convergence of a sequence of continuous processes to a Markov process. A first step is to show that under A 1, {(W (n) t ) t > 0 : n ∈ N } is uniformly bounded in L q for some q > 2, on [0, T ] for any T > 0.

Proposition 15. Assume A 1. Then for any T > 0, there exists C T > 0 such that sup n∈ N E [sup t∈[0,T ] {W (n) t } 4 ] 6 C T where (W (n) t ) t > 0 is defined by (19).

Proof. The proof is postponed to Section 5.2.1.

Then, we are able to obtain the tightness of the sequence of stochastic processes {(W (n) t ) t > 0 : n ∈ N }.

Proposition 16 . Assume A1. Then, {(W (n) t ) t > 0 : n ∈ N } is tight in W . Proof. The proof is postponed to Section 5.2.2.

Denote for any n ∈ N , µ n the distribution of (W (n) t ) t > 0 on W . Then, by Prohorov’s Theorem [3, Theorem 5.1,5.2], (µ n ) n∈ N admits a limit point. If we now show that every limit point associated with {(W (n) t ) t > 0 : n ∈ N} is a solution of the SDE (20) using that {(W (n) t ) t > 0 : n ∈ N } is tight again and since (20) admits a unique weak solution, the proof of Theorem 14 will be completed. To establish this result, we use the characterization of solutions of SDEs through martingale problems. More precisely by [6, Theorem 1.27], the distribution µ on W of (W t ) t > 0 , solution of (20), is the unique solution to the martingale problem associated with µ 0 , the drift function w 7→ κ(w) + c

and the variance function 2σ 1 (0,+∞) , i.e. it is the unique probability measure satisfying on the filtered probability space ( W , W, (W t ) t > 0 , µ):

(a) the distribution of W 0 is µ 0 ;

(b) the processes (M t ) t > 0 , (N t ) t > 0 defined for any t > 0 by (21) M t = W t − W 0 −

Z t 0

{c ∞ + κ(W u )}du , N t = M 2 t − 4σ 2 Z t

0 1 (0,+∞) (W u )du , are (W t ) t > 0 -local martingales.

In other words, it corresponds in showing that (M t ) t > 0 is a (W t ) t > 0 -local martingales and by [27, Theorem 1.8] identifying its quadratic variation (hMi t ) t > 0 as the process (4σ 2 R 0 t 1 (0,+∞) (W u )du) t > 0 . Therefore, Theorem 14 is a direct consequence of the fol- lowing result.

Theorem 17. Assume A 1. Let µ be a limit point ofn ) n∈ N . Then, the two processes (M t ) t > 0 and (N t ) t > 0 defined by (21) are (W t ) t > 0 -martingales on ( W, W, (W t ) t > 0 , µ ).

Proof. The proof is postponed to Section 5.2.4.

Consider the differential operators A, A ˜ defined for any ψ ∈ C 2 ( R ) by Aψ(w) = {κ(w) + c ∞ }ψ 0 (w) + 2 1 (0,+∞) (w)σ 2 ψ 00 (w) (22)

Aψ(w) = ˜ {κ(w) + c ∞ }ψ 0 (w) + 2σ 2 ψ 00 (w) ,

where κ is arbitrary extended on R . Note that A is the extended generator associated

with (20). A crucial step in the proof of Theorem 17 is the following.

(14)

Proposition 18 . Let ϕ ∈ C 3 ( R ), satisfying

(23) sup

w∈ R

{|ϕ|(w)/(1 + w 2 ) + |ϕ 0 |(w)/(1 + |w|) + |ϕ 00 |(w) + |ϕ (3) |(w)} < +∞ . Then, for any N ∈ N , (t 1 , . . . , t N , s, t) ∈ [0, +∞) N+2 , 0 6 t 1 6 · · · 6 t N 6 s < t, ψ : [0, +∞) N → R, positive, continuous and bounded, it holds that

(24) lim

n→+∞ E

ϕ(W (n) t ) − ϕ(W (n) s ) − Z t

s

Aϕ(W (n) u )du

ψ(W (n) t

1

, . . . , W (n) t

N

)

= 0 . In addition, if ϕ 00 (w) > 0 for any w ∈ R, it holds that

(25) lim sup

n→+∞ E

ϕ(W (n) t ) − ϕ(W (n) s ) − Z t

s

Aϕ(W ˜ (n) u )du

ψ(W (n) t

1

, . . . , W (n) t

N

)

6 0 . Proof. The proof is postponed to Section 5.2.3.

Note that while Proposition 18-(24) is in general sufficient to conclude on the con- vergence of the sequence of processes {(W (n) t ) t > 0 : n ∈ N } (see e.g. [19]), in our setting, it is not enough to complete the proof of Theorem 14 since the diffusion coefficient associated with A is discontinuous. To circumvent this issue, we adapt to our sequence {(W (n) t ) t > 0 : n ∈ N} the same strategy employed in [26, Proposition 6].

4. An application in Bayesian statistics: parameter estimation in an ODE.

4.1. Setting and verifying the assumptions. Consider an ordinary differential equa- tion ODE on R n of the form

(26) x ˙ θ (t) = f θ (x θ (t), t) , x θ (0) = x 0 ∈ R n ,

where {f θ :, θ ∈ R d } is a family of function from R n × [0, +∞) to R n parametrized by some parameter θ ∈ R d . In all this section x 0 ∈ R n is assumed to be fixed and we consider the following assumption.

AO 1 . For all θ ∈ R d there exists a unique solution of (26) defined for all positive times, which we denote by (x θ (t)) t > 0 . In addition, the functions (θ, x, t) ∈ R d × R n × [0, +∞) 7→ f θ (x, t) and (θ, t) ∈ R d × [0, +∞) 7→ x θ (t) are continuously differentiable.

In fact the continuous differentiability of (θ, t) 7→ x θ (t) is a consequence of the one of (θ, x, t) 7→ f θ (x, t), see e.g. [31, Theorem 4.D].

To fix ideas, throughout this section, we will repeatedly discuss the following case of a logistic equation.

Example 19 . For r ∈ C 1 ( R , R + ), set f θ (x) = x(1r(θ)x) for any θ, x ∈ R , so that (26) reads

˙

x θ (t) = x θ (t) (1 − r(θ)x θ (t)) , x θ (0) = x 0 ,

with x 0 > 0. In this example, AO 1 holds and, r and x 0 being positive, for all θ ∈ R ,

the solution of (26) is such that x θ (t) ∈ [0, e t x 0 ] for all t > 0. Indeed, x 7→ x(1r(θ))

is locally Lipschitz continuous, which yields existence and uniqueness of a maximal

solution. Since 0 is always an equilibrium, solutions stay positive, from which x 0 θ (t) 6

x θ (t) for all t > 0, implying that x θ (t) 6 e t x 0 for all t > 0. This also implies non-

explosion, hence the solution is defined on [0, +∞).

(15)

We consider the problem of estimating θ based on some observation of a trajectory of the ODE. More precisely, for T > 0, N ∈ N , (t 1 , . . . , t N ) ∈ R N , 0 < t 1 < · · · < t N = T , the statistical model corresponding to the observation y = (y i ) i∈{1,...,N} ∈ ( R n ) N is given by

(27) y i = x θ (t i ) + ε i ,

for θ ∈ R d and where (ε i ) i∈{1,...,N} are independent and identically random variables on R n distributed according to some known positive density ϕ ε with respect to the Lebesgue measure. Given a prior distribution with positive density π 0 on R d , the a posteriori distribution for this model admits a positive density π with respect to the Lebesgue measure which is characterized by the potential U given by (up to an additive constant)

− log π(θ) = U(θ) = − ln π 0 (θ) −

N

X

i=1

ln ϕ ε (y ix θ (t i )) . We consider the following assumption on π 0 and ϕ ε setting − log(π 0 ) = U 0 .

AO2 . The functions π 0 and ϕ ε are twice continuously differentiable and there exist m U > 0, L U , R U > 0 such that ∇U 0 is L U -Lipschitz continuous and for any θ, θ ˜ ∈ R d with kθ − θk ˜ > R U ,

hθ − θ, ˜ ∇U 0 (θ) − ∇U 0 (˜ θ)i > m U kθ − θk ˜ 2 .

In practice, expectations with respect to the posterior distribution can be approx- imated by ergodic means of the Unadjusted Langevin Algorithm (ULA), namely the Markov chain

(28) X k+1 = X kγ∇U (X k ) + p 2γZ k+1 ,

where γ > 0 and (Z k ) k∈ N are independent and identically standard Gaussian variables.

The long-time convergence of this algorithm and the numerical bias on the invariant measure due to the time discretization are well understood, see e.g. [7, 14, 15, 8, 10]

and references therein. However, in the present case, it is not possible to sample this Markov chain, as the exact computation of

(29) ∇U(θ) = −∇ θ ln π 0 (θ) +

N

X

i=1

θ x θ (t i )∇ x ln ϕ ε (y ix θ (t i )) ,

is not possible in most cases because of the term involving x θ and ∇ θ x θ . Here ∇ θ and

x denote the gradient operator with respect to θ and x respectively. Therefore, only approximations of these two functions can be used in place of (x θ (t i ), ∇ θ x θ (t i )) i∈

J 0,N K , which leads to an additional discretization bias. Our results based on the sticky coupling yields a quantitative bound on this error (with respect to the ideal ULA above). Let us detail this statement.

First, remark that t 7→ z θ (t) = (x θ (t), ∇ θ x θ (t)) solves

(30) z ˙ θ (t) = F θ (z θ (t), t) z θ (0) = z 0 = (x 0 , 0)

on R n × M d,n ( R ) with for any x ∈ R n , A ∈ M d,n ( R ), θ ∈ R d , t > 0,

(31) F θ ((x, A), t) = (f θ (x, t), ∇ θ f θ (x, t) + A∇ x f θ (x, t)) .

Références

Documents relatifs

In Section 1, we state the MDP for weakly dependent random variables with applications to functions of mixing sequences, Markov chains, and functions of linear process... devoted to

In this section we shall explain how Fermat’s claim is related to torsion points of very special elliptic

For such a class of non-homogeneous Markovian Arrival Process, we are concerned with the following question: what is the asymptotic distribution of {N t } t≥0 when the arrivals tend

In this note, we prove that the density of the free additive convolution of two Borel probability measures supported on R whose Cauchy transforms behave well at innity can be

If W is a non-affine irreducible finitely generated Coxeter group, then W embeds as a discrete, C-Zariski dense subgroup of a complex simple Lie group with trivial centre, namely

In cooperation with Yves Bertot and Loïc Pottier (2003), they have designed an interface between the Coq proof assistant and the GeoPlan dynamic

One of the most important theorems on branching processes with n-types of particles (n-dimensional Galton-Watson processes) is the Sevastyanov’s theorem on

They deal with the different kinds of definitions which the mathematicians talked about; the features of their defining processes (a focus is made on the interplay between