ASYMPTOTIC EQUIVALENCE FOR INHOMOGENEOUS JUMP DIFFUSION PROCESSES AND WHITE NOISE

(1)

ESAIM: Probability and Statistics

DOI:10.1051/ps/2015005 www.esaim-ps.org

ASYMPTOTIC EQUIVALENCE FOR INHOMOGENEOUS JUMP DIFFUSION PROCESSES AND WHITE NOISE

Ester Mariucci

¹

Abstract. We prove the global asymptotic equivalence between the experiments generated by the discrete (high frequency) or continuous observation of a path of a time inhomogeneous jump-diﬀusion process and a Gaussian white noise experiment. Here, the parameter of interest is the drift function and the observation timeT can be both bounded or diverging. The approximation is given in the sense of the Le Cam Δ-distance, under some smoothness conditions on the unknown drift function. These asymptotic equivalences are established by constructing explicit Markov kernels that can be used to reproduce one experiment from the other.

Mathematics Subject Classification. 62B15, 62G20, 60G51, 62C20.

Received May 3, 2014. Revised February 4, 2015.

1. Introduction

Consider a sequence of one-dimensional time inhomogeneous jump-diﬀusion processes{X_t}t≥0deﬁned by X_t=η+

_t

0

f(s)ds+ _t

0

σ_n(s)dW_s+

Nt

i=1

Y_i, t∈[0, T_n], (1.1)

where:

• η is some random initial condition;

• W ={W_t}t≥0is a standard Brownian motion;

• N={N_t}_t≥0 is an inhomogeneous Poisson process with intensity functionλ(·), independent ofW;

• (Y_i)_i≥1 is a sequence of i.i.d. real random variables with distributionG, independent ofW andN;

• σ_n²(·) is supposed to be known. Either T_n → ∞ and σ_n(·) = σ(·) does not depend on n or T_n ≡ T and σ_n(·) =ε_nσ(·) withε_n→0 asn→ ∞.

• f(·) belongs to some non-parametric class F making its estimation consistent (e.g. if T_n → ∞ one may requireF to consist of a subclass of periodic functions).

• λ(·) andG(·) are unknown and belong to non-parametric classesΛandG, respectively.

Keywords and phrases.Non-parametric experiments, Le Cam distance, asymptotic equivalence, L´evy processes, additive processes, white noise.

1 Laboratoire Jean Kuntzmann, 51 rue des Math´ematiques, 38041 Grenoble cedex 09, France.[email protected]

Article published by EDP Sciences c EDP Sciences, SMAI 2015

(2)

We observe{X_t}_t≥0 at discrete times 0 =t₀ < t₁ < . . . < t_n =T_n such that Δ_n= max_1≤i≤n

|t_i−t_i−1|

↓ 0 asngoes to inﬁnity. We are interested in estimating the drift functionf(·) from the discrete data (X_t_i)ⁿ_i=0. At least two natural questions arise:

1. How much information about the parameterf(·) do we lose by observing (Xti)ⁿ_i=0 instead of{X_t}_t∈[0,T_n]? 2. Can we construct an easier (read: mathematically more tractable), but equivalent, model from (X_t_i)ⁿ_i=0? The aim of this paper is to give an answer to questions (1) and (2) by means of the Le Cam theory of statistical experiments. For the basic concepts and a detailed description of the notion of asymptotic equivalence that we shall adopt, we refer to [25,26]. We recall the relevant deﬁnitions and properties in Section2.2.

One of the main applications of proving an asymptotic equivalence between two sequences of experiments is that it allows to transfer asymptotic risk bounds for any inference problem from one model to the other, at least for bounded loss functions. In particular, if there is an estimator τ₁ in the statistical model P1 = (X1,A1,{P_1,θ:θ∈Θ}) with risk

L(θ, τ(x))P_1,θ(dx), then, for bounded loss functionsL, there is an estimator τ₂in P2 such that

sup

θ

L(θ, τ₁(x))P_1,θ(dx)−

L(θ, τ₂(x))P_2,θ(dx)

→0, asn→ ∞.

More generally, asymptotic equivalence allows to transfer minimax rates of convergence, up to some constants.

The first asymptotic equivalence results for non-parametric experiments date to 1996 and are due to Brown and Low [1] and Nussbaum [31]. This is the first instance of an abundance of works devoted to establishing asymptotic equivalence results for non-parametric experiments. In particular, asymptotic equivalence theory has been developed for non-parametric regression [1,3,7–9,21,29,32,34], non-parametric density estimation models [4,6,23,31], generalized linear models [20], time series [22,30], diffusion models [11–13,17,18,27,33], GARCH model [5], functional linear regression [28] and spectral density estimation [19]. Negative results are somewhat harder to come by; the most notable among them are [2,15,37].

There is however a lack of equivalence results concerning processes with jumps. To our knowledge, this is the first one for what concerns the estimation of a drift function issued from a discretely (high frequency) observed Lévy process. We actually allow it to be inhomogeneous in time,i.e. an additive process. In this setting one should also cite the works [14,16] as they are the only ones we know about treating (pure jumps) Lévy processes.

However, they both give asymptotic results for the estimation of the L´evy’s measure.

The interest in Lévy processes is due to them being a building block for stochastic continuous time models with jumps. Because of that, they are widely used in finance, queueing, telecommunications, extreme value theory, quantum theory or biology. Their stationarity property, however, makes them rather inflexible; as a consequence, in recent years additive processes have been preferred in financial modelling (see [10], Chap. 14).

It is therefore in this more general setting that we present our results.

In order to mathematically reformulate questions (1) and (2), let us denote by (D,D) the Skorokhod’s space;

deﬁneP_T^(f,σ²ⁿ^,λG)

n as the law of{X_t}_t∈[0,T_n] on (D,D) andQ^(f,σ_n ⁿ²^,λG)as the law of the vector (X_t₀, X_t₁, . . . , X_t_n) on (Rⁿ⁺¹,B(Rⁿ⁺¹)).

Consider the parameter setΘ=F. We allow two more degrees of freedom by consideringλ∈ΛandG∈G, although these will not be parameters of interest. Let us then consider the following statistical models:

Pn=

D,D, P_T^(f,σ²ⁿ^,λG)

n :f ∈F

, Qn=

Rⁿ⁺¹,B(Rⁿ⁺¹), Q^(f,σ_n ²ⁿ^,λG):f ∈F .

Finally, let us introduce the Gaussian model that will appear in the statement of our main results. For that, let us denote by (C,C) the space of continuous mappings from [0,∞) intoRendowed with its standard ﬁltration and, coherently with the previous notation, byP_T^(f,σ²ⁿ^,0)

n the law induced on (C,C) by the stochastic process:

dy_t=f(t)dt+σ_n(t)dW_t, y₀= 0, t∈[0, T_n]. (1.2)

(3)

We set:

W_n=

C,C,(P_T^(f,σⁿ²^,0)

n :f ∈F)

.

We have already mentioned that asymptotic equivalences can be used to reduce estimation problems from one model to a simpler ones. This is what happens here, the model associated with the discrete or continuous observation of{X_t}as in (1) has been proved to be equivalent to that in (2), which is much better studied. For example, consider the two following situations:

• T_n is ﬁxed and σ_n(·) =ε_nσ(·) withε_n→0;

• T_n goes to inﬁnity and σ_n(·) is ﬁxed; in this case, also ask that elements of F have some periodicity assumption.

In both these cases, a consistent estimation of f ∈ F is possible. Our equivalence result does not rely on assumptions such as these, but it applies to these cases, as well: indeed, proving equivalence for a class F automatically implies that the same equivalence holds true for any subclass ofF.

We state here our main result in the case in whichF is a functional class consisting ofα-H¨older, uniformly bounded functions onR,i.e. there existB <∞, M <∞andα∈(0,1] such that

|f(x)| ≤B and|f(x)−f(y)| ≤M|x−y|^α, ∀x, y∈R. For the general statements see Section2.4.

Theorem 1.1. Suppose thatF is a subclass ofα-H¨older, uniformly bounded functions onR. Letσ_n(·) =ε_nσ(·) be such that0< m_σ≤σ(·)≤M_σ<∞with derivative σ(·)in L_∞(R). Suppose either:

• T_n ≡T <∞,ε_n→0and there exists anL₂<∞such that for allλ∈Λ, λ _L₂_([0,T])< L₂andλ(s)≤λ(t)∀0

• or T_n → ∞, ε_n ≡ 1 and there exist L₁ < ∞, L₂ < ∞ such that for all λ ∈ Λ, λ _L₁_(R) < L₁ and λ _L₂_(R)< L₂.

Then

Δ(Qn,Wn)→0 and Δ(Pn,Qn)→0 asn→ ∞, as soon as one of the following two conditions holds

(1) G is a subclass of discrete distributions with support on Z: in this case an upper bound for the rate of convergence isO√Δ_n+T_nΔ^2α_n ε⁻²_n +T_nΔ_n

.

(2) G is a subclass of absolutely continuous distributions with respect to the Lebesgue’s measure on R with uniformly bounded densities on a ﬁxed neighborhood of 0: in this case an upper bound for the rate of convergence isO

√4

Δ_n+T_nΔ^2α_n ε⁻²_n +T_nΔ_n .

The paper is organized as follows. Sections2.1 to 2.3ﬁx assumptions and notation. The main results, as well as examples, are given in Section 2.4. A discussion of the results can be found in Section 2.5. The proofs are postponed to Section3. They are obtained as a sequence of results proving diﬀerent (asymptotic) equivalences.

Loosely speaking, we ﬁrst reduce to having in each interval of the discretization at most one jump (Bernoulli’s approximation, Sect.3.1). Secondly, we ﬁlter it outviaan explicit Markov kernel, reducing ourselves to treating independent Gaussian variables (Sect.3.2). Finally, we apply an argument similar to that in [1] (Sect.3.3) and collect all the pieces to conclude the proofs in Section 3.4. An appendix collects some proofs of general facts about the Le Cam distance that we use in the rest of the paper.

2. Assumptions and main results 2.1. Additive processes

Time inhomogeneous jump-diffusion processes are a special case of additive processes. Here we briefly recall definitions and properties of this class of processes.

(4)

Definition 2.1. A stochastic process {X_t}_t≥0 on R deﬁned on a probability space (Ω,A,P) is an additive process if the following conditions are satisﬁed.

(1) X₀= 0P-a.s.

(2) Independent increments: for any choice of n ≥ 1 and 0 ≤ t₀ < t₁ < . . . < t_n, random variables X_t₀, X_t₁−X_t₀, . . . , X_t_n−X_t_n−1 are independent.

(3) There is Ω₀ ∈A with P(Ω0) = 1 such that, for everyω ∈Ω₀,X_t(ω) is right-continuous int≥0 and has left limits int >0.

(4) Stochastic continuity:∀ε >0,P(|X_t+h−X_t| ≥ε)→0 ash→0.

Thanks to the L´evy–Khintchine’s formula (see [10], Thm. 14.1), the characteristic function of any additive processX ={X_t}_t∈[0,T_] can be expressed, for alluinR, as:

E e^iuX^t

= exp

iu _t

0

f(r)dr−u² 2

_t

0

σ²(r)dr−

R(1−e^iuy+iuyI_|y|≤1)ν_t(dy)

, (2.1)

wheref(·) andσ²(·) belongs toL₁(R) andν_t is a positive measure onRsatisfying ν_t({0}) = 0 and

R

(y²∧1)ν_t(dy)<∞, ∀t∈[0, T].

In the sequel we shall refer to (f(t), σ²(t), ν_t)_t∈[0,T] as the local characteristics of the process X and a ν_t as above will be called aL´evy measure, for allt. This data characterizes uniquely the law of the processX. In the case wheref(·) andσ(·) are constant functions andν_t=ν for allt, the processX satisfying (2.1) is stationary, and is called aL´evy process of characteristic triplet (f, σ², ν).

Let D = D([0,∞),R) be the space of mappings ω from [0,∞) into R that are right-continuous with left limits. Deﬁne thecanonical process x:D→D by

∀ω∈D, x_t(ω) =ω_t, ∀t≥0.

Let Dt and D be the σ-algebras generated by {x_s : 0 ≤ s ≤ t} and {x_s : 0 ≤ s < ∞}, respectively.

LetX be an additive process deﬁned on (Ω,A,P) having local characteristics (f(t), σ²(t), ν_t)_t∈[0,T]. It is well known that it induces a probability measure P^(f,σ²^,ν) on (D,D) such that{x_t}deﬁned on

D,D, P^(f,σ²^,ν) is an additive process identical in law with ({X_t},P) (that is the local characteristics of{x_t} under P^(f,σ²^,ν) is (f(t), σ²(t), ν_t)_t≥0).

In the sequel we will denote by

{x_t}, P^(f,σ²^,ν)

such an additive process, stressing the probability measure and byP_t^(f,σ²^,ν)for the restriction ofP^(f,σ²^,ν)toDt.

Further, for every functionω inD, we will denote byΔω_rits jump at the timerand byω^c,ω^dits continuous and discontinuous part, respectively:

Δω_r=ω_r−lim

s↑rω_s, ω_t^d=

r≤t

Δω_r, ω_t^c=ω_t−ω^d_t.

Note that, if ν_t = 0 for all t ≥ 0, then

{x_t}, P^(f,σ²^,0)

is a Gaussian process that can be represented on (Ω,A,P) as

X_t= _t

0

f(s)ds+ _t

0

σ(s)dW_s, t≥0, (2.2)

for some standard Brownian motionW on (Ω,A,P).

A time inhomogeneous jump-diﬀusion process as in (1.1), observed until the time T_n, is an additive process (apart from the possibly non-zero initial condition) with local characteristics (f(t), σ²(t), ν_t)_t∈[0,T_n_], where

(5)

ν_t(·) = λ(t)G(·). We will write

{x_t}, P_T^(f,σ²^,λG)

n

for such a process. Also observe that

{x^c_t}, P_T^(f,σ²^,λG)

n

has the same law as

{x_t}, P_T^(f,σ²^,0)

n

. Moreover, thanks to the independence of the increments, the law of the ith increment of (1.1) is the convolution product between the Gaussian lawN _t^t_i−1ⁱ f(s)ds,_t_i

ti−1σ²(s)ds

and the law of the variable_P_i

j=1Y_j, whereP_i is Poisson of intensityλ_i=_t_i

ti−1λ(s)ds.

2.2. Le Cam theory of statistical experiments

A statistical model is a triplet Pj = (Xj,Aj,{P_j,θ;θ ∈Θ}) where {P_j,θ;θ ∈Θ} is a family of probability distributions all defined on the sameσ-field Aj over the sample space Xj and Θ is the parameter space. The deficiencyδ(P1,P2) ofP1with respect toP2quantifies “how much information we lose” by usingP1instead of P2 and is defined as δ(P1,P2) = inf_Ksup_θ∈Θ||KP_1,θ−P_2,θ||T V, where TV stands for “total variation”

and the inﬁmum is taken over all “transitions”K (see [25], p. 18). The general deﬁnition of transition is quite involved but, for our purposes, it is enough to know that Markov kernels are special cases of transitions.

The Le Cam Δ-distance is defined as the symetrization of δ and it defines a pseudometric. When Δ(P1,P2) = 0 the two statistical models are said to be equivalent. Two sequences of statistical models (P₁ⁿ)_n∈N and (P₂ⁿ)_n∈N are calledasymptotically equivalent ifΔ(P₁ⁿ,P₂ⁿ) tends to zero asngoes to infinity. There are various techniques to bound theΔ-distance. We report below only the properties that are useful for our purposes.

For the proofs see,e.g., [25,35] and the Appendix.

Property 2.2. Let Pj = (X,A,{P_j,θ;θ ∈ Θ}), j = 1,2, be two statistical models having the same sample space and deﬁne Δ₀(P₁,P₂) := sup_θ∈Θ P_1,θ−P_2,θ _{T V}.Then, Δ(P₁,P₂)≤Δ₀(P₁,P₂).

In particular, Property 2.2 allows us to bound the Δ-distance between statistical models sharing the same sample space by means of classical bounds for the total variation distance. Classical bounds on the latter will thus prove useful:

Fact 2.3(see [24], p. 35). LetP₁andP₂be two probability measures onX, dominated by a common measureξ, with densitiesg_i= ^dP_dξⁱ,i= 1,2. Deﬁne

L₁(P₁, P₂) =

X |g₁(x)−g₂(x)|ξ(dx), H(P₁, P₂) =

X

g₁(x)− g₂(x)₂

ξ(dx) _1/2

.

Then,

H²(P, Q)

2 ≤ P₁−P₂ _{T V} = 1

2L₁(P₁, P₂)≤H(P₁, P₂). (2.3) Fact 2.4 (see [35], Lem. 2.19). Let P and Q be two product measures deﬁned on the same sample space:

P =⊗ⁿ_i=1P_i,Q=⊗ⁿ_i=1Q_i. Then

H²(P, Q)≤ n i=1

H²(P_i, Q_i). (2.4)

Using (2.3), it follows that

P−Q _{T V} ≤ ⁿ

i=1

2 P_i−Q_i _{T V}.

Below, we collect some well-known facts that can be used to establish asymptotic equivalences. For the conve- nience of the reader, their proofs can be found in the Appendix.

(6)

Fact 2.5. Let Q₁∼ N(μ₁, σ₁²)andQ₂∼ N(μ₂, σ₂²). Then Q₁−Q₂ _{T V} ≤

1−σ₁

σ₂ ₂

+(μ₁−μ₂)² 2σ₂² ≤

1−σ²₁

σ²₂ ₂

+(μ₁−μ₂)² 2σ₂² · Fact 2.6. Let m_i(·) andσ(·) be real functions such that

Rmi(s)²

σ(s)² ds <∞,i= 1,2, with σ(·)>0. Then, with the same notation as in Section2.1:

L₁

P_t^(m¹^,σ²^,0), P_t^(m²^,σ²^,0)

= 2

1−2φ

−D_t 2

, ∀t >0,

whereφ denotes the cumulative distribution function of a Gaussian random variable N(0,1) and D²_t =

_t

0

(m₁(s)−m₂(s))² σ²(s) ds.

In particular, L₁

P_t^(m¹^,σ²^,0), P_t^(m²^,σ²^,0)

=O(D_t).

Property 2.7. Let Pi = (Xi,Ai,{P_i,θ, θ ∈ Θ}), i = 1,2, be two statistical models. Let S : X1 → X2 be a suﬃcient statistics such that the distribution of S under P_1,θ is equal toP_2,θ. Then Δ(P1,P2) = 0.

2.3. The parameter space

We now state the diﬀerent kinds of assumptions on the non-parametric classes F, Λ andG that will show up in the statements of the theorems:

(F1) Everyf ∈F is continuous and sup_t∈R{|f(t)|:f ∈F} ≤B, for some constantB.

(F2) Deﬁning:

f¯_n(t) =

f(t_i) if t_i−1≤t < t_i, i= 1, . . . , n;

f(T_n) if t=T_n; (2.5)

we have

n→∞lim sup

f∈F

_T_n

0

(f(t)−f¯_n(t))²

σ_n²(t) dt= 0. (2.6)

(L1) Denoting by · 1 theL₁norm onR, we require sup_λ∈Λ λ ₁≤L₁, for some constantL₁. (L2) Denoting by · 2 theL₂norm onR, we ask sup_λ∈Λ λ ²₂≤L₂, for some constantL₂. (G1) G is a subset of discrete distributions concentrated onZ.

(G2) G is a subset of absolutely continuous distributions with respect to Lebesgue,h= _dLeb^dG . We ask that there are uniform constants N₁, N₂>0 such thath≤N₂Leb-a.e. on [−_N¹₁,_N¹

1].

2.4. Main results and examples

Recall that models (1.1) and (1.2) depend on diffusion coefficientsσ_n(·) = ε_nσ(·), where ε_n is either 1 (if T_n→ ∞) orε_n →0 (ifT_n =T finite). We will assume thatσ(·) is absolutely continuous, strictly positive, and its logarithmic derivative is uniformly bounded: There exists a constantC₁such that:

d

dtlnσ(t)

≤C₁, t∈R. (2.7)

(7)

Our main results are then:

Theorem 2.8. Suppose that the parameter space F fulﬁlls the assumptions (F1) and (F2) and let σ(·) sat- isfy (2.7) as above. If, in addition, Λ andG satisfy assumptions (L2) and (G1), respectively, then, for n big enough, we have

Δ Pn,Qn

=Δ Qn,Wn

≤O

f∈Fsup _T_n

0

(f(t)−f¯_n(t))²

σ_n²(t) dt+T_nΔ_n+ Δ_n

.

Here, theO depends only on the constantsC₁ andL₂.

Theorem 2.9. Suppose that the parameter space F fulfills the assumptions (F1) and (F2)and let σ(·) be as above. Suppose also there exist m_σ, M_σ such that 0 < m_σ ≤σ(·) ≤M_σ < ∞. Let β_i = B(t_i−t_i−1) +√σ_i. Moreover suppose thatΛfulfills assumptions(L1),(L2)andG fulfills assumption(G2). Then, fornbig enough, we have

Δ(Pn,Qn) =Δ(Wn,Qn)≤O

sup

f∈F

_T_n

0

(f(t)−f¯_n(t))²

σ²_n(t) dt+T_nΔ_n+Δ_n¹⁴

.

Here the leading terms in theO depend onL₁,N₂ andM_σ only.

As a corollary, when F consists of uniformly bounded α-H¨older functions, one retrieves the rates of convergence stated in Theorem 1.1. We now give some examples of situations where our results can be applied.

Example 2.10. The sum of a diﬀusion process and an inhomogeneous Poisson process: this corresponds to settingY₁≡1, so thatG consists of the only Dirac mass in 1. Letσ_n(·) =ε_nσ(·) satisfy (2.7) as above, andΛ satisfy assumption (L2). If F is a class ofα-H¨older, uniformly bounded functions onR withα∈(0,1], for n big enough, an application of Theorem2.8yields:

Δ(P_n,Q_n) =Δ(Q_n,W_n) =O

Δ_n+T_nΔ_n+T_nΔ^2α_n ε⁻²_n .

Example 2.11. Merton model inhomogeneous in time: This corresponds to G being a parametric class of Gaussian random distributions N(m, Γ²), Γ > 0. Suppose that σ(·) is as in Example 2.10 and Λ satisﬁes assumptions (L1) and (L2). Let F be a class of α-H¨older, uniformly bounded functions on Rwith α∈(0,1].

Then, fornbig enough, an application of Theorem2.9yields:

Δ(Pn,Qn) =Δ(Qn,Wn) =O 4

Δ_n+T_nΔ_n+T_nΔ^2α_n ε⁻²_n .

2.5. Discussion

Remark 2.12. Hypotheses (F1), (F2) are modeled on those in [1]. They are satisﬁed, for example, by any class F of uniformly boundedα-H¨older functions, withα depending on the asymptotics of the dataΔ_n, T_n, ε_n, as well as by uniformly bounded SobolevW^α,2 functions. Hypothesis (2.7) onσ²(·) also appears in [1]. The non-parametric classesΛ andG were introduced to stress that the precise parametersλ,Gchosen do not play any role in the proofs.

Remark 2.13. In the case where G satisﬁes Assumption (G1) (i.e. the Y_i’s are discrete), the Markov kernel K in Lemma 3.2 does not depend onσ(·). Hence, combining our Theorem 2.8 with the one by Carter [8] one can obtain the same equivalence result whenσ(·) is an unknown nuisance parameter.

(8)

Remark 2.14. An important advantage of showing the asymptotic equivalence between statistical models is that it allows to transfer statistical inference procedures from one model to the other. This is done in such a way that the asymptotic risk remains the same, at least for bounded loss functions. When the proof of such an equivalence is constructive, one can provide a precise recipe for producing, from a sequence of procedures in one problem, an asymptotically equivalent sequence in the other one. Formally, let us consider two sequences of statistical modelsP_jⁿ= (Xj,n,Aj,n,{P_j,n,θ;θ∈Θ}) and a decision or action space (A,A). Furthermore, for everyn, let us denote byρ_j,na possiblyrandomized decision procedureinP_jⁿ,i.e.a Markov kernelρ_j,n: (Xj,n,Aj,n)→(A,A) and byR(Pj,n, ρ_j,n, L_n, θ) therisk in the modelPj,n with respect to the decision ruleρ_j,n and the loss func- tionL_n. One says that the sequences of proceduresρ_1,nandρ_2,nareasymptotically equivalentif for any sequence of bounded loss functionL_n one has lim_n→∞sup_θ∈Θ|R(P1,n, ρ_1,n, L_n, θ)−R(P2,n, ρ_2,n, L_n, θ)|= 0.

In this paper there are essentially four statistical models that we prove to be mutually asymptotically equivalent:Pn,Wn,Qn and ˜Qn (which is associated with the observation of the increments of (y_t) as in (1.2)). The proofs of Theorems2.8and2.9allow us to use the knowledge of a sequence of procedures inPn,Wn or ˜Qn for producing one inQn.

For example, suppose thatG satisﬁes assumption (G1) and let (δ_n) be a sequence of procedures in ˜Qn. Deﬁne a sequence of procedures inQn as:

γ_n(z₀, . . . , z_n) :=δ_n

z₁−z₀−[z₁−z₀], . . . , z_n−z_n−1−[z_n−z_n−1]

, z₀, . . . , z_n∈R, where [z] denotes the the closest integer to z. Then (γ_n) is asymptotically equivalent to (δ_n).

Remark that, up to this point, we did not use the knowledge of σ²(·). In particular, if one disposes of a sequence of estimators off(·) in ˜Qn an equivalent one can be deduced inQn also whenσ²(·) is unknown.

3. Proofs 3.1. Bernoulli approximation

Lemma 3.1. Let (N_i)ⁿ_i=1,(P_i)ⁿ_i=1, (Y_i)ⁿ_i=1 and (ε_i)ⁿ_i=1 be samples of, respectively, Gaussian random variables N(m_i, σ²_i), Poisson random variables P(λ_i), random variables with common distribution G and Bernoulli random variables of parameters α_i := λ_ie^−λⁱ. Let us denote by Q_N_i (resp. Q_(Y_i_,P_i₎, Q_(Y₁_,ε_i₎) the law of N_i (resp._P_i

j=1Y_j,ε_iY₁). Then

⊗ⁿ_i=1Q_N_i∗Q_(Y_i_,P_i₎− ⊗ⁿ_i=1Q_N_i∗Q_(Y₁_,ε_i₎ _{T V} ≤2 ⁿ

i=1

λ²_i (3.1)

where the symbol ∗denotes the product convolution between measures.

Proof. Observe that:

Q_N_i∗Q_(Y_i_,P_i₎−Q_N_i∗Q_(Y₁_,ε_i₎ _{T V} = sup

A∈B(R)

k≥0

P

N_i+ k j=1

Y_j∈A

e^−λⁱλ^k_i k!

−(1−α_i)P(Ni∈A)−α_iP(Ni+Y₁∈A)

= sup

A∈B(R)

k≥2

⎛

⎝P N_i+

k j=1

Y_j ∈A

−P(N_i∈A)

⎞

⎠e^−λⁱλ^k_i k!

≤2

k≥2

e^−λⁱλ^k_i k! ≤2λ²_i.

We get (3.1) thanks to Fact2.4.

(9)

3.2. Explicit construction of Markov kernels

Lemma 3.2. Let (N_i)ⁿ_i=1 and (ε_i)ⁿ_i=1 be samples of, respectively, Gaussian random variables N(m_i, σ_i²) with

|m_i| ≤ ¹₃ and Bernoulli random variables of parameters α_i := λ_ie^−λⁱ, λ_i > 0. Moreover, let Y₁ be a discrete random variable taking values inZ and denote by Q_N_i (resp.Q_(Y₁_,ε_i₎) the law ofN_i (resp.ε_iY₁). For all xin Rdenote by [x] the nearest integer toxand deﬁne the Markov kernel

K(x, A) =IA(x−[x]), ∀A∈B(R).

Then

⊗ⁿ_i=1K(Q_N_i∗Q_(Y₁_,ε_i₎)− ⊗ⁿ_i=1Q_N_i

T V ≤ 2

n i=1

6 σ_iϕ

1 6σ_i

+ 4φ

−1 6σ_i

(3.2) where∗ stands for the convolution product,φdenotes the cumulative distribution of a Gaussian random variable N(0,1) andϕthe derivative ofφ.

Proof. Denote byg_i(·) the density of N_i, by h(·) the density of Y₁ with respect to the counting measure and deﬁneG_i(x, k) := (1−α_i)g_i(x) +α_ig_i(x−k),∀x∈R,∀k∈Z. We have, for alli:

K(Q_N_i∗Q_(Y₁_,ε_i₎)−Q_N_i

T V = sup

A∈B(R)

IA(x−[x]) (1−α_i)g_i(x) +α_i

k∈Z

h(k)g_i(x−k)

! dx

−

IA(x)g_i(x)dx

≤ sup

A∈B(R)

k∈Z

h(k)

IA(x−[x])G_i(x, k)−IA(x)g_i(x) dx

. Writing

IA(x−[x])G_i(x, k)dx as

l∈Z

¹₂

−¹₂ IA(x)G_i(x+l, k)dx, one can bound IA(x−[x])G_i(x, k)− IA(x)g_i(x)

dxby the sum of the following three terms:

I= ¹₂

−¹₂ IA(x)"

G_i(x, k) +G_i(x+k, k)−g_i(x)# dx

= ¹

2

−¹₂ I_A(x)

α_ig_i(x−k) + (1−α_i)g_i(x+k) dx

≤ ¹

2

−¹₂

g_i(x−k) +g_i(x+k) dx

II =

l∈Z^∗−{k}

¹₂

−¹₂|G_i(x+l, k)|dx≤

l∈Z^∗−{k}

¹₂

−¹₂

g_i(x+l) +g_i(x+l−k) dx III =

[−¹₂,¹₂]^c

g_i(x)dx.

Since IA(x−[x])G_i(x,0)−IA(x)g_i(x) dx≤

[−¹₂,¹₂]^cg_i(x)dxandh(0)≤1, we obtain K(Q_N_i∗Q_(Y₁_,ε_i₎)−Q_N_i

T V ≤

k∈Z^∗

h(k) ¹

2

−¹₂

g_i(x−k) +g_i(x+k) dx

+

k∈Z^∗,l∈Z^∗−{k}

h(k) ¹₂

−¹₂

g_i(x+l) +g_i(x+l−k) dx+ 2

[−¹₂,¹₂]^c

g_i(x)dx.

(10)

Using the mean value theorem one can write ¹

2

−¹₂

g_i(x−k) +g_i(x+k) dx=φ

1/2−k−m_i σ_i

−φ

−1/2−k−m_i σ_i

+φ

1/2 +k−m_i σ_i

−φ

−1/2 +k−m_i σ_i

= 1

σ_i(ϕ(ξ_1,k) +ϕ(ξ_2,k)) for some ξ_1,k ∈"

−1/2−k−mi

σi ,^1/2−k−m_σ ⁱ

i

#

and ξ_2,k ∈"

−1/2+k−mi

σi ,^1/2+k−m_σ ⁱ

i

#

. In particular, since |m_i| ≤ ¹₃ one has thatϕ(ξ_j,k)≤ϕ

1 6σi

,j= 1,2, hence

k∈Z^∗

h(k) ¹

2

−¹₂

g_i(x−k) +g_i(x+k)

dx≤

k∈Z^∗

2h(k) σ_i ϕ

1 6σ_i

≤ 2 σ_iϕ

1 6σ_i

· In the same way one can write

¹

2

−¹₂

g_i(x+l) +g_i(x+l−k) dx= 1

σ_i

ϕ(η_1,l) +ϕ(η_2,l−k)

for someη_1,l∈"

−1/2+l−mi

σi ,^1/2+l−m_σ ⁱ

i

#

andη_2,l−k ∈"

−1/2+l−k−mi

σi ,^{1/2+l−k−m}_σ ⁱ

i

# . Then:

k∈Z^∗,l∈Z^∗−{k}

h(k) ¹₂

−¹₂

g_i(x+l) +g_i(x+l−k)

dx≤

k∈Z^∗,l∈Z^∗−{k}

h(k) σ_i

ϕ(η_1,l) +ϕ(η_2,l−k)

≤

k∈Z^∗,l∈Z^∗−{k}

h(k)

σ_i ϕ(η_1,l) +

k,w∈Z^∗

h(k) σ_i ϕ(η_2,w)

≤

k∈Z^∗

h(k) σ_i

l∈Z^∗

ϕ(η_1,l) +

k∈Z^∗

h(k) σ_i

w∈Z^∗

ϕ(η_2,w)

≤ 1 σ_i

l∈Z^∗

ϕ(η_1,l) +ϕ(η_2,l) .

Now,|η_i,l| ≥ ^|l|−5/6_σ_i ,i= 1,2, so 1 σ_i

l∈Z^∗

ϕ(η_1,l) +ϕ(η_2,l)

≤ 4 σ_iϕ

1 6σ_i

+ 1

σ_i

|l|≥2

ϕ

|l| −5/6 σ_i

≤ 4 σ_iϕ

1 6σ_i

+ 2

_∞

6σi1

ϕ(x)dx.

Finally,

[−¹₂,¹₂]^cg_i(x)dx≤

[−₆¹_σi,₆_σi¹ ]^cϕ(x)dx= 2φ

−_6σ¹_i

. Using Fact2.4, these computations imply (3.2) Remark 3.3. In the case whereY₁ ≡1 (see Ex. 2.10) one can also consider a, maybe, more natural Markov kernel, that is:

K(x, A) =IA(Ψ(x)), with Ψ(x) =

$x ifx≤ ¹₂, x−1 otherwise.

However, the rate of convergence in (3.2) turns out to be asymptotically the same regardless of the chosen kernel.