HAL Id: hal-01131685
https://hal.archives-ouvertes.fr/hal-01131685v2
Preprint submitted on 12 Aug 2015
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Asymptotic equivalence for pure jump Lévy processes with unknown Lévy density and Gaussian white noise
Ester Mariucci
To cite this version:
Ester Mariucci. Asymptotic equivalence for pure jump Lévy processes with unknown Lévy density
and Gaussian white noise. 2015. �hal-01131685v2�
Asymptotic equivalence for pure jump L´evy processes with unknown L´evy density and Gaussian white noise
Ester Mariucci
a,ba
Laboratoire LJK, Universit´ e Joseph Fourier UMR 5224 51, Rue des Math´ ematiques, Saint Martin d’H` eres BP 53 38041 Grenoble Cedex 09
b
Corresponding Author, Ester.Mariucci@imag.fr
Abstract
The aim of this paper is to establish a global asymptotic equivalence between the experiments generated by the discrete (high frequency) or continuous ob- servation of a path of a L´ evy process and a Gaussian white noise experiment observed up to a time T , with T tending to ∞. These approximations are given in the sense of the Le Cam distance, under some smoothness condi- tions on the unknown L´ evy density. All the asymptotic equivalences are established by constructing explicit Markov kernels that can be used to re- produce one experiment from the other.
Keywords: Nonparametric experiments, Le Cam distance, asymptotic equivalence, L´ evy processes.
2000 MSC: 62B15, (62G20, 60G51).
1. Introduction
L´ evy processes are a fundamental tool in modelling situations, like the dynamics of asset prices and weather measurements, where sudden changes in values may happen. For that reason they are widely employed, among many other fields, in mathematical finance. To name a simple example, the price of a commodity at time t is commonly given as an exponential function of a L´ evy process. In general, exponential L´ evy models are proposed for their ability to take into account several empirical features observed in the returns of assets such as heavy tails, high-kurtosis and asymmetry (see [15] for an introduction to financial applications).
From a mathematical point of view, L´ evy processes are a natural exten-
sion of the Brownian motion which preserves the tractable statistical prop-
erties of its increments, while relaxing the continuity of paths. The jump
dynamics of a L´ evy process is dictated by its L´ evy density, say f. If f is
continuous, its value at a point x
0determines how frequent jumps of size
close to x
0are to occur per unit time. Concretely, if X is a pure jump L´ evy process with L´ evy density f , then the function f is such that
Z
A
f(x)dx = 1 t E
X
s≤t
I
A(∆X
s)
,
for any Borel set A and t > 0. Here, ∆X
s≡ X
s−X
s−denotes the magnitude of the jump of X at time s and I
Ais the characteristic function. Thus, the L´ evy measure
ν(A) :=
Z
A
f(x)dx,
is the average number of jumps (per unit time) whose magnitudes fall in the set A. Understanding the jumps behavior, therefore requires to estimate the L´ evy measure. Several recent works have treated this problem, see e.g. [2]
for an overview.
When the available data consists of the whole trajectory of the process during a time interval [0, T ], the problem of estimating f may be reduced to estimating the intensity function of an inhomogeneous Poisson process (see, e.g. [23, 42]). However, a continuous-time sampling is never available in practice and thus the relevant problem is that of estimating f based on discrete sample data X
t0, . . . , X
tnduring a time interval [0, T
n]. In that case, the jumps are latent (unobservable) variables and that clearly adds to the difficulty of the problem. From now on we will place ourselves in a high- frequency setting, that is we assume that the sampling interval ∆
n= t
i−t
i−1tends to zero as n goes to infinity. Such a high-frequency based statistical approach has played a central role in the recent literature on nonparametric estimation for L´ evy processes (see e.g. [22, 13, 14, 1, 19]). Moreover, in order to make consistent estimation possible, we will also ask the observation time T
nto tend to infinity in order to allow the identification of the jump part in the limit.
Our aim is to prove that, under suitable hypotheses, estimating the L´ evy density f is equivalent to estimating the drift of an adequate Gaussian white noise model. In general, asymptotic equivalence results for statistical exper- iments provide a deeper understanding of statistical problems and allow to single out their main features. The idea is to pass via asymptotic equivalence to another experiment which is easier to analyze. By definition, two sequences of experiments P
1,nand P
2,n, defined on possibly different sample spaces, but with the same parameter set, are asymptotically equivalent if the Le Cam distance ∆( P
1,n, P
2,n) tends to zero. For P
i= ( X
i, A
i, P
i,θ: θ ∈ Θ)
, i = 1, 2, ∆( P
1, P
2) is the symmetrization of the deficiency δ( P
1, P
2) where
δ( P
1, P
2) = inf
K
sup
θ∈Θ
KP
1,θ− P
2,θ T V.
Here the infimum is taken over all randomizations from ( X
1, A
1) to ( X
2, A
2) and k · k
T Vdenotes the total variation distance. Roughly speaking, the Le Cam distance quantifies how much one fails to reconstruct (with the help of a randomization) a model from the other one and vice versa. Therefore, we say that ∆( P
1, P
2) = 0 can be interpreted as “the models P
1and P
2contain the same amount of information about the parameter θ.” The general defi- nition of randomization is quite involved but, in the most frequent examples (namely when the sample spaces are Polish and the experiments dominated), it reduces to that of a Markov kernel. One of the most important feature of the Le Cam distance is that it can be also interpreted in terms of statistical decision theory (see [32, 33]; a short review is presented in the Appendix).
As a consequence, saying that two statistical models are equivalent means that any statistical inference procedure can be transferred from one model to the other in such a way that the asymptotic risk remains the same, at least for bounded loss functions. Also, as soon as two models, P
1,nand P
2,n, that share the same parameter space Θ are proved to be asymptotically equiv- alent, the same result automatically holds for the restrictions of both P
1,nand P
2,nto a smaller subclass of Θ.
Historically, the first results of asymptotic equivalence in a nonparametric
context date from 1996 and are due to [5] and [39]. The first two authors
have shown the asymptotic equivalence of nonparametric regression and a
Gaussian white noise model while the third one those of density estimation
and white noise. Over the years many generalizations of these results have
been proposed such as [3, 28, 43, 11, 10, 40, 12, 37, 46] for nonparametric
regression or [9, 31, 4] for nonparametric density estimation models. Another
very active field of study is that of diffusion experiments. The first result
of equivalence between diffusion models and Euler scheme was established
in 1998, see [38]. In later papers generalizations of this result have been
considered (see [24, 35]). Among others we can also cite equivalence results
for generalized linear models [27], time series [29, 38], diffusion models [18, 25,
16, 17], GARCH model [7], functional linear regression [36], spectral density
estimation [26] and volatility estimation [41]. Negative results are somewhat
harder to come by; the most notable among them are [20, 6, 48]. There is
however a lack of equivalence results concerning processes with jumps. A first
result in this sense is [34] in which global asymptotic equivalences between the
experiments generated by the discrete or continuous observation of a path of
a L´ evy process and a Gaussian white noise experiment are established. More
precisely, in that paper, we have shown that estimating the drift function
h from a continuously or discretely (high frequency) time inhomogeneous
jump-diffusion process:
X
t= Z
t0
h(s)ds + Z
t0
σ(s)dW
s+
Nt
X
i=1
Y
i, t ∈ [0, T
n], (1) is asymptotically equivalent to estimate h in the Gaussian model:
dy
t= h(t)dt + σ(t)dW
t, t ∈ [0, T
n].
Here we try to push the analysis further and we focus on the case in which the considered parameter is the L´ evy density and X = (X
t) is a pure jump L´ evy process (see [8] for the interest of such a class of processes when modelling asset returns). More in details, we consider the problem of estimating the L´ evy density (with respect to a fixed, possibly infinite, L´ evy measure ν
0concentrated on I ⊆ R ) f :=
dνdν0
: I → R from a continuously or discretely observed pure jump L´ evy process X with possibly infinite L´ evy measure. Here I ⊆ R denotes a possibly infinite interval and ν
0is supposed to be absolutely continuous with respect to Lebesgue with a strictly positive density g :=
dLebdν0. In the case where ν is of finite variation one may write:
X
t= X
0<s≤t
∆X
s(2)
or, equivalently, X has a characteristic function given by:
E e
iuXt= exp
− t Z
I
(1 − e
iuy)ν(dy)
.
We suppose that the function f belongs to some a priori set F , non- parametric in general. The discrete observations are of the form X
ti, where t
i= T
nni, i = 0, . . . , n with T
n= n∆
n→ ∞ and ∆
n→ 0 as n goes to infinity.
We will denote by P
nν0the statistical model associated with the continuous observation of a trajectory of X until time T
n(which is supposed to go to infinity as n goes to infinity) and by Q
nν0the one associated with the obser- vation of the discrete data (X
ti)
ni=0. The aim of this paper is to prove that, under adequate hypotheses on F (for example, f must be bounded away from zero and infinity; see Section 2.1 for a complete definition), the models P
nν0and Q
νn0are both asymptotically equivalent to a sequence of Gaussian white noise models of the form:
dy
t= p
f (t)dt + 1 2 √
T
ndW
tp g (t) , t ∈ I.
As a corollary, we then get the asymptotic equivalence between P
nν0and Q
nν0. The main results are precisely stated as Theorems 2.5 and 2.6. A par- ticular case of special interest arises when X is a compound Poisson process, ν
0≡ Leb([0, 1]) and F ⊆ F
(γ,K,κ,MI )where, for fixed γ ∈ (0, 1] and K, κ, M strictly positive constants, F
(γ,K,κ,MI )is a class of continuously differentiable functions on I defined as follows:
F
(γ,K,κ,MI )= n
f : κ ≤ f(x) ≤ M, |f
0(x) − f
0(y)| ≤ K|x − y|
γ, ∀x, y ∈ I o . (3) In this case, the statistical models P
nν0and Q
nν0are both equivalent to the Gaussian white noise model:
dy
t= p
f(t)dt + 1 2 √
T
ndW
t, t ∈ [0, 1].
See Example 3.1 for more details. By a theorem of Brown and Low in [5], we obtain, a posteriori, an asymptotic equivalence with the regression model
Y
i= r
f i T
n+ 1
2 √
T
nξ
i, ξ
i∼ N (0, 1), i = 1, . . . , [T
n].
Note that a similar form of a Gaussian shift was found to be asymptotically equivalent to a nonparametric density estimation experiment, see [39]. Let us mention that we also treat some explicit examples where ν
0is neither finite nor compactly-supported (see Examples 3.2 and 3.3).
Without entering into any detail, we remark here that the methods are very different from those in [34]. In particular, since f belongs to the discon- tinuous part of a L´ evy process, rather then its continuous part, the Girsanov- type changes of measure are irrelevant here. We thus need new instruments, like the Esscher changes of measure.
Our proof is based on the construction, for any given L´ evy measure ν, of two adequate approximations ˆ ν
mand ¯ ν
mof ν: the idea of discretizing the L´ evy density already appeared in an earlier work with P. ´ Etor´ e and S. Louhichi, [21]. The present work is also inspired by the papers [9] (for a multinomial approximation), [4] (for passing from independent Poisson variables to independent normal random variables) and [34] (for a Bernoulli approximation). This method allows us to construct explicit Markov kernels that lead from one model to the other; these may be applied in practice to transfer minimax estimators.
The paper is organized as follows: Sections 2.1 and 2.2 are devoted to make the parameter space and the considered statistical experiments precise.
The main results are given in Section 2.3, followed by Section 3 in which some
examples can be found. The proofs are postponed to Section 4. The paper includes an Appendix recalling the definition and some useful properties of the Le Cam distance as well as of L´ evy processes.
2. Assumptions and main results 2.1. The parameter space
Consider a (possibly infinite) L´ evy measure ν
0concentrated on a possibly infinite interval I ⊆ R , admitting a density g > 0 with respect to Lebesgue.
The parameter space of the experiments we are concerned with is a class of functions F = F
ν0,Idefined on I that form a class of L´ evy densities with respect to ν
0: For each f ∈ F , let ν (resp. ˆ ν
m) be the L´ evy measure having f (resp. ˆ f
m) as a density with respect to ν
0where, for every f ∈ F , ˆ f
m(x) is defined as follows.
Suppose first x > 0. Given a positive integer depending on n, m = m
n, let J
j:= (v
j−1, v
j] where v
1= ε
m≥ 0 and v
jare chosen in such a way that
µ
m:= ν
0(J
j) = ν
0(I \ [0, ε
m]) ∩ R
+m − 1 , ∀j = 2, . . . , m. (4) In the sequel, for the sake of brevity, we will only write m without making explicit the dependence on n. Define x
∗j:=
R
Jjxν0(dx)
µm
and introduce a sequence of functions 0 ≤ V
j≤
µ1m
, j = 2, . . . , m supported on [x
∗j−1, x
∗j+1] if j = 3, . . . , m − 1, on [ε
m, x
∗3] if j = 2 and on (I \ [0, x
∗m−1]) ∩ R
+if j = m. The V
j’s are defined recursively in the following way.
• V
2is equal to
µ1m
on the interval (ε
m, x
∗2] and on the interval (x
∗2, x
∗3] it is chosen so that it is continuous (in particular, V
2(x
∗2) =
µ1m
), R
x∗3x∗2
V
2(y)ν
0(dy) =
ν0((xµ∗2,v2])m
and V
2(x
∗3) = 0.
• For j = 3, . . . , m − 1 define V
jas the function
µ1m
− V
j−1on the in- terval [x
∗j−1, x
∗j]. On [x
∗j, x
∗j+1] choose V
jcontinuous and such that R
x∗j+1x∗j
V
j(y)ν
0(dy) =
ν0((x∗ j,vj])
µm
and V
j(x
∗j+1) = 0.
• Finally, let V
mbe the function supported on (I \ [0, x
∗m−1]) ∩ R
+such that
V
m(x) = 1
µ
m− V
m−1(x), for x ∈ [x
∗m−1, x
∗m], V
m(x) = 1
µ
m, for x ∈ (I \ [0, x
∗m]) ∩ R
+.
(It is immediate to check that such a choice is always possible). Observe that, by construction,
m
X
j=2
V
j(x)µ
m= 1, ∀x ∈ (I\[0, ε
m])∩ R
+and Z
(I\[0,εm])∩R+
V
j(y)ν
0(dy) = 1.
Analogously, define µ
−m=
ν0 (I\[−εm,0])∩R−m−1
and J
−m, . . . , J
−2such that ν
0(J
−j) = µ
−mfor all j . Then, for x < 0, x
∗−jis defined as x
∗jby using J
−jand µ
−minstead of J
jand µ
mand the V
−j’s are defined with the same procedure as the V
j’s, starting from V
−2and proceeding by induction.
Define
f ˆ
m(x) = I
[−εm,εm](x) +
m
X
j=2
V
j(x)
Z
Jj
f(y)ν
0(dy) + V
−j(x) Z
J−j
f(y)ν
0(dy)
. (5) The definitions of the V
j’s above are modeled on the following example:
Example 2.1. Let ν
0be the Lebesgue measure on [0, 1] and ε
m= 0. Then v
j=
m−1j−1and x
∗j=
2m−22j−3, j = 2, . . . , m. The standard choice for V
j(based on the construction by [9]) is given by the piecewise linear functions interpolating the values in the points x
∗jspecified above:
0 x
∗2x
∗31 m − 1
0 x
∗j−1x
∗jx
∗j+11 m − 1
0 m − 1
x
∗mx
∗m−11
V
2V
jV
mRemark 2.2. The function ˆ f
mhas been defined in such a way that the rate of convergence of the L
2norm between the restriction of f and ˆ f
mon I \ [−ε
m, ε
m] is compatible with the rate of convergence of the other quanti- ties appearing in the statements of Theorems 2.5 and 2.6. For that reason, as in [9], we have not chosen a piecewise constant approximation of f but an approximation that is, at least in the simplest cases, a piecewise linear approximation of f . Such a choice allows us to gain an order of magnitude on the convergence rate of kf − f ˆ
mk
L2(ν0|I\[−εm,εm])at least when F is a class of sufficiently smooth functions.
We now explain the assumptions we will need to make on the parameter
f ∈ F = F
ν0,I. The superscripts ν
0and I will be suppressed whenever this
can lead to no confusion. We require that:
(H1) There exist constants κ, M > 0 such that κ ≤ f (y) ≤ M , for all y ∈ I and f ∈ F .
For every integer m = m
n, we can consider c
√ f
m, the approximation of √ f constructed as ˆ f
mabove, i.e. c
√ f
m(x) = I
[−εm,εm](x) + X
j=−m...,m j6=−1,0,1.
V
j(x) Z
Jj
p f (y)ν
0(dy), and introduce the quantities:
A
2m(f ) :=
Z
I\
−εm,εm
p d
f
m(y) − p f(y)
2ν
0(dy), B
m2(f ) := X
j=−m...,m j6=−1,0,1.
Z
Jj
p f(y)
p ν
0(J
j) ν
0(dy) − q
ν(J
j)
2,
C
m2(f ) :=
Z
εm−εm
p f (t) − 1
2ν
0(dt).
The conditions defining the parameter space F are expressed by asking that the quantities introduced above converge quickly enough to zero. To state the assumptions of Theorem 2.5 precisely, we will assume the existence of sequences of discretizations m = m
n→ ∞, of positive numbers ε
m= ε
mn→ 0 and of functions V
j, j = ±2, . . . , ±m, such that:
(C1) lim
n→∞
n∆
nsup
f∈F
Z
I\(−εm,εm)
f (x) − f ˆ
m(x)
2ν
0(dx) = 0.
(C2) lim
n→∞
n∆
nsup
f∈F
A
2m(f ) + B
m2(f ) + C
m2(f)
= 0.
Remark in particular that Condition (C2) implies the following:
(H2) sup
f∈F
Z
I
( p
f(y) − 1)
2ν
0(dy) ≤ L, where L = sup
f∈FR
εm−εm
( p
f (x) −1)
2ν
0(dx)+( √
M +1)
2ν
0I \(−ε
m, ε
m) , for any choice of m such that the quantity in the limit appearing in Condition (C2) is finite.
Theorem 2.6 has slightly stronger hypotheses, defining possibly smaller parameter spaces: We will assume the existence of sequences m
n, ε
mand V
j, j = ±2, . . . , ±m (possibly different from the ones above) such that Condition (C1) is verified and the following stronger version of Condition (C2) holds:
(C2’) lim
n→∞
n∆
nsup
f∈F
A
2m(f ) + B
2m(f ) + nC
m2(f )
= 0.
Finally, some of our results have a more explicit statement under the hypothesis of finite variation which we state as:
(FV) R
I
(|x| ∧ 1)ν
0(dx) < ∞.
Remark 2.3. The Condition (C1) and those involving the quantities A
m(f) and B
m(f ) all concern similar but slightly different approximations of f.
In concrete examples, they may all be expected to have the same rate of convergence but to keep the greatest generality we preferred to state them separately. On the other hand, conditions on the quantity C
m(f) are purely local around zero, requiring the parameters f to converge quickly enough to 1.
Examples 2.4. To get a grasp on Conditions (C1), (C2) we analyze here three different examples according to the different behavior of ν
0near 0 ∈ I.
In all of these cases the parameter space F
ν0,Iwill be a subclass of F
(γ,K,κ,M)Idefined as in (3). Recall that the conditions (C1), (C2) and (C2’) depend on the choice of sequences m
n, ε
mand functions V
j. For the first two of the three examples, where I = [0, 1], we will make the standard choice for V
jof triangular and trapezoidal functions, similarly to those in Example 2.1.
Namely, for j = 3, . . . , m − 1 we have V
j(x) = I
(x∗j−1,x∗j](x) x − x
∗j−1x
∗j− x
∗j−11
µ
m+ I
(x∗j,x∗j+1](x) x
∗j+1− x x
∗j+1− x
∗j1
µ
m; (6) the two extremal functions V
2and V
mare chosen so that V
2≡
µ1m
on (ε
m, x
∗2] and V
m≡
µ1m
on (x
∗m, 1]. In the second example, where ν
0is infinite, one is forced to take ε
m> 0 and to keep in mind that the x
∗jare not uniformly distributed on [ε
m, 1]. Proofs of all the statements here can be found in Section 5.2.
1. The finite case: ν
0≡ Leb([0, 1]).
In this case we are free to choose F
Leb,[0,1]= F
(γ,K,κ,M[0,1] ). Indeed, as ν
0is finite, there is no need to single out the first interval J
1= [0, ε
m], so that C
m(f) does not enter in the proofs and the definitions of A
m(f ) and B
m(f) involve integrals on the whole of [0, 1]. Also, the choice of the V
j’s as in (6) guarantees that R
10
V
j(x)dx = 1. Then, the quantities kf − f ˆ
mk
L2([0,1]), A
m(f) and B
m(f) all have the same rate of convergence, which is given by:
s Z
10
f(x) − f ˆ
m(x)
2ν
0(dx) + A
m(f ) + B
m(f ) = O
m
−γ−1+ m
−32,
uniformly on f. See Section 5.2 for a proof.
2. The finite variation case:
dLebdν0(x) = x
−1I
[0,1](x).
In this case, the parameter space F
ν0,[0,1]is a proper subset of F
(γ,K,κ,M[0,1] ). Indeed, as we are obliged to choose ε
m> 0, we also need to impose that C
m(f) = o
n√1∆n
, with uniform constants with respect to f , that is, that all f ∈ F converge to 1 quickly enough as x → 0. Choosing ε
m= m
−1−α, α > 0 we have that µ
m=
ln(εm−1−1m), v
j= ε
m−j m−1
m
and x
∗j=
(vj−vµ j−1)m
. In particular, max
j|v
j−1− v
j| = |v
m− v
m−1| = O
lnm m
. Also in this case one can prove that the standard choice of V
jdescribed above leads to R
1εm
V
j(x)
dxx= 1.
Again, the quantities kf − f ˆ
mk
L2(ν0|I\[0,εm]), A
m(f ) and B
m(f) have the same rate of convergence given by:
s Z
1εm
f (x) − f ˆ
m(x)
2ν
0(dx)+A
m(f )+B
m(f ) = O
ln m m
γ+1p ln(ε
−1m)
, (7) uniformly on f. The condition on C
m(f) depends on the behavior of f near 0. For example, it is ensured if one considers a parametric family of the form f (x) = e
−λxwith a bounded λ > 0. See Section 5.2 for a proof.
3. The infinite variation, non-compactly supported case:
dLebdν0(x) = x
−2I
R+(x).
This example involves significantly more computations than the preceding ones, since the classical triangular choice for the functions V
jwould not have integral equal to 1 (with respect to ν
0), and the support is not compact.
The parameter space F
ν0,[0,∞)can still be chosen as a proper subclass of F
(γ,K,κ,M[0,∞) ), again by imposing that C
m(f ) converges to zero quickly enough (more details about this condition are discussed in Example 3.3). We divide the interval [0, ∞) in m intervals J
j= [v
j−1, v
j) with:
v
0= 0; v
1= ε
m; v
j= ε
m(m − 1)
m − j ; v
m= ∞; µ
m= 1 ε
m(m − 1) . To deal with the non-compactness problem, we choose some “horizon” H(m) that goes to infinity slowly enough as m goes to infinity and we bound the L
2distance between f and ˆ f
mfor x > H (m) by 2 sup
x≥H(m) f(x)2
H(m)
. We have:
kf − f ˆ
mk
2L2(ν0|I\[0,εm])+ A
2m(f ) + B
2m(f ) = O
H(m)
3+4γ(ε
mm)
2+2γ+ sup
x≥H(m)
f(x)
2H(m)
. In the general case where the best estimate for sup
x≥H(m)
f (x)
2is simply given by
M
2, an optimal choice for H(m) is √
ε
mm, that gives a rate of convergence:
kf − f ˆ
mk
2L2(ν0|I\[0,εm])+ A
2m(f ) + B
2m(f ) = O 1
√ ε
mm
, independently of γ. See Section 5.2 for a proof.
2.2. Definition of the experiments
Let (x
t)
t≥0be the canonical process on the Skorokhod space (D, D ) and denote by P
(b,0,ν)the law induced on (D, D ) by a L´ evy process with charac- teristic triplet (b, 0, ν). We will write P
t(b,0,ν)for the restriction of P
(b,0,ν)to the σ-algebra D
tgenerated by {x
s: 0 ≤ s ≤ t} (see Appendix A.2 for the precise definitions). Let Q
(b,0,ν)tbe the marginal law at time t of a L´ evy pro- cess with characteristic triplet (b, 0, ν). In the case where R
|y|≤1
|y|ν(dy) < ∞ we introduce the notation γ
ν:= R
|y|≤1
yν(dy); then, Condition (H2) guaran- tees the finiteness of γ
ν−ν0(see Remark 33.3 in [44] for more details).
Recall that we introduced the discretization t
i= T
nniof [0, T
n] and denote by Q
(γnν−ν0,0,ν)the laws of the n + 1 marginals of (x
t)
t≥0at times t
i, i = 0, . . . , n. We will consider the following statistical models, depending on a fixed, possibly infinite, L´ evy measure ν
0concentrated on I (clearly, the models with the subscript F V are meaningful only under the assumption (FV)):
P
n,F Vν0=
D, D
Tn, n
P
T(γnν,0,ν): f := dν
dν
0∈ F
ν0,Io , Q
n,F Vν0=
R
n+1, B ( R
n+1), n
Q
(γnν,0,ν): f := dν
dν
0∈ F
ν0,Io , P
nν0=
D, D
Tn, n
P
T(γnν−ν0,0,ν): f := dν
dν
0∈ F
ν0,Io , Q
nν0=
R
n+1, B ( R
n+1), n
Q
(γnν−ν0,0,ν): f := dν
dν
0∈ F
ν0,Io .
Finally, let us introduce the Gaussian white noise model that will appear in the statement of our main results. For that, let us denote by (C(I), C ) the space of continuous mappings from I into R endowed with its standard fil- tration, by g the density of ν
0with respect to the Lebesgue measure. We will require g > 0 and let W
fnbe the law induced on (C(I), C ) by the stochastic process satisfying:
dy
t= p
f(t)dt + dW
t2 √
T
np
g(t) , t ∈ I, (8)
where (W
t)
t∈Rdenotes a Brownian motion on R with W
0= 0. Then we set:
W
nν0=
C(I), C , { W
fn: f ∈ F
ν0,I} .
Observe that when ν
0is a finite L´ evy measure, then W
nν0is equivalent to the statistical model associated with the continuous observation of a process (˜ y
t)
t∈Idefined by:
d y ˜
t= p
f(t)g(t)dt + dW
t2 √
T
n, t ∈ I.
2.3. Main results
Using the notation introduced in Section 2.1, we now state our main results. For brevity of notation, we will denote by H(f, f ˆ
m) (resp. L
2(f, f ˆ
m)) the Hellinger distance (resp. the L
2distance) between the L´ evy measures ν and ˆ ν
mrestricted to I \ [−ε
m, ε
m], i.e.:
H
2(f, f ˆ
m) :=
Z
I\[−εm,εm]
p f (x) −
q f ˆ
m(x)
2ν
0(dx), L
2(f, f ˆ
m)
2:=
Z
I\[−εm,εm]
f (y) − f ˆ
m(y)
2ν
0(dy).
Observe that Condition (H1) implies (see Lemma 5.1) 1
4M L
2(f, f ˆ
m)
2≤ H
2(f, f ˆ
m) ≤ 1
4κ L
2(f, f ˆ
m)
2.
Theorem 2.5. Let ν
0be a known L´ evy measure concentrated on a (possibly infinite) interval I ⊆ R and having strictly positive density with respect to the Lebesgue measure. Let us choose a parameter space F
ν0,Isuch that there exist a sequence m = m
nof integers, functions V
j, j = ±2, . . . , ±m and a sequence ε
m→ 0 as m → ∞ such that Conditions (H1), (C1), (C2) are satisfied for F = F
ν0,I. Then, for n big enough we have:
∆( P
nν0, W
nν0) = O
p n∆
nsup
f∈F
A
m(f) + B
m(f) + C
m(f )
+ O
p n∆
nsup
f∈F
L
2(f, f ˆ
m) + s m
n∆
n1 µ
m+ 1 µ
−m. (9)
Theorem 2.6. Let ν
0be a known L´ evy measure concentrated on a (possibly
infinite) interval I ⊆ R and having strictly positive density with respect to
the Lebesgue measure. Let us choose a parameter space F
ν0,Isuch that there
exist a sequence m = m
nof integers, functions V
j, j = ±2, . . . , ±m and a sequence ε
m→ 0 as m → ∞ such that Conditions (H1), (C1), (C2’) are satisfied for F = F
ν0,I. Then, for n big enough we have:
∆( Q
νn0, W
nν0) = O
ν
0I \ [−ε
m, ε
m] p
n∆
2n+ m ln m
√ n + r
n p
∆
nsup
f∈F
C
m(f )
+ O
p n∆
nsup
f∈F
A
m(f) + B
m(f ) + H(f, f ˆ
m)
. (10)
Corollary 2.7. Let ν
0be as above and let us choose a parameter space F
ν0,Iso that there exist sequences m
0n, ε
0m, V
j0and m
00n, ε
00m, V
j00such that:
• Conditions (H1), (C1) and (C2) hold for m
0n, ε
0m, V
j0, and
n∆m0n
1 µm0+
1 µ−
m0
tends to zero.
• Conditions (H1), (C1) and (C2’) hold for m
00n, ε
00m, V
j00, and ν
0I \ [−ε
m00, ε
m00] p
n∆
2n+
m00√lnnm00tends to zero.
Then the statistical models P
nν0and Q
nν0are asymptotically equivalent:
n→∞
lim ∆( P
nν0, Q
νn0) = 0,
If, in addition, the L´ evy measures have finite variation, i.e. if we assume (FV), then the same results hold replacing P
nν0and Q
νn0by P
n,F Vν0and Q
n,F Vν0, respectively (see Lemma Appendix A.14).
3. Examples
We will now analyze three different examples, underlining the different behaviors of the L´ evy measure ν
0(respectively, finite, infinite with finite vari- ation and infinite with infinite variation). The three chosen L´ evy measures are I
[0,1](x)dx, I
[0,1](x)
dxxand I
R+(x)
dxx2. In all three cases we assume the pa- rameter f to be uniformly bounded and with uniformly γ-H¨ older derivatives:
We will describe adequate subclasses F
ν0,I⊆ F
(γ,K,κ,MI )defined as in (3). It
seems very likely that the same results that are highlighted in these exam-
ples hold true for more general L´ evy measures; however, we limit ourselves
to these examples in order to be able to explicitly compute the quantities
involved (v
j, x
∗j, etc.) and hence estimate the distance between f and ˆ f
mas
in Examples 2.4.
In the first of the three examples, where ν
0is the Lebesgue measure on I = [0, 1], we are considering the statistical models associated with the discrete and continuous observation of a compound Poisson process with L´ evy density f . Observe that W
nLebreduces to the statistical model associated with the continuous observation of a trajectory from:
dy
t= p
f(t)dt + 1 2 √
T
ndW
t, t ∈ [0, 1].
In this case we have:
Example 3.1. (Finite L´ evy measure). Let ν
0be the Lebesgue measure on I = [0, 1] and let F = F
Leb,[0,1]be any subclass of F
(γ,K,κ,M)[0,1]for some strictly positive constants K, κ, M and γ ∈ (0, 1]. Then:
n→∞
lim ∆( P
n,F VLeb, W
nLeb) = 0 and lim
n→∞
∆( Q
n,F VLeb, W
nLeb) = 0.
More precisely,
∆( P
n,F VLeb, W
nLeb) =
O
(n∆
n)
−4+2γγif γ ∈ 0,
12, O
(n∆
n)
−101if γ ∈
12, 1 .
In the case where ∆
n= n
−β,
12< β < 1, an upper bound for the rate of convergence of ∆( Q
Lebn,F V, W
nLeb) is
∆( Q
Lebn,F V, W
nLeb) =
O
n
−4+2γγ+βln n
if γ ∈ 0,
12and
2+2γ3+2γ≤ β < 1, O
n
12−βln n
if γ ∈ 0,
12and
12< β <
2+2γ3+2γ, O
n
−2β+110ln n
if γ ∈
12
, 1
and
34≤ β < 1, O
n
12−βln n
if γ ∈
12
, 1
and
12< β <
34. See Section 5.3 for a proof.
Example 3.2. (Infinite L´ evy measure with finite variation). Let X be a truncated Gamma process with (infinite) L´ evy measure of the form:
ν(A) = Z
A
e
−λxx dx, A ∈ B ([0, 1]).
Here F
ν0,Iis a 1-dimensional parametric family in λ, assuming that there
exists a known constant λ
0such that 0 < λ ≤ λ
0< ∞, f(t) = e
−λtand
dν
0(x) =
x1dx. In particular, the f are Lipschitz, i.e. F
ν0,[0,1]⊂ F
(γ=1,K,κ,M[0,1] ).
The discrete or continuous observation (up to time T
n) of X are asymptoti- cally equivalent to W
nν0, the statistical model associated with the observation of a trajectory of the process (y
t):
dy
t= p
f (t)dt +
√ tdW
t2 √
T
n, t ∈ [0, 1].
More precisely, in the case where ∆
n= n
−β,
12< β < 1, an upper bound for the rate of convergence of ∆( Q
n,F Vν0, W
nν0) is
∆( Q
νn,F V0, W
nν0) =
( O n
12−βln n
if
12< β ≤
109O n
−1+2β7ln n
if
109< β < 1.
Concerning the continuous setting we have:
∆( P
n,F Vν0, W
nν0) = O
n
β−16ln n
52= O T
−1
n 6
ln T
n52. See Section 5.4 for a proof.
Example 3.3. (Infinite L´ evy measure, infinite variation). Let X be a pure jump L´ evy process with infinite L´ evy measure of the form:
ν(A) = Z
A
2 − e
−λx3x
2dx, A ∈ B ( R
+).
Again, we are considering a parametric family in λ > 0, assuming that the parameter stays bounded below a known constant λ
0. Here, f (t) = 2 − e
−λt3, hence 1 ≤ f (t) ≤ 2, for all t ≥ 0, and f is Lipschitz, i.e. F
ν0,R+⊂ F
(γ=1,K,κ,M)R+. The discrete or continuous observations (up to time T
n) of X are asymptotically equivalent to the statistical model associated with the observation of a trajectory of the process (y
t):
dy
t= p
f (t)dt + tdW
t2 √
T
n, t ≥ 0.
More precisely, in the case where ∆
n= n
−β, 0 < β < 1, an upper bound for the rate of convergence of ∆( Q
nν0, W
nν0) is
∆( Q
νn0, W
nν0) =
( O n
12−23βif
34< β <
1213O n
−16+18β(ln n)
76if
1213≤ β < 1.
In the continuous setting, we have
∆( P
nν0, W
nν0) = O n
3β−334(ln n)
76= O T
−3
n 34
(ln T
n)
76.
See Section 5.5 for a proof.
4. Proofs of the main results
In order to simplify notations, the proofs will be presented in the case I ⊆ R
+. Nevertheless, this allows us to present all the main difficulties, since they can only appear near 0. To prove Theorems 2.5 and 2.6 we need to introduce several intermediate statistical models. In that regard, let us denote by Q
fjthe law of a Poisson random variable with mean T
nν(J
j) (see (4) for the definition of J
j). We will denote by L
mthe statistical model associated with the family of probabilities N
mj=2
Q
fj: f ∈ F : L
m=
N ¯
m−1, P ( ¯ N
m−1),
mO
j=2
Q
fj: f ∈ F
. (11)
By N
jfwe mean the law of a Gaussian random variable N (2 p
T
nν(J
j), 1) and by N
mthe statistical model associated with the family of probabilities N
mj=2
N
jf: f ∈ F : N
m=
R
m−1, B ( R
m−1),
mO
j=2
N
jf: f ∈ F
. (12)
For each f ∈ F , let ¯ ν
mbe the measure having ¯ f
mas a density with respect to ν
0where, for every f ∈ F , ¯ f
mis defined as follows.
f ¯
m(x) :=
( 1 if x ∈ J
1,
ν(Jj)
ν0(Jj)
if x ∈ J
j, j = 2, . . . , m. (13) Furthermore, define
P ¯
nν0=
D, D
Tn, n
P
T(γ¯νm−ν0,0,¯νm)n
: d¯ ν
mdν
0∈ F o
. (14)
4.1. Proof of Theorem 2.5
We begin by a series of lemmas that will be needed in the proof. Before doing so, let us underline the scheme of the proof. We recall that the goal is to prove that estimating f =
dνdν0
from the continuous observation of a L´ evy process (X
t)
t∈[0,Tn]without Gaussian part and having L´ evy measure ν is asymptotically equivalent to estimating f from the Gaussian white noise model:
dy
t= p
f (t)dt + 1 2 p
T
ng(t) dW
t, g = dν
0dLeb , t ∈ I.
Also, recall the definition of ˆ ν
mgiven in (5) and read P
1⇐⇒
∆P
2as P
1is asymptotically equivalent to P
2. Then, we can outline the proof in the following way.
• Step 1: P
T(γν−ν0,0,ν)n
⇐⇒
∆P
T(γνm−νˆ 0,0,ˆνm)n
;
• Step 2: P
T(γνm−νˆ 0,0,ˆνm)n
⇐⇒
∆N
mj=2
P (T
nν(J
j)) (Poisson approximation).
Here N
mj=2
P (T
nν(J
j)) represents a statistical model associated with the observation of m−1 independent Poisson r.v. of parameters T
nν(J
j);
• Step 3: N
mj=2
P (T
nν(J
j)) ⇐⇒
∆N
mj=2
N (2 p
T
nν(J
j), 1) (Gaussian ap- proximation);
• Step 4: N
mj=2
N (2 p
T
nν(J
j), 1) ⇐⇒
∆(y
t)
t∈I.
Lemmas 4.1–4.3, below, are the key ingredients of Step 2.
Lemma 4.1. Let P ¯
nν0and L
mbe the statistical models defined in (14) and (11), respectively. Under the Assumption (H2) we have:
∆( ¯ P
nν0, L
m) = 0, for all m.
Proof. Denote by ¯ N = N ∪ {∞} and consider the statistics S : (D, D
Tn) → N ¯
m−1, P ( ¯ N
m−1)
defined by S(x) =
N
Tx; 2n
, . . . , N
Tx;mn
with N
Tx;jn
= X
r≤Tn
I
Jj(∆x
r). (15) An application of Theorem Appendix A.12 to P
T(γ¯νm−ν0,0,¯νm)n
and P
T(0,0,ν0)n
,
yields
dP
T(γνm¯ −ν0,0,¯νm)n
dP
T(0,0,νn 0)(x) = exp
mX
j=2
ln ν(J
j) ν
0(J
j)
N
Tx;jn
−T
nZ
I
( ¯ f
m(y)−1)ν
0(dy)
. Hence, by means of the Fisher factorization theorem, we conclude that S is a sufficient statistics for ¯ P
nν0. Furthermore, under P
T(γνm−ν¯ 0,0,¯νm)n
, the random
variables N
Tx;jnhave Poisson distributions Q
fjwith means T
nν(J
j). Then, by means of Property Appendix A.7, we get ∆( ¯ P
nν0, L
m) = 0, for all m.
Let us denote by ˆ Q
fjthe law of a Poisson random variable with mean T
nR
Jj
f ˆ
m(y)ν
0(dy) and let L ˆ
mbe the statistical model associated with the family of probabilities { N
mj=2
Q ˆ
fj: f ∈ F }.
Lemma 4.2.
∆( L
m, L ˆ
m) ≤ sup
f∈F
s T
nκ Z
I\[0,εm]
f (y) − f ˆ
m(y)
2ν
0(dy).
Proof. By means of Facts Appendix A.2–Appendix A.4, we get:
∆( L
m, L ˆ
m) ≤ sup
f∈F
H
mO
j=2
Q
fj,
m
O
j=2
Q ˆ
fj≤ sup
f∈F
v u u t
m
X
j=2
2H
2(Q
fj, Q ˆ
fj)
= sup
f∈F
√ 2
v u u t
m
X
j=2
1 − exp
− T
n2
s Z
Jj
f ˆ (y)ν
0(dy) − s Z
Jj
f(y)ν
0(dy)
2. By making use of the fact that 1 − e
−x≤ x for all x ≥ 0 and the equality
√ a − √
b =
√a−ba+√
b
combined with the lower bound f ≥ κ (that also implies f ˆ
m≥ κ) and finally the Cauchy-Schwarz inequality, we obtain:
1 − exp
− T
n2
s Z
Jj
f ˆ (y)ν
0(dy) − s Z
Jj
f (y)ν
0(dy)
2≤ T
n2
s Z
Jj
f(y)ν ˆ
0(dy) − s Z
Jj
f (y)ν
0(dy)
2≤ T
n2
R
Jj
(f(y) − f ˆ
m(y))ν
0(dy)
2κν
0(J
j)
≤ T
n2κ Z
Jj
f(y) − f ˆ
m(y)
2ν
0(dy).
Hence, H
mO
j=2
Q
fj,
m
O
j=2
Q ˆ
fj≤ s T
nκ Z
I\[0,εm]
f(y) − f ˆ
m(y)
2ν
0(dy).
Lemma 4.3. Let ν ˆ
mand ν ¯
mthe L´ evy measures defined as in (5) and (13), respectively. For every f ∈ F , there exists a Markov kernel K such that
KP
T(γνm¯ −ν0,0,¯νm)n
= P
T(γνm−νˆ 0,0,ˆνm)n
.
Proof. By construction, ¯ ν
mand ˆ ν
mcoincide on [0, ε
m]. Let us denote by
¯
ν
mresand ˆ ν
mresthe restriction on I \ [0, ε
m] of ¯ ν
mand ˆ ν
mrespectively, then it is enough to prove: KP
(γν¯resm−ν0,0,¯νresm)
Tn
= P
(γνˆmres−ν0,0,ˆνresm)
Tn
. First of all, let us observe that the kernel M :
M (x, A) =
m
X
j=2
I
Jj(x) Z
A
V
j(y)ν
0(dy), x ∈ I \ [0, ε
m], A ∈ B (I \ [0, ε
m])
is defined in such a way that M ν ¯
mres= ˆ ν
mres. Indeed, for all A ∈ B (I \ [0, ε
m]), M ν ¯
mres(A) =
m
X
j=2
Z
Jj
M (x, A)¯ ν
mres(dx) =
m
X
j=2
Z
Jj
Z
A
V
j(y)ν
0(dy)
¯ ν
mres(dx)
=
m
X
j=2
Z
A
V
j(y)ν
0(dy)
ν(J
j) = Z
A
f ˆ
m(y)ν
0(dy) = ˆ ν
mres(A). (16)
Observe that (γ
ν¯resm−ν0, 0, ν ¯
mres) and (γ
νˆmres−ν0, 0, ν ˆ
mres) are L´ evy triplets asso- ciated with compound Poisson processes since ¯ ν
mresand ˆ ν
mresare finite L´ evy measures. The Markov kernel K interchanging the laws of the L´ evy pro- cesses is constructed explicitly in the case of compound Poisson processes.
Indeed if ¯ X is the compound Poisson process having L´ evy measure ¯ ν
mres, then X ¯
t= P
Nti=1
Y ¯
i, where N
tis a Poisson process of intensity ι
m:= ¯ ν
mres(I \ [0, ε
m]) and the ¯ Y
iare i.i.d. random variables with probability law
ι1m
ν ¯
mres. Moreover, given a trajectory of ¯ X, both the trajectory (n
t)
t∈[0,Tn]of the Poisson process (N
t)
t∈[0,Tn]and the realizations ¯ y
iof ¯ Y
i, i = 1, . . . , n
Tnare uniquely deter- mined. This allows us to construct n
Tni.i.d. random variables ˆ Y
ias follows:
For every realization ¯ y
iof ¯ Y
i, we define the realization ˆ y
iof ˆ Y
iby throwing it according to the probability law M (¯ y
i, ·). Hence, thanks to (16), ( ˆ Y
i)
iare i.i.d. random variables with probability law
ι1m
ν ˆ
mres. The desired Markov kernel K (defined on the Skorokhod space) is then given by:
K : ( ¯ X
t)
t∈[0,Tn]7−→
X ˆ
t:=
Nt
X
i=1
Y ˆ
it∈[0,Tn]
. Finally, observe that, since
ι
m= Z
I\[0,εm]
f ¯
m(y)ν
0(dy) = Z
I\[0,εm]
f(y)ν
0(dy) = Z
I\[0,εm]