NONPARAMETRIC REGRESSION ESTIMATION ONTO A POISSON POINT PROCESS COVARIATE

(1)

DOI:10.1051/ps/2014023 www.esaim-ps.org

NONPARAMETRIC REGRESSION ESTIMATION ONTO A POISSON POINT PROCESS COVARIATE

^∗

Benoˆ ıt Cadre

¹

and Lionel Truquet

²

Abstract.LetY be a real random variable andX be a Poisson point process. We investigate rates of convergence of a nonparametric estimate ˆr(x) of the regression functionr(x) =E(Y|X =x), based onn independent copies of the pair (X, Y). The estimator ˆris constructed using a Wiener–Itô decomposition of r(X). In this infinite-dimensional setting, we first obtain a finite sample bound on the expected squared difference E(ˆr(X)−r(X))². Then, under a condition ensuring that the model is genuinely infinite-dimensional, we obtain the exact rate of convergence of lnE(ˆr(X)−r(X))².

Mathematics Subject Classification. 62G05, 62G08.

Received March 13, 2014. Revised September 18, 2014.

1. Introduction

1.1. Functional regression estimation

LetS be a measurable space, and let the data (X₁, Y₁), . . . ,(X_n, Y_n) be independentS×R-valued random variables with the same distribution as a generic pair (X, Y) such thatE|Y|<∞. In the regression estimation problem, the goal is to estimate the regression functionr(X) =E(Y|X) using the data.

In the classical setting, each covariateX_iis supposed to be a collection of numerical experiments represented by a finite-dimensional vector. Thus, to date, most of the results pertaining to regression estimation have been reported in the finite-dimensional case whereS =R^d. We refer the reader to the book by Gyo¨rfiet al. [7] for a comprehensive introduction to the subject and an overview of most standard methods inR^d.

However, in an increasing number of practical applications, input data items take more complicated forms.

In the functional data analysis, S is a set of curves and, in this context, regression estimation has many applications in a wide class of problems, among with speech recording, analysis of patients visits in a hospital, price of an option... Last few years have witnessed important developments in both the theory and practise of functional data analysis, and many traditional statistical tools have been adapted to handle functional inputs.

The book by Ramsay and Silverman [13] provides a presentation of this area.

Keywords and phrases.Regression estimation, Poisson point process, Wiener–Itˆo decomposition, rates of convergence.

∗The authors wish to thank the referee and the Associate Editor for valuable comments and insightful suggestions.

1 IRMAR, ENS Rennes, CNRS, UEB, Campus de Ker Lann, Avenue Robert Schuman, 35170 Bruz, France.

[email protected]

2 IRMAR, Ensai, CNRS, UEB, Campus de Ker Lann, Avenue Robert Schuman, 35170 Bruz, France.[email protected]

Article published by EDP Sciences c EDP Sciences, SMAI 2015

(2)

B. CADRE AND L. TRUQUET

In the infinite-dimensional setting, the regression problem is faced with new challenges which requires changing methodology (see [2]). Curiously, despite a huge research activity in the area of infinite-dimensional data analysis, few attempts have been made to connect it with the rich theory of stochastic processes that both provides a wide class of models and powerful tools. This approach, based on a use of stochastic process theory for the benefit of nonparametric estimation, has been studied in the fairly closed problem of supervised classification;

in this direction, we refer the reader to the recent papers by Ba`ılloet al.[5], Biauet al.[4], Cadre [6].

Among the classical models from the theory of time-dependent stochastic processes, the case of Poisson processes is of great interest. Here, S is the set of counting paths on a subset of R+. In epidemiology for example, the observed curveX_irepresent the dates of the patient visits to the doctor and the variable response Y_i provide quantitative information on the health status of the patient.

More generally, we consider in this paper the case of a Poisson point process covariate, which corresponds to the situation where each measurement report the locations of every individual event. In this setting, the state spaceS is identiﬁed to the so-called Poisson space over some measurable spaceX,i.e.

S = _n

i=1

δ_x_i, n∈Nandx_i∈X

,

whereδ_x stands for the Dirac measure onx.

1.2. Poisson point process regression estimation

In the sequel, the covariateX is a Poisson point process (see [9]) on a domainX⊂R^d with mean measureμ, where μis aσ-finite measure on the Borelσ-fieldX ofX. As seen before, this meansX belongs to the space of integer-valuedσ-finite measures on Xand satisfies:

• for anyA∈X, the numberX_Aof points ofX lying inA has a Poisson distribution with parameterμ(A);

• for any family of disjoint setsA₁, . . . , A∈X,X_A₁, . . . , X_A are independent random variables.

In the case X = R, Itˆo’s famous chaos expansion (see [8,15]) says that every square integrable and σ(X)- measurable random variable can be decomposed as a sum of multiple stochastic integrals, called chaos. This result has been generalized by Nualart and Vives [12], and more recently by Last and Penrose [10].

Now recall some basic facts about chaotic decomposition in the Poisson space. Fix k ≥ 1. Provided g ∈ L²(μ^⊗k), we can deﬁne thekth chaosI_k(g) associated withg, namely

I_k(g) =

Δk

gd(X−μ)^⊗k, (1.1)

where Δ_k ={x∈X^k : x_i =x_j for all i=j}. Interestingly, ifg ∈ L²(μ^⊗k) and h∈L²(μ^⊗) for k, ≥1, we have

EI_k(g)I(h) =k!

X^kg¯¯hdμ^⊗k1_{k=} andEI_k(g) = 0, (1.2) where ¯g and ¯hare the symmetrizations ofg andh, that is, for all (x₁, . . . , x_k)∈X^k:

¯

g(x₁, . . . , x_k) = 1 k!

σ

g(x_σ(1), . . . , x_σ(k)),

the sum being taken over all permutationsσ= (σ(1), . . . , σ(k)) of{1, . . . , k}, and similarly for ¯h. In particular, note thatI_k(g) is a square integrable random variable. In Nualart and Vives ([12], p. 160), it is proved that every square integrableσ(X)-measurable random variable can be decomposed as an inﬁnite sum of chaos. Applied to our regression problem, this statement writes as

r(X) =EY +

k≥1

1

k!I_k(f_k), (1.3)

(3)

where equality holds in L², providedEY² <∞. In the above formula, each f_k is an element of L²_sym(μ^⊗k) – the subset of symmetric functions in L²(μ^⊗k) –, and the decomposition is deﬁned in an unique way. Last and Penrose [10] proved that eachf_k can be expressed as a diﬀerence operator of orderk.

Based on independent copies of (X, Y), we shall construct a nonparametric estimate ˆr ofrwith the help of decomposition (1.3). Next section is devoted to the model, and to the construction and statistical properties of ˆ

r. In particular, we obtain a ﬁnite sample bound on the mean squared error E(ˆr(X)−r(X))². Moreover, we prove that if the model is genuinely inﬁnite-dimensional in the sense that inf_k≥1fk_L2(μ^⊗k) >0 then, under some regularity conditions, and letting ln₂n= ln(lnn):

n→∞lim

lnE(ˆr(X)−r(X))²

√lnnln₂n =− α

2d, for someα∈]0,1[. Last two sections contains proofs.

In this paper, we focus on Poisson point process, since it models many interesting situations that naturally raise in statistics. However, we believe that our work may be generalized, for example towards the direction of L´evy processes for which a chaotic decomposition property also holds, see [1].

2. Regression estimate

2.1. Model and heuristic of the estimate

Basic assumptions on the model. We assume throughout thatX is a compact set, sayX⊂[−M, M]^d, and the mean measureμhas a density ϕwith respect to the Lebesgue measureλonXthat is,ϕ: X→R₊is such that for all Borel setA⊂X:

EXA=

A

ϕdλ.

Heuristics. In view of a short presentation of the heuristic of the estimate ofr, we assume for simplicity that ϕ is a known positive function. Suppose also that the intensityϕ and the f_k’s deﬁned by (1.3) are bounded functions. LetW be a bounded density onX,h=h(n)>0 and for all x∈X:

W_h(x) = 1 h^dW

x h

· (2.1)

For a real-valued functiong deﬁned onX, the notationg^⊗k denotes the real-valued function onX^k such that g^⊗k(x) =

k i=1

g(x_i), x= (x₁, . . . , x_k)∈X^k. By relations (1.2) and (1.3), we have for allx∈X^k andk≥1:

EY I_k

W_h^⊗k(x− ·)

=Er(X)I_k

W_h^⊗k(x− ·)

=

≥1

1

!EI(f)I_k

W_h^⊗k(x− ·)

=

X^kf_kW¯_h^⊗k(x− ·)ϕ^⊗kdλ^⊗k,

where ¯W_h^⊗k(x− ·) is the symmetrization of the functionW_h^⊗k(x− ·). Sincef_k is a symmetric function, we can write

EY Ik

W_h^⊗k(x− ·)

=

X^kf_kW_h^⊗k(x− ·)ϕ^⊗kdλ^⊗k.

(4)

Thus, under smoothness assumptions on ϕ and f_k, the right-hand side converges to f_k(x)ϕ^⊗k(x), provided h→ 0. From the left-hand side, we thus deduce that a kernel-type estimator of f_k(x) based on independent copies (X₁, Y₁), . . . ,(X_n, Y_n) of (X, Y) is

f˜_k(x) = 1 n

n i=1

Y_i

Δk

W_h^⊗k(x− ·)

ϕ^⊗k(x) d(X_i−μ)^⊗k. By (1.1), thekth chaosI_k(f_k) is thus estimated by

I˜_k=

Δk

f˜_k(x)d(X−μ)^⊗k, (2.2)

which, by (1.3), gives the estimator ˜rofrdeﬁned by

˜

r(X) = ¯Y_n+

N(n) k=1

1

k!I˜_k, (2.3)

whereN(n) tends to inﬁnity and, as usual,

Y¯_n = 1 n

n i=1

Y_i.

However, this construction requires the unrealistic assumption thatϕis known. Thus we shall ﬁrst present a nonparametric estimator ofϕ, then we shall adapt the previous idea to this context.

2.2. Construction of the estimator

Estimation of ϕ.Construction of the nonparametric estimate of ϕ is based on Mecke’s Formula (see [11]), which states in particular that

E

XγdX =

Xγdμ=

Xγ ϕdλ,

providedγ : X→Ris inL¹(μ). With this respect, we deﬁne the nonparametric estimator ˆϕofϕby ˆ

ϕ(x) = 1 n

n j=1

XK_b(x, x)X_j(dx), x∈X, where, for allx= (x₁, . . . , x_d) andu= (u₁, . . . , u_d)∈X,

K_b(x, u) =

d j=1

(J_b(x_j−u_j) +J_b(2M+x_j+u_j) +J_b(2M−x_j−u_j)). (2.4)

In the above formula,J is a continuous and symmetric density on [−1,1] and, for the bandwidth b=b(n)>0 such thatb→0 asn→ ∞:

J_b(y) =1 bJ

y b

, y∈[−b, b].

This particular construction is classical in order to avoid bias on the boundary ofX, a classical drawback of usual convolution kernel estimators, for example when the intensity is positive on the boundary ofX. In this case, we need to correct the bias using one-sided kernel near the boundary (see p. 30 in the book by Silverman [14]).

(5)

Then, we deﬁne for alli= 1, . . . , nthe leave-one-out nonparametric estimator ˆϕ_i ofϕby ˆ

ϕ_i(x) = ˆϕ(x)−1 n

XK_b(x, x)X_i(dx)

= 1 n

n j=1,j=i

XK_b(x, x)X_j(dx), forx∈X. Leave-one-out procedure is only considered here for technical matters.

Chaos estimate. Now consider the vanishing sequences ρ= ρ(n) >0 and h= h(n) >0. Following (2.2), the leave-one-out estimator of thekth chaosI_k(f_k) is ˆI_k, such that

Iˆ_k= 1 n

n i=1

Y_i

Δk

W_h^⊗k(x− ·)

( ˆϕ_i+ρ)^⊗k(x)(dX_i−ϕˆ_idλ)^⊗k

(dX−ϕˆ_idλ)^⊗k(x),

whereW_h is deﬁned by (2.1). In the sequel, we assume for simplicity thatW has a compact support.

Regression estimate.Finally, following the idea drawn by (2.3), the estimator ˆrofris ˆ

r(X) = ¯Y_n+

N(n) k=1

1 k!Iˆ_k, whereN(n) tends to inﬁnity.

2.3. Result

In the sequel,.is the euclidean norm on anyR^p. We assume that (X, Y) is independent from the sample (X₁, Y₁), . . . ,(X_n, Y_n). Now introduce the assumptions on the model.

H1.Y is a bounded random variable.

H2. inf_Xϕ >0 and there existsL₁>0 such that for all x, y∈X:

|ϕ(x)−ϕ(y)| ≤L₁x−y. H3. There exists a constantL₂>0 such that for allk≥1 andx, y∈X^k:

|fk(x)−f_k(y)| ≤L^k₂x−y.

Recall that h=h(n),b =b(n) and ρ=ρ(n) are vanishing sequences of positive numbers. Moreover,N(n) tends to inﬁnity; for simplicity, we writeN instead ofN(n).

H4. There exist two constantsC₁, C₂>0 such thatnb^d+2,nh^dN+1 andnb^dρ³/Nare bounded below byC₁, and b²+ρ≤C₂h^dN+1.

Assumptions H1–H3 are classical in nonparametric estimation. H1 has been introduced for technical matters;

however, similar results can be proved under tail assumptions onY. Observe moreover that, sinceXis bounded, H3 holds when, for instance, the pair (X, Y) is such that each f_k can be written as f_k(x) = θ^⊗k(x) for all x∈X^k, whereθ is a bounded Lipschitz function onX.

Finally, it is easily seen that H4 holds forb=n^−γ,ρ=n^−β andh, N deﬁned by formulas (2.6) below, under the additional constraints that 0 < γ < 1/(d+ 2), 0 < 3β < 1−dγ and α < min(1, β,2γ). Note that last constraint is suﬃcient forb²+ρ≤C₂h^dN+1to hold because, under the conditionu_N/(NlnN)→0 asn→ ∞:

lnh^dN+1=−αlnn+◦(lnn).

Moreover, since

lnnh^dN+1= (1−α) lnn+◦(lnn), we see thatα <1 is a suﬃcient condition fornh^dN+1≥C₁ to hold.

(6)

Theorem 2.1. Assume that H1–H4hold. Then, there existsC≥1 such that E(ˆr(X)−r(X))²≤C^N

h+ 1

N!

. (2.5)

In particular, for the following choices:

N=

2αlnn dln₂n

, h= e^−N^ln^N^−u^N, (2.6)

whereα >0 andu_N ≥0is such that u_N/(NlnN)→0 asn→ ∞, we have:

lim sup

n→∞

lnE(ˆr(X)−r(X))²

√lnnln₂n ≤ − α

2d·

Theorem2.1entails that, for all ω < α/(2d) andnlarge enough, the mean squared error between ˆr(X) and r(X) is at least

exp −

ωlnnln₂n

.

Note that this rate is obtained for those bandwidthshthat satisfy, asn→ ∞:

lnh∼ − α

2dlnnln₂n·

Hence, the general finding here is that the rate of convergence of the mean squared error is much slower than the traditional finite-dimensional rate (see [7]), but faster than the rates obtained by Biauet al.[3] in the infinite-dimensional setting. This, of course, is explained by the fact that we fully exploit the particular nature of the covariate via the chaotic decomposition of square integrableσ(X)-measurable random variables whereas Biauet al.[3] utilize a k-nearest neighbor estimator whose construction does not depend on the law ofX.

For fixed k, estimation of f_k more or less requires to make use of usual tools in nonparametric estimation in dimension dk. In particular, optimal estimator off_k should be obtain for this specific problem. However, the mean squared error in the estimation ofr(X) not only depends on the risks in the estimation of thef_k’s, but also on the remainder term of the chaotic decomposition which plays a key role. Thus, we proceed to a simultaneous optimization of N and the risks of the estimates of f₁, . . . , f_N, leading to non-standard choices for the bandwidths. Nevertheless, due to the lack of information on the minimax rate of convergence in this infinite-dimensional setting, we can not guarantee that this procedure leads to an optimal estimate forr(X).

Our task is now to prove that the rate of Theorem 2.1is optimal in a genuine inﬁnite-dimensional setting.

With this goal, we see that it is necessary to strengthen the conditions on the model. Indeed, if all thef_k’s for k≥k₀have a nullL²_sym(μ^⊗k)-norm, thenrcan be decomposed into a ﬁnite sum of chaos (see1.3), and hence we are faced with a ﬁnite-dimensional estimation problem, for which the rate of theorem2.1may not be optimal.

To avoid this situation, we assume that all thef_k’s have aL²(μ^⊗k)-norm greater than a positive constant.

Theorem 2.2. Assume that H1–H4 hold, and inf_k≥1fk_L²_(μ^⊗k₎>0. If N andh are given by (2.6) with the additional assumption onu_N that u_N/N→ ∞, we have

n→∞lim

lnE(ˆr(X)−r(X))²

√lnnln₂n =− α

2d·

To the best of our knowledge, this is the ﬁrst exact (logarithmic) rate in the regression estimation problem with a Poisson point process covariate. However, it is an open problem to know whether this rate is optimal over the whole class of regression estimates.

(7)

3. Proofs of theorems 2.1 and 2.2

In the sequel, κ >0 is such that sup_Xϕsup_R2dK≤κand we assume for simplicity that constantsC₁, C₂ of assumption H4 are equal to 1. Moreover, we assume thatρ, bandhare smaller than 1 (recall that they vanish asntends to inﬁnity). Finally, we let for allk≥1:

β_k=ρ^−kexp

−nb^dρ² 4κ

.

We start the section with the following result, whose proof is presented later in this section.

Lemma 3.1. Assume that assumptionsH1–H3hold and nb^d+2≥1. Then, there exists C≥1 such that for all k≥1:

E

Iˆ_k−I_k(f_k) ₂

≤C^k

(k!)²b²+β_2k

h^dk +h+ k!

nh^dk

.

Proof of Theorem 2.1. In the sequel, C≥1 denotes a constant whose value may change from line to line.

According to Jensen’s inequality and Lemma3.1:

E _N

k=1

1 k!

Iˆ_k−I_k(f_k) ²

≤C^N N k=1

b²+β_2k h^dk + h

(k!)² + 1 k!nh^dk

≤C^N

b²+β_2N

h^dN +h+ 1 nh^dN

. (3.1)

Moreover, according to Theorem 4.2 in Last and Penrose [10],

E

⎛

⎝

k≥N+1

I_k(f_k) k!

⎞

⎠

2

≤ 1 N!E

X^N+2

D^N_x⁺²r(X)₂

ϕ^⊗(N⁺²⁾(x)λ^⊗(N⁺²⁾(dx), (3.2)

where for allx∈X^N⁺²,D^N_x⁺² denotes the diﬀerence operator of orderN+ 2, that is, if δ_z is the Dirac mass onz:

D^N_x⁺²r(X) =

S⊂{1,...,N+2}

(−1)^N^+2−|S|r

X+

s∈S

δ_x_s

.

In the formula above, |S| is the number of elements of S. But |r| is bounded since Y is bounded under H1.

Hence,|D_x^N+2r(X)| ≤C2^N+2 and, by (3.2):

E

⎛

⎝

k≥N+1

I_k(f_k) k!

⎞

⎠

2

≤C^N N!· Putting all pieces together, we deduce from (3.1) and above that

E(ˆr(X)−r(X))²≤C^N 1

n+b²+β_2N

h^dN +h+ 1 nh^dN + 1

N!

,

because E(EY −Y¯_n)² ≤1/n.Now, under condition H4 which in particular states thatnb^dρ³/N ≥1, we have fornlarge enough:

β_2N ≤ρ^−2Ne^3N^ln^ρ≤ρ. (3.3)

(8)

Moreover, since nh^dN+1≥1 andb²+ρ≤h^dN+1:

E(ˆr(X)−r(X))²≤C^N

h+ 1 N!

, (3.4)

hence the ﬁrst part of the theorem.

Now consider the case whereN andhare given by (2.6). We have

C^Nh= e^N^ln^C−N^ln^N^−u^N ≤e^−N^ln^N^+N^ln^C. (3.5) Furthemore, according to the Stirling’s formula,

C^N

N! ≤e^N^ln^C+N−N^ln^N. Hence by (3.4), we have for allε >0:

lnE(ˆr(X)−r(X))²≤ln

e^−N^ln^N+N^ln^C+ e^−N^ln^N^+N(1+ln^C)

≤ −NlnN+ ln

e^N^ln^C+ e^N(1+ln^C)

.

Moreover, by the very deﬁnition ofN given in (2.6):

NlnN =

2αlnn dln₂n

ln

2αlnn dln₂n

= α

2dlnnln₂n(1 +◦(1)). (3.6) Putting all pieces together yield

lim sup

n→∞

lnE(ˆ√r(X)−r(X))² lnnln₂n ≤ −

α 2d,

which is the desired result.

Proof of Theorem2.2.For simplicity, we assume that inf_k≥1f_k_L²_(μ^⊗k₎≥1. Then, by the triangle inequality and (1.2):

E(ˆr(X)−r(X))²= E

⎡

⎣^N

k=1

1 k!

Iˆ_k−I_k(f_k)

−

k≥N+1

1 k!I_k(f_k)

⎤

⎦

2

≥ E

⎛

⎝

k≥N+1

1 k!I_k(f_k)

⎞

⎠

2

− E

_N

k=1

1 k!

Iˆ_k−I_k(f_k) ²

≥ 1

(N+ 1)!− E

_N

k=1

1 k!

Iˆ_k−I_k(f_k) ²

.

Applying successively (3.1), (3.3) and H4, we ﬁrst deduce that E

_N

k=1

1 k!

Iˆ_k−I_k(f_k) ²

≤C^Nh= e^−N^ln^N^+N^ln^C−u^N.

(9)

Moreover, by the Stirling’s formula:

1

(N+ 1)! ≥e^−(N^{+3/2) ln(N}^+1)+N. Consequently,E(ˆr(X)−r(X))²is greater than

e−(N/2+3/4) ln(N+1)+N/2−e^{−(N/2) ln}^{N+(N/2) ln}^C−u^N^/2 ₂

. Then, using the conditionu_N/N→ ∞, we get with easy calculations that

lnE(ˆr(X)−r(X))²≥ −NlnN+cN, for some constantc >0. By (3.6), we deduce that

lim inf

n→∞

lnE(ˆr(X)−r(X))²

√lnnln₂n ≥ − α

2d,

which, combined with Theorem2.1, gives the result.

Fixk≥1 and denote for allx, y ∈X^k: ˆ

g_k,i(x, y) = W_h^⊗k(x−y)

( ˆϕ_i+ρ)^⊗k(x)· (3.7)

We also let for alli= 1, . . . , n:

d ˆX_i = dX_i−ϕˆ_idλ, d ¯X_i= dX−ϕˆ_idλ, (3.8)

d ˜X_i = dX_i−ϕdλand d ˜X = dX−ϕdλ. (3.9)

With this respect, we have:

Iˆ_k= 1 n

n i=1

Y_i

Δ²_k

ˆ

g_k,i(x, y) ˆX_i^⊗k(dy) ¯X_i^⊗k(dx).

Proof of Lemma 3.1.For simplicity, we shall assume in this proof that|Y| ≤1. Moreover,C≥1 denotes a constant whose value may change from line to line. With the help of notations (3.7)–(3.9), we let:

Jˆ₁= 1 n

n i=1

Y_i

Δ²_k

ˆ

g_k,i(x, y) ˜X_i^⊗k(dy) ˜X^⊗k(dx) Jˆ₂= 1

n n i=1

Y_i

Δ²_k

g_k(x, y) ˜X_i^⊗k(dy) ˜X^⊗k(dx), where, forx, y ∈X^k:

g_k(x, y) =W_h^⊗k(x−y) ϕ^⊗k(x) · Then, since|Y| ≤1, we get by Jensen’s inequality:

E Iˆ_k−Jˆ₁

₂

≤E

Δ²_k

ˆ

g_k,1(x, y)

!Xˆ₁^⊗k(dy) ¯X₁^⊗k(dx)−X˜₁^⊗k(dy) ˜X^⊗k(dx)

"²

≤2(R_1k+R_2k),

(10)

whereR_1k andR_2k are deﬁned in Lemma4.4. Hence, E

Iˆ_k−Jˆ₁ ₂

≤C^k(k!)²(1 +β_2k) b²

h^dk· (3.10)

Moreover, conditioning ﬁrst byX₁, . . . , X_n then byX₂, . . . , X_n, we ﬁnd with two successive applications of the isometry formula (1.2), that

E Jˆ₁−Jˆ₂

₂

≤E

Δ²_k

(ˆg_k,1(x, y)−g_k(x, y)) ˜X₁^⊗k(dy) ˜X^⊗k(dx) ₂

≤(k!)²

X^2kE(ˆg_k,1(x, y)−g_k(x, y))²ϕ^⊗k(x)ϕ^⊗k(y)dxdy.

Thus, using the notation of Lemma4.3, we get:

E Jˆ₁−Jˆ₂

₂

≤C^k(k!)²N_k

X^2k

W_h^⊗k(x−y)2

dxdy

≤C^k(k!)²b²+ρ²+β_2k

h^dk · (3.11)

Finally, formula (1.2) gives E

Jˆ₂−I_k(f_k) ₂

=

X^kE

1 n

n i=1

Z_i(x)−f_k(x) ₂

ϕ^⊗k(x)dx

=

X^k

1

nvar (Z₁(x)) + (EZ₁(x)−f_k(x))²

ϕ^⊗k(x)dx, (3.12)

where for allx∈X^k andi= 1, . . . , n:

Z_i(x) =Y_i

Δk

g_k(x, y) ˜X_i^⊗k(dy).

By (1.3) and (1.2):

EZ₁(x) =Er(X)

Δk

g_k(x, y) ˜X^⊗k(dy) =

X^kf_k(y)g_k(x, y)ϕ^⊗k(y)dy

= 1

ϕ^⊗k(x)

X^kf_k(y)W_h^⊗k(x−y)ϕ^⊗k(y)dy.

Then, easy calculations prove that, sinceW has a compact support:

X^k(EZ₁(x)−f_k(x))²ϕ^⊗k(x)dx≤C^kh. (3.13) Moreover, by (1.2):

var (Z₁(x))≤E

Δk

g_k(x, y)X^⊗k(dy) ₂

≤k!

X^kg_k²(x, y)ϕ^⊗k(y)dy

≤ k!

[ϕ^⊗k(x)]²

X^k

W_h^⊗k(x−y)₂

ϕ^⊗k(y)dy.

(11)

Hence,

X^kvar (Z₁(x))ϕ^⊗k(x)dx≤k!C^k

h^dk· (3.14)

We can conclude with (3.12)–(3.14) that E

Jˆ₂−I_k(f_k) ₂

≤C^k

h+ k!

nh^dk

·

Finally, the lemma is then a consequence of (3.10), (3.11) and above, sincebandρvanish.

4. Auxiliary results

4.1. Intensity estimation

Recall that

β_k=ρ^−kexp

−nb^dρ² 4κ

, whereκ >0 is such that sup_Xϕsup_R2dK≤κ.

Lemma 4.1. AssumeH2holds andnb^d+2≥1. Then, there exists C≥1 such that for all k≥1 andx∈X:

(i) E|ϕˆ₁(x)−ϕ(x)|^k≤C^kb^k (ii) E## 1

ˆ

ϕ₁(x) +ρ− 1

ϕ(x)##^k≤C^k

b^k+ρ^k+β_k .

Proof. First we compute the bias of ˆϕ₁(x). For simplicity, we assume d= 1 only for the computation of the biais. By the very deﬁnition of ˆϕ₁(x) (see2.4and below), we have

Eϕˆ₁(x) = _M

−MK_b(x, y)ϕ(y)dy

=

_(x+M)/b

(x−M)/b J(z)ϕ(x−bz)dz+

_(3M+x)/b

(M+x)/b J(z)ϕ(bz−2M−x)dz +

_(3M_−x)/b

(M−x)/b J(z)ϕ(2M−x−bz)dz.

Distinguishing the casesx∈[−M,−M+b],x∈[−M+b, M−b] andx∈[M −b, M], it is an easy exercise to prove that under the Lipschitz condition H2 onϕ, we have

|Eϕˆ₁(x)−ϕ(x)| ≤3bL₁ ₁

−1(y|+ 2)J(y)dy≤Cb, (4.1)

where, here and in the following,C≥1 is a constant that does not depend onx,kandn, and may change from line to line.

Our second task is to give an exponential inequality for the deviation probability of ˆϕ₁(x). Fix α >0 and, for simplicity, writeF_x(y) =K_b(x, y) fory∈X. We have by independence, for alls >0:

P( ˆϕ₁(x)−Eϕˆ₁(x)≥α) =P

exp

s n i=2

XF_x(dX_i−ϕdλ)

≥e^αsn

≤e^−αsn

Eexp

s

XF_x(dX₁−ϕdλ) _n−1

,

(12)

using Markov’s inequality. By Campbell’s inequality (see [9]), we thus have P( ˆϕ₁(x)−Eϕˆ₁(x)≥α)≤exp

−αsn+ (n−1)

X

e^sF^x−sF_x−1 ϕdλ

. (4.2)

But, ifK₊, ϕ₊>0 are constants such thatK(x, y)≤K₊ andϕ(x)≤ϕ₊ for allx, y ∈X, we have:

X

##e^sF^x−sF_x−1##ϕdλ≤

j≥2

s^j j!

XF_x^jϕdλ

≤

j≥2

s^j j!

K₊^j−1ϕ₊ b^d(j−1)

≤ϕ₊b^d K₊

exp

sK₊ b^d

−sK₊ b^d −1

.

Letting δthe function deﬁned for allt≥0 byδ(t) =t−(1 +t) ln(1 +t) and choosingsso that s= b^d

K₊ln

1 + α ϕ₊

,

we have by (4.2) and above:

P( ˆϕ₁(x)−Eϕˆ₁(x)≥α)≤exp

nb^dϕ₊ K₊ δ

α ϕ₊

− α

ϕ₊ +b^dϕ₊ K₊ ln

1 + α

ϕ₊

≤

1 + α ϕ₊

exp

nb^dϕ₊ K₊ δ

α ϕ₊

,

fornlarge enough, sincebvanishes. Considering−F_x instead ofF_x, we can conclude that P##ϕˆ₁(x)−Eϕˆ₁(x)##≥α

≤2

1 + α ϕ₊

exp

nb^dϕ₊ K₊ δ

α ϕ₊

. (4.3)

We are now in a position to establish (i) and (ii). Regarding (i), we have by (4.1):

E##ϕˆ₁(x)−ϕ(x)##^k ≤2^k

C^kb^k+E##ϕˆ₁(x)−Eϕˆ₁(x)##^k . Moreover, writing

E##ϕˆ₁(x)−Eϕˆ₁(x)##^k = _∞

0 P##ϕˆ₁(x)−Eϕˆ₁(x)##≥u^1/k

du

and observing that, sinceδ(t) is smaller than−t²/4 or−t/4, depending ont≤1 or not, we deduce from (4.3) and an obvious decomposition of the above integral that :

E##ϕˆ₁(x)−Eϕˆ₁(x)##^k ≤ C

√nb^d _k

. Putting all pieces together gives (i), sincenb^d+2≥1.

Next we prove (ii). Note that, according to (4.1):

### 1

Eϕˆ₁(x)− 1 ϕ(x)

###≤ Cb ϕ₋(ϕ₋−Cb)·

(13)

Sinceb vanishes asntends to inﬁnity, we thus have

### 1

Eϕˆ₁(x)− 1 ϕ(x)

###≤Cb. (4.4)

Let nowA={|ϕˆ₁(x)−Eϕˆ₁(x)| ≥ρ}. Since infXϕ >0 by assumption, we have according to (4.1):

E### 1 ˆ

ϕ₁(x) +ρ− 1 Eϕ(x)ˆ

###^k ≤E### 1 ˆ

ϕ₁(x) +ρ− 1 Eϕ(x)ˆ

###^k1_Ac

+E### 1 ˆ

ϕ₁(x) +ρ− 1 Eϕ(x)ˆ

###^k1_A

≤C^k

ρ^k+ρ^−kP(A)

≤C^k

ρ^k+ρ^−kexp

−nb^dρ² 4κ

·

Last inequality is a consequence of (4.3), ρ→0 as n→ ∞ and the fact that δ(t)≤ −t²/4 providedt > 0 is small enough. We can now conclude with the above inequality and (4.4).

4.2. Perturbated chaos

Lemma 4.2. Let g∈L²(μ^⊗k)andψ∈L²(λ). Ifdν=ψdλ, we have E

Δk

gd

(X−ν)^⊗k−(X−μ)^⊗k²

≤

k−1

i=0

i!

k i

₂

ϕ−ψ^2(k−i)₂

X^k

g(x)²

i j=1

ϕ(x_j)λ^⊗k(dx).

Proof. For simplicity of the proof, we assume that g is a symmetric function. Otherwise, one only needs to consider its symmetrized version. First observe that by symmetry ofg:

Δk

gd(X−ν)^⊗k= k i=0

k

i _Δ_kgd(X−μ)^⊗id(μ−ν)^⊗k−i. Consequently,

D=

Δk

gd(X−ν)^⊗k−

Δk

gd(X−μ)^⊗k

=

k−1

i=0

k

i _Δ_kgd(μ−ν)^⊗k−id(X−μ)^⊗i. Hence, letting fori= 1, . . . , kandx∈Xⁱ,

Γ(x₁, . . . , x_i) =$

y∈Δ_k−i : y∈ {/ x₁, . . . , x_i} for all= 1, . . . , k−i% , we have:

D=

X^kg(ϕ−ψ)^⊗kdλ^⊗k+

k−1

i=1

k

i _Δ_ig_id(X−μ)^⊗i,

(14)

where for (x₁, . . . , x_i)∈Δ_i,

g_i(x₁, . . . , x_i) =

Γ(x1,...,xi)g(x₁, . . . , x_k)

k j=i+1

(ϕ−ψ)(x_j)λ(dx_j).

Observe that by Cauchy–Schwarz,

g_i(x₁, . . . , x_i)²≤ ϕ−ψ^2(k−i)₂

X^k−ig(x₁, . . . , x_k)λ(dx_i+1). . . λ(dx_k).

Then, since eachg_i is symmetric, we deduce from equations (1.1) and (1.2) that ED²=

X^kg(ϕ−ψ)^⊗kdλ^⊗k ₂

+

k−1

i=1

i!

k i

₂

Xⁱg²_idμ^⊗i

≤

k−1

i=0

i!

k i

₂

ϕ−ψ^2(k−i)₂

X^kg(x)²

i j=1

ϕ(x_j)λ^⊗k(dx),

hence the lemma.

4.3. Technical inequalities

Lemma 4.3. AssumeH2holds andnb^d+2≥1. There exists C≥1 such that for allk, , n≥1:

(i) M_k,= sup

x∈X^kE

ϕˆ₁−ϕ₂ ( ˆϕ₁+ρ)^⊗k(x)

2

≤C^k(1 +β_2k)b²; (ii) N_k = sup

x∈X^kE### 1

( ˆϕ₁+ρ)^⊗k(x) − 1 ϕ^⊗k(x)

###²≤C^k

b²+ρ²+β_2k .

Proof. We only prove (i). First observe that by Cauchy–Schwarz and sinceXis bounded:

M_k,² ≤Eϕˆ₁−ϕ⁴₂ sup

x∈X^kE

( ˆϕ₁+ρ)^⊗k(x)₋₄

≤C^ksup

x∈XE|ϕˆ₁(x)−ϕ(x)|⁴ sup

x∈X^kE

( ˆϕ₁+ρ)^⊗k(x)−4

,

where, here and in the following,C is a positive constant that does not depend onk, and nand may change from line to line. Hence by Lemma4.1:

M_k,² ≤C^kb⁴sup

x∈XE

( ˆϕ₁+ρ)^⊗k(x)₋₄

. (4.5)

Thus, we only need to consider the rightmost term. Fix x∈X^k, and note that E

( ˆϕ₁+ρ)^⊗k(x)−4

≤ 8

(ϕ^⊗k(x))⁴ + 8E### 1

( ˆϕ₁+ρ)^⊗k(x)− 1 ϕ^⊗k(x)

###⁴. (4.6)

The task is to bound the term

A=E### 1

( ˆϕ₁+ρ)^⊗k(x)− 1 ϕ^⊗k(x)

###⁴.

(15)

We shall make use of the following inequality:

## ^k

i=1

a_i−

k i=1

b_i##⁴≤16^k

∅=I⊂{1,...,k}i∈I

|a_i−b_i|⁴

i /∈I

b⁴_i,

where thea_i’s and theb_i’s are positive real numbers. Sinceϕis bounded below by a positive constant:

A≤C^k

∅=I⊂{1,...,k}

E

i∈I

#### 1 ˆ

ϕ₁(x_i) +ρ− 1 ϕ(x_i)

####⁴

≤C^k

∅=I⊂{1,...,k}i∈I

E##

## 1 ˆ

ϕ₁(x_i) +ρ− 1 ϕ(x_i)

####^4|I|

_1/|I|

,

according to H¨older’s inequality, and where|I|is the cardinality of the setI. Thus, by Lemma4.1:

A≤C^k k j=1

k j

sup

x∈XE##

## 1 ˆ

ϕ₁(x) +ρ− 1 ϕ(x)

####^4j

≤C^k k j=1

k

j b^4j+ρ^4j+ρ^−4jexp

−nb^dρ² 4κ

≤C^k

1 +ρ^−4kexp

−nb^dρ² 4κ

.

because b and ρvanishes as n→ ∞. Assertion (i) is then a straightforward consequence of inequalities (4.5)

and (4.6).

Before statement of next lemma, we recall the notations (3.7)–(3.9).

Lemma 4.4. Assume H2and H3hold, and nb^d+2 ≥1. Then, there exists a constant C ≥1 such that for all k, n≥1, both quantities above:

R_1k =E

Δ²_k

ˆ

g_k,1(x, y) ˆX₁^⊗k(dy)

X¯₁^⊗k(dx)−X˜^⊗k(dx) ²

R_2k =E

Δ²_k

ˆ

g_k,1(x, y)

Xˆ₁^⊗k(dy)−X˜₁^⊗k(dy)

X˜^⊗k(dx) ₂

are bounded by

C^k(k!)²(1 +β_2k) b² h^dk·

Proof. We only prove the bound forR_1k, other proof being similar. In the sequel,C≥1 is a constant that does not depend onnandk, and may change from line to line. Writing (see notations (3.7)–(3.9)):

R_1k=E E&

Δk

ˆ

g_k,1(x, y) ˆX₁^⊗k(dy) X¯₁^⊗k(dx)−X˜^⊗k(dx)

'²###X₁, . . . , X_n

,

we obtain with Lemma 4.2, using the independence of X and X₁, . . . , X_n and the fact that ϕ is a bounded function:

R_1k≤C^k

k−1

i=0

i!

k i

₂ E

V^k−i

X^k

Δk

ˆ

g_k,1(x, y) ˆX₁^⊗k(dy) ₂

λ^⊗k(dx)

, (4.7)