MINIMAX REGRESSION ESTIMATION FOR POISSON COPROCESS

(1)

DOI:10.1051/ps/2017004 www.esaim-ps.org

MINIMAX REGRESSION ESTIMATION FOR POISSON COPROCESS

^∗

Benoˆıt Cadre

¹

, Nicolas Klutchnikoff

²

and Gaspar Massiot

²

Abstract. For a Poisson point process X, Itˆo’s famous chaos expansion implies that every square integrable regression function r with covariateX can be decomposed as a sum of multiple stochastic integrals called chaos. In this paper, we consider the case wherercan be decomposed as a sum ofδchaos.

In the spirit of Cadre and Truquet [ESAIM: PS 19(2015) 251–267], we introduce a semiparametric estimate of rbased on i.i.d. copies of the data. We investigate the asymptotic minimax properties of our estimator when δis known. We also propose an adaptive procedure whenδis unknown.

Mathematics Subject Classification. 62G08, 62H12, 62M30.

Received November 29, 2016. Revised March 2, 2017. Accepted March 5, 2017.

1. Introduction 1.1. Regression estimation

Regression estimation is a central problem in statistics. It is widely used and studied in the litterature.

Among all the methods explored to deal with the regression problem, nonparametric statistics have been widely investigated (see the monographies by Tsybakov [14] for a full introduction to nonparametric estimation and Gy¨orﬁ et al. [5] for a clear account on nonparametric regression). A more recent challenge regarding this statistical problem is the regression onto a functional covariate (see the books by Ramsay and Silverman [13]

and Horváth and Kokozska [6] for more precision on functional data analysis). Although very challenging, the functional regression problem in the minimax setting has little coverage up to our knowledge. In the kernel estimation setting, Mas [11] studied the small ball probabilities over some Hilbert spaces to derive minimax lower bounds at fixed points. More recently, Chagny and Roche [4] derived minimax lower bounds at fixed points for adaptive nonparametric estimation of the regression under some Wiener measure domination assumptions on the small ball probabilities. Based on the k-nearest neighbor approach, Biau, C´erou and Guyader [1] used compact embedding theory to get bounds on the minimax risk. See also the references therein for a more complete overview.

Keywords and phrases.Functional statistic, poisson point process, regression estimate, minimax estimation.

∗The authors wish to thank the referee for valuable comments and insightful suggestions, which led to an improvement of the paper.

1 IRMAR, ENS Rennes, CNRS, Campus de Ker Lann, Avenue Robert Schuman, 35170 Bruz, France.[email protected]

2 IRMAR, Universit´e de Rennes 2, CNRS, Campus Villejean, Place du recteur Henri le Moal, CS 24307, 35043 Rennes cedex, France.[email protected]; [email protected]

Article published by EDP Sciences c EDP Sciences, SMAI 2017

(2)

1.2. Minimax regression for Poisson coprocess

In this paper, we focus on a regression problem for which the covariate is a Poisson point process. In the spirit of Cadre and Truquet [3], we use a method based on the chaotic decomposition of Poisson functionals.

LetX be a Poisson point process on a compact domainX⊂R^d equipped with its Borelσ-algebraX. Letting δ_xthe Dirac measure on x∈X, the state space is identiﬁed toS ={s=_m

i=1δ_x_i :m∈N^∗, x_i∈X} equipped with the smallestσ-algebra making the mappingss→s(B) measurable for all Borel setB inX. We denote by PX the distribution ofX whereasL²(PX) denotes the space of all measurable functionsg:S →Rsuch that

g²_L2(PX)=Eg(X)²<+∞.

LetPbe a distribution onS ×Rand (X, Y) with lawP. ProvidedE|Y|<+∞whereEis the expectation with respect to P, we consider the regression functionr:S →Rdeﬁned byr(s) =E(Y |X=s).

We assume thatr belongs toL²(PX) and we aim at estimating ron the basis of an i.i.d. sample randomly drawn from the distribution P of (X, Y). In this context, any measurable map ˜r: (S ×R)ⁿ → L²(PX) is an estimator, the accuracy of which is measured by the risk

R_n(˜r, r) =Eⁿr˜−r²_L2(PX),

where Eⁿ denotes the expectation with respect to the distributionP^⊗n. Following the minimax approach, we deﬁne the maximal risk of ˜rover a classP of distributions for the random pair (X, Y) by

R_n(˜r,P) = sup

P∈PR_n(˜r, r).

We are interested in ﬁnding an estimator ˆrsuch that R_n(ˆr,P)inf

˜

r R_n(˜r,P),

where the inﬁmum is taken over all possible estimates of r and u_n v_n stands for 0 < lim inf_nu_nv_n⁻¹ ≤ lim sup_nu_nv_n⁻¹<+∞. Such an estimate is called asymptotically minimax overP.

1.3. Chaotic decomposition in the Poisson space

Roughly, Itô’s famous chaos expansion (see Itô [8] and Nualart and Vives [12] for technical details) says that every square integrable and σ(X)-measurable random variable can be decomposed as a sum of multiple stochastic integrals, called chaos. To be more precise, we now recall some basic facts about chaos decomposition in the Poisson space. Letμ be the mean measure of the Poisson point Process X, defined byμ(A) =EX(A) forA∈ X, wheneverX(A) is the number of points ofX lying inA. Fixk≥1. Providedg ∈L²(μ^⊗k), we can define thekth chaosI_k(g) associated withg, namely

I_k(g) =

Δk

gd

X−μ)^⊗k, (1.1)

whereΔ_k ={x∈X^k :x_i=x_j for alli=j}.In Nualart and Vives [12], it is proved that every square integrable σ(X)-measurable random variable can be decomposed as an inﬁnite sum of chaos. Applied to our regression problem, this statement writes as

r(X) =EY +

k≥1

1

k!I_k(f_k), (1.2)

where equality holds inL²(PX), providedEY²<∞. In the above formula, eachf_k is an element ofL²_sym(μ^⊗k) –the subset of symmetric functions inL²(μ^⊗k)–, and the decomposition is deﬁned in a unique way.

(3)

1.4. Organization of the paper

In this paper we introduce a new estimator of the regression functionrbased on independent copies of (X, Y) and we study its minimax properties. Section2is devoted to the deﬁnition of a semiparametric modeli.e. the construction of the family P of distributions of (X, Y). In particular, we assume that ris a sum ofδ chaos. In Section3, we provide a lower bound for the minimax risk overP. Whenδis known, we prove that our estimator achieves this bound up to a logarithmic term. Finally, in Section4, we deﬁne an adaptive procedure when δis unknown, the risk of which is also proved to be optimal up to a logarithmic term. Last sections contain proofs.

2. Model

In the rest of the paper, we let Θ ⊂ R^p. For eachθ ∈ Θ, ϕ_θ : X → R+ is a Borel function. The family {ϕθ}_θ∈Θ contains the constant function1_X/λ(X), and is such that there exists three positive constantsϕ, ϕ andγ₁ satisfying, for allx, y∈Xandθ, θ∈Θ,

ϕ≤ϕ_θ(x)≤ϕ, (2.1)

|ϕ_θ(x)−ϕ_θ(x)| ≤ϕ|θ−θ|, (2.2)

|ϕθ(x)−ϕ_θ(y)| ≤γ₁|x−y|, (2.3)

where, here and in the following,| · |stands for the euclidean norm.

Let (X, Y) be a pair of random variables taking values inS ×Rwith distributionP, whereS is the Poisson space over the compact domainX⊂R^d. Here,X is a Poisson point process on Xwith intensity ϕ_θ,i.e. for all setA∈ X:

EX(A) =

A

ϕ_θdλ, (2.4)

whereλis the Lebesgue measure andEis the expectation with respect toP. In other words, the mean measure ofX, sayμ, has a Radon−Nikod`ym derivativeϕ_θ with respect toλ. We assume that for alll≥1, there exists an estimator

θ˜_l:

S ×R)^l→Θ, such that,

E^l|θ˜_l−θ|²≤ κ

l+ 1, (2.5)

where κ >0 is an absolute constant that does not depend on l andE^l is the expectation with respect to P^⊗l. As shown in Birg´e ([2], Prop. 3.1), the above property is satisﬁed by a wide class of models, provided ˜θ_l is a maximum likelihood estimate.

Moreover, the real-valued random variableY satisﬁes, for someu, M >0, the exponential moment condition:

EY²e^u|Y^|≤M, (2.6)

As seen in (1.2), the regression function r(s) = E(Y | X = s) has a chaotic decomposition. In our model, we consider the case of a ﬁnite chaotic decomposition, i.e. there exists a strictly positive integer δ and f₁ ∈ L²_sym(μ), . . . , f_δ ∈L²_sym(μ^⊗δ) such that

r(X) =EY + δ k=1

1

k!I_k(f_k), (2.7)

where theI_k(f_k)’s are deﬁned in (1.1). The coeﬃcientsf_k’s of the chaos lie in a nonparametric family, for which there exists two strictly positive constantsγ₂ and ¯f such that for allk= 1, . . . , δ, andx, y∈X^k

|fk(x)−f_k(y)| ≤γ₂|x−y|, (2.8)

|fk(x)| ≤f .¯ (2.9)

(4)

Remark 2.1(On the ﬁniteness of the number of chaos).

(1) Finiteness of the number of chaos roughly implies that the regression functionr(X) is unbounded. Indeed, consider for simplicity the case wherer(X) is decomposed onto only one chaos,i.e.

r(X) =

Xfd(X−λ).

Here X is a simple Poisson process on the domain X ⊂ R with unit intensity and f is any λ-integrable function onX. Observe that, iff ≥a >0, then

r(X)≥aX(X)−

Xfdλ.

Consequently,r(X) is unbounded. The same tendency may be expected whatever the number of chaos.

(2) Finiteness of the chaotic decomposition relies on the distribution of (X, Y) via the Malliavin calculus.

Indeed, as proved in Proposition 4.1 by Last and Penrose [10], the decomposition inδchaos ofr(X) holds if and only if the (δ+ 1)th Malliavin derivative ofris null.

In the rest of the paper, the constantsϕ, ϕ, γ₁, u, M, δ, γ₂,f¯andκwill be ﬁxed, and we shall denote byP the set of distributionsPof (X, Y) such that the assumptions (2.1)−(2.9) are satisﬁed. In this setting,θimplicitly denotes the true value of the parameter, that isϕ_θis the intensity ofX (with mean measureμ).

3. Minimax properties for known δ 3.1. Chaos estimator

Our task is to construct an estimate of the regression function which achieves fast rates overP. Let P∈ P and (X, Y)∼PwhereX has mean measureμ=ϕ_θ·λ.

First recall some basic facts about chaos decomposition in the Poisson space. Ifg∈L²(μ^⊗k) andh∈L²(μ^⊗l) fork, l≥1, we have the so-called Itˆo Isometry Formula:

EIk(g)I_l(f) =k!

X^kghdμ^⊗k1_{k=l} andEIk(g) = 0, (3.1) whereg andhare the symmetrizations ofg andh, that is, for all (x₁, . . . , x_k)∈X^k:

g(x₁, . . . , x_k) = 1 k!

σ

g(x_σ(1), . . . , x_σ(k)), (3.2)

the sum being taken over all permutationsσ=

σ(1), . . . , σ(k)

of{1, . . . , k}, and similarly forh.

Now let W be a strictly positive constant andW be a density on Xsuch that sup_XW ≤W. Furthermore, leth_k =h_k(n)>0 a bandwidth to be tuned later on and denote

W_h_k(·) = 1 h^d_kW

· h_k

·

One may easily deduce from relations (1.2) and (3.1) that EY I_k

W_h^⊗k

k (x− ·)

=

X^kf_kW_h^⊗k

k (x− ·)ϕ^⊗k_θ dλ^⊗k,

(5)

where, here and in the following, for any real-valued function g deﬁned on X, the notation g^⊗k denotes the real-valued function onX^k such that

g^⊗k(x) =

k i=1

g(x_i), x= (x₁, . . . , x_k)∈X^k.

Thus, under the smoothness assumptions (2.1) on ϕ_θ and (2.9) on f_k, the right-hand side converges to f_k(x)ϕ^⊗k_θ (x), providedh_k→0.

Now let (X, Y),(X₁, Y₁), . . . ,(X_n, Y_n) be i.i.d. with distribution P. Based on this observation, a semiparametric estimate denoted ˆI_k,h_k(X) of the kth chaosI_k(f_k) of (1.1) may be deﬁned as follows:

1 n

n i=1

Y_i1_|Y_i_|≤T_n

Δ²_k

W_h^⊗k_k (x−y) ϕ^⊗k_ˆ

θi (x)

X_i−ϕ_θ_ˆ_i·λ_⊗k (dy)

X−ϕ_θ_ˆ_i·λ_⊗k

(dx), (3.3)

whereT_n >0 is a truncation parameter to be tuned later on and the ˆθ_i’s are the leave-one-out estimates deﬁned by ˆθ_i= ˜θ_n−1

(X_j)_j≤n,j=i

(see Sect.2).

3.2. Results

Based on the estimate (3.3) of thekth chaos, we may deﬁne the following empirical mean type estimator of the regression functionrfor any strictly positive integerl

ˆ

r_l(X) =Y_n+ l k=1

1

k!Iˆ_k,h_k(X), (3.4)

whereY_n is the empirical mean ofY₁, . . . , Y_n.

In this subsection, we study the performance of the estimate ˆr_δ of the regression function from a minimax point of view when the number of chaosδis known.

Theorem 3.1. Let ε >0 and setT_n= (lnn)^1+εandh_k = (T_n²n⁻¹)^1/(2+dk). Then, lim sup

n→+∞

n (lnn)^2+2ε

_2/(2+dδ)

supP∈PR_n ˆ

r_δ, r)<∞.

Remark 3.2. Thus, the optimal rate of convergence overP is upper bounded by

(lnn)^2+2εn⁻¹_2/(2+dδ) .Here it is noticeable that, up to a logarithmic factor, we recover the optimal rate n^−2/(2+dδ) corresponding to the dδ-dimensional regression with a Lipschitz regression function (see,e.g., Thm. 1 in Kohleret al.[9]).

In our next result, we provide a lower bound for the optimal rate of convergence overP in order to assess the tightness of the upper bound obtained in Theorem3.1.

Theorem 3.3. We have,

lim inf

n→+∞n^2/(2+dδ)inf

˜ r sup

P∈PR_n(˜r, r)>0, where the infimum is taken over all estimates ˜r.

Remark 3.4. Theorem3.3indicates that the optimal rate of convergence overPis lower bounded byn^−2/(2+dδ) which, up to a logarithmic factor, corresponds to the upper bound found in Theorem3.1. As a conclusion, up to a logarithmic factor, the estimate ˆr_δ is asymptotically minimax onP.

(6)

4. Adaptive properties for unknown δ

We now consider the case of an unknown number of chaosδ. Form >0, we set P(m) ={P∈ P :fk ≥m; k∈1, . . . , δ},

where·stands for the L²-norm relatively to the Lebesgue measure. Thus, wheneverP∈ P(m), δ= min(k:f_k= 0)−1.

Based on this observation, a natural estimate ofδmay be obtained as follows. Let we assume that the dataset is of size 2n, and let (X₁, Y₁), . . . ,(X_2n, Y_2n) be i.i.d. with distributionP∈ P(m). Fork∈1, . . . , δ, we introduce the empirical counterpart ofϕ_θf_k deﬁned by

ˆ

g_k(x) = 1 n

2n i=n+1

Y_i

Δk

W_b^⊗k_k (x−y)

X_i−ϕ_θ_ˆ·λ_⊗k

(dy), (4.1)

where ˆθ = ˜θ_n(X_n+1, . . . , X_2n) is deﬁned in Sect. 2), and b_k = b_k(n) is a bandwidth to be tuned later. The estimator ˆδofδis then deﬁned by

δˆ= min(k:ˆg_k ≤ρ_k)−1, (4.2)

whereρ_k =ρ_k(n) is a vanishing sequence of positive numbers that we choose later on. We may now deﬁne the adaptative estimator ˆrofrby

ˆ r= ˆr_δ_ˆ, where ˆr_l is deﬁned in (3.4) for all strictly positive integerl.

Theorem 4.1. Let ε > dδ≥2,α, β >0 such thatα+β <1and2α+β >1/(2 +dδ), and setT_n = (lnn)^1+ε. Then, if we take for all integer k,

h_k= (T_n²n⁻¹)^1/(2+dk), ρ_k = ((2k)!)²n^{(α+β−1)/2} andb_k=n^−β/(2dk), we obtain, for all m >0,

lim sup

n→+∞

n (lnn)^2+2ε

_2/(2+dδ) sup

P∈P(m)R_n(ˆr, r)<+∞.

Remark 4.2. Here it is noticeable that despite the estimation of the number of chaosδand up to a logarithmic factor, we recover the optimal raten^−2/(2+dδ)of Theorems3.1and3.3.

5. Proof of Theorem 3.1

In this section, we assume without loss of generality that the constantsϕ,γ₁,γ₂, ¯f,λ(X) andW are greater than 1 and thatϕis smaller than 1. Moreover,Cdenotes a positive number that only depends on the parameters of the model,i.e. u, ϕ, ϕ, γ₁, γ₂,f , δ, θ, κ,¯ λ(X), M andW, and whose value may change from line to line.

We letP∈ P and, for simplicity, we may denoteE=Eⁿ and var stands for the variance relatively to P^⊗n. Finally, let (X, Y),(X₁, Y₁), . . . ,(X_n, Y_n) be i.i.d. with distributionP.

(7)

5.1. Technical results

Letk≥1 be ﬁxed and denote for allx, y∈Xandi= 1, . . . , n:

g_i(x, y) =W_h_k(x−y) ϕ_θ_ˆ

i(x) andg(x, y) = W_h_k(x−y)

ϕ_θ(x) · (5.1)

We also let

d ˆX_i= dX_i−ϕ_θ_ˆ_idλ, d ˆX_i= dX−ϕ_θ_ˆ_idλ, (5.2) d ˜X_i= dX_i−ϕ_θdλand d ˜X = dX−ϕ_θdλ. (5.3) With this respect, we have (see (3.3)):

Iˆ_k,h_k(X) = 1 n

n i=1

Y_i1_|Y_i_|≤T_n

Δ²_k

g^⊗k_i (x, y) ˆX_i^⊗k(dy) ˆX_i^⊗k(dx).

Furthermore, denote for allx∈X^k:

Z_i,k(x) =Y_i1_|Y_i_|≤T_n

Δk

g^⊗k(x, y) ˜X_i^⊗k(dy). (5.4)

Lemma 5.1. Let i= 1, . . . , nandk≤δ be fixed. Then for allx∈X^k: var

Z_i,k(x)

≤T_n²k!C^k

h^dk_k , and |EZi,k(x)−f_k(x)| ≤C^k

φ k!

h^dk_k _1/2

+C^kh_k, whereφ=EY²1_|Y_|>T_n.

Proof. On the one hand, by the isometry formula (3.1) over the setP, var

Z_i,k(x)

≤T_n²E

Δk

g^⊗k(x, y) ˜X^⊗k(dy) ₂

≤T_n²k!

X^kg^⊗k²(x, y)ϕ^⊗k_θ (y)dy

≤T_n²W^kϕ^k ϕ^2k

k!

h^dk_k ,

whereg^⊗k(x,·) is the symmetrization –see (3.2)– of the functiong^⊗k(x,·) deﬁned in (5.1). On the other hand, denote

Z˜_i,k(x) =Y_i

Δk

g^⊗k(x, y) ˜X_i^⊗k(dy), then by the isometry formula (3.1):

EZ˜_i,k(x) =Er(X)

Δk

g^⊗k(x, y) ˜X^⊗k(dy) =

X^kf_k(y)g^⊗k(x, y)ϕ^⊗k_θ (y)dy

= 1

ϕ^⊗k_θ (x)

X^k

f_k(x−h_kz)W^⊗k(z)ϕ^⊗k_θ (x−h_kz)dz.

(8)

Furthermore, by assumptions (2.1), (2.3), (2.8) and (2.9) on the model, we have for allx, y∈X^k:

≤

ϕ^kγ₂+kf ϕ¯ ^k−1γ₁

|x−y|

≤2kf ϕ¯ ^kγ₂γ₁|x−y|.

Hence, lettingω_k =

X^k|z|W^⊗k(z)dz,

|EZi,k(x)−f_k(x)| ≤ |E

Z_i,k(x)−Z˜_i,k(x)

|+|EZ˜_i,k(x)−f_k(x)|

≤EY1_|Y_|>T_n

Δk

g^⊗k(x, y) ˜X^⊗k(dy)+|EZ˜_i,k(x)−f_k(x)|

≤φ^1/2

E

Δk

g^⊗k(x, y) ˜X^⊗k(dy) ₂_1/2

+ 2kf ω¯ _kϕ^kγ₂γ₁ ϕ^k h_k·

One last application of the isometry formula (3.1) to the ﬁrst term on the right-hand side of above gives the

Proposition.

With the help of notations (5.1)–(5.3), deﬁne

Rⁱ_1k =E

Δ²_k

g_i^⊗k(x, y) ˆX_i^⊗k(dy)Xˆ_i^⊗k(dx)−X˜^⊗k(dx)₂

, (5.5)

Rⁱ_2k =E

Δ²_k

g_i^⊗k(x, y)Xˆ_i^⊗k(dy)−X˜_i^⊗k(dy)X˜^⊗k(dx) ₂

. (5.6)

Lemma 5.2. Let i= 1, . . . , nandk≤δ be fixed. Then, for j= 1or 2:

Rⁱ_jk≤C^k(k!)² nh^dk_k ·

Proof. The proofs for the bounds forR_1kⁱ andRⁱ_2k being similar, we only prove the one forR_1kⁱ . We have

Rⁱ_1k =EE

⎡

⎣

Δ²_k

g^⊗k_i (x, y) ˆX_i^⊗k(dy)

Xˆ_i^⊗k(dx)−X˜^⊗k(dx) ²

|(Xl)_l≤n

⎤

⎦,

Using the independence ofX and (X_l)_l≤n, we can apply Lemma 4.2 from Cadre and Truquet [3], which entails Rⁱ_1k ≤^k−1

j=0

j!

k j

₂ ϕ^j

X^kE

V_i^k−j

Δk

g_i^⊗k(x, y) ˆX_i^⊗k(dy) ₂

dx, (5.7)

whereV_i=ϕ_θ_ˆ_i−ϕ_θ².Now letx∈X^k andj= 0, . . . , k−1 be ﬁxed. We have EV_i^k−j

Δk

≤2EV_i^k−j

Δk

g^⊗k_i (x, y)

Xˆ_i^⊗k(dy)−X˜_i^⊗k(dy) ²

+ 2EV_i^k−j

Δk

g_i^⊗k(x, y) ˜X_i^⊗k(dy) ₂

.

(9)

We proceed to bound the two terms on the right-hand side of above. As before, we apply Lemma 4.2 from Cadre and Truquet [3], but conditionally on (X_l)_l≤n,l=i. For notational simplicity, and since it does change the result anymore, we do not specify the symmetrized version of the functions when using the isometry formula. By (3.1) and assumption (2.1), we then get

EV_i^k−j

Δk

≤2

k−1

l=0

l!

k l

₂

ϕ^lEV_i^2k−l−j

X^kg^⊗k_i (x, y)²dy + 2k!ϕ^kEV_i^k−j

X^kg^⊗k_i (x, y)²dy.

Note that for allm≥1, by (2.2), we have V_i^m≤

ϕ²λ(X)_m−1

supX |ϕθˆi−ϕ_θ|₂ λ(X)

≤

ϕ²λ(X)_m

|θˆ_i−θ|², and

X^kg_i^⊗k(x, y)²dy≤ W^k ϕ^2kh^dk_k ·

Hence, sincel!

k l

≤k!, we get with (2.5):

EV_i^k−j

Δk

g^⊗k_i (x, y) ˆX_i^⊗k(dy) ₂

≤2k!W^k φ^2k

κ nh^dk_k

φ²λ(X)_k−j

ϕ+ϕ²λ(X)_k .

Finally, we deduce with similar arguments and inequality (5.7) that Rⁱ_1k ≤2(k!)²W^k

ϕ^2k κ nh^dk_k

ϕ+ϕ²λ(X)_2k

≤2(k!)²4^kW^kϕ^4k ϕ^2k

κ

nh^dk_k λ(X)^2k,

because bothϕandλ(X) are greater than 1. The Lemma is proved.

Lemma 5.3. Let ε >0 be fixed and setT_n= (lnn)^1+ε. Then, for all k≤δ:

EIˆ_k,h_k(X)−I_k(f_k)₂

≤C^k(k!)²

(lnn)^2+2ε nh^dk_k +h²_k

·

Proof. With the help of notations (5.1)−(5.3), we let J₁= 1

n n

i=1

Y_i1_|Y_i_|≤T_n

Δ²_k

g^⊗k_i (x, y) ˜X_i^⊗k(dy) ˜X^⊗k(dx),

J₂= 1 n

n i=1

Y_i1_|Y_i_|≤T_n

Δ²_k

g^⊗k(x, y) ˜X_i^⊗k(dy) ˜X^⊗k(dx).

(10)

Then, using notations of Lemma5.2, by Jensen’s Inequality EIˆ_k,h_k(X)−J₁₂

≤2T_n² n

n i=1

(Rⁱ_1k+Rⁱ_2k), Hence, by Lemma5.2

EIˆ_k,h_k(X)−J₁₂

≤T_n²C^k(k!)²

nh^dk_k · (5.8)

Moreover, sequentially conditioning on (X_l)_l≤n, then on (X_l)_l≤n,l=i, and using assumption (2.1), we ﬁnd with two successive applications of the isometry formula (3.1) that

E

J₁−J₂₂

≤(k!)²T_n²ϕ^2kE

X^2k

g₁^⊗k(x, y)−g^⊗k(x, y) ₂

dxdy.

Now letx, y∈X^k be ﬁxed. We have

g^⊗k₁ (x, y)−g^⊗k(x, y)≤W_h^⊗k

k(x−y) ϕ^⊗k_θ (x) kϕ^k−1

ϕ^k sup

X |ϕ_θ_ˆ

1−ϕ_θ|, so that (2.2) and (2.5) give

E

J₁−J₂₂

≤C^kT_n²(k!)²

nh^dk_k · (5.9)

Finally, using notation (5.4), by the isometry formula (3.1), we have E

J₂−I_k(f_k)₂

=E

Δk

1 n

n i=1

Z_i,k(x)−f_k(x)

X˜^⊗k(dx) ₂

=k!

X^kE1 n

n i=1

Z_i,k(x)−f_k(x)₂

ϕ^⊗k_θ (x)dx

=k!

X^k

1 nvar

Z_1,k(x) +

EZ_1,k(x)−f_k(x)₂

ϕ^⊗k_θ (x)dx.

By Lemma5.1, we thus get E

J₂−I_k(f_k)₂

≤T_n²(k!)²C^k

nh^dk_k +C^k(k!)² φ²

h^dk_k +C^kk!h²_k. Moreover, given that (2.6) gives φ≤e^−uTⁿEY²e^u|Y^|.Consequently,

E

J₂−I_k(f_k)₂

≤C^k(k!)² T_n²

nh^dk_k +e^−2uTⁿ h^dk_k +h²_k

·

Finally, combining inequalities (5.8), (5.9) and above, we deduce that with the choiceT_n= (lnn)^1+ε: EIˆ_k,h_k(X)−I_k(f_k)₂

≤C^k(k!)²

,

hence the Lemma.

(11)

5.2. Proof of Theorem 3.1

According to Jensen Inequality and Lemma5.3, we have by (3.4) and (2.7):

E ˆ

r_δ(X)−r(X)₂

=E

Y¯_n−EY + δ k=1

1 k!

Iˆ_k,h_k(X)−I_k(f_k)²

≤ 2var(Y)

n +δ

δ k=1

C^k

·

Setting h_k =

(lnn)^2+εn⁻¹_1/(2+dk)

, we deduce that, since var(Y)≤M: E

ˆ

r_δ(X)−r(X)₂

≤ 2M n +C

(lnn)^2+2ε n

_2/(2+dδ) ,

hence the theorem.

6. Proof of Theorem 3.3

In this section we assume for simplicity thatXcontains the hypercubeX0= [0,1]^d.

6.1. Technical results

We introduce the setF=Fδ(γ₂,f¯) of functionsf :X^δ→RinL²_sym(λ^⊗δ) for which conditions (2.8) and (2.9) hold, and letR=Rδ(γ₂,f¯) be the class of functionsr_f :S →Rwithf ∈ F such that

r_f(·) = 1 δ!

Δδ

fd(· −λ)^⊗δ. (6.1)

Letting P the distribution of the Poisson point process onX with unit intensity, we may deﬁne the following distanceD onRby

D(r_f₀, r_f₁) =rf0−r_f₁_L²_(P). (6.2) Whenever P∈ P, we can associate the regression functionr. To stress the dependency onr, we now writePr

instead ofP. Now letN >0. We deﬁne the following three conditions for any sequence of sizeN+ 1 of functions r⁽⁰⁾, . . . , r^(N⁾from S toR:

(R1) r^(j)∈ R, forj = 0, . . . , N;

(R2) D(r⁽ⁱ⁾, r^(j))≥2n^−1/(2+dδ), for 0≤i < j≤N;

(R3) 1 N

N j=1

K

P^⊗n_r(j),P^⊗n_r(0)

≤αlogN for some 0< α <1/8 whereKis the Kullback-Leibler divergence (seee.g.

Tsybakov [14]).

Lemma 6.1. Introduce f₀≡0, f₁, . . . , f_N, N+ 1 functions fromX^δ toRsuch that (F1) f_j ∈ F;

(F2) fi−f_j ≥2n^−1/(2+dδ); (F3) 1

N N

i=1

n

2fi²≤αlogN for some0< α <1/8.

Then, the sequence of functions r_f₀, . . . , r_f_N defined by (6.1) verify conditionsR1,R2 andR3.

(12)

Proof. First of all, remark that since f_j ∈ F for j = 0, . . . , N, by definition of the r_f_j’s and the set R, we have r_f_j ∈ R. Hence condition R1 is satisfied by the r_f_j’s. Now remark that the Itô isometry (3.1) gives for any 0≤i, j≤N

D(r_f_j, r_f_i) =fj−f_i.

This ensures that ther_f_j’s statisfy condition R2. Finally, for allj= 0, . . . , N K

P^⊗n_r_fj,P^⊗n_r_f₀

=nK(Pr_fj,Prf0)

=nErf0

logdPrf0

dPr_fj

(X, Y)

=nErf0Erf0

logdPrf0

dPr_fj

(X, Y)|X

,

whereErf0 is the expectation underPrf0. Denote bypthe density of theN(0,1). Then, sincef₀≡0 Erf0

logdPrf0

dPr_fj

(X, Y)|X

=

Rlog

p(u) p

u−r_f_j(X)

p(u)du.

Simple calculus then give

Erf0

logdPrf0

dPr_fj

(X, Y)|X

≤1 2

r_f_j(X)₂ .

Thus, by the Itˆo Isometry,

K

P^⊗n_r_fj,P^⊗n_r_f₀

≤n 2f_j²,

hence the lemma.

6.2. Proof of Theorem 3.3

Let P₀ be the subset of distributions Pr of (X, Y) in P for whichX is a Poisson point process with unit intensity (recall that the unit function lies in{ϕθ}_θ∈Θ, see Sect.2) and such that

Y =r(X) +ε, (6.3)

wherer∈ Randεis independent fromX with distributionN(0,1). SinceP0⊂ P, we have infr˜ sup

Pr∈P0

R_n(˜r, r)≤inf

˜ r sup

Pr∈PR_n(˜r, r).

As a result, in order to prove Theorem3.3, we need only to prove that lim inf

n→+∞n^2/(2+dδ)inf

˜ r sup

Pr∈P0

R_n(˜r, r)>0, which accordingly to (6.2) may be written

lim inf

n→+∞n^2/(2+dδ)inf

˜ r sup

Pr∈P0

Eⁿ_rD²(˜r, r)>0, (6.4)

(13)

whereEⁿ_r denotes expectation with respect toP^⊗n_r . Then, according to Lemma6.1and Theorem 2.5 page 99 in the book by Tsybakov [14], in order to prove (6.4), we need only to prove the existence of a sequence of functions satisfying conditions F1, F2 and F3 deﬁned in the previous subsection. To this end, we letψ∈L²_sym(λ^⊗δ) be a nonzero function such that Supp(ψ) =X^δ₀ and, for allx, y∈X^δ₀:

ψ(y)−ψ(x)≤ γ₂

2|y−x|and|ψ(x)| ≤f .¯ (6.5) LetQ=

c₀n^dδ/(2+dδ)

≥8 wherec₀ >0 and· is the integer part and leta_n = (1/Q)^1/(dδ). One may easily prove that there existst₁, . . . , t_Q in X^δ₀ such that the functions

ψ_q(·) =a_nψ · −t_q

a_n

, for q= 1, . . . , Q, verify the following assumptions

(1) Supp(ψ_q)⊂X^δ₀,forq= 1, . . . , Q;

(2) Supp(ψ_q)∩Supp(ψ_q) =∅, forq=q; (3) λ^⊗δ

Supp(ψ_q)

=Q⁻¹. Now let, for allω∈ {0,1}^Q:

f_ω(·) = Q q=1

ω_qψ_q(·),

According to the Varshamov−Gilbert Lemma (see the book by Tsybakov [14], Lem. 2.8 p. 104), there exists a subsetΩ={ω⁽⁰⁾, . . . , ω^(N⁾} of{0,1}^Qsuch thatω⁽⁰⁾= (0,0, . . .),N ≥2^Q/8,and for allj=k:

Q q=1

|ω_q^(j)−ω^(k)_q | ≥ Q 8· Now ﬁx 0< α <1/8 and set

c₀= (4ψ²α⁻¹)^dδ/(2+dδ).

We may now prove that functions{f_ω(j):j= 0, . . . , N}satisfy conditions F1, F2 and F3. First of all, letj=k be ﬁxed and remark that,

f_ω^(j)−f_ω(k)²≤

X^δ₀

f_ω(j)(x)−f_ω(k)(x)₂ dx

=

X^δ₀

^Q

q=1

ω^(j)_q −ω_q^(k) ψ_q(x)

₂ dx

= Q q=1

Supp(ψq)

ω_q^(j)−ω_q^(k)₂ a²_nψ²

x−t_q a_n

dx,

= a²_n Q

Q q=1

ω^(j)_q −ω_q^(k)

X^δ₀ψ²(x)dx.

Furthermore, by deﬁnition of the setΩ, we have Q

8 ≤ Q q=1

ω_q^(j)−ω_q^(k)≤Q,

(14)

so that

ψ²

8 a²_n≤ f_ω^(j)−f_ω(k)²≤ψ²a²_n. (6.6) Now let 0≤j≤N. Sinceψ∈L²_sym(λ^⊗δ) it is clear thatf_ω(j) inherits that property. Then, using the ﬁrst part of assumption (6.5) onψand assumption (ii) on theψ_q’s, one may easily prove thatf_ω(j) is a Lipschitz function with constantγ₂. Finally, using the second part of assumption (6.5) onψ and assumption (ii) on theψ_q’s, we get that for allx∈X^k,|f_ω(j)(x)| ≤a_nf¯witha_n ≤1 as soon asn≥c−(2+dδ)/(dδ)

0 . We conclude that f_ω(j) ∈ F so that condition F1 is satisﬁed byf_ω(j). Now, takingk= 0 in (6.6) gives

1 N

N j=1

n

2f_ω(j)²≤ nψ²

2 a²_n≤ψ²

2 c−(2+dδ)/(dδ)

0 Q.

SinceN ≥2^Q/8andc₀≥(4ψ²α⁻¹)^dδ/(2+dδ), we get 1

N N j=1

n

2f_ω^(j)²≤4ψ²c−(2+dδ)/(dδ)

0 logN ≤αlogN, so that F3 is satisﬁed. Finally, according to (6.6), we have

f_ω^(j)−f_ω(k) ≥ ψ 2√

2a_n = ψ 2√

2(2c₀)^−1/(dδ)n^−1/(2+dδ)·

We can now conclude that conditions F1, F2 and F3 are satisﬁed so that according to Theorem 2.5 page 99 from the book by Tsybakov [14] and Lemma6.1, (6.4) is veriﬁed. Theorem follows.

7. Proof of Theorem 4.1

In this section, we assume without loss of generality that the constantsϕ,γ₁,γ₂, ¯f,λ(X) andW are greater than 1 and thatϕandmare smaller than 1. Moreover,Cdenotes a positive number that only depends on the parameters of the model,i.e.m, u, ϕ, ϕ, δ, γ₁, γ₂,f , θ, κ,¯ λ(X), M andW, and whose value may change from line to line.

We letP∈ P(m), and for simplicity, we denoteE=E²ⁿ the expectation with respect toP^⊗2n. Recall that the pairs (X, Y),(X₁, Y₁), . . . ,(X_2n, Y_2n) are i.i.d. with distributionP.

7.1. Technical results

Fork≥1, denote for allx∈X^k: g_k(x) = 1

n n i=1

Y_i

Δk

W_b^⊗k_k (x−y)

X_i−ϕ_θ·λ_⊗k

(dy). (7.1)

We also let

S_k= 1 n

n i=1

|Yi|

X_i(X) +ϕλ(X)_k

. (7.2)

Lemma 7.1. We have, for all k:

ˆg_k−g_k ≤C^kS_k|θˆ−θ| b^dk/2_k ·