• Aucun résultat trouvé

Minimax regression estimation for Poisson coprocess

N/A
N/A
Protected

Academic year: 2021

Partager "Minimax regression estimation for Poisson coprocess"

Copied!
29
0
0

Texte intégral

(1)

HAL Id: hal-01430576

https://hal.archives-ouvertes.fr/hal-01430576

Submitted on 10 Jan 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Minimax regression estimation for Poisson coprocess

Benoît Cadre, Nicolas Klutchnikoff, Gaspar Massiot

To cite this version:

Benoît Cadre, Nicolas Klutchnikoff, Gaspar Massiot. Minimax regression estimation for Poisson copro- cess. ESAIM: Probability and Statistics, EDP Sciences, 2017, 21, pp.138-158. �10.1051/ps/2017004�.

�hal-01430576�

(2)

Minimax regression estimation for Poisson coprocess

Benoˆıt CADRE

IRMAR, ENS Rennes, CNRS, UBL Campus de Ker Lann

Avenue Robert Schuman, 35170 Bruz, France benoit.cadre@ens-rennes.fr

Nicolas KLUTCHNIKOFFand Gaspar MASSIOT

IRMAR, Universit´e de Rennes 2, CNRS, UBL Campus Villejean

Place du recteur Henri le Moal CS 24307, 35043 Rennes cedex, France

klutchnikoff,massiot@univ-rennes2.fr

Abstract

For a Poisson point processX, Itˆo’s famous chaos expansion implies that ev- ery square integrable regression functionrwith covariateX can be decom- posed as a sum of multiple stochastic integrals called chaos. In this paper, we consider the case wherer can be decomposed as a sum ofδ chaos. In the spirit of Cadre and Truquet (2015), we introduce a semiparametric esti- mate of rbased on i.i.d. copies of the data. We investigate the asymptotic minimax properties of our estimator whenδ is known. We also propose an adaptive procedure whenδ is unknown.

Index Terms — Functional statistic, Poisson point process, regression estimate, Minimax estimation

AMS 2010 Classification — 62G08, 62H12, 62M30

(3)

1 Introduction

1.1 Regression estimation

Regression estimation is a central problem in statistics. It is widely used and studied in the litterature. Among all the methods explored to deal with the re- gression problem, nonparametric statistics have been widely investigated (see the monographies by Tsybakov, 2009 for a full introduction to nonparametric estima- tion and Gy¨orfi et al, 2002 for a clear account on nonparametric regression). A more recent challenge regarding this statistical problem is the regression onto a functional covariate (see the books by Ramsay and Silverman, 2005 and Horv´ath and Kokozska, 2012 for more precision on functional data analysis). Although very challenging, the functional regression problem in the minimax setting has little coverage up to our knowledge. In the kernel estimation setting, Mas (2012) studied the small ball probabilities over some Hilbert spaces to derive minimax lower bounds at fixed points. More recently, Chagny and Roche (2016) derived minimax lower bounds at fixed points for adaptive nonparametric estimation of the regression under some Wiener measure domination assumptions on the small ball probabilities. Based on the k-nearest neighbor approach, Biau, C´erou and Guyader (2010) used compact embedding theory to get bounds on the minimax risk. See also the references therein for a more complete overview.

1.2 Minimax regression for Poisson coprocess

In this paper, we focus on a regression problem for which the covariate is a Poisson point process. In the spirit of Cadre and Truquet (2015), we use a method based on the chaotic decomposition of Poisson functionals.

LetX be a Poisson point process on a compact domainX⊂Rdequipped with its Borel σ-algebraX. Lettingδx the Dirac measure on x∈X, the state space is identified to S ={s=∑mi=1δxi :m∈N,xi ∈X} equipped with the smallest σ-algebra making the mappings s7→s(B) measurable for all Borel set Bin X. We denote by PX the distribution of X whereasL2(PX)denotes the space of all measurable functionsg:S →Rsuch that

kgk2

L2(PX)=Eg(X)2<+∞.

Let P be a distribution onS ×R and (X,Y)with law P. Provided E|Y|<+∞

whereEis the expectation with respect toP, we consider the regression function r:S →Rdefined byr(s) =E(Y |X =s).

(4)

In this paper we assume thatrbelongs toL2(PX)and we aim at estimating r on the basis of an i.i.d. sample randomly drawn from the distributionPof(X,Y).

In this context, any measurable map ˜r:(S×R)n→L2(PX)is an estimator, the accuracy of which is measured by the risk

Rn(˜r,r) =Enk˜r−rk2

L2(PX),

where En denotes the expectation with respect to the distribution P⊗n. Follow- ing the minimax approach, we define the maximal risk of ˜r over a class P of distributions for the random pair(X,Y)by

Rn(˜r,P) = sup

P∈P

Rn(˜r,r).

We are interested in finding an estimator ˆrsuch that Rn(r,ˆ P)inf

˜

r Rn(˜r,P),

where the infimum is taken over all possible estimates ofrandunvnstands for 0<lim infnunv−1n ≤lim supnunv−1n <+∞. Such an estimate is called asymptoti- cally minimax overP.

1.3 Chaotic decomposition in the Poisson space

Roughly, Itˆo’s famous chaos expansion (see Itˆo, 1956 and Nualart and Vives, 1990 for technical details) says that every square integrable and σ(X)-measurable ran- dom variable can be decomposed as a sum of multiple stochastic integrals, called chaos.

To be more precise, we now recall some basic facts about chaos decomposition in the Poisson space. Let µ be the mean measure of the Poisson point ProcessX, defined by µ(A) =EX(A)forA∈X, wheneverX(A)is the number of points of X lying inA. Fixk≥1. Providedg∈L2⊗k), we can define thek-th chaosIk(g) associated withg, namely

Ik(g) = Z

k

gd X−µ)⊗k, (1.1)

where ∆k ={x∈ Xk :xi 6=xj for alli 6= j}. In Nualart and Vives (1990), it is proved that every square integrableσ(X)-measurable random variable can be de- composed as an infinite sum of chaos. Applied to our regression problem, this

(5)

statement writes as

r(X) =EY+

k≥1

1

k!Ik(fk), (1.2)

where equality holds in L2(PX), providedEY2<∞. In the above formula, each fk is an element ofL2sym⊗k)–the subset of symmetric functions in L2⊗k)–, and the decomposition is defined in a unique way.

1.4 Organization of the paper

In this paper we introduce a new estimator of the regression functionr based on independent copies of (X,Y)and we study its minimax properties. Section 2 is devoted to the definition of a semiparametric model i.e. the construction of the family P of distributions of (X,Y). In particular, we assume that r is a sum of δ chaos. In Section 3, we provide a lower bound for the minimax risk over P. Whenδ is known, we prove that our estimator achieves this bound up to a logarithmic term. Finally, in Section 4, we define an adaptive procedure whenδ is unknown, the risk of which is also proved to be optimal up to a logarithmic term.

Last sections contain proofs.

2 Model

In the rest of the paper, we let Θ ⊂ Rp. For each θ ∈Θ, ϕθ : X→ R+ is a Borel function. The family {ϕθ}θ∈Θ contains the constant function 1X/λ(X), and is such that there exists three positive constantsϕ,ϕandγ1satisfying, for all x,y∈Xandθ,θ0∈Θ,

ϕ≤ϕθ(x)≤ϕ, (2.1)

θ(x)−ϕθ0(x)| ≤ϕ|θ−θ0|, (2.2)

θ(x)−ϕθ(y)| ≤γ1|x−y|, (2.3) where, here and in the following,| · |stands for the euclidean norm.

Let(X,Y)be a pair of random variables taking values inS×Rwith distribu- tionP, whereS is the Poisson space over the compact domainX⊂Rd. Here,X is a Poisson point process onXwith intensityϕθ, i.e. for all Borel setA∈X :

EX(A) = Z

A

ϕθdλ, (2.4)

(6)

whereλ is the Lebesgue measure andEis the expectation with respect toP. In other words, the mean measure ofX, sayµ, has a Radon-Nikod`ym derivativeϕθ with respect toλ. We assume that for alll≥1, there exists an estimator

θ˜l: S ×R)l→Θ, such that,

El|θ˜l−θ|2≤ κ

l+1, (2.5)

where κ >0 is an absolute constant that does not depend on l and El is the ex- pectation with respect to P⊗l. As shown in Birg´e (1983, Proposition 3.1), the above property is satisfied by a wide class of models, provided ˜θl is a maximum likelihood estimate.

Moreover, the real-valued random variableY satisfies, for someu,M>0, the exponential moment condition:

EY2eu|Y|≤M, (2.6)

As seen in (1.2), the regression functionr(s) =E(Y |X =s)has a chaotic decom- position. In our model, we consider the case of a finite chaotic decomposition, i.e.

there exists a strictly positive integer δ and f1∈L2sym(µ),· · ·,fδ ∈L2sym⊗δ) such that

r(X) =EY+

δ

k=1

1

k!Ik(fk), (2.7)

where the Ik(fk)’s are defined in (1.1). The coefficients fk’s of the chaos lie in a nonparametric family, for which there exists two strictly positive constantsγ2and

f¯such that for allk=1, . . . ,δ, andx,y∈Xk

|fk(x)− fk(y)| ≤γ2|x−y|, (2.8)

|fk(x)| ≤ f¯. (2.9)

In the rest of the paper, the constantsϕ,ϕ,γ1,u,M,δ,γ2,f¯andκwill be fixed, and we shall denote by P the set of distributions P of (X,Y) such that the as- sumptions (2.1)-(2.5) are satisfied. In this setting, θ implicitly denotes the true value of the parameter, that isϕθ is the intensity ofX (with mean measureµ).

(7)

3 Minimax properties for known δ

3.1 Chaos estimator

Our task is to construct an estimate of the regression function which achieves fast rates overP. LetP∈P and(X,Y)∼PwhereX has mean measure µ =ϕθ·λ. First recall some basic facts about chaos decomposition in the Poisson space.

If g∈L2⊗k) andh∈L2⊗l)fork,l≥1, we have the so-called Itˆo Isometry Formula:

EIk(g)Il(f) =k!

Z

Xk

ghdµ⊗k1{k=l}andEIk(g) =0, (3.1) wheregandhare the symmetrizations ofgandh, that is, for all(x1, . . . ,xk)∈Xk:

g(x1, . . . ,xk) = 1 k!

σ

g(xσ(1), . . . ,xσ(k)), (3.2)

the sum being taken over all permutationsσ= σ(1), . . . ,σ(k)

of{1, . . . ,k}, and similarly forh.

Now letW be a strictly positive constant andW be a density onX such that supXW ≤W. Furthermore, let hk =hk(n)>0 a bandwidth to be tuned later on and denote

Whk(·) = 1 hdkW

· hk

.

One may easily deduce from relations (1.2) and (3.1) that EY Ik Wh⊗k

k (x− ·)

= Z

Xk

fkWh⊗k

k (x− ·)ϕθ⊗k⊗k,

where, here and in the following, for any real-valued functiongdefined onX, the notationg⊗kdenotes the real-valued function onXk such that

g⊗k(x) =

k

i=1

g(xi), x= (x1, . . . ,xk)∈Xk.

Thus, under the smoothness assumptions (2.1) on ϕθ and (2.9) on fk, the right- hand side converges to fk(x)ϕ⊗k

θ (x), providedhk→0.

(8)

Now let(X,Y),(X1,Y1), . . . ,(Xn,Yn)be i.i.d. with distributionP. Based on this observation, a semiparametric estimate denoted ˆIk,hk(X) of thek-th chaos Ik(fk) of (1.1) may be defined as follows:

1 n

n

i=1

Yi1|Yi|≤Tn Z

k2

Wh⊗k

k (x−y)

ϕ⊗kˆ

θi (x) Xi−ϕˆ

θi·λ⊗k

(dy) X−ϕˆ

θi·λ⊗k

(dx), (3.3)

where Tn>0 is a truncation parameter to be tuned later on and the ˆθi’s are the leave-one-out estimates defined by ˆθi=θ˜n−1 (Xj)j≤n,j6=i

(see Section 2).

3.2 Results

Based on the estimate (3.3) of the k-th chaos, we may define the following em- pirical mean type estimator of the regression function r for any strictly positive integerl

ˆ

rl(X) =Yn+

l

k=1

1

k!Iˆk,hk(X), (3.4) whereYnis the empirical mean ofY1, . . . ,Yn.

In this subsection, we study the performance of the estimate ˆrδ of the regres- sion function from a minimax point of view when the number of chaosδ is known.

Theorem 3.1. Letε>0and set Tn= (lnn)1+ε and hk= (Tn2n−1)1/(2+dk). Then, lim sup

n→+∞

n (lnn)2+2ε

2/(2+dδ)

sup

PP

Rnδ,r)<∞.

Remark – Thus, the optimal rate of convergence over P is upper bounded by (lnn)2+2εn−12/(2+dδ)

.Here it is noticeable that, up to a logarithmic factor, we recover the optimal rate n−2/(2+dδ) corresponding to thedδ-dimensional regres- sion with a Lipschitz regression function (see, e.g., Theorem 1 in Kohler et al.

2009).

In our next result, we provide a lower bound for the optimal rate of conver- gence over P in order to assess the tightness of the upper bound obtained in Theorem 3.1.

(9)

Theorem 3.2. We have, lim inf

n→+∞n2/(2+dδ)inf

˜ r sup

PP

Rn(r,˜ r)>0, where the infimum is taken over all estimatesr.˜

Remark– Theorem 3.2 indicates that the optimal rate of convergence overP is lower bounded byn−2/(2+dδ)which, up to a logarithmic factor, corresponds to the upper bound found in Theorem 3.1. As a conclusion, up to a logarithmic factor, the estimate ˆrδ is asymptotically minimax onP.

4 Adaptive properties for unknown δ

We now consider the case of an unknown number of chaosδ. Form>0, we set P(m) ={P∈P :kfkk ≥m; k∈1, . . . ,δ},

wherek·kstands for theL2-norm relatively to the Lebesgue measure. Thus, when- everP∈P(m),

δ =min(k:kfkk=0)−1.

Based on this observation, a natural estimate ofδ may be obtained as follows. Let we assume that the dataset is of size 2n, and let(X1,Y1), . . . ,(X2n,Y2n)be i.i.d. with distribution P∈P(m). Fork∈1, . . . ,δ, we introduce the empirical counterpart ofϕθfkdefined by

ˆ

gk(x) = 1 n

2n i=n+1

Yi Z

k

Wb⊗k

k (x−y) Xi−ϕθˆ·λ⊗k

(dy), (4.1) where ˆθ =θ˜n(Xn+1, . . . ,X2n)is defined in Section 2), and bk =bk(n) is a band- width to be tuned later. The estimator ˆδ ofδ is then defined by

δˆ =min(k:kgˆkk ≤ρk)−1, (4.2) whereρkk(n)is a vanishing sequence of positive numbers that we choose later on. We may now define the adaptative estimator ˆrofrby

ˆ r=rˆˆ

δ,

where ˆrlis defined in (3.4) for all strictly positive integerl.

(10)

Theorem 4.1. Let ε >dδ ≥ 2, α,β >0 such that α +β < 1 and 2α+β >

1/(2+dδ), and set Tn= (lnn)1+ε. Then, if we take for all integer k, hk= (Tn2n−1)1/(2+dk), ρk= ((2k)!)2n−1)/2and bk=n−β/(2dk), we obtain, for all m>0,

lim sup

n→+∞

n (lnn)2+2ε

2/(2+dδ)

sup

P∈P(m)

Rn(r,ˆ r)<+∞.

Remark – Here it is noticeable that despite the estimation of the number of chaosδ and up to a logarithmic factor, we recover the optimal raten−2/(2+dδ) of Theorems 3.1 and 3.2.

5 Proof of Theorem 3.1

In this section, we assume without loss of generality that the constantsϕ, γ1, γ2, f¯, λ(X) and W are greater than 1 and that ϕ is smaller than 1. Moreover, C denotes a positive number that only depends on the parameters of the model, i.e.

u,ϕ,ϕ,γ12,f¯,δ,θ,κ, λ(X),M andW, and whose value may change from line to line.

We letP∈Pand, for simplicity, we may denoteE=Enand var stands for the variance relatively to P⊗n. Finally, let(X,Y),(X1,Y1), . . . ,(Xn,Yn) be i.i.d. with distributionP.

5.1 Technical results

Letk≥1 be fixed and denote for allx,y∈Xandi=1, . . . ,n:

gi(x,y) =Whk(x−y)

ϕθˆi(x) andg(x,y) =Whk(x−y)

ϕθ(x) . (5.1)

We also let

d ˆXi=dXi−ϕθˆ

idλ, d ˆXi0=dX−ϕθˆ

idλ, (5.2)

d ˜Xi=dXi−ϕθdλ and d ˜X =dX−ϕθdλ. (5.3)

(11)

With this respect, we have (see (3.3)):

k,hk(X) =1 n

n i=1

Yi1|Yi|≤Tn Z

k2

g⊗ki (x,y)Xˆi⊗k(dy)Xˆi0⊗k(dx).

Furthermore, denote for allx∈Xk: Zi,k(x) =Yi1|Yi|≤Tn

Z

k

g⊗k(x,y)X˜i⊗k(dy). (5.4) Lemma 5.1. Let i=1, . . . ,n and k≤δ be fixed. Then for all x∈Xk:

var Zi,k(x)

≤Tn2k!Ck hdkk , and

|EZi,k(x)−fk(x)| ≤Ck

φ k!

hdkk 1/2

+Ckhk, whereφ =EY21|Y|>Tn.

Proof. On the one hand, by the isometry formula (3.1) over the setP, var Zi,k(x)

≤Tn2E Z

k

g⊗k(x,y)X˜⊗k(dy)2

≤Tn2k!

Z

Xk

g⊗k2(x,y)ϕθ⊗k(y)dy

≤Tn2Wkϕk ϕ2k

k!

hdkk ,

whereg⊗k(x,·)is the symmetrization –see (3.2)– of the functiong⊗k(x,·)defined in (5.1). On the other hand, denote

i,k(x) =Yi Z

k

g⊗k(x,y)X˜i⊗k(dy), then by the isometry formula (3.1):

EZ˜i,k(x) =Er(X) Z

k

g⊗k(x,y)X˜⊗k(dy) = Z

Xk

fk(y)g⊗k(x,y)ϕ⊗k

θ (y)dy

= 1

ϕθ⊗k(x) Z

Xk

fk(x−hkz)W⊗k(z)ϕ⊗k

θ (x−hkz)dz.

(12)

Furthermore, by assumptions (2.1), (2.3), (2.8) and (2.9) on the model, we have for allx,y∈Xk:

|fk(x)ϕθ⊗k(x)−fk(y)ϕθ⊗k(y)| ≤ϕk|fk(x)−fk(y)|+f¯|ϕθ⊗k(x)−ϕθ⊗k(y)|

≤ ϕkγ2+kf¯ϕk−1γ1

|x−y|

≤2kf¯ϕkγ2γ1|x−y|.

Hence, lettingωk=R

Xk|z|W⊗k(z)dz,

|EZi,k(x)−fk(x)| ≤ |E Zi,k(x)−Z˜i,k(x)

|+|EZ˜i,k(x)− fk(x)|

EY1|Y|>Tn Z

k

g⊗k(x,y)X˜⊗k(dy)

+|EZ˜i,k(x)−fk(x)|

≤φ1/2

E Z

k

g⊗k(x,y)X˜⊗k(dy)21/2

+2kf¯ωkϕkγ2γ1 ϕk

hk. One last application of the isometry formula (3.1) to the first term on the right-

hand side of above gives the Proposition.

With the help of notations (5.1), (5.2) and (5.3), define Ri1k=E

Z

k2

g⊗ki (x,y)Xˆi⊗k(dy)Xˆi0⊗k(dx)−X˜⊗k(dx)2

, (5.5)

Ri2k=E Z

k2

g⊗ki (x,y)Xˆi⊗k(dy)−X˜i⊗k(dy)X˜⊗k(dx)2

. (5.6)

Lemma 5.2. Let i=1, . . . ,n and k≤δ be fixed. Then, for j=1or 2:

Rijk≤Ck(k!)2 nhdkk .

Proof.The proofs for the bounds forRi1kandRi2kbeing similar, we only prove the one forRi1k. We have

Ri1k=EE Z

k2

g⊗ki (x,y)Xˆi⊗k(dy)Xˆi0⊗k(dx)−X˜⊗k(dx)2

(Xl)l≤n

, Using the independence of X and(Xl)l≤n, we can apply Lemma 4.2 from Cadre and Truquet (2015), which entails

Ri1k

k−1 j=0

j!

k j

2

ϕj Z

Xk

E

Vik−jZ

k

g⊗ki (x,y)Xˆi⊗k(dy)2

dx, (5.7)

(13)

whereVi=kϕˆ

θi−ϕθk2.Now letx∈Xk and j=0, . . . ,k−1 be fixed. We have EVik−j

Z

k

g⊗ki (x,y)Xˆi⊗k(dy)2

≤2EVik−j Z

k

g⊗ki (x,y) Xˆi⊗k(dy)−X˜i⊗k(dy)2

+2EVik−j Z

k

g⊗ki (x,y)X˜i⊗k(dy) 2

.

We proceed to bound the two terms on the right-hand side of above. As before, we apply Lemma 4.2 from Cadre and Truquet (2015), but conditionally on(Xl)l≤n,l6=i. For notational simplicity, and since it does change the result anymore, we do not specify the symmetrized version of the functions when using the isometry formula. By (3.1) and assumption (2.1), we then get

EVik−j Z

k

g⊗ki (x,y)Xˆi⊗k(dy)2

≤2

k−1 l=0

l!

k l

2

ϕlEVi2k−l−j Z

Xk

g⊗ki (x,y)2dy +2k!ϕkEVik−j

Z

Xk

g⊗ki (x,y)2dy.

Note that for allm≥1, by (2.2), we have Vim≤ ϕ2λ(X)m−1

sup

X

θˆ

i−ϕθ|2

λ(X)

≤ ϕ2λ(X)m

|θˆi−θ|2, and

Z

Xk

g⊗ki (x,y)2dy≤ Wk ϕ2khdkk . Hence, sincel!

k l

≤k!, we get with (2.5):

EVik−j Z

k

g⊗ki (x,y)Xˆi⊗k(dy)2

≤2k!Wk φ2k

κ

nhdkk φ2λ(X)k−j

ϕ+ϕ2λ(X)k

.

Finally, we deduce with similar arguments and inequality (5.7) that Ri1k≤2(k!)2Wk

ϕ2k κ

nhdkk ϕ+ϕ2λ(X)2k

≤2(k!)24kWkϕ4k ϕ2k

κ

nhdkk λ(X)2k,

because bothϕ andλ(X)are greater than 1. The Lemma is proved.

(14)

Lemma 5.3. Letε >0be fixed and set Tn= (lnn)1+ε. Then, for all k≤δ: E Iˆk,hk(X)−Ik(fk)2

≤Ck(k!)2 (lnn)2+2ε nhdkk +h2k

.

Proof. With the help of notations (5.1), (5.2), (5.3), we let J1=1

n

n i=1

Yi1|Yi|≤Tn Z

k2

g⊗ki (x,y)X˜i⊗k(dy)X˜⊗k(dx), J2=1

n

n i=1

Yi1|Yi|≤Tn Z

k2

g⊗k(x,y)X˜i⊗k(dy)X˜⊗k(dx).

Then, using notations of Lemma 5.2, by Jensen’s Inequality E Iˆk,hk(X)−J12

≤2Tn2 n

n

i=1

(Ri1k+Ri2k), Hence, by Lemma 5.2

E Iˆk,hk(X)−J12

≤Tn2Ck(k!)2

nhdkk . (5.8)

Moreover, sequentially conditioning on (Xl)l≤n, then on (Xl)l≤n,l6=i, and using assumption (2.1), we find with two successive applications of the isometry for- mula (3.1) that

E J1−J22

≤(k!)2Tn2ϕ2kE Z

X2k

g⊗k1 (x,y)−g⊗k(x,y)2

dxdy.

Now letx,y∈Xk be fixed. We have g⊗k1 (x,y)−g⊗k(x,y)

≤Wh⊗k

k (x−y)

ϕ⊗k

θ (x) kϕk−1 ϕk sup

X

ˆ

θ1−ϕθ|, so that (2.2) and (2.5) give

E J1−J22

≤CkTn2(k!)2

nhdkk . (5.9)

(15)

Finally, using notation (5.4), by the isometry formula (3.1), we have E J2−Ik(fk)2

=E Z

k

1 n

n

i=1

Zi,k(x)−fk(x)X˜⊗k(dx)2

=k!

Z

Xk

E 1 n

n

i=1

Zi,k(x)− fk(x)2

ϕ⊗k

θ (x)dx

=k!

Z

Xk

1

nvar Z1,k(x)

+ EZ1,k(x)−fk(x)2

ϕθ⊗k(x)dx.

By Lemma 5.1, we thus get E J2−Ik(fk)2

≤Tn2(k!)2Ck

nhdkk +Ck(k!)2φ2

hdkk +Ckk!h2k. Moreover, given that (2.6) givesφ ≤e−uTnEY2eu|Y|.Consequently,

E J2−Ik(fk)2

≤Ck(k!)2 Tn2

nhdkk +e−2uTn hdkk +h2k

.

Finally, combining inequalities (5.8), (5.9) and above, we deduce that with the choiceTn= (lnn)1+ε:

E Iˆk,hk(X)−Ik(fk)2

≤Ck(k!)2(lnn)2+2ε nhdkk +h2k

,

hence the Lemma.

5.2 Proof of Theorem 3.1

According to Jensen Inequality and Lemma 5.3, we have by (3.4) and (2.7):

E rˆδ(X)−r(X)2

=E

n−EY+

δ k=1

1

k! Iˆk,hk(X)−Ik(fk) 2

≤2var(Y) n +δ

δ

k=1

Ck

(lnn)2+2ε nhdkk +h2k

.

Settinghk= (lnn)2+εn−11/(2+dk)

, we deduce that, since var(Y)≤M:

E rˆδ(X)−r(X)2

≤ 2M

n +C(lnn)2+2ε n

2/(2+dδ)

,

hence the theorem.

(16)

6 Proof of Theorem 3.2

In this section we assume for simplicity that X contains the hypercube X0 = [0,1]d.

6.1 Technical results

We introduce the set F =Fδ2,f¯) of functions f :Xδ →R inL2sym⊗δ)for which conditions (2.8) and (2.9) hold, and letR =Rδ2,f¯)be the class of func- tionsrf :S →Rwith f ∈F such that

rf(·) = 1 δ!

Z

δ

fd(· −λ)⊗δ. (6.1)

LettingPthe distribution of the Poisson point process onXwith unit intensity, we may define the following distanceDonR by

D(rf0,rf1) =krf0−rf1kL2(P). (6.2) WheneverP∈P, we can associate the regression functionr. To stress the den- dendency onr, we now writePr instead ofP. Now letN>0. We define the fol- lowing three conditions for any sequence of sizeN+1 of functionsr(0), . . . ,r(N) fromS toR:

R1. r(j)∈R, for j=0, . . . ,N;

R2. D(r(i),r(j))≥2n−1/(2+dδ), for 0≤i< j≤N;

R3. 1 N

N

j=1

K P⊗nr(j),P⊗nr(0)

≤ αlogN for some 0< α < 1/8 where K is the Kullback-Leibler divergence (see e.g. Tsybakov, 2009).

Lemma 6.1. Introduce f0≡0,f1, . . . ,fN,N+1functions fromXδ toRsuch that F1. fj∈F;

F2. kfi−fjk ≥2n−1/(2+dδ); F3. 1

N

N i=1

n

2kfik2≤αlogN for some0<α <1/8.

(17)

Then, the sequence of functions rf0, . . . ,rfN defined by(6.1)verify conditions R1, R2 and R3.

Proof. First of all, remark that since fj∈F for j=0, . . . ,N, by definition of the rfj’s and the setR, we haverfj ∈R. Hence condition R1 is satisfied by therfj’s.

Now remark that the Itˆo isometry (3.1) gives for any 0≤i,j≤N D(rfj,rfi) =kfj−fik.

This ensures that therfj’s statisfy condition R2. Finally, for all j=0, . . . ,N K P⊗nrf j,P⊗nrf0

=nK (Prf j,Prf0)

=nErf0

logdPrf0

dPrf j

(X,Y)

=nErf0Erf0

logdPrf0

dPrf j

(X,Y)|X

,

whereErf0 is the expectation underPrf0. Denote by pthe density of theN (0,1).

Then, since f0≡0 Erf0

logdPrf0

dPrf j

(X,Y)|X

= Z

R

log

p(u) p u−rfj(X)

p(u)du.

Simple calculus then give Erf0

logdPrf0

dPrf j

(X,Y)|X

≤ 1

2 rfj(X)2

.

Thus, by the Itˆo Isometry,

K P⊗nrf j,P⊗nrf0

≤ n 2kfjk2,

hence the lemma.

6.2 Proof of Theorem 3.2

LetP0be the subset of distributionsPr of(X,Y)inP for whichX is a Poisson point process with unit intensity (recall that the unit function lies in{ϕθ}θ∈Θ, see Section 2) and such that

Y =r(X) +ε, (6.3)

(18)

wherer∈Randε is independent fromXwith distributionN (0,1). SinceP0⊂ P, we have

infr˜ sup

Pr∈P0

Rn(r,˜ r)≤inf

˜ r sup

Pr∈PRn(˜r,r).

As a result, in order to prove Theorem 3.2, we need only to prove that lim inf

n→+∞n2/(2+dδ)inf

˜

r sup

Pr∈P0

Rn(r,r)˜ >0, which accordingly to (6.2) may be written

lim inf

n→+∞n2/(2+dδ)inf

˜

r sup

Pr∈P0

EnrD2(˜r,r)>0, (6.4) whereEnr denotes expectation with respect toP⊗nr . Then, according to Lemma 6.1 and Theorem 2.5 page 99 in the book by Tsybakov (2009), in order to prove (6.4), we need only to prove the existence of a sequence of functions satisfying conditions F1, F2 and F3 defined in the previous subsection. To this end, we let ψ ∈L2sym⊗δ) be a nonzero function such that Supp(ψ) =Xδ0 and, for all x,y∈Xδ0 :

ψ(y)−ψ(x) ≤ γ2

2|y−x|and|ψ(x)| ≤ f¯. (6.5) LetQ=

c0n/(2+dδ)

≥8 wherec0>0 andb·cis the integer part and letan= (1/Q)1/(dδ). One may easily prove that there existst1, . . . ,tQinXδ0 such that the functions

ψq(·) =anψ

· −tq an

, forq=1, . . . ,Q, verify the following assumptions

(i) Supp(ψq)⊂Xδ0,forq=1, . . . ,Q;

(ii) Supp(ψq)∩Supp(ψq0) = /0, forq6=q0; (iii) λ⊗δ Supp(ψq)

=Q−1.

(19)

Now let, for allω ∈ {0,1}Q:

fω(·) =

Q

q=1

ωqψq(·),

According to the Varshamov-Gilbert Lemma (see the book by Tsybakov, 2009, Lemma 2.8 page 104), there exists a subsetΩ ={ω(0), . . . ,ω(N)}of{0,1}Qsuch thatω(0)= (0,0,· · ·),N≥2Q/8,and for all j6=k:

Q

q=1

q(j)−ωq(k)| ≥Q 8. Now fix 0<α <1/8 and set

c0= (4kψk2α−1)/(2+dδ). We may now prove that functions{f

ω(j) : j=0, . . . ,N}satisfy conditions F1, F2 and F3. First of all, let j6=kbe fixed and remark that,

kf

ω(j)−f

ω(k)k2≤ Z

Xδ0

fω(j)(x)− f

ω(k)(x)2

dx

= Z

Xδ0

Q

q=1

ωq(j)−ωq(k)

ψq(x)2

dx

=

Q q=1

Z

Supp(ψq)

ωq(j)−ωq(k)2

a2nψ2 x−tq an

! dx,

=a2n Q

Q

q=1

ωq(j)−ωq(k) Z

Xδ0

ψ2(x)dx.

Furthermore, by definition of the setΩ, we have Q

8 ≤

Q

q=1

ωq(j)−ωq(k) ≤Q, so that

kψk2

8 a2n≤ kf

ω(j)−f

ω(k)k2≤kψk2a2n. (6.6)

(20)

Now let 0≤ j≤N. Sinceψ ∈L2sym⊗δ)it is clear that f

ω(j) inherits that prop- erty. Then, using the first part of assumption (6.5) on ψ and assumption (ii) on the ψq’s, one may easily prove that f

ω(j) is a Lipschitz function with constant γ2. Finally, using the second part of assumption (6.5) on ψ and assumption (ii) on the ψq’s, we get that for all x∈Xk, |f

ω(j)(x)| ≤anf¯ withan ≤1 as soon as n≥c−(2+dδ)/(dδ)

0 . We conclude that f

ω(j) ∈F so that condition F1 is satisfied by fω(j). Now, takingk=0 in (6.6) gives

1 N

N

j=1

n 2kf

ω(j)k2≤ nkψk2

2 a2n≤ kψk2

2 c−(2+dδ)/(dδ)

0 Q.

SinceN≥2Q/8andc0≥(4kψk2α−1)dδ/(2+dδ), we get 1

N

N j=1

n 2kf

ω(j)k2≤4kψk2c−(2+dδ)/(dδ)

0 logN≤αlogN,

so that F3 is satisfied. Finally, according to (6.6), we have kf

ω(j)−f

ω(k)k ≥ kψk 2√

2an= kψk 2√

2(2c0)−1/(dδ)n−1/(2+dδ).

We can now conclude that conditions F1, F2 and F3 are satisfied so that according to Theorem 2.5 page 99 from the book by Tsybakov (2009) and Lemma 6.1, (6.4)

is verified. Theorem follows.

7 Proof of Theorem 4.1

In this section, we assume without loss of generality that the constantsϕ, γ1, γ2, f¯,λ(X)andW are greater than 1 and thatϕ andmare smaller than 1. Moreover, C denotes a positive number that only depends on the parameters of the model, i.e.m,u,ϕ,ϕ,δ,γ12,f¯,θ,κ,λ(X),MandW, and whose value may change from line to line.

We letP∈P(m), and for simplicity, we denoteE=E2nthe expectation with respect toP⊗2n. Recall that(X,Y),(X1,Y1), . . . ,(X2n,Y2n)are i.i.d. with distribu- tionP.

(21)

7.1 Technical results

Fork≥1, denote for allx∈Xk: gk(x) =1

n

n

i=1

Yi Z

k

Wb⊗k

k (x−y) Xi−ϕθ·λ⊗k

(dy). (7.1)

We also let

Sk= 1 n

n i=1

|Yi| Xi(X) +ϕ λ(X)k

. (7.2)

Lemma 7.1. We have, for all k:

kgˆk−gkk ≤CkSk|θˆ−θ| bdk/2k

.

Proof. Let ˆµ the signed measure µˆ(dy) =1

n

n

i=1

Yih

Xi−ϕˆ

θ·λ⊗k

(dy)− Xi−ϕθ·λ⊗k

(dy)i

=1 n

n i=1

Yi

k−1

l=0

k l

(−1)k−l

l

j=1

Xi(dyj) h k

j=l+1

ϕθˆ(yj)dyj

k

j=l+1

ϕθ(yj)dyj i

.

Then,

kgˆk−gkk2= Z

Xk

Z

k

Wb⊗k

k (x−y)µ(dy)ˆ 2

dx

= Z

k2

Z

Xk

Wb⊗k

k (x−y(1))Wb⊗k

k (x−y(2))dx

µˆ(dy(1))µˆ(dy(2)).

But,

Z

Xk

Wb⊗k

k (x−y(1))Wb⊗k

k (x−y(2))dx≤Wk bdkk . Furthermore, one gets by induction that forl=0, . . . ,k−1:

k j=l+1

ϕθˆ(yj)−

k j=l+1

ϕθ(yj)

≤ϕk−l−1

k j=l+1

ˆ

θ(yj)−ϕθ(yj)|,

(22)

and by assumptions (2.1) and (2.2), Z

Xk−l k j=l+1

ˆ

θ(yj)−ϕθ(yj)|dyl+1. . .dyk≤(k−l)λ(X)k−l−1 Z

X

ˆ

θ−ϕθ|dλ

≤(k−l)λ(X)k−lϕ|θˆ−θ|.

Puting all pieces together, we get kgˆk−gkk2≤Wk

bdkk 1

n

n

i=1

|Yi|

k−1

l=0

k k−1

l

Xi(X)lϕk−l−1λ(X)k−l|θˆ−θ| 2

.

We can then conclude

kgˆk−gkk2≤Wkk2λ(X)2

bdkk |θˆ−θ|21 n

n i=1

|Yi| Xi(X) +ϕ λ(X)k−12

.

Lemma follows, sinceϕ λ(X)≥1.

Denote for alli,j≥0

si,j=E|Y|i X(X) +ϕ λ(X)j

. (7.3)

Moreover,(Vn)nis a sequence of real numbers, bigger than 1 and tending to infin- ity, to be tuned latter.

Lemma 7.2. Let l≥0be fixed. Then for all x∈Xk, E

Y

Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)

l≤sl,lkClk bldkk . Moreover,

E

Y Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)2

≤Ck√ s4,4k

k!Vn2

bdkk +e−uVn/2 b2dkk

.

Proof. First observe that

Z

k

Wb⊗k

k (x−y)X˜⊗k(dy) ≤Wk

bdkk

k

l=0

k l

ϕk−lλ(X)k−lX(X)l

≤Wk

bdkk X(X) +ϕ λ(X)k

.

(23)

Hence,

E Y

Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)

l≤Wlksl,lk bldkk Regarding the second inequality, we observe that

E

Y Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)2

≤Vn2E Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)2

+E

Y1|Y|>Vn Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)2

.

By (3.1),

E Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)2

≤k!Ck bdkk . Moreover, by the Cauchy-Schwarz Inequality, (2.6) and above,

h E

Y1|Y|>Vn Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)2i2

≤P(|Y|>Vn) +E

Y

Z

k

Wb⊗k

k (x−y)X˜⊗k(dy)4

≤Cks4,4ke−uVn b4dkk .

Puting all pieces together gives the result.

Lemma 7.3. Let k>δ be fixed. Then,

P(δˆ =k)≤Ck((2k)!+s1,k)2 n(ρkbdk/2k )2

+ s2,2k n((2k)!)2 +Ck

ρk8

s8,8kmax n

(nbdkk )8,(k!)4Vn8

(nbdkk )4 + e−2uVn (nb2dkk )4

.

Proof. Sincek>δ, we have with notation (7.1) P(δˆ =k)≤P(kgˆkk>ρk)

≤P(kgˆk−gkk+kgkk>ρk)

≤P(2kgˆk−gkk>ρk) +P(2kgkk>ρk). (7.4)

Références

Documents relatifs

The resulting estimator is shown to be adaptive and minimax optimal: we establish nonasymptotic risk bounds and compute rates of convergence under various assumptions on the decay

Forests improve the order of magnitude of the approximation error, compared to a single tree. Estimation error seems to change only by a constant factor (at least for

Keywords : Functional data, recursive kernel estimators, regression function, quadratic error, almost sure convergence, asymptotic normality.. MSC : 62G05,

In the infinite-dimensional setting, the regression problem is faced with new challenges which requires changing methodology. Curiously, despite a huge re- search activity in the

Keywords: Functional regression, Wiener-Itô chaos expansion, Oracle inequalities, Adaptive minimax rates of convergence.. AMS Subject Classification:

Our constructed estimate of r is shown to adapt to the unknown value d of the reduced dimension in the minimax sense and achieves fast rates when d &lt;&lt; p, thus reaching the goal

Keywords: Poisson process; adaptive hypotheses testing; uniform separation rate; minimax separation rate; model selection; thresholding

A new approach for signal parametrization, which consists of a specific regression model incorporating a discrete hidden logistic pro- cess, is proposed.. The model parameters