• Aucun résultat trouvé

Kernel Selection in Nonparametric Regression

N/A
N/A
Protected

Academic year: 2021

Partager "Kernel Selection in Nonparametric Regression"

Copied!
20
0
0

Texte intégral

(1)

HÉLÈNE HALCONRUY* AND NICOLAS MARIE

Abstract. In the regression modelY =b(X) +ε, whereX has a densityf, this paper deals with an oracle inequality for an estimator ofbf, involving a kernel in the sense of Lerasle et al. (2016), selected via the PCO method. In addition to the bandwidth selection for kernel-based estimators already studied in Lacour, Massart and Rivoirard (2017) and Comte and Marie (2020), the dimension selection for anisotropic projection estimators off andbfis covered.

Contents

1. Introduction 1

2. Risk bound 2

3. Kernel selection 4

4. Basic numerical experiments 6

Appendix A. Details on kernels sets: proofs of Propositions 2.2, 2.3 and 3.3 8

A.1. Proof of Proposition 2.2 8

A.2. Proof of Proposition 2.3 8

A.3. Proof of Proposition 3.3 9

Appendix B. Proofs of risk bounds 11

B.1. Preliminary results 11

B.2. Proof of Proposition 2.4 17

B.3. Proof of Theorem 2.5 17

B.4. Proof of Theorem 3.2 18

References 20

MSC2010: 62G05 ; 62G08.

1. Introduction

Considern∈NindependentRd×R-valued (d∈N) random variables(X1, Y1), . . . ,(Xn, Yn), having the same probability distribution assumed to be absolutely continuous with respect to Lebesgue’s measure, and

bsK,`(n;x) := 1 n

n

X

i=1

K(Xi, x)`(Yi) ;x∈Rd,

where`:R→Ris a Borel function andK is a symmetric continuous map fromRd×Rd intoR. This is an estimator of the functions:Rd→Rdefined by

s(x) :=E(`(Y1)|X1=x)f(x) ;∀x∈Rd,

where f is a density of X1. For` = 1,bsK,`(n;.) coincides with the estimator off studied in Lerasle et al. [11], but for`6= 1, it covers estimators involved in nonparametric regression. Assume that for every i∈ {1, . . . , n},

(1) Yi=b(Xi) +εi

whereεi is a centered random variable, independent ofXi, andb:Rd→Ris a Borel function.

Key words and phrases. Nonparametric estimators ; Projection estimators ; Model selection ; Regression model.

1

arXiv:2006.07673v1 [math.ST] 13 Jun 2020

(2)

• If`=IdR,kis a symmetric kernel and

(2) K(x0, x) =

d

Y

q=1

1 hq

k

x0q−xq hq

withh1, . . . , hd >0

for every x, x0 ∈ Rd, then bsK,`(n;.) is the numerator of Nadaraya-Watson’s estimator of the regression functionb. Precisely,sbK,`(n;.)is an estimator ofs=bf. If`6=IdR, thenbsK,`(n;.)is the numerator of the estimator studied in Einmahl and Mason [5, 6].

• If ` = IdR, Bmq = {ϕ1mq, . . . , ϕmmqq} (mq ∈ N and q ∈ {1, . . . , d}) is an orthonormal family of L2(R)and

(3) K(x0, x) =

d

Y

q=1 mq

X

j=1

ϕmj q(xqmj q(x0q)

for everyx, x0∈Rd, thensbK,`(n;.)is the projection estimator onS=span(Bm1⊗ · · · ⊗ Bmd)of s=bf.

Now, assume that for everyi∈ {1, . . . , n}, Yi is defined by the heteroscedastic model

(4) Yi=σ(Xii,

where εi is a centered random variable of variance 1, independent of Xi, and σ : Rd → R is a Borel function. If`(x) =x2 for everyx∈R, thensbK,`(n;.)is an estimator ofs=σ2f.

These ten last years, several data-driven procedures have been proposed in order to select the band- width of Parzen-Rosenblatt’s estimator (` = 1 and K defined by (2)). First, Goldenshluger-Lepski’s method, introduced in [8], which reaches the adequate bias-variance compromise, but is not completely satisfactory on the numerical side (see Comte and Rebafka [4]). More recently, in [10], Lacour, Massart and Rivoirard proposed the PCO (Penalized Comparison to Overfitting) method and proved an oracle inequality for the associated adaptative Parzen-Rosenblatt’s estimator by using a concentration inequal- ity for the U-statistics due to Houdré and Reynaud-Bouret [9]. Together with Varet, they established the numerical efficiency of the PCO method in Varet et al. [13].

Comte and Marie [3] deals with an oracle inequality and numerical experiments for an adaptative Nadaraya-Watson’s estimator with a numerator and a denominator having distinct bandwidths, both selected via the PCO method. Since the output variable in a regression model has no reason to be bounded, there were significant additional difficulties, bypassed in [3], to establish an oracle inequality for the numerator’s adaptative estimator. Via similar arguments, the present article deals with an oracle inequality forbs

K,`c (n;.), whereKb is selected via the PCO method in the spirit of Lerasle et al. [11]. In addition to the bandwidth selection for kernel-based estimators already studied in [10, 3], it covers the dimension selection for anisotropic projection estimators off,bf (whenY1, . . . , Yn are defined by Model (1)) and σ2f (when Y1, . . . , Yn are defined by Model (4)). As for the bandwidth selection for kernel based estimators, ford >1, the PCO method allows to bypass the numerical difficulties generated by the Goldenshluger-Lepski type method involved in the anisotropic model selection procedures (see Chagny [1]).

In Section 2, some examples of kernels sets are provided and a risk bound for bsK,`(n;.) is established.

Section 3 deals with an oracle inequality forbs

K,`c (n;.), whereKb is selected via the PCO method. Finally, Section 4 deals with a basic numerical study.

2. Risk bound

Throughout the paper,s∈L2(Rd). LetKn be a set of symmetric continuous maps fromRd×Rd into R, of cardinal less or equal thann, fulfilling the following assumption.

Assumption 2.1. There exists a deterministic constant mK,`>0, not depending on n, such that

(3)

(1) For everyK∈ Kn,

sup

x0Rd

kK(x0, .)k226mK,`n.

(2) For everyK∈ Kn,

ksK,`k226mK,`

with

sK,`:=E(sbK,`(n;.)) =E(K(X1, .)`(Y1)).

(3) For everyK, K0∈ Kn,

E(hK(X1, .), K0(X2, .)`(Y2)i22)6mK,`sK0,`

with

sK0,`:=E(kK0(X1, .)`(Y1)k22).

(4) For everyK∈ Kn andψ∈L2(Rd),

E(hK(X1, .), ψi22)6mK,`kψk22.

The elements ofKn are called kernels. Let us provide two natural examples of kernels sets.

Proposition 2.2. Consider

Kk(hmin) :=

(

(x0, x)7→

d

Y

q=1

1 hqk

x0q−xq

hq

; h1, . . . , hd∈ {hmin, . . . ,1}

) ,

where k is a symmetric kernel (in the usual sense) and nhdmin > 1. The kernels set Kk(hmin) fulfills Assumption 2.1 and, for anyK∈ Kk(hmin)such that

K(x0, x) =

d

Y

q=1

1 hq

k

x0q−xq

hq

; ∀x, x0∈Rd

withh1, . . . , hd∈ {hmin, . . . ,1},

sK,`=kkk2d2 E(`(Y1)2)

d

Y

q=1

1 hq

.

Proposition 2.3. Consider

KB1,...,Bn(mmax) :=

(x0, x)7→

d

Y

q=1 mq

X

j=1

ϕmj q(xqmj q(x0q) ; m1, . . . , md∈ {1, . . . , mmax}

 ,

where mdmax ∈ {1, . . . , n} and, for every m∈ {1, . . . , n},Bm ={ϕm1, . . . , ϕmm} is an orthonormal family of L2(R)such that

sup

x0R m

X

j=1

ϕmj (x0)26mBm

withmB>0not depending on mandn, and

(5) Bm⊂ Bm+1 ;∀m∈ {1, . . . , n−1}

or

(6) mB:= sup{|E(K(X1, x))| ;K∈ KB1,...,Bn(mmax)andx∈Rd}is finite and doesn’t depend onn.

The kernels setKB1,...,Bn(mmax)fulfills Assumption 2.1 and, for anyK∈ KB1,...,Bn(mmax)such that

K(x0, x) =

d

Y

q=1 mq

X

j=1

ϕmjq(xqjmq(x0q) ; ∀x, x0∈Rd

(4)

withm1, . . . , mn∈ {1, . . . , mmax},

sK,`6mdBE(`(Y1)2)

d

Y

q=1

mq.

Remark. Note that Condition (5) (resp. (6)) is close to (resp. the same that) Condition (19) (resp.

(20)) of Lerasle et al. [11], Proposition 3.2. See also Massart [12], Chapter 7 on these conditions. For instance, the trigonometric basis and Hermite’s basis satisfy Condition (5). The regular histograms basis satisfy Condition (6). Indeed, by taking ϕmj = ψjm := √

m1[(j−1)/m,j/m[ for every m ∈ {1, . . . , n} and j∈ {1, . . . , m},

E

d

Y

q=1 mq

X

j=1

ψmj q(X1,qjmq(xq)

=

m1

X

j1=1

· · ·

md

X

jd=1 d

Y

q=1

mq1[(jq−1)/mq,jq/mq[(xq)

!

× Z j1/m1

(j1−1)/m1

· · ·

Z jd/md (jd−1)/md

f(x01, . . . , x0d)dx01· · ·dx0d

6kfk d

Y

q=1 mq

X

j=1

1[(j−1)/mq,j/mq[(x)6kfk

for everym1, . . . , md ∈ {1, . . . , n}andx∈Rd.

The following proposition provides a suitable control of the variance ofbsK,`(n;.).

Proposition 2.4. Under Assumption 2.1.(1,2,3), if s ∈ L2(Rd) and if there exists α > 0 such that E(exp(α|`(Y1)|))<∞, then there exists a deterministic constantc2.4>0, not depending onn, such that for everyθ∈]0,1[,

E

sup

K∈Kn

kbsK,`(n;.)−sK,`k22−sK,`

n −θ

nsK,`

6c2.4

log(n)5 θn . Finally, let us state the main result of this section.

Theorem 2.5. Under Assumption 2.1, ifs∈L2(Rd)and if there existsα >0such thatE(exp(α|`(Y1)|))<

∞, then there exists a deterministic constant c2.5,c2.5 > 0, not depending on n, such that for every θ∈]0,1[,

E

sup

K∈Kn

nkbsK,`(n;.)−sk22−(1 +θ)

ksK,`−sk22+sK,`

n o

6c2.5

log(n)5 θn and

E

sup

K∈Kn

ksK,`−sk22+sK,`

n − 1

1−θkbsK,`(n;.)−sk22

6c2.5

log(n)5 θ(1−θ)n.

Remark. Note that the first inequality in Theorem 2.5 gives a risk bound on the estimatorbsK,`(n;.):

E(kbsK,`(n;.)−sk22)6(1 +θ)

ksK,`−sk22+sK,`

n

+c2.5log(n)5 θn

for everyθ ∈]0,1[. The second inequality is useful in order to establish a risk bound on the adaptative estimator defined in the next section (see Theorem 3.2).

3. Kernel selection This section deals with a risk bound on the adaptative estimatorsb

K,`c (n;.), where Kb ∈arg min

K∈Kn

{kbsK,`(n;·)−bsK0,`(n;·)k22+pen(K)}, K0is an overfitting proposal forK and

(7) pen(K) := 2

n2

n

X

i=1

hK(., Xi), K0(., Xi)i2`(Yi)2;∀K∈ Kn.

(5)

Example. ForKn=Kk(hmin), one should take K0(x0, x) = 1

hdmin

d

Y

q=1

k

x0q−xq

hmin

;∀x, x0 ∈Rd,

and forKn =KB1,...,Bn(mmax), one should take K0(x0, x) =

d

Y

q=1 mmax

X

j=1

ϕmjmax(xqmj max(x0q) ;∀x, x0∈Rd.

In the sequel, in addition to Assumption 2.1, the kernels setKn fulfills the following assumption.

Assumption 3.1. There exists a deterministic constant mK,`>0, not depending on n, such that

E

sup

K,K0∈Kn

hK(X1, .), sK0,`i22

6mK,`.

The following theorem provides an oracle inequality for the adaptative estimatorbs

K,`c (n;.).

Theorem 3.2. Under Assumptions 2.1 and 3.1, if s ∈ L2(Rd) and if there exists α > 0 such that E(exp(α|`(Y1)|))<∞, then there exists a deterministic constantc3.2>0, not depending onn, such that for everyϑ∈]0,1[,

E(kbs

K,`c (n;.)−sk22)6(1 +ϑ) min

K∈Kn

E(kbsK,`(n;.)−sk22) +c3.2

ϑ

ksK0,`−sk22+log(n)5 n

. Finally, let us discuss about Assumption 3.1. Note that ifsis bounded and

mK:= sup{kK(x0, .)k21 ;K∈ Kn andx0 ∈Rd} doesn’t depend onn, thenKn fulfills Assumption 3.1. Indeed,

E sup

K,K0∈Kn

hK(X1, .), sK0,`i22

! 6

sup

K0∈Kn

ksK0,`k2

E

sup

K∈Kn

kK(X1, .)k21

6 mKsup (Z

−∞

|K0(x0, x)s(x)|dx 2

;K0∈ Kn andx0∈R )

6m2Kksk2. In the nonparametric regression framework (see Model (1)), to assume s bounded means that bf is bounded. For instance, this condition is fulfilled by the linear regression models with Gaussian inputs.

The following examples focus on the condition onmK. Examples:

(1) ConsiderK∈ Kk(hmin). Then, there existsh1, . . . , hd ∈ {hmin, . . . ,1}such that K(x0, x) =

d

Y

q=1

1 hq

k

x0q−xq

hq

;∀x, x0 ∈Rd.

Clearly,kK(x0, .)k1=kkkd1 for every x0 ∈Rd. So, forKn =Kk(hmin),mK6kkk2d1 .

(2) For Kn =KB1,...,Bn(mmax), the condition on mK seems harder to check in general. Let us show that it is satisfied for the regular histograms basis defined in Section 2. For everym1, . . . , md∈ {1, . . . , n},

d

Y

q=1 mq

X

j=1

ψjmq(x0qjmq(.q) 1

6

d

Y

q=1

mq

mq

X

j=1

1[(j−1)/mq,j/mq[(x0q) Z j/mq

(j−1)/mq

dx

61.

The following proposition shows thatKB1,...,Bn(mmax)fulfills Assumption 3.1 for the trigonometric basis, even if the condition onmK is not satisfied.

(6)

Proposition 3.3. Consider χ1 :=1[0,1] and, for everyj ∈N, the functions χ2j andχ2j+1 defined on Rby

χ2j(x) :=√

2 cos(2πjx)1[0,1](x)andχ2j+1(x) :=√

2 sin(2πjx)1[0,1](x) ; ∀x∈R.

Ifs∈C2(Rd)andBm={χ1, . . . , χm}for everym∈ {1, . . . , n}, thenKB1,...,Bn(mmax)fulfills Assumption 3.1.

4. Basic numerical experiments

Throughout this section,d= 1,`∈ {1,IdR}andY1, . . . , Ynare defined by Model (1) withε1, . . . , εn

N(0,1). Some numerical experiments on bsK,1(n;.)(resp. bsK,IdR(n;.)) for K ∈ Kk(hmin) have already been done in Varet et al. [13] (resp. Comte and Marie [3]). So, this section deals with basic numerical experiments on sbK,1(n;.) and sbK,IdR(n;.) for K ∈ KB1,...,Bn(mmax)and Bm ={ψm1, . . . , ψmm} for every m= 1, . . . , n.

In this case,Kb =K

cm(`) where Km(x0, x) :=

m

X

j=1

ψjm(x0jm(x) ;∀x, x0∈R,∀m∈ M={1, . . . , mmax},

m(`)b is a solution of the minimization problem

m∈Mmin{kbsKm,`(n;.)−bsKmmax,`(n;.)k22+pen(m)}

and

pen(m) := 2 n2

n

X

i=1

hKm(., Xi),Kmmax(., Xi)i2`(Yi)2 ;∀m∈ M.

For`∈ {1,IdR}, n= 250and mmax= 30, mis selected inMfor two basic densities and two nonlinear regression functions:

• f =f1 the density ofE(5).

• f =f2 the density ofN(1/2,(1/8)2).

• b(x) =b1(x) := 10(x2−1/2)for every x∈[0,1].

• b(x) =b2(x) := cos(5πx)for every x∈[0,1].

On the one hand, on the four following figures, one can see the beam of all possible estimations off and bf (i.e. for eachm∈ M) at left, the PCO criteria for bsK,1(n;.)andbsK,IdR(n;.)for eachm∈ Mat the middle, and the PCO estimations off andbf (i.e. form=m(1)b andm=m(Idb R)) at right:

Figure 1. f =f1,b=b1,m(1) = 10b andm(Idb R) = 10.

(7)

Figure 2. f =f1,b=b2,m(1) = 12b andm(Idb R) = 20.

Figure 3. f =f2,b=b1,m(1) = 10b andm(Idb R) = 15.

Figure 4. f =f2,b=b2,m(1) = 15b andm(Idb R) = 6.

(8)

On the other hand, for (f, b) = (f1, b2) and (f, b) = (f2, b1), let us generate 10 datasets of n = 250 observations of(X1, Y1)and, for each of these, selectm∈ Mvia the PCO criterion introduced previously.

On the two following figures, the beam of all PCO estimations off (resp. bf) is plotted at left (resp. at right):

Figure 5. f =f1andb=b2. Figure 6. f =f2and b=b1. Appendix A. Details on kernels sets: proofs of Propositions 2.2, 2.3 and 3.3 A.1. Proof of Proposition 2.2. Consider K, K0 ∈ Kk(hmin). Then, there exist h, h0 ∈ {hmin, . . . ,1}d such that

K(x0, x) =

d

Y

q=1

1 hqk

x0q−xq hq

andK0(x0, x) =

d

Y

q=1

1 h0qk

x0q−xq h0q

for everyx, x0∈Rd. (1) For everyx0∈Rd,

kK(x0, .)k22=kkk2d2

d

Y

q=1

1

hq 6kkk2d2 n.

(2) SincesK,`=K∗s,ksK,`k226kkk2d1 ksk22. (3) First,

sK0,`=kkk2d2 E(`(Y1)2)

d

Y

q=1

1 h0q. Then,

E(hK(X1, .), K0(X2, .)`(Y2)i22) = E((K∗K0)(X1−X2)2`(Y2)2) 6kfkkK∗K0k22E(`(Y1)2) 6kfkkkk2d1 sK0,`.

(4) For everyψ∈L2(Rd),

E(hK(X1, .), ψi22) = E((K∗ψ)(X1)2)

6kfkkK∗ψk226kfkkkk2d1 kψk22.

A.2. Proof of Proposition 2.3. Consider K, K0 ∈ KB1,...,Bn(mmax). Then, there exist m, m0 ∈ {1, . . . , mmax}d such that

K(x0, x) =

d

Y

q=1 mq

X

j=1

ϕmj q(xqmj q(x0q)andK0(x0, x) =

d

Y

q=1 m0q

X

j=1

ϕm

0 q

j (xqm

0 q

j (x0q) for everyx, x0∈Rd.

(9)

(1) For everyx0∈Rd,

kK(x0, .)k22 =

d

Y

q=1 mq

X

j,j0=1

ϕmj0q(x0qmj q(x0q) Z

−∞

ϕmj0q(x)ϕmjq(x)dx

=

d

Y

q=1 mq

X

j=1

ϕmj q(x0q)26mdB

d

Y

q=1

mq 6mdBn.

(2) Since

sK,`(.) =

m1

X

j1=1

· · ·

md

X

jd=1

hs, ϕmj1

1 ⊗ · · · ⊗ϕmjd

d i2mj 1

1 ⊗ · · · ⊗ϕmj d

d )(.),

by Pythagore’s theorem,ksK,`k226ksk22. (3) First,

sK0,`=E

`(Y1)2

d

Y

q=1 m0q

X

j=1

ϕm

0 q

j (X1,q)2

6mdBE(`(Y1)2)

d

Y

q=1

m0q.

On the one hand, ifB1, . . . ,Bn satisfy Condition (5), then

E(hK(X1, .), K0(X2, .)`(Y2)i22) = Z

Rd

E

d

Y

q=1 mq∧m0q

X

j=1

ϕm

0 q j (x0qm

0 q j (X2,q)

2

`(Y2)2

f(x0d(dx0)

6 kfkE

`(Y2)2

d

Y

q=1 mq∧m0q

X

j,j0=1

ϕm

0 q

j0 (X2,qm

0 q j (X2,q)

Z

−∞

ϕm

0 q j0 (x0m

0 q j (x0)dx0

6kfksK0,`.

On the other hand, ifB1, . . . ,Bn satisfy Condition (6), then

E(hK(X1, .), K0(X2, .)`(Y2)i22) 6E(kK(X1, .)k22kK0(X2, .)k22`(Y2)2)

=E(K(X1, X1))E(kK0(X2, .)k22`(Y2)2)6mBsK0,`. (4) For everyψ∈L2(Rd),

E(hK(X1, .), ψi22) = E

m1

X

j1=1

· · ·

md

X

jd=1

hψ, ϕmj 1

1 ⊗ · · · ⊗ϕmj d

d i2mj1

1 ⊗ · · · ⊗ϕmjd

d )(X1)

2

6 kfk

m1

X

j1=1

· · ·

md

X

jd=1

hψ, ϕmj 1

1 ⊗ · · · ⊗ϕmj d

d i2mj1

1 ⊗ · · · ⊗ϕmjd

d )(.)

2

2

6kfkkψk22.

A.3. Proof of Proposition 3.3. For the sake of readability, assume that d = 1. Consider K, K0 ∈ KB1,...,Bn(mmax). Then, there existm, m0∈ {1, . . . , mmax} such that

K(x0, x) =

m

X

j=1

χj(x)χj(x0)andK0(x0, x) =

m0

X

j=1

χj(x)χj(x0) ;∀x, x0∈R.

(10)

First, there existm1(m, m0)∈ {0, . . . , n} andc1>0, not depending onn, K andK0, such that for any x0 ∈[0,1],

|hK(x0, .), sK0,`i2| =

m∧m0

X

j=1

E(`(Y1j(X1))χj(x0)

6 c1+ 2

m1(m,m0)

X

j=1

E(`(Y1)(cos(2πjX1) cos(2πjx0) + sin(2πjX1) sin(2πjx0))1[0,1](X1))

= c1+ 2

m1(m,m0)

X

j=1

E(`(Y1) cos(2πj(X1−x0))1[0,1](X1)) .

Moreover, for anyj∈ {2, . . . ,m1(m, m0)}, E(`(Y1) cos(2πj(X1−x0))1[0,1](X1)) =

Z 1 0

cos(2πj(x−x0))s(x)dx

= 1 j

sin(2πj(x−x0))

2π s(x)

1 0

+1 j2

cos(2πj(x−x0)) 4π2 s0(x)

1

0

− 1 j2

Z 1 0

cos(2πj(x−x0))

2 s00(x)dx

= s(0)−s(1)

2π · αj(x0)

j +βj(x0) j2 whereαj(x0) := sin(2πjx0)and

βj(x0) := 1 4π2

(s0(1)−s0(0)) cos(2πjx0)− Z 1

0

cos(2πj(x−x0))s00(x)dx

.

Then, there exists a deterministic constantc2>0, not depending onn,K,K0 andx0, such that

(8) hK(x0, .), sK0,`i226c2

1 +

m1(m,m0)

X

j=1

αj(x0) j

2

+

m1(m,m0)

X

j=1

βj(x0) j2

2

.

Let us show that each term of the right-hand side of Inequality (8) are uniformly bounded inx0,mand m0. On the one hand,

m1(m,m0)

X

j=1

βj(x0) j2

6 max

j∈{1,...,n}jk

n

X

j=1

1 j2 6 1

24(2ks0k+ks00k).

On the other hand, for everyx∈]0, π[ such that[π/x] + 16m1(m, m0)(without loss of generality),

m1(m,m0)

X

j=1

sin(jx) j

6

[π/x]

X

j=1

sin(jx) j

+

m1(m,m0)

X

j=[π/x]+1

sin(jx) j

6 xhπ

x i

+ 2

(1 + [π/x]) sin(x/2) 6π+ 2.

(9)

Sincex7→sin(x)is continuous, odd and 2π-periodic, Inequality (9) holds true for everyx∈R. So,

m1(m,m0)

X

j=1

αj(x0) j

6π+ 2.

(11)

Therefore, E

"

sup

K,K0∈KB1,...,Bn(mmax)

hK(X1, .), sK0,`i22

# 6c2

1 + (π+ 2)2+ 1

242(2ks0k+ks00k)2

.

Appendix B. Proofs of risk bounds

In this section, the proofs follow the same pattern as in Comte and Marie [2, 3].

B.1. Preliminary results. This subsection provides three lemmas used several times in the sequel.

Lemma B.1. Consider UK,K0,`(n) :=X

i6=j

hK(Xi, .)`(Yi)−sK,`, K0(Xj, .)`(Yj)−sK0,`i2 ; ∀K, K0 ∈ Kn.

Under Assumption 2.1.(1,2,3), ifs∈L2(Rd)and if there exists α >0 such that E(exp(α|`(Y1)|))<∞, then there exists a deterministic constantcB.1>0, not depending onn, such that for every θ∈]0,1[,

E

sup

K,K0∈Kn

|UK,K0,`(n)|

n2 −θ nsK0,`

6cB.1

log(n)5 θn . Lemma B.2. Consider

VK,`(n) := 1 n

n

X

i=1

kK(Xi, .)`(Yi)−sK,`k22 ;∀K∈ Kn.

Under Assumption 2.1.(1,2), if s ∈ L2(Rd) and if there exists α > 0 such that E(exp(α|`(Y1)|))< ∞, then there exists a deterministic constantcB.2>0, not depending onn, such that for every θ∈]0,1[,

E

sup

K∈Kn

1

n|VK,`(n)−sK,`| − θ nsK,`

6cB.2

log(n)3 θn . Lemma B.3. Consider

WK,K0,`(n) :=hbsK,`(n;.)−sK,`, sK0,`−si2 ; ∀K, K0∈ Kn.

Under Assumption 2.1.(1,2,4), ifs∈L2(Rd)and if there exists α >0 such that E(exp(α|`(Y1)|))<∞, then there exists a deterministic constantcB.3>0, not depending onn, such that for every θ∈]0,1[,

E

sup

K,K0∈Kn

{|WK,K0,`(n)| −θksK0,`−sk22}

6cB.3

log(n)4 θn . B.1.1. Proof of Lemma B.1. Considerm(n) := 8 log(n)/α. For anyK, K0 ∈ Kn,

UK,K0,`(n) =UK,K1 0,`(n) +UK,K2 0,`(n) +UK,K3 0,`(n) +UK,K4 0,`(n) where

UK,Kl 0,`(n) :=X

i6=j

glK,K0,`(n;Xi, Yi, Xj, Yj) ;l= 1,2,3,4

with, for every(x0, y),(x00, y0)∈E=Rd×R,

g1K,K0,`(n;x0, y, x00, y0) :=hK(x0, .)`(y)1|`(y)|6m(n)−sK,`+ (n;.), K0(x00, .)`(y0)1|`(y)|6m(n)−s+K0,`(n;.)i2, g2K,K0,`(n;x0, y, x00, y0) :=hK(x0, .)`(y)1|`(y)|>m(n)−sK,` (n;.), K0(x00, .)`(y0)1|`(y)|6m(n)−s+K0,`(n;.)i2, g3K,K0,`(n;x0, y, x00, y0) :=hK(x0, .)`(y)1|`(y)|6m(n)−s+K,`(n;.), K0(x00, .)`(y0)1|`(y)|>m(n)−sK0,`(n;.)i2, g4K,K0,`(n;x0, y, x00, y0) :=hK(x0, .)`(y)1|`(y)|>m(n)−sK,`(n;.), K0(x00, .)`(y0)1|`(y)|>m(n)−sK0,`(n;.)i2 and, for everyk∈ Kn,

s+k,`(n;.) :=E(k(X1, .)`(Y1)1|`(Y1)|6m(n))andsk,`(n;.) :=E(k(X1, .)`(Y1)1|`(Y1)|>m(n)).

(12)

On the one hand, since E(gK,K1 0,`(n;x0, y, X1, Y1)) = 0 for every (x0, y) ∈ E, by Giné and Nickl [7], Theorem 3.4.8, there exists a universal constantm>1 such that for any λ >0, with probability larger than1−5.4e−λ,

|UK,K1 0,`(n)|

n2 6 m

n2(cK,K0,`(n)λ1/2+dK,K0,`(n)λ+bK,K0,`(n)λ3/2+aK,K0,`(n)λ2)

where the constants aK,K0,`(n), bK,K0,`(n), cK,K0,`(n) and dK,K0,`(n) are defined and controlled later.

First, note that

UK,K1 0,`(n) = X

i6=j

K,K0,`(n;Xi, Yi, Xj, Yj)

−ψK,K0,`(n;Xi, Yi)−ψK0,K,`(n;Xj, Yj) +E(ϕK,K0,`(n;Xi, Yi, Xj, Yj))), (10)

where

ϕK,K0,`(n;x0, y, x00, y00) :=hK(x0, .)`(y)1|`(y)|6m(n), K0(x00, .)`(y0)1|`(y0)|6m(n)i2

and

ψk,k0,`(n;x0, y) :=hk(x0, .)`(y)1|`(y)|6m(n), s+k0,`(n;.)i2=E(ϕk,k0,`(n;x0, y, X1, Y1))

for every k, k0 ∈ Kn and (x0, y),(x00, y0)∈E. Let us now control aK,K0,`(n), bK,K0,`(n), cK,K0,`(n)and dK,K0,`(n):

• The constant aK,K0,`(n). Consider aK,K0,`(n) := sup

(x0,y),(x00,y0)∈E

|gK,K1 0,`(n;x0, y, x00, y0)|.

By (10), Cauchy-Schwarz’s inequality and Assumption 2.1.(1), aK,K0,`(n) 6 4 sup

(x0,y),(x00,y0)∈E

|hK(x0, .)`(y)1|`(y)|6m(n), K0(x00, .)`(y0)1|`(y0)|6m(n)i2|

6 4m(n)2

sup

x0∈Rd

kK(x0, .)k2 sup

x00∈Rd

kK0(x00, .)k2

64mK,`m(n)2n.

So,

1

n2aK,K0,`(n)λ26 4

nmK,`m(n)2λ2.

• The constant bK,K0,`(n). Consider bK,K0,`(n)2:=n sup

(x0,y)∈EE(gK,K1 0,`(n;x0, y, X1, Y1)2).

By (10), Jensen’s inequality, Cauchy-Schwarz’s inequality and Assumption 2.1.(1), bK,K0,`(n)2 616n sup

(x0,y)∈EE(hK(x0, .)`(y)1|`(y)|6m(n), K0(X1, .)`(Y1)1|`(Y1)|6m(n)i22) 616nm(n)2 sup

x0∈Rd

kK(x0, .)k22E(kK0(X1, .)`(Y1)1|`(Y1)|6m(n)k22)616mK,`n2m(n)2sK0,`. So, for anyθ∈]0,1[,

1

n2bK,K0,`(n)λ3/2 6 2 3m

θ

1/2 2

n1/2m1/2K,`m(n)λ3/2× θ

3m

1/2 1 n1/2s1/2K0,`

6 θ

3mnsK0,`+12mλ3

θn mK,`m(n)2.

• The constant cK,K0,`(n). Consider

cK,K0,`(n)2:=n2E(g1K,K0,`(n;X1, Y1, X2, Y2)2).

By (10), Jensen’s inequality and Assumption 2.1.(3),

cK,K0,`(n)2 616n2E(hK(X1, .)`(Y1)1|`(Y1)|6m(n), K0(X2, .)`(Y2)1|`(Y2)|6m(n)i22) 616n2m(n)2E(hK(X1, .), K0(X2, .)`(Y2)i22)616mK,`n2m(n)2sK0,`.

(13)

So,

1

n2cK,K0,`(n)λ1/26 θ

3mnsK0,`+12mλ

θn mK,`m(n)2.

• The constant dK,K0,`(n). Consider dK,K0,`(n) := sup

(a,b)∈AE

 X

i<j

ai(Xi, Yi)bj(Xj, Yj)g1K,K0,`(n;Xi, Yi, Xj, Yj)

,

where

A:=

 (a, b) :

n−1

X

i=1

E(ai(Xi, Yi)2)61and

n

X

j=2

E(bj(Xj, Yj)2)61

 .

By (10), Jensen’s inequality, Cauchy-Schwarz’s inequality and Assumption 2.1.(3), dK,K0,`(n)6 4 sup

(a,b)∈AE

n−1

X

i=1 n

X

j=i+1

|ai(Xi, Yi)bj(Xj, YjK,K0,`(n;Xi, Yi, Xj, Yj)|

6 4nm(n)E(hK(X1, .), K0(X2, .)`(Y2)i22)1/264m1/2K,`nm(n)s1/2K0,`. So,

1

n2dK,K0,`(n)λ6 θ

3mnsK0,`+12mλ2

θn mK,`m(n)2. Then, sincem>1 andλ >0, with probability larger than1−5.4e−λ,

|UK,K1 0,`(n)|

n2 6 θ

nsK0,`+40m2

θn mK,`m(n)2(1 +λ)3. So, with probability larger than1−5.4|Kn|e−λ,

SK,`(n, θ) := sup

K,K0∈Kn

(|UK,K1 0,`(n)|

n2 −θ

nsK0,`

)

6 40m2

θn mK,`m(n)2(1 +λ)3. For everyt∈R+, consider

λK,`(n, θ, t) :=−1 +

t mK,`(n, θ)

1/3

withmK,`(n, θ) =40m2

θn mK,`m(n)2. Then, for anyT >0,

E(SK,`(n, θ))6 T+ Z

T

P(SK,`(n, θ)>(1 +λK,`(n, θ, t))3mK,`(n, θ))dt

6 2T+ 5.4c1|Kn|mK,`(n, θ) exp

− T1/3 2mK,`(n, θ)1/3

withc1= Z

0

e1−r1/3/2dr.

Moreover,

mK,`(n, θ)6c2

log(n)2

θn withc2=40·82m2 α2 mK,`. So, by taking

T = 23c2

log(n)5 θn , and since|Kn|6n,

E(SK,`(n, θ))624c2log(n)5

θn + 5.4c1mK,`(n, θ)|Kn|

n 6(24+ 5.4c1)c2log(n)5 θn .

Références

Documents relatifs

Raphaël Coudret, Gilles Durrieu, Jerome Saracco. A note about the critical bandwidth for a kernel density estimator with the uniform kernel.. Among available bandwidths for

In this paper, in order to obtain this inequality for model (1.1) we develop a new model selection method based on the truncated sequential procedures developed in [4] for the

In this paper, in order to obtain this inequality for model (1.1) we develop a new model selection method based on the truncated sequential procedures developed in Arkoun

This section presents some results of our method on color image segmentation. The final partition of the data is obtained by applying a last time the pseudo balloon mean

To obtain the minimax rate of convergence of an estimator of θ in a semiparametric setting is the first goal of this note. A lower bound is obtained in Section 2 and an upper bound

The Pneumocystis carinii gene encoding the enzyme dihydrofolate synthase (DHFS), which is involved in the essential biosynthesis of folates, was isolated from clones of the

Inspired by the component selection and smoothing operator (COSSO) (Lin and Zhang, 2006), which is based on ANOVA (ANalysis Of VAriance) decomposition, and the wavelet kernel

Let us underline that, because of the factor (1 + ad) which is present in (6) but not in (4), the rate function obtained in the MDP in the case the semi-recursive estimator is used