• Aucun résultat trouvé

Online Appendix to the Article “Choice of V for V -Fold Cross-Validation in Least-Squares Density Estimation”

N/A
N/A
Protected

Academic year: 2022

Partager "Online Appendix to the Article “Choice of V for V -Fold Cross-Validation in Least-Squares Density Estimation”"

Copied!
50
0
0

Texte intégral

(1)

Online Appendix to the Article “Choice of V for V -Fold Cross-Validation in Least-Squares Density Estimation”

Sylvain Arlot sylvain.arlot@math.u-psud.fr

Laboratoire de Math´ematiques d’Orsay

Univ. Paris-Sud, CNRS, Universit´e Paris-Saclay 91405 Orsay, France

Matthieu Lerasle mlerasle@unice.fr

CNRS

Univ. Nice Sophia Antipolis LJAD CNRS UMR 7351 06100 Nice France

Editor:Xiaotong Shen

This appendix is organized as follows. The first section (called Section B, for consistency with the numbering of the article) gives complementary computations of variances. Then, results concerning hold-out penalization are detailed in Section D, with the proof of the oracle inequality stated in Section 8.2 (Theorem 12) and an exact computation of the variance. Section E provides complements on the computational aspects stated in Section 7.

In particular, we state and analyse the basic algorithm for computing the V-fold criteria and we give the proof of Proposition 8. A useful concentration inequality is recalled in Section F. Finally, some simulation results are detailed in Section G, as a supplement to the ones of Section 6.

Appendix B. Additional Variance Computations

Proposition 17 Let(ψλ)λ∈Λm1 and(ψλ)λ∈Λm2 be two finite orthonormal families of vectors of L4(µ). Assume that B satisfies (Reg) and, for any m∈ {m1, m2}, let

Cid(m) =Pnγ(bsm) +E

penid(m) . Then, with the notation of Theorem 6,

Var Cid(m1)

= 2(n−1)

n3 β(m1, m1) + 2 nVar

1− 1

n

sm1(ξ) + 1

2nΨm1(ξ)

.

We also have

Var Cid(m1)− Cid(m2)

= 2(n−1)

n3 B(m1, m2) + 2

nVar

1− 1 n

sm1(ξ)−sm2(ξ) + 1

2n Ψm1(ξ)−Ψm2(ξ)

.

(2)

Proof Simply notice that

Var Cid(m1)

= Var Pnγ(bsm1) .

Therefore, from (57), the variance ofCid(m1) is the one of

− 1 n2

X

16i,j6n

Um1i, ξj)−

n

X

i=1

2sm1i)

n .

so that, by Lemma 16, Var Cid(m1)

= 2(n−1)

n3 β(m1, m1) + 1

n3 Var Ψm1(ξ)−2sm1(ξ) + 4

n2

n

X

i=1

Cov Ψm1(ξ)−2sm1(ξ), sm1(ξ) + 4

nVar sm1(ξ)

= 2(n−1)

n3 β(m1, m1) + 2 nVar

1− 1

n

sm1(ξ) + 1

m1(ξ)

. The variance of the increments follows from the same computations.

B.1 Evaluation of the Terms in the Variance Formula

The following proposition gives a formula for the terms appearing in Theorem 6 and Propo- sition 17 which does not depend on the basis (ψλ)λ∈Λm.

Proposition 18 For any m1, m2∈ Mn, we have β(m1, m2) =nCov sbm1(ξ),bsm2(ξ)

−(n+ 1) Cov sm1(ξ), sm2(ξ) B(m1, m2) =nVar (bsm1−bsm2)(ξ)

−(n+ 1) Var (sm1−sm2)(ξ)

, (59)

where ξ denotes a copy of ξ1, independent of ξ

JnK. Proof By definition, we have

β(m1, m2) = X

λ∈Λm1

X

λ0∈Λm2

Cov(ψλ1), ψλ01))2

= X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)−P ψλP ψλ02

= X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)2

−2 X

λ∈Λm1

X

λ0∈Λm2

P ψλP ψλ0P(ψλψλ0)

+ X

λ∈Λm1

X

λ0∈Λm2

(P ψλP ψλ0)2

= X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)2

−2P(sm1sm2) +ksm1k2ksm2k2.

(3)

Now, by Eq. (31), we have Cov sbm1(ξ),bsm2(ξ)

= 1 n2

X

16i,j6n

X

λ∈Λm1

X

λ0∈Λm2

Cov ψλiλ(ξ), ψλ0jλ0(ξ)

= 1 n

X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)2

−(P ψλP ψλ0)2 + n−1

n

X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)−P ψλP ψλ0

P ψλP ψλ0

= 1 n

X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)2

− 1

nksm1k2ksm2k2+n−1

n Cov sm1(ξ), sm2(ξ) .

It follows that X

λ∈Λm1

X

λ0∈Λm2

P(ψλψλ0)2

=nCov sbm1(ξ),bsm2(ξ)

+ksm1k2ksm2k2

−(n−1) Cov sm1(ξ), sm2(ξ) . Thus,

β(m1, m2) =nCov bsm1(ξ),bsm2(ξ)

−(n+ 1) Cov sm1(ξ), sm2(ξ) . Eq. (59) follows.

B.2 Evaluation of the Variance in the Regular Histogram Case

The following lemma gives the value of the terms appearing in Theorem 6 for two nested regular histogram models.

Lemma 19 Let m1 = Λm1 and m2 = Λm2 be two regular partitions of R, as defined by Example 1 in Section 3.2, so that for i∈ {1,2}, for any λ∈mi, µ(λ) =d−1mi. We assume thatm2 is a subpartition ofm1, that is, any element of m2 is a subset of an element of m1. For any m? ∈ {m1, m2}, we define

Tm?(x) = X

λ∈m?

λ(x)−P ψλ)2 = sup

t∈Bm?

t(x)−P t2

where we recall thatBm? ={t∈Sm?/ktk61}and for anyλ∈m1∪m2λ = (µ(λ))−1/21λ. Then, we have

β(m1, m2) =dm1ksm2k2−2P(sm1sm2) +ksm1k2ksm2k2=P(Tm1sm2) (60) and B(m1, m2) =P Tm1(sm1 −sm2) + (Tm2 −Tm1)sm2

= (dm2−dm1)ksm2k2+dm1ksm1 −sm2k2

−2 VarP(sm1 −sm2)− ksm1 −sm2k4 .

(4)

Proof On the one hand, by definition, β(m1, m2)

= X

λ∈m1

X

λ0∈m2

E

h

ψλ1)−P ψλ

ψλ01)−P ψλ0i2

= X

λ∈m1

X

λ0∈m2

P(ψλψλ0)2

−2P(ψλψλ0)P ψλP ψλ0+ (P ψλ)2(P ψλ0)2

= X

λ∈m1

X

λ0∈m2

P(ψλψλ0)2

−2P

X

λ∈m1

(P ψλλ

| {z }

=sm1

X

λ∈m2

(P ψλλ

| {z }

=sm2

+ X

λ∈m1

(P ψλ)2

| {z }

=ksm1k2

X

λ∈m2

(P ψλ)2

| {z }

=ksm2k2

.

For computing the first term, we use thatψλψλ0 = 0 ifλ∩λ0 =∅ and m2 is a subpartition of m1, so that

X

λ∈m1

X

λ0∈m2

P(ψλψλ0)2

= X

λ∈m1

X

λ0∈m2 λ0⊂λ

P(ψλψλ0)2

= X

λ∈m1

1 µ(λ)

X

λ0∈m2

λ0⊂λ

(P ψλ0)2 =dm1 X

λ0∈m2

(P ψλ0)2 =dm1ksm2k2

hence

β(m1, m2) =dm1ksm2k2−2P(sm1sm2) +ksm1k2ksm2k2 . On the other hand, by definition of Tm,

P(Tm1sm2) = X

λ∈m1

X

λ0∈m2

P (ψλ−P ψλ)2ψλ0P(ψλ0)

= X

λ∈m1

X

λ0∈m2

P(ψ2λψλ0)(P ψλ0)−2P(ψλψλ0)(P ψλ)(P ψλ0) + (P ψλ)2(P ψλ0)2

=P X

λ∈m1

ψ2λ

| {z }

=dm1

X

λ0∈m2

(P ψλ0λ0

| {z }

=sm2

!

−2P(sm1sm2) +ksm1k2ksm2k2

which proves Eq. (60) since P(sm2) =ksm2k2.

Now, we remark that Eq. (60) also gives formulas for β(mi, mi), i∈ {1,2}, since mi is a subpartition of itself. So, the second formula forβ(mi, mj) in Eq. (60) yields

B(m1, m2) =P Tm1sm1+Tm2sm2 −2Tm1sm2

(5)

=P Tm1(sm1 −sm2) + (Tm2 −Tm1)sm2 . Similarly, the first formula for β(mi, mj) in Eq. (60) gives

B(m1, m2)

=dm1 ksm1k2− ksm2k2

+ (dm2 −dm1)ksm2k2−2P (sm1 −sm2)2

+ ksm1k2− ksm2k22

= (dm2 −dm1)ksm2k2+dm1ksm1 −sm2k2−2 VarP(sm1 −sm2)− ksm1−sm2k4 , where we used thatP(sm) =ksmk2 and ksm1−sm2k2=ksm1k2− ksm2k2.

Appendix C. Results on MCCV and Some Other Cross-Validation Criteria

We prove here the results stated in Section 8.1. Note that we here prove slightly more general results (Theorems 23 and 24), from which Theorems 9 and 10 are corollaries. In particular, we do not always restrict to MCCV criteria: we always assume (SameSize) and (Ind) hold true, but we sometimes do not need to have (MCCV) satisfied.

C.1 Preliminary Computations

Our proofs rely on a simple closed-form formula for cross-validation criteria. Let us start by the hold-out criterion. Let T ⊂JnKwith|T|=n−p, independent fromDn. Then,

critHO(m, T) =Pn(Tc)γ sb(Tm)

= bs(Tm)

2−2Pn(Tc) bs(Tm)

=

bs(Tm)−sm

2+ksmk2+ 2 D

bs(Tm)−sm, sm

E

−2

Pn(Tc)−P

sb(Tm)−sm

−2P

bs(Tm)−sm

−2Pn(Tc)(sm)

=

bs(Tm)−sm

2−2

Pn(Tc)−P

bs(Tm)−sm

−2Pn(Tc)(sm) +ksmk2 (61) where the last equality uses that

P

bs(Tm)−sm

= D

bs(Tm)−sm, s E

= D

bs(Tm)−sm, sm

E

sincesm is the orthogonal projection in L2(µ) ofsm ontoSm and bs(Tm)−sm∈Sm. The last two terms in the right-hand side of Eq. (61) can be rewritten as

−2Pn(Tc)(sm) +ksmk2 =−2

Pn(Tc)−P

(sm)−2P(sm) +ksmk2

=−2

Pn(Tc)−P

(sm)− ksmk2 sinceksmk2 =P(sm). For the first two terms, we write that

ksb(Tm)−smk2−2

Pn(Tc)−P

sb(Tm)−sm

(6)

= X

λ∈Λm

h

(Pn(T)−P)(ψλ)2

−2

Pn(Tc)−P

λ)

Pn(T)−P

λ) i

= X

λ∈Λm

1 (n−p)2

X

16i,j6n

1i∈T, j∈T ψλi)−P ψλ

ψλj)−P ψλ

− 2 p(n−p)

X

16i,j6n

1i∈Tc, j∈T ψλi)−P ψλ

ψλj)−P ψλ

= X

16i,j6n

1j∈T

n−p 1i∈T

n−p−21i∈Tc

p

Umi, ξj)

where we recall that for any x, y∈ X, Um(x, y) = X

λ∈Λm

ψλ(x)−P ψλ

ψλ(y)−P ψλ

= X

λ∈Λm

ψλ(x)ψλ(y)−sm(x)−sm(y) +ksmk2 is defined by Eq. (56), and that Um(x, x) = Ψm(x)−2sm(x) +ksmk2.

Therefore, Eq. (61) can be rewritten as critHO(m, T) = X

16i,j6n

1j∈T n−p

1i∈T

n−p−21i∈Tc

p

Umi, ξj)

−2 Pn(Tc)−P

(sm)− ksmk2

= X

16i,j6n

ωHOi,j (T)Umi, ξj) +

n

X

i=1

σHOi (T) smi)−P(sm)

− ksmk2 (62) with

ωHOi,j (T) = 1j∈T

n−p 1i∈T

n−p −21i∈Tc

p

σHOi (T) = −2

p 1i∈Tc . As a consequence, under assumption (SameSize),

critCV m,(Tj)16j6K

= X

16i,j6n

ωi,jUmi, ξj) +

n

X

i=1

σi smi)−P(sm)

− ksmk2 (63) with

ωi,j = 1 B

B

X

K=1

1j∈TK

n−p

1i∈TK

n−p −21i∈TKc

p

σi= −2 pB

B

X

K=1

1i∈TKc .

Note that Eq. (63) is consistent with previously obtained formulas. For V-fold cross- validation, under assumption (Reg), Eq. (63) holds with

ωi,jVFi,j := 1 n2

V

V−1 ifiand j belong to the same block

V V−1

2

otherwise

(7)

σiiVF:= −2 n ,

which can also be obtained from the combination of Eq. (8) in Lemma 1 and Eq. (58). For the leave-p-out, Eq. (63) holds with

ωi,ji,jLPO:=

( 1

n(n−p) ifi6=j

−(n−p+1)

n(n−1)(n−p) otherwise σiiLPO := −2

n ,

which can also be obtained from Eq. (10) in Lemma 1 and Eq. (58).

Using that Um(x, x) = Ψm(x)−2sm(x) +ksmk2, Eq. (63) can be rewritten as critC m,(Tj)16j6K

=

n

X

i=1

ωi,i

!

Dm− ksmk2

− ksmk2+

n

X

i=1

ωi,i Ψmi)− Dm +

n

X

i=1

(−2ωi,ii) smi)−P sm

+ X

16i6=j6n

ωi,jUmi, ξj) . Using (SameSize) we have

n

X

i=1

ωi,i= 1 B(n−p)2

B

X

K=1 n

X

i=1

1i∈TK = 1 n−p ,

and we get

critC m,(Tj)16j6K

= Dm− ksmk2

n−p − ksmk2+

n

X

i=1

ωi,i Ψmi)− Dm +

n

X

i=1

(−2ωi,ii) smi)−P sm

+ X

16i6=j6n

ωi,jUmi, ξj) .

(64)

C.2 Concentration Inequalities

In the proof of Theorem 9 in Section C.3, given formula (64) for the cross-validation crite- rion, we need concentration inequalities for the three random sums appearing in Eq. (64).

These are stated and proved in three lemmas below.

Concentration of Pn

i=1ωi,imi)− Dm).

Lemma 20 Assume that (SameSize), (Ind) and (H1) hold true. Then, for any x >0, an event of probability at least 1−2e−x exists on which the following holds true: for any ∈(0,1],

n

X

i=1

ωi,i Ψmi)− Dm

6 Dm

n−p +5x(n+A) 3(n−p)2 .

(8)

Proof By (Ind), conditionally to (ωi,i)16i6n,Pn

i=1ωi,i Ψmi)− Dm

is a sum of indepen- dent real-valued random variables. So, we can apply Bernstein’s inequality.

First, for any i∈JnK, using (SameSize), ωi,i = 1

B

B

X

K=1

1i∈TK

(n−p)2 6 1 (n−p)2 and using Eq. (49),

mk6kUmk62

b2m+ksmk2 ,

so that

ωi,iΨmi)6 max

16i6nωi,i× kΨmk6 2

b2m+ksmk2 (n−p)2 almost surely.

Second, using (SameSize), we have

n

X

i=1

ωi,i2 6 max

16i6nωi,i×

n

X

i=1

ωi,i 6 1 (n−p)3 and using Eq. (49) again,

E

Ψmi)2

6kΨmk×P(Ψm) =kΨmk× Dm62

b2m+ksmk2 Dm , so that

n

X

i=1

ω2i,iE

Ψmi)2 6

2

b2m+ksmk2 Dm (n−p)3 .

Then, by Bernstein’s inequality (Boucheron et al., 2013, Theorem 2.10), conditionally to (ωi,i)16i6n, an event of probability at least 1−2e−x exists on which

n

X

i=1

ωi,i Ψmi)− Dm

62 v u u t x

b2m+ksmk2 Dm (n−p)3 +

2

b2m+ksmk2 (n−p)2

x 3 6 Dm

n−p + 2

3 +1

x

b2m+ksmk2 (n−p)2 6 Dm

n−p + 5 3

x(n+A) (n−p)2

for any∈(0,1], where we used thatb2m 6nby (H1), and thatksmk2 6ksk2 6ksk6A.

The result follows by integrating this conditional concentration inequality with respect to (ωi,i)16i6n.

(9)

Concentration of Pn

i=1(−2ωi,ii)(smi)−P sm).

Lemma 21 Assume that (SameSize)and (Ind) hold true. Then, for anyx >0, an event of probability at least 1−e−x exists on which the following holds true: for any∈(0,1],

n

X

i=1

(−2ωi,ii) smi)−P sm−sm0i)−P sm0 6ksm−sm0k2+R21n(x, , π, A)

(65)

where the remainder term depends on the additional assumption that we make. If (H2) holds true, then

R21n (x, , π, A) := 16Ax 3

1

(n−p)2 + π? p

.

If (H1) and (H20) hold true, then, some numerical constant κ >0 exists such that R21n (x, , π, A) := κ

"

Ax 1

(n−p)2? p

+x2n

1

(n−p)2? p

2# .

Before proving Lemma 21, let us introduce some useful notation: given a sequenceT1, . . . , TB of subsets of JnK, for everyi, j∈JnK, we define

πi= 1 B

B

X

K=1

1i∈TKc πi,j = 1 B

B

X

K=1

1i∈TKc1j∈TKc and π= max

i=1,...,nπi .

Note that, assuming (SameSize), we have

06πi,j 6min(πi, πj)6π61

n

X

i=1

πi =p

n

X

i=1

πi,j=pπj 6pπ and X

16i,j6n

πi,j =p2 .

(66)

Proof of Lemma 21 By (Ind), conditionally to (−2ωi,ii)16i6n,

n

X

i=1

ωi,i Ψmi)− Dm

is a sum of independent real-valued random variables. So, we can apply Bernstein’s in- equality.

First, we notice that for every i∈JnK,

−2ωi,ii= 1 B

B

X

K=1

−2

(n−p)21i∈TK− 2 p1i /∈TK

=−2 1

(n−p)2(1−πi) + πi

p

hence

|−2ωi,ii|= 2 1

(n−p)2(1−πi) +πi

p

62 1

(n−p)2 p

(10)

since 06πi 61. So, for every i∈JnK, (−2ωi,ii) smi)−sm0i)

62 1

(n−p)2 p

ksm−sm0k almost surely. Second,

n

X

i=1

(−2ωi,ii)262 1

(n−p)2 p

n

X

i=1

|−2ωi,ii|

= 2 1

(n−p)2 p

2

1 n−p + 1

68 1

(n−p)2 p

and

E h

sm(ξ)−sm0(ξ)2i

6kskksm−sm0k2 6Aksm−sm0k2 so that

n

X

i=1

E

(−2ωi,ii) sm(ξ)−sm0(ξ)2 68A

1

(n−p)2 + π p

ksm−sm0k2 .

Then, by Bernstein’s inequality (Boucheron et al., 2013, Theorem 2.10), conditionally to (−2ωi,ii)16i6n, an event of probability at least 1−e−x exists on which

n

X

i=1

(−2ωi,ii) smi)−P sm−sm0i)−P sm0

6R0(m, m0)

R0(m, m0) :=

s 16xA

1

(n−p)2 p

ksm−sm0k2+2xksm−sm0k 3

1

(n−p)2 p

. Since 1−2e−x is deterministic, the same inequality holds unconditionally on an event of probability at least 1−2e−x.

We now upperbound R0(m, m0), differently depending on the assumption we make. On the one hand, if (H2) holds true,

ksm−sm0k6ksmk+ksm0k62A and we get

R0(m, m0)6 s

16Ax 1

(n−p)2 p

ksm−sm0k2+4Ax 3

1

(n−p)2 p

6ksm−sm0k2+16Ax 3

1

(n−p)2? p

for any∈(0,1], which proves Eq. (65). On the other hand, if (H1) and (H20) hold true, sm−sm0 ∈Sm00 withm00∈ {m, m0}, so that

ksm−sm0k6bm00ksm−sm0k6√

nksm−sm0k

(11)

and we get R0(m, m0)6

s 16xA

1

(n−p)2 p

ksm−sm0k2+2x√

nksm−sm0k 3

1

(n−p)2 p

6ksm−sm0k2+ 1

"

8Ax 1

(n−p)2? p

+2

9x2n 1

(n−p)2? p

2#

for any ∈(0,1], which proves Eq. (65) with κ= 8.

Concentration of P

16i6=j6nωi,jUmi, ξj).

Lemma 22 Suppose that assumptions (SameSize), (Ind) and (H1) hold true. Then, an absolute constantκ >0exists such that, for anyx >1, with probability larger than1−6e−x, for any∈(0,1],

X

16i6=j6n

ωi,jUmi, ξj)

6 Dm

n−p+ κn (n−p)2

1 +nπ p

nAx (n−p)+

1 +A

n

x2

. (67) Proof We start with the following symmetrization trick

X

16i6=j6n

ωi,jUmi, ξj) = X

16i<j6n

ωi,jUmi, ξj) +ωj,iUmj, ξi)

= X

16i<j6n

i,jj,i)Umi, ξj)

= X

16i6=j6n

ωi,j0 Umi, ξj) , where

ω0i,j = ωi,jj,i

2 = 1

(n−p)2

1−(πij)n p +

2n p −1

πi,j

= 1

(n−p)2

(1−πi,j) + n

p(πi,j−πi) +n

p(πi,j−πj)

.

From the last formula forωi,j0 , using Eq. (66), we get that (ωi,j0 )2 6 1

(n−p)4

1 +n2

p2ij)2

(68)

and max

i,j∈JnK

i,j0 |6 1 (n−p)2

1 +2n

p π

. (69)

The concentration of theU-statistics follows from Houdr´e and Reynaud-Bouret (2003, The- orem 3.4), that is Eq. (44) withgi,ji, ξj) =ω0i,jUmi, ξj). To apply this result, it remains to compute the termsA,B,C,D. First,

2A2 = X

16i6=j6n

i,j0 )2E

Umi, ξj)2

6kskDm X

16i6=j6n

0i,j)2

(12)

by Eq (45). Algebraic computations and Eq. (68) and (66) show that X

16i6=j6n

0i,j)26 1 (n−p)4

X

16i6=j6n

1 +n2

p2ij)2

6 1

(n−p)4 X

16i,j6n

1 +n2

p2πiπj + 2πiπj)

= n2 (n−p)4

3 +2πn p

Hence,

A6 n

(n−p)2 s

3

2 +πn p

kskDm . Second, let ai and bj be functions such that Pn

i=1E ai(ξ)2

6 1 and Pn i=1E

bi(ξ)2 61.

Eq (47) shows that E

ai(ξ)bj0)Um(ξ, ξ0)

6 ksk

2

E ai(ξ)2

+E

bj(ξ)2 ,

hence, using Eq. (69),

B = X

16i6=j6n

ωi,j0 E

ai(ξ)bj0)Um(ξ, ξ0) 6 max

16i6=j6n

ω0i,j

ksk

2

X

16i6=j6n

E

ai(ξ)2 +E

bj(ξ)2 6 nksk

(n−p)2

1 +2n p π

.

Third, Eq (48) shows that, for any x >0, E

Um(ξ, x)2 62

b2m+ksmk2 ksk and by Eq. (68) we have

n

X

i=2

i,10 )26 1 (n−p)4

n

X

i=2

1 + (πi1)2n2 p2

6 n

(n−p)4

1 + 2πn p

2

.

So, for anyx >0,

n

X

i=2

0i,1)2E

Um(ξ, x)2 62

b2m+ksmk2

ksk× 1 (n−p)4

1 + 2πn p

2

hence

C6

1 + 2πn p

n (n−p)2

s

2 b2m+ksmk2 ksk

n .

(13)

Fourth, using Eq (49) and (69), D6 max

i,j∈JnK

ωi,j0

sup

x,y

Um(x, y) 6

1 +2n

p π n

(n−p)2

2 b2m+ksmk2

n .

Now, we remark that b2m 6 n by (H1), and ksmk2 6 ksk2 6 ksk 6 A, and we can plug this two inequalities in the upper bounds above. By (Ind), we can apply Houdr´e and Reynaud-Bouret (2003, Theorem 3.4), conditionally on the weights ωi,j. We obtain that an absolute constant κ > 0 exists such that, for any x > 1, with probability larger than 1−6e−x, for any∈(0,1], Eq. (67) holds true.

C.3 Oracle Inequality (Proof of Theorem 9)

Theorem 9 actually is a corollary of the following general result.

Theorem 23 Let ξ

JnK be i.i.d. real-valued random variables with common density s with respect toµ, such thats∈L(µ), (TK)16K6B be some sequence of subsets of JnKsatisfying (SameSize) and (Ind), and (Sm)m∈Mn be a collection of separable linear spaces satisfying (H1). Assume that either (H2) or (H20) holds true. For every m ∈ Mn, let bsm be the estimator defined by Eq. (1), and es=sbmb where

mb ∈argmin

m∈Mn

n

critCV m,(TK)16K6Bo and critCV is defined by Eq. (25). Define π = maxi=1,...,nB1 PB

K=11i∈TKc and for any x, , κ >0,

ρ4(, x, κ, n, τn, π, A) := κ nτn2

1 + π 1−τn

α Ax

τn+(A∨1)x2 3

withα= 1 under assumption (H2) andα= 2 under assumption (H20). Then, an absolute constant κ >0 exists such that, for anyx>0, with probability at least1−12|Mn|2e−x, for any∈(0, κ−1),

1−

τn

kes−sk26 1 + τn inf

m∈Mn

nkbsm−sk2o

4(, x, κ, n, τn, π, A) .

The oracle inequality of Theorem 23 is similar to the one of Theorem 5, withδ replaced by 1/τn−1 (both quantities correspond to the bias of the criterion as an estimator of the risk) and a slightly different remainder term. In addition to the remarks already made about Theorem 5, we can make the following comments.

• The remainder termρ4 is of orderx2/n, as in Theorem 5 under the following sufficient conditions: (i)τn stays away from 0, (ii)π/(1−τn) is bounded.

• ForV-fold criteria, τn= (V −1)/V >1/2 and π/(1−τn) = 1, so conditions (i) and (ii) are satisfied and we recover an oracle inequality forV-fold cross-validation similar to Theorem 5.

(14)

• The leading constant in front of the oracle inequality of Theorem 23 is of order 1/τn, so we can get asymptotic optimality only ifτn→1, that is,pn. This is consistent with the fact that the bias of the cross-validation criterion is negligible at first order if and only if τn→1.

• For hold-out criteria,π= 1 so the remainder term is of orderx2/(n(1−τn)α)>x2/p which is large whenτn is close to 1, that is, whenpis small. Hence, for such criteria, we cannot get a leading constant close to 1 and a “small” remainder term.

Let us now explain why Theorem 9 is also a corollary of Theorem 23.

Proof of Theorem 9 We only have to prove some upper bound onπ under assumption (MCCV), thanks to which Theorem 9 is a straightforward corollary of Theorem 23.

By (SameSize) and (MCCV), for any i ∈ JnK, πi is the empirical mean of K inde- pendent Bernoulli random variables with common parameter P(i∈TKc) =p/n. Then, by Bernstein’s inequality (Boucheron et al., 2013, Theorem 2.10)

∀y >0,∀i∈JnK, P πi− p n >

r2p(n−p)y n2B + x

3B

!

6e−y . A union bound over i∈JnKyields that for anyx >0,

P

π 61∧ 2p

n +logn+x B

>1−e−x , where we used also thatπ 61 almost surely. Theorem 9 follows.

We finally prove Theorem 23.

Proof of Theorem 23 Throughout the proof,Ldenotes some positive numerical constant, whose value may change from line to line. Given Eq. (64), the proof relies on concentration inequalities that are detailed in Section C.2. Let us fix x > 0 and define for every κ > 1 the event Ωgood(κ, x) where all the following inequalities hold for anym, m0 ∈ Mn and any ∈(0,1]

n

X

i=1

ωi,i Ψmi)− Dm

6 Dm

n−p +κ(n+A)x (n−p)2

n

X

i=1

(−2ωi,ii) smi)−P sm−sm0i)−P sm0

6ksm−sm0k2+Rn21(x, , π, A)

X

16i6=j6n

ωi,jUmi, ξj)

6 Dm

n−p +κ n (n−p)2

1 +πn p

nAx

(n−p)+ (n+A)x2 n

kbsm−smk2−Dm n

6Dm

n +κAx2 3n .

It follows from Lemmas 14, 20, 21 and 22 that an absolute constant κ > 0 exists such that P(Ωgood(κ, x)) > 1− |Mn|2e−x −10|Mn|e−x. Let us remark that we can assume

(15)

x >log(11)>1 in the following, since otherwise the above probability bound is negative.

On Ωgood(κ, x), for everym∈ Mn and ∈(0,1), Dm

n 6 1

1−kbsm−smk2+ LAx2

3(1−)n . (70)

By definition ofm, for everyb m∈ Mn, kbs

mb −sk2 6kbsm−sk2+

critCV(m)− ksbm−sk2

critCV(m)b − kbs

mb −sk2

. (71) In addition, by Eq. (64),

critCV(m)− kbsm−sk2 = Dm− ksmk2

n−p − ksmk2+ksm−sk2

| {z }

=ksk2

−Dm n

+

n

X

i=1

ωi,i Ψmi)− Dm +

n

X

i=1

(−2ωi,ii) smi)−P sm

+ X

16i6=j6n

ωi,jUmi, ξj)−

kbsm−sk2−Dm n

.

So, on Ωgood(κ, x), for every m, m0 ∈ Mn and∈(0,1/5), critCV(m)− kbsm−sk2

critCV(m0)− kbsm0−sk2 6Dm

1 + 2

n−p −1− n

+

+Dm0

1 +

n −1−2 n−p

+

+ksm−sm0k2 +R21n (x, , π, A) + Ln

(n−p)2

1 +πn p

nAx

(n−p)+ (A∨1)x2 3

+ksm0k2 n−p 6 n

1−

1 + 2

n−p −1− n

+

kbsm−smk2+ n 1−

1 +

n −1−2 n−p

+

kbsm0 −sm0k2

+ 2ksm−sk2+ 2ksm0 −sk2 +R21n (x, , π, A) + Ln

(n−p)2

1 +πn p

nAx

(n−p)+ (A∨1)x2 3

6max 1

1− 1

τn

−1 ++2 τn

+

,2

kbsm−sk2

+ max 1

1−

1− 1 τn

++2 τn

+

,2

kbsm0−sk2

+R21n (x, , π, A) + L nτn2

1 + π 1−τn

Ax

τn+(A∨1)x2 3

6 1

τn −1 +L τn

kbsm−sk2+ 4

τnkbsm0−sk2 +R21n (x, , π, A) + L

n2

1 + π 1−τn

Ax

τn+(A∨1)x2 3

Références

Documents relatifs

In short, it is proved in least-squares regression that at first order, V -fold cross-validation is suboptimal for model selection (with an estimation goal) if V stays bounded,

large signal-to-noise ratio ⇒ possible to stay unbiased with a small V (for computational reasons). flexibility improves V -fold cross-validation (according to both theoretical

Now it is possible to change every Turing machine description in the list into one that computes a semiprobability mass function that is computable from below, as described in the

The corresponding cross- sections (pb), generator names, generator level filter efficiencies and total numbers of events are shown in this Table.. NpX (X=0..5) in the process

In addition to the positive U case, we have aise calculated the density of states and the gap between the partiale and the hole spectrum of the density of states for

Keywords and phrases: non-parametric statistics, statistical learning, resampling, non-asymptotic, V -fold cross- validation, model selection, penalization, non-parametric

In general, for projection estimators in L 2 density estimation, the cross- validation risk estimator is not a (conditional) empirical process but a (weighted) U-statistic of order

In this exercise sheet, we work out some of the details in the proof of the Poincar´ e Duality Theorem..