Supplement to "Nonlinear dimension reduction for regression with nearest neighbors"

(1)

HAL Id: hal-00785657

https://hal.archives-ouvertes.fr/hal-00785657v2

Preprint submitted on 20 Mar 2013

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Supplement to ”Nonlinear dimension reduction for regression with nearest neighbors”

Quentin Paris

To cite this version:

Quentin Paris. Supplement to ”Nonlinear dimension reduction for regression with nearest neighbors”.

2013. �hal-00785657v2�

(2)

Supplement to Nonlinear dimension reduction for regression with nearest neighbors

Quentin PARIS

IRMAR, ENS Cachan Bretagne, CNRS, UEB Campus de Ker Lann

Avenue Robert Schuman, 35170 Bruz, France quentin.paris@bretagne.ens-cachan.fr

1 Proof of Lemma 8.3

Lemma 1.1 Suppose assumption(A3)holds and d≥3. Then, for all h∈H_d, we have

EkX₍₁₎^h (X^h)−X^hk²≤ 16R² n^2/d

d 2−1

4/d

.

Proof– Fixh∈H_dand denoteµ_hthe distribution ofX^h. Since dimS(h)≤d, one may consider thatX^hisR^d-valued and thatµ_his of support inR^d. Hence, for all ε >0, we have

P

kX₍₁₎^h (X^h)−X^hk>ε

= E

h P

kX₍₁₎^h (X^h)−X^hk>ε X

i

= E

h P

kX₁^h−X^hk>ε X

ni

= E

h

1−µ_h B_d

X^h,εni

= Z

kukd≤R(1−µ_h(Bd(u,ε)))ⁿµ_h(du), since, under assumption (A3), the variable X^h is B_d(0,R)-valued. Now, for all ε >0, theε-covering numberN(ε)ofB_d(0,R)satisfies

N(ε)≤ 4R

ε d

(seee.g. Proposition 5 in Cucker and Smale, 2001). Thus, givenε >0, one may find a finite collection of Euclidian balls B₁, . . . ,B_N(ε) of radius at most ε in R^d

(3)

such that

B_d(0,R)⊂

N(ε)

[

i=1

Bi.

We can notice that for alli∈ {1, . . . ,N(ε)}, we haveu∈Bi⇒Bi⊂B_d(u,ε).Then P

= Z

kukd≤R(1−µ_h(Bd(u,ε)))ⁿµ_h(du)

≤

N(ε)

∑

i=1

Z

B_i(1−µ_h(Bi))ⁿµ_h(du)

=

N(ε)

∑

i=1

µ_h(Bi) (1−µ_h(Bi))ⁿ.

Then, since for allt∈[0,1]we havet(1−t)ⁿ≤ ¹_n, it follows that for allε>0

P

≤N(ε) n ≤ 1

n 4R

ε d

.

Now writeC_n:=^(4R)_n ^d. Using the fact thatd≥3, we have EkX₍₁₎^h (X^h)−X^hk² =

Z _+∞

0 P

kX₍₁₎^h (X^h)−X^hk>√ ε

dε

≤

Z _+∞

0

min

1, C_n ε^d/2

dε

≤ inf

δ>0

Z _δ

0

dε+C_n Z _+∞

δ

dε ε^d/2

= inf

δ>0

δ+Cn

d 2−1

δ¹⁻^d/2

= C_n^2/d d

2−1 4/d

= 16R² n^2/d

d 2−1

4/d

, which concludes the proof.

(4)

Proof of Lemma 8.3 – Fix k∈ {1, . . . ,n}, ρ >0 and h∈H_d(ρ) such thatkh− h^∗k^∞≤ρ. To ease notations we set

ˆ

r_h:=rˆ_h[k] and W_i(h, .):=W_i[k](h, .), i=n+1, . . . ,2n.

First we have Eh

(r(X)−rˆ_h(h(X)))²i

≤ 2Eh

(r(X)−r_h(h(X)))²i + 2E

h

(rh(h(X))−rˆ_h(h(X)))²i .

Sincer=r_h∗◦h^∗andr_h∗ ∈G, we have under assumptions(A4)and(A5) E

h

(r(X)−r_h(h(X)))²i

≤ 2E h

(rh^∗(h^∗(X))−r_h∗(h(X)))²i + 2Eh

(rh^∗(h(X))−r_h(h(X)))²i

≤ 2(L²+K²)kh−h^∗k²∞

≤ 2(L²+K²)ρ². Therefore, we deduce that

E

h(r(X)−rˆ_h(h(X)))²i

≤4(L²+K²)ρ²+2E

h(r_h(h(X))−rˆ_h(h(X)))²i .

Next, denotingS ={X_n+1, . . . ,X_2n}, we have Eh

(rh(h(X))−rˆ_h(h(X)))² Xi

= E

r_h(h(X))−Eh ˆ

r_h(h(X))

X,Si2 X

+ E

ˆ

r_h(h(X))−Eh ˆ

r_h(h(X))

X,Si2 X

=: E1+E2.

Here, we have used the fact that E

h

r_h(h(X))−E h

ˆ

r_h(h(X))

X,Si ˆ

r_h(h(X))−E h

ˆ

r_h(h(X))

X,Si X

i

=Eh Eh

r_h(h(X))−Eh ˆ

r_h(h(X))

X,Si ˆ

r_h(h(X))−Eh ˆ

r_h(h(X))

X,Si

X,Si Xi

=Eh

r_h(h(X))−Eh ˆ

r_h(h(X))

X,Si Eh

ˆ

r_h(h(X))−Eh ˆ

r_h(h(X))

X,Si

X,Si Xi

=0.

(5)

First, we bound the termE₁. We have

Eh ˆ

r_h(h(X)) X,Si

= E

"

2n

∑

i=n+1

i=n+1

W_i(h,h(X)) (rh^∗(h(Xi))−r_h∗(h^∗(Xi)))

!2

X





=: J₁+J₂+J₃.

Sincer=r_h∗◦h^∗andr_h∗ ∈G, assumptions(A4)and(A5)lead to J₁≤3K²ρ² and J₃≤3L²ρ².

The fact thatr_h∗ isL-Lipschitz leads to

J₂≤3L²E



 1 k

k

∑

i=1

kX^h−X_(i)^h (X^h)k

!2

X



.

(6)

Now let ˜n=⌊n/k⌋. We split the sample

X_n+1^h , . . . ,X_n+k^h _n_˜ intoksubsamples of size ˜n:

Zj=n

X_n+(^h _j₋₁₎_n+1_˜ , . . . ,X_n+^h _j˜_n o

, j=1, . . . ,k, and denote byZ⁽¹⁾_j the closest element ofZ_jfromX^h. Then

k

∑

j=1kX₍^h_j)(X^h)−X^hk ≤

k

∑

j=1kZ⁽¹⁾_j −X^hk. Therefore, by Jensen’s inequality and lemma 1.1, we have

E[E1] ≤ 3L² k

k

∑

j=1

EkZ⁽¹⁾_j −X^hk²+3(L²+K²)ρ²

≤ 48L²R² n˜^2/d

d−2 2

4/d

+3(L²+K²)ρ²

≤ 48L²R²4^1/d

d−2 2

4/d k n

2/d

+3(L²+K²)ρ²,

where the last inequality holds provided ^k_n⌊n/k⌋ ≥ ¹2.

Now we turn to bounding the termE₂. According to (1.1), we have

E₂ = E





2n

∑

i=n+1

W_i(h,h(X))(Yi−r(Xi))

!2

X





= E

"

2n

∑

i=n+1

W_i(h,h(X))²(Yi−r(Xi))² X

# ,

where we have used the fact that ifi6= j∈ {n+1, . . . ,2n} Eh

W_i(h,h(X))(Yi−r(Xi))Wj(h,h(X))(Yj−r(Xj)) Xi

=0.

Using the properties∑²ⁿ_i=n+1W_i(h,h(X)) =1, W_i(h,h(X))≤ ¹_k and |Y_i−r(Xi)| ≤ B+L, for alli∈ {n+1, . . . ,2n}, we conclude that

E[E2]≤ (B+L)² k .

(7)

To complete the proof, write Eh

(r(X)−rˆ_h(h(X)))²i

≤ 2E[E1] +2E[E2] +4(L²+K²)ρ²

≤ 2(B+L)²

k +96L²R²4^1/d

d−2 2

4/d k n

2/d

+10(L²+K²)ρ²

≤ C (1

k+ k

n

2/d)

+Cρ²,

whereC:=maxn

2(B+L)²; 10(L²+K²); 96L²R²4^1/d ^d⁻₂²4/do .

References

Cucker, F. and Smale, S. (2001). On the mathematical foundations of learning theory. Bulletin of the American Mathematical Society. Vol. 39, pp. 1-49.