On the Rate of Convergence of the Functional $k$-NN Estimates

(1)

HAL Id: inria-00364555

https://hal.inria.fr/inria-00364555

Submitted on 26 Feb 2009

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

On the Rate of Convergence of the Functional k-NN Estimates

Gérard Biau, Frédéric Cérou, Arnaud Guyader

To cite this version:

Gérard Biau, Frédéric Cérou, Arnaud Guyader. On the Rate of Convergence of the Functionalk-NN Estimates. [Research Report] RR-6861, INRIA. 2009. �inria-00364555�

(2)

a p p o r t

d e r e c h e r c h e

ISSN0249-6399ISRNINRIA/RR--6861--FR+ENG

Thème NUM

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

On the Rate of Convergence of the Functional k-NN Estimates

G. Biau — F. Cérou — A. Guyader

N° 6861

Février 2009

(3)

(4)

Centre de recherche INRIA Rennes – Bretagne Atlantique IRISA, Campus universitaire de Beaulieu, 35042 Rennes Cedex

Téléphone : +33 2 99 84 71 00 — Télécopie : +33 2 99 84 71 71

On the Rate of Convergence of the Functional k-NN Estimates

G. Biau^∗ , F. C´erou^† , A. Guyader^‡

Thème NUM — Systèmes numériques Equipe-Projet ASPI´

Rapport de recherche n°6861 — F´evrier 2009 — 13 pages

Abstract: Let F be a general separable metric space and denote by Dⁿ = {(X1, Y1), . . . ,(Xn, Yn)} independent and identically distributedF ×R-valued random variables with the same distribution as a generic pair (X, Y). In the regression function estimation problem, the goal is to estimate, for fixedx∈ F, the regression functionr(x) =E[Y|X=x] using the dataDⁿ. Motivated by a broad range of potential applications, we propose, in the present contribution, to investigate the properties of the so-calledkn-nearest neighbor regression estimate. We present explicit general finite sample upper bounds, and particularize our results to important function spaces, such as reproducing kernel Hilbert spaces, Sobolev spaces or Besov spaces.

Key-words: Regression, Nearest neighbors, Rates of convergence.

The author names appear in alphabetical order.

∗LSTA & LPMA - Universit´e Pierre et Marie Curie - Paris VI, Boˆıte 158, 175 rue du Chevaleret, 75013 Paris, France

†INRIA Rennes - Bretagne Atlantique, Campus de Beaulieu, 35042 Rennes Cedex, France

‡ Equipe de Statistique IRMAR, Universit´e de Haute Bretagne, Place du Recteur H. Le Moal, CS 24307, 35043 Rennes Cedex, France

(5)

Sur la vitesse de convergence de l’estimateur fonctionnel des k plus proches voisins

Résumé : SoitF un espace métrique séparable. On s’intéresse à l’estimation de la fonction de régressionr(x) =E[Y|X =x] associée à un couple aléatoire (X, Y) à valeurs dansF×R, à partir d’un échantillon i.i.d. Dⁿ={(Xi, Yi)_1≤i≤n} de même loi que (X, Y). Dans ce contexte, l’estimateur des k plus proches voisins consiste à sélectionner dansDⁿleskplus proches voisins deXet à cal- culer la moyenne des Yi associés. Un nombre croissant d’applications portant sur des donnéesXifonctionnelles, on s’intéresse aux vitesses de convergence de la méthode deskplus proches voisins dans ce contexte. On présente en partic- ulier des bornes explicites pour des espaces fonctionnelsF classiques : espaces de Hilbert à noyau reproduisant, espaces de Sobolev et espaces de Besov.

Mots-cl´es : R´egression, Plus proches voisins, Vitesses de convergence.

(6)

Functional k-NN Estimates 3

1 Introduction

LetF be a general separable metric space equipped with metricd. Denote by Dⁿ ={(X1, Y1), . . . ,(Xn, Yn)}independent and identically distributed F ×R- valued random variables with the same distribution as a generic pair (X, Y) satisfyingE|Y|<∞. In the regression function estimation problem, the goal is to estimate, for fixedx∈ F, the regression functionr(x) =E[Y|X=x] using the dataDⁿ. With this respect, we say that a regression function estimatern(x) is consistent ifE|rn(X)−r(X)|²→0 asn→ ∞.

In the classical statistical setting, each observation Xi is supposed to be a collection of numerical measurements represented by ad-dimensional vector.

Thus, most of the results to date have been reported in the finite-dimensional case, where it is assumed thatF is the standard Euclidean spaceR^d. We refer the reader to the monograph of Gy¨orfi, Kohler, Krzy˙zak and Walk [8] for a comprehensive introduction to the subject and an overview of most standard methods and developments.

However, in an increasing number of practical applications, input data items are in the form of random functions (speech recordings, times series, images...) rather than standard vectors, and this casts the regression problem into the general class of functional data analysis. Here, “random functions” means that the variable X takes values in a space F of functions on a compact subset of R^d with an appropriate norm. For example, F could be the Banach space of continuous functions onX = [0,1]^d with the norm

kfk∞= sup

x∈X|f(x)|,

but many other choices are possible. The challenge in this context is to infer the regression structure by exploiting the infinite-dimensional nature of the observations. The last few years have witnessed important developments in both the theory and practice of functional data analysis, and many traditional data analysis tools have been adapted to handle functional inputs. The book of Ramsay and Silverman [9] provides a presentation of the area.

Interestingly, functional observations also arise naturally in the so-called kernel methods for general pattern analysis. These methods are based on the choice of a proper similarity measure, given by a positive definite kernel defined between pairs of objects of interest, to be used for inferring general types of relations. The key idea is to embed the observations at hand into a (possibly infinite-dimensional) Hilbert space, called the feature space, and to compute inner products efficiently directly from the original data items using the kernel function. For an exhaustive presentation of kernel methodologies and related algorithms, we refer the reader to Sch¨olkopf and Smola [10], and Shawe-Taylor and Cristianini [11].

Motivated by this broad range of potential applications, we propose, in the present contribution, to investigate the properties of the so-called kn-nearest neighbor (kn-NN) regression estimate, assuming that theXi’s take values in a general separable metric space (F, d), possibly infinite-dimensional. Recall that,

RR n°6861

(7)

4 Biau & al.

forx inF, thekn-NN estimate is defined by

rn(x) = 1 kn

kn

X

i=1

Y(i,n)(x),

where (X(1,n)(x), Y(1,n)(x)), . . . ,(X(1,n)(x), Y(1,n)(x)) denotes a reordering of the data according to the increasing values ofd(x,Xi) (ties are broken in favor of smallest indices). This procedure is one of the oldest approaches to regression, dating back to Fix and Hodges [4, 5]. It is among the most popular nonparametric methods used in regression analysis, with over 900 research articles published on the method since 1981 alone. For implementation, it requires only a measure of distance in the sample space, hence its popularity as a starting-point for refinement, improvement and adaptation to new settings (see for example Devroye, Gy¨orfi and Lugosi [2], Chapter 19).

Stone [13] proved the striking result that the estimatern isuniversally con- sistentifF=R^d, providedkn → ∞andkn/n→0. Here universally consistent means that the method is consistent, regardless of the underlying distribution of (X, Y) (universally consistent regression estimates can also be obtained by other local averaging methods as long asF = R^d, see e.g. [8]). It turns out that the story is radically different in general metric spacesF. In this respect, C´erou and Guyader [1] present counterexamples indicating that the estimatern

is not universally consistent for generalF, and they argue that restrictions on F and the regression functionrcannot be dispensed with.

In this paper, we go one step further in the analysis and study the rates of convergence ofE|rn(x)−r(x)|² asn→ ∞, whenXis allowed to take values in a general separable spaceF. This important question has been first addressed by Kulkarni and Posner [7], who put forward the essential role played by the covering numbers of the support of the distribution. Building upon the ideas in [7] and exploiting recent results of Smale [12] on metric entropy, we present explicit general finite sample upper bounds, and particularize our results to important function spaces, such as reproducing kernel Hilbert spaces, Sobolev spaces or Besov spaces.

2 Rates of convergence

Setting

˜

rn(x) = 1 kn

kn

X

i=1

r(X(i,n)(x)),

we start the analysis with the standard variance/bias decomposition (Gy¨orfi, Kohler, Krzy˙zak and Walk [8])

E|rn(X)−r(X)|²=E|rn(X)−r˜n(X)|²+E|r˜n(X)−r(X)|². (1) The first term is a variance term, which can be upper-bounded independently of the spaceF. Proof of the next proposition can be found in [8], page 94:

INRIA

(8)

Proposition 2.1. Suppose that, for all x∈(F, d), σ²(x) =V[Y |X=x]≤σ². Then, for all x∈R^d and alln≥1,

E|rn(x)−r˜n(x)|²≤σ² kn

.

The right-hand term in (1), which is a bias term, needs more careful at- tention. Let the symbol ⌊.⌋ denote the integer part function. First, a quick inspection of the finite-dimensional proof ([8], page 95) reveals that

Proposition 2.2. Suppose that, for all xandx^′∈(F, d),

|r(x)−r(x^′)| ≤Ckx−x^′k, for some nonnegative constant C. Then

E|r˜n(X)−r(X)|²≤C²Eh d²

X,X(1,⌊_knⁿ⌋)(X)i .

Putting Proposition 2.1 and 2.2 together, we obtain E|rn(X)−r(X)|²≤ σ²

kn

+C²Eh d²

X,X(1,⌊_knⁿ⌋)(X)i .

Thus, in order to bound the rate of convergence ofE|rn(X)−r(X)|², we need to analyse the rate of convergence ofthenearest neighbor distance in a general separable metric setting. As noticed in Kulkarni and Posner [7], this task can be achieved via the use of covering numbers of totally bounded sets (Kolmogorov and Tihomirov [6]). Let B(x, ε) denote the open ball in F centered at x of radiusε.

Definition 2.1. Let A be a subset of F. The ε-covering number N(ε) [=

N(ε,A)] is defined as the smallest number of open balls of radius ε that cover the setA. That is

N(ε) = inf (

k≥1 :∃x1, . . . ,xk ∈ F such thatA ⊂

k

[

i=1

B(xi, ε) )

. A set A ⊂ F is said to be totally bounded if N(ε) < ∞ for all ε > 0.

In particular, every relatively compact set is totally bounded and all totally bounded sets are bounded. The converse assertions are not true in general.

Figure 1 below illustrates this important concept in the finite-dimensional setting (F, d) = (R²,k.k∞) andA= (0,1)²=B(0,1).

As a function ofε, N(ε) is a non increasing, piecewise-constant and right- continuous function. The following discrete function, called the metric covering radius, can be interpreted as a pseudo-inverse of the functionN(ε).

Definition 2.2. The metric covering radius N⁻¹(r) [=N⁻¹(r,A)] is defined as the smallest radius such that there existr balls of this radius which cover the setA. That is

N⁻¹(r) = inf (

ε >0 :∃x1, . . . ,xr∈ F such thatA ⊂

r

[

i=1

B(xi, ε) )

.

RR n°6861

(9)

6 Biau & al.

x 1

1 B(0,1)

B(x, ε)

Figure 1: Anε-covering ofB(0,1) in dimension 2.

We note thatN⁻¹(r) is a non increasing function ofr(see Figure 2).

Exemple 2.1. If (F, d) = (R^d,k.k∞)andA=B(0,1) = (−1,+1)^d, then N(ε,A) =

1 ε

d

1_{ε−1∈N⋆}+ 1

ε

+ 1 d

1_{ε−1∈/N⋆}. In addition

N⁻¹(i,A) =i⁻¹^d.

1 2 1 3 1

4 1 1 2 3 4

ε i

1 2 3 4

1

1 2 1 3 1 4

N(ε,B(0,1)) N⁻¹(i,B(0,1))

Figure 2: Covering numbers and covering radii for the unit ball (−1,+1).

Finally, we let the support S(µ) of the probability measure µof X be defined as the collection of all x with µ(B(x, ε)) >0 for allε > 0. Throughout the paper, it will be assumed that S(µ) is totally bounded. We observe that 2N⁻¹(1,S(µ)) is an upper bound of the diameter of the support ofµ.

Proposition 2.3 below bounds the convergence rate of the expected nearest neighbor distance in terms of the metric covering radii of the support of the distributionµ. It sharpens the constant of Theorem 1, page 1032 in Kulkarni and Posner [7].

INRIA

(10)

Proposition 2.3. Let X1, . . . ,XN be i.i.d. according to a probability measure µ withS(µ)a totally bounded subset of(F, d). Then

E

d² X,X(1,N)

≤ 4 N

N

X

i=1

N⁻¹(i,S(µ))2

.

Proof. All the covering and metric numbers we use in this proof are per- taining to the bounded set S(µ). Therefore, to lighten notation a bit, we set N(ε) =N(ε,S(µ)) andN⁻¹(r) =N⁻¹(r,S(µ)).

LetX^′ be a random variable distributed as and independent ofXand let, forε >0,

F_X(ε) =P(kX−X^′k ≤ε|X)

be the cumulative distribution function of the Euclidean distance between X andX^′ conditionally onX. Set finally

D(1)(X) =d X,X(1,N)(X) .

Clearly, P

D²₍₁₎(X)> ε

=EP D(1)(X)>√ ε|X

=E

(1−F_X(√ ε))^N

. TakeB¹, . . . ,BN(√

ε/2)a√ε/2-covering ofS(µ), and define anN(√ε/2)-partition ofS(µ) as follows. For eachℓ= 1, . . . ,N(√ε/2), let

P^ℓ=B^ℓ−

ℓ−1

[

j=1

B^j. ThenP^ℓ⊂ B^ℓ and

N(√ ε/2)

[

ℓ=1

B^ℓ=

N(√ ε/2)

[

ℓ=1

P^ℓ, withP^ℓ∩ P^ℓ^′ =∅. Also,

N(ε/2)

X

ℓ=1

µ(P^ℓ) = 1.

Thus, lettingpℓ=µ(P^ℓ), we may write F_X √

ε

≥P(∃ ℓ= 1, . . . ,N(√

ε/2) :X∈ P^ℓ andX^′∈ P^ℓ|X)

=E





N(√ ε/2)

X

ℓ=1

1X^′∈Pℓ¹X∈Pℓ

X





=

N(√ε/2)

X

ℓ=1

pℓ¹X∈Pℓ

As a by-product, we remark that ∀ε > 0 we haveF_X(√ε) >0 almost surely.

Moreover

E 1

F_X(√ε)

≤E

"

1 PN(√

ε/2)

ℓ=1 pℓ¹X∈Pℓ

#

=E





N(√ ε/2)

X

ℓ=1

1 pℓ

1X∈Pℓ



,

RR n°6861

(11)

8 Biau & al.

which leads to

E 1

F_X(√ε)

≤ N(√ε/2).

Sincep(1−p)^N ≤N^e⁻+1¹ ≤2N¹ for allp∈[0,1], we can deduce P

D₍₁₎² (X)> ε

=E

(1−F_X(√ ε))^N

=E 1

F_X(√ε)F_X √ ε

(1−F_X(√ ε))^N

≤ 1 2NE

1 F_X(√ε)

≤ N(√ε/2) 2N

Using the fact thatP(D(1)(X)> ε) = 0 forε >2N⁻¹(1), we may write

E h

D²₍₁₎(X)i

= Z ∞

0

P

D₍₁₎² (X)> ε dε

=

Z 4[N⁻¹(1)]² 0

P

D²₍₁₎(X)> ε dε

=

Z 4[N^−N(1)]² 0

P

D₍₁₎² (X)> ε dε+

Z 4[N⁻¹(1)]² 4[N⁻¹(N)]²

P

D²₍₁₎(X)> ε dε

≤4

N⁻¹(N)2

+ 1 2N

Z 4[N⁻¹(1)]² 4[N⁻¹(N)]²

N(√ε/2)dε

= 4

N⁻¹(N)2

+ 2 N

Z [N⁻¹(1)]² [N⁻¹(N)]²

N(√ ε)dε

= 4

N⁻¹(N)2

+ 2 N

N

X

i=2

Z [N⁻¹(i−1)]² [N⁻¹(i)]²

N(√ε)dε

SinceN(√ε) =iforN⁻¹(i)≤√ε <N⁻¹(i−1), we can upper bound this last quantity as follows

Eh

D²₍₁₎(X)i

≤4

N⁻¹(N)2

+ 2 N

N

X

i=2

i

N⁻¹(i−1)2

−

N⁻¹(i)2 .

= 4 N

N⁻¹(1)² + 2

N

N−1

X

i=2

N⁻¹(i)² + 2

N⁻¹(N)²

≤ 4 N

N

X

i=1

N⁻¹(i)2

.

For the last inequality, recall that the sequence (N⁻¹(i)) is non increasing, so that

N⁻¹(N)2

≤ PN

i=2

N⁻¹(i)2

N−1 .

INRIA

(12)

Then the decomposition N⁻¹(N)²

= N−1 N

N⁻¹(N)² + 1

N

N⁻¹(N)²

leads to the desired result.

Exemple 2.2. : if (F, d) = (R^d,k.k∞),S(X)⊂ B(0,1) andd≥3, then:

EkX(1,N)−Xk²∞≤ 4 N

N

X

i=1

i⁻²^d ∼ 4d

d−2 N⁻^d².

Remark 2.1. : In the same way as for the proof of Theorem 2.3, one can easily show that

E

d X,X(1,N)

≤ 2 N

N

X

i=1

N⁻¹(i,S(µ)),

which sharpens also the constant of Theorem 1, page 1032 in Kulkarni and Posner [7].

In our context, the goal is to estimateN⁻¹(i,S(µ)) whenX takes values in a general metric space. Unfortunately, in infinite dimension, even if the support is bounded (e.g. S(µ) contained in the unit ball), it will not be totally bounded, so that N⁻¹(i,S(µ)) = +∞for alli and the previous theorem is useless. The trick is then to use an auxiliary distance to measure the distances between X and the training data. This is the key idea of compact embedding.

In the sequel, we assume that (F,k.kF) and (G,k.kG) are two general normed vector spaces. Moreover,BF(0,1) is the open unit ball with radius 1 in (F,k.kF) :

BF(0,1) ={x∈ F : kxkF <1}.

Definition 2.3. An inclusion I : (F,k.kF) ֒→ (G,k.kG) is called a compact embedding if I(BF(0,1)) is totally bounded in(G,k.kG).

We give now several examples of compact embeddings.

1. Reproducing Kernel Hilbert Spaces.

LetX be a compact domain inR^d with smooth boundary andK:X × X →R a Mercer kernel, i.e. K is continuous, symmetric and positive definite. This last property means that for all finite sets {x1, . . . ,xm} ⊂ X, the matrix [K(xi,xj)]_1≤i,j≤m is positive definite. The reproducing kernel Hilbert space H^K associated with the kernel Kis defined as the closure of the linear span of the set of functions {K_x=K(x, .), x∈ X }with the inner product satisfying

∀x∈ X,∀f ∈ H^K hK_x, fi^K =f(x).

Let us also denote (C(X),k.k∞) the space of continuous functions onXequipped with the supremum norm. If K is C^∞ then it has been proved by Cucker and Smale [12] that the inclusion

IK : (H^K,k.k^K)֒→(C(X),k.k∞)

RR n°6861

(13)

10 Biau & al.

is a compact embedding. If B^R stands for the closed ball with radius R in (H^K,k.k^K), it implies that the closureIK(B^R) is a compact subset of (C(X),k.k∞) and we can even obtain some upper-bounds on the covering numbersN(ε, IK(B^R)).

Indeed, Cucker and Smale have shown (see [12], Theorem D) that forh > d, ε >0 andR >0:

lnN(ε, IK(B^R))≤ RCh

ε ^2d_h

,

where Ch is a constant independent of ε andR. This readily implies that for h > d,i∈N^∗ andR >0:

N⁻¹(i, IK(B^R))≤RCh(lni)⁻^2d^h.

More recently, with stronger assumptions, Zhou [14] has even improved this result. For convolution type kernels K(x,x^′) = k(x^′ −x) on [0,1]^d, he has indeed provided estimates depending on the decay of ˆk, the Fourier transform ofk. For example, when ˆkdecays exponentially, we have:

lnN(ε, IK(B^R))≤Ck,d

lnR

ε d+1

,

whereCk,ddepends only on the kernel and on the dimension. This implies that:

N⁻¹(i, IK(B^R))≤Rexp (

− lni

Ck,d

_d+1¹ ) .

In particular, this last result can be applied to the classical Gaussian kernel:

K(x,x^′) =k(x^′−x) = exp

−kx^′−xk² σ²

.

2. Sobolev Spaces.

Here againX is a compact domain inR^d with smooth boundary. For allm∈N andp≥1, define the usual Sobolev spaceW^m,p(X) equipped with the norm:

kfk^W^m,p= X

|α|≤m

kD^αfk^p,

where the sum is over the multi-indexesα= (α1, . . . , αd) such thatα1+· · ·+ αd≤m. The first results date back to Sobolev in 1938 and it is now well-known that for allm > d/2, the inclusion

Im: (W^m,p(X),k.k^W^m,p)֒→(C(X),k.k∞)

is a compact embedding. Moreover, with the same notations as above, it can be proved (see [3] p.105 and [12] Proposition 6) that forε >0 andR >0:

lnN(ε, Im(B^R))≤ RCm

ε _m^d

,

whereCmis a constant independent ofεandR. This implies that form > d/2, i∈N^∗ andR >0:

N⁻¹(i, Im(BR))≤RCm(lni)⁻^m^d.

INRIA

(14)

The result mentioned in [3] can in fact be applied in many more situations since they also give covering number estimates for a lot of classical “Sobolev type”

function spaces.

3. Besov Spaces.

Here again X is a compact domain in R^d with smooth boundary. Without delving into the details of construction, let us just say that (B_pq^s (X),k.k^spq) is the classical Besov space onX. If 1≤p, q≤ ∞ands > d/p, then the inclusion

Is: (B_pq^s (X),k.k^spq)֒→(C(X),k.k∞)

is a compact embedding. It can be deduced from a general result by Edmunds and Triebel (see [3] p.105) that forε >0 andR >0:

lnN(ε, Is(B^R))≤ RCm

ε ^d_s

,

which gives raise to the same upper-bound as in Sobolev spaces : N⁻¹(i, Is(B^R))≤RCs(lni)⁻^s^d.

Here again, the result mentioned in [3] can be applied in many more situations since they give compact embeddings for a lot of classical “Besov type” function spaces.

In the general case, we suppose that the supportS(µ) is bounded in (F, d) and denote R > 0 such that S(µ)⊂ B^R. In all above mentioned cases, there exists a compact embedding :

I: (F, d)֒→(C(X),k.k∞) and we can upper-bound the covering radius :

N⁻¹(i,S(µ))≤ N⁻¹(i, I(B^R))≤φ(lni),

whereφis a function depending on the considered compact embedding. Anyway, in all these casesφhas some common features which lead to the following result.

Lemma 2.1. If φ:R⁺→R⁺ is a function satisfying the assumptions : (i) φis non increasing andlimt→+∞φ(t) = 0;

(ii) ^φ_φ(t)^′^(t)−−−−→_t

→+∞ 0;

(iii) R+∞

1 φ(lnu)du= +∞.

Then we have the following equivalence when N goes to infinity : 1

N

X

i=1

φ(lni)∼φ(lnN)

RR n°6861

(15)

12 Biau & al.

Proof. Sinceφis a non increasing function and R+∞

1 φ(lnu)du= +∞, we have the classical equivalence between sum and integral whenN goes to +∞:

N

X

1

φ(lni)∼ Z N

1

φ(lnu) du

By the assumption limu→+∞φ^′(lnu)

φ(lnu) = 0, we have also the following equivalence whenugoes to +∞

φ(lnu)∼φ(lnu)

1 +φ^′(lnu) φ(lnu)

, and sinceR+∞

1 φ(lnu) du= +∞, we can deduce that Z N

1

φ(lnu) du∼ Z N

1

φ(lnu)

1 + φ^′(lnu) φ(lnu)

du, but this last term is simply:

Z N 1

φ(lnu)

1 +φ^′(lnu) φ(lnu)

du= [uφ(lnu)]^N₁ ∼N φ(lnN).

Putting all things together, we can conclude that:

1 N

N

X

i=1

φ(lni)∼φ(lnN).

We summarize now the assumptions on the model.

Hypothesis 2.1 (H). We make the following assumptions:

(i) The support S(µ) is bounded in (F,k.kF) : there exists an R > 0 such that S(µ)⊂ BF(0, R);

(ii) There exists a compact embedding

I: (F,k.kF)֒→(G,k.kG),

and a functionφ satisfying the assumptions of Lemma 2.1 so that for all i∈N^⋆:

N⁻¹(i, I(B^R))≤φ(lni);

(iii) There exists anα∈Rsuch that:

∀(x,x^′)∈ F × F |r(x^′)−r(x)| ≤Ckx^′−xkG.

Finally, we can give a complete overview of the preceding results.

Theorem 2.1. Under assumption (H), we have : E|rn(X)−r(X)|²≤ σ²

kn +C² φ(ln⌊n kn⌋),

INRIA

(16)

References

[1] C´erou, F. and Guyader, A. (2006). Nearest neighbor classification in infinite dimension,ESAIM: Probabability and Statistics, 10, 340-355.

[2] Devroye, L., Gy¨orfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition, Springer-Verlag, New York.

[3] Edmunds, L.E. and Triebel, H. (1996). Function Spaces, Entropy Numbers and Differential Operators, Cambridge University Press.

[4] Fix, E. and Hodges, J.L. (1951). Discriminatory analysis. Nonparametric discrimination: Consistency properties,Technical Report 4, Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas.

[5] Fix, E. and Hodges, J.L. (1952). Discriminatory analysis: small sample per- formance,Technical Report 11, Project Number 21-49-004, USAF School of Aviation Medicine, Randolph Field, Texas.

[6] Kolmogorov, A.N. and Tihomirov, V.M. (1961).ε-entropy andε-capacity of sets in functional spaces, American Mathematical Society Translations,17, 277-364.

[7] Kulkarni, S.R. and Posner, S.E. (1995). Rates of convergence of nearest neighbor estimation under arbitrary sampling,IEEE Transactions on Infor- mation Theory,41, 1028-1039.

[8] Gy¨orfi, L., Kohler, M., Krzy˙zak, A. and Walk, H. (2002).A Distribution-free Theory of Nonparametric Regression, Springer-Verlag, New York.

[9] Ramsay, J.O. and Silverman, B.W. (1997). Functional Data Analysis, Springer, New York.

[10] Sch¨olkopf, B. and Smola, A.J. (2002). Learning with Kernels, The MIT Press, Cambridge.

[11] Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge.

[12] Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning, Bulletin of the American Mathematical Society,39, 1-49.

[13] Stone, C.J. (1977). Consistent nonparametric regression, The Annals of Statistics,5, 595-645.

[14] Zhou, D.-X (2002). The Covering Number in Learning Theory,Journal of Complexity,18, 739-767.

RR n°6861

(17)

Centre de recherche INRIA Rennes – Bretagne Atlantique IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France)

Centre de recherche INRIA Bordeaux – Sud Ouest : Domaine Universitaire - 351, cours de la Libération - 33405 Talence Cedex Centre de recherche INRIA Grenoble – Rhône-Alpes : 655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier Centre de recherche INRIA Lille – Nord Europe : Parc Scientifique de la Haute Borne - 40, avenue Halley - 59650 Villeneuve d’Ascq

Centre de recherche INRIA Nancy – Grand Est : LORIA, Technopôle de Nancy-Brabois - Campus scientifique 615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex

Centre de recherche INRIA Paris – Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex Centre de recherche INRIA Saclay – Île-de-France : Parc Orsay Université - ZAC des Vignes : 4, rue Jacques Monod - 91893 Orsay Cedex

Centre de recherche INRIA Sophia Antipolis – Méditerranée : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex

Éditeur

INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)

http://www.inria.fr ISSN 0249-6399