• Aucun résultat trouvé

THE k-NEAREST NEIGHBORS ESTIMATION OF THE CONDITIONAL MODE FOR FUNCTIONAL DATA

N/A
N/A
Protected

Academic year: 2022

Partager "THE k-NEAREST NEIGHBORS ESTIMATION OF THE CONDITIONAL MODE FOR FUNCTIONAL DATA"

Copied!
23
0
0

Texte intégral

(1)

OF THE CONDITIONAL MODE FOR FUNCTIONAL DATA

MOHAMMED KADI ATTOUCH and WAHIBA BOUABC¸ A Communicated by Dan Cri¸san

The aim of this article is to study the nonparametric estimation of the conditional mode by the k-Nearest Neighbor method for functional explanatory variables.

We give the rate of the almost complete convergence of the k-NN kernel esti- mator of the conditional mode. A real data application is presented to exhibit the effectiveness of this estimation method with respect to the classical kernel estimation.

AMS 2010 Subject Classification: 62G05, 62G08, 62G20, 62G35.

Key words: functional data, the conditional mode, nonparametric regression, k-NN estimator, rate of convergence, random bandwidth.

1. INTRODUCTION

The conditional mode, by its importance in the nonparametric forecast- ing field, has motivated a number of researchers in the investigation of mode estimators, and constitutes an alternative method to estimate the conditional regression (see Ould Sa¨ıd [24] for more discussion and examples).

In finite dimension spaces, there exists an extensive bibliography for independent and dependent data cases. In the independent case, strong consistency and asymptotic normality using the kernel method estimation of the conditional mode is given in Samanta and Thavaneswaran [26]. In the dependent case, the strong consistency of conditional mode estimator was obtained by Collomb et al. [6] and Ould-Sa¨ıd [24] under strong mixing conditions. (See Azzahrioui [14] for extensive literature).

In infinite dimension spaces, the almost complete convergence of the con- ditional mode is obtained by Ferraty et al. [18], Ezzahrioui and Ould Sa¨ıd [15] establish the asymptotic normality of nonparametric estimators of the conditional mode for both independent and dependant functional data. The consistency in Lp-norm of the conditional mode function estimator is given in Dabo-Niang and Laksaci [10].

Estimating the conditional mode is directly linked to density estimation, and for the last one, the bandwidth selection is extremely important for the

REV. ROUMAINE MATH. PURES APPL.58(2013),4,393–415

(2)

performance of an estimate. The bandwidth must not be too large, so as to prevent over-smoothing,i.e. substantial bias, and must not be too small either, so as to prevent detecting the underlying structure. Particularly, in nonpara- metric curve estimation, the smoothing parameter is critical for performance.

Based on this fact, this work deals with the nonparametric estimation with k nearest neighbors method k-NN. More precisely, we consider a ker- nel estimator of the conditional mode build from a local window taking into account the exact knearest neighbors with real response variable Y and func- tional curves X.

The k-Nearest Neighbor or k-NN estimator is a weighted average of re- sponse variables in the neighborhood of x. The existent bibliography of the k-NN method estimation dates back to Royall [25] and Stone [27] and has re- ceived, continuous developments (Mack [22] derived the rates of convergence for the bias and variance as well as asymptotic normality in multivariate case; Col- lomb [5] studied different types of convergence (probability, almost.sure (a.s.) and almost.completely (a.co.)) of the estimator of the regression function; De- vroye [12] obtained the strong consistency and the uniform convergence). For the functional data studies, the k-NN kernel estimate was first introduced in the monograph of Ferraty and Vieu [18], Burba et al. [4] obtained the rate of almost complete convergence of the regression function using thek-NN method for independent data and Attouch and Benchikh [1] established the asymptotic normality of robust nonparametric regression function.

The paper is organized as follows: the following section presents thek-NN model estimation of the conditional mode. Then, we give hypotheses and state our main result in Section 3. After we introduce the almost complete conver- gence and the almost complete convergence rate, in Section 5 we illustrate the effectiveness of thek-NN method estimation for real data. All proofs are given in the appendix.

2. MODELS AND ESTIMATORS

Let (X1, Y1), . . .(Xn, Yn) be n independent pairs, identically distributed as (X, Y) which is a random pair valued in F ×IR, whereF is a semi-metric space, d(., .) denoting the semi-metric. We do not suppose the exitance of a density for the functional random variable (f.r.v)X.

All along the paper when no confusion will be possible we will denote by c, c0 or / Cx some generic constant in R+ and in the following, any real function with an integer in brackets an exponent denotes its derivative with the corresponding order. From a sample of independent pair (Xi, Yi), each having the same distribution as (X, Y), our aim is to build nonparametric estimates

(3)

of several function related with the conditional probability distribution of Y given X. Forx∈ F we will denote the conditional cdf of Y givenX =x by:

∀y ∈IR FYX = IP(Y ≤y|X=x).

If this distribution is absolutely continue with respect to the Lesbesgues measure on R, then we will denote byfx (resp. fx(j)) the conditional density (resp. its jth order derivative) of Y given X = x. Then we will give almost complete convergence results (with rates) for nonparametric estimates offx(j). we will deduce immediately the convergence of conditional density estimate from the general results concerningfx(j).

In the following, x will be a fixed inF;Nx will denote a fixed neighbor- hood ofx,S will be a fixed compact subset ofR, and we will use the notation B(x, h)={x0 ∈ F/ d(x,x0)<h}.

The k nearest neighbors (k-NN) of F on xis defined by:

FbkN Nx (y) =

n

X

i=1

K(Hn,k−1d(x, Xi))G(h−1G (y−Yi))

n

X

i=1

K(Hn,k−1d(x, Xi))

.

Now, we focuse on the estimation of the jth order derivative of the con- ditional density f on x by the k nearest neighbors (k-NN), wherefbkN Nx(j) (y) is defined by:

fbkN Nx(j) (y) =

h−j−1G

n

X

i=1

K(Hk−1d(x, Xi))G(j+1)(h−1G (y−Yi))

n

X

i=1

K(Hk−1d(x, Xi))

, (1)

whereK is a kernel function andHn,k(.) is defined as follows : Hn,k(x) =min

(

h∈R+/

n

X

i=1

1IB(x,h)(Xi) =k )

.

Hk,n is a positives random variable (r.v.) which depends (x1, ..., xn), G is a cdf, andhG=hG,n is a sequence of positive real numbers.

We propose the estimateθbkN N of the conditional modeθ(assumed uniquely defined in the compact set S) defined below in (2). Our estimate is based on the previous functional conditional density estimate defined in (1) as:

(2) fbx(θbkN N) =sup

y∈S

fbkN Nx (y)

(4)

Note that the estimate θbkN N is note necessarily unique, and if this is the case, all the remaining of our paper will concern any valueθbkN N satisfying (2).

To prove the almost complete convergence, and the almost complete convergence rate of the k-NN conditional mode estimate, we need some re- sults of density estimate of Ferraty et al. [19]. Therefore, we present this density estimate by

fbx(j)(y) =

h−j−1G

n

X

i=1

K(h−1k d(x, Xi))G(j+1)(h−1G (y−Yi))

n

X

i=1

K(h−1k d(x, Xi)) and the conditional mode is

(3) fbx(θbmode) = sup

y∈S

fbx(y).

In the case, the non-random bandwidthh:=hn(sequence of positive real numbers which goes to zero as ngoes to infinity). The random feature of the k-NN bandwidth represents both its main quality and also its major disadvan- tage. Indeed, the fact that Hn,k(x) is a r.v. creates technical difficulties in proofs because we cannot use the same tools as in the standard kernel method.

But the randomness of Hn,k(x) allows to define a neighborhood adapted to x and to respect the local structure of the data.

We will consider two kinds of nonparametric models. The first one, called the “continuity-type” model, is defined as:

f ∈ CE0 ={f :F →IR/ lim

d(x,x0)→0f(x0) =f(x)}, (4)

and will issue pointwise consistency results. The second one called the “Lipchitz- type” model, assumes the existence of an α >0 such that

(5) f ∈LipE,α={f :F →IR/∃C >0,∀x0 ∈E,|f(x0)−f(x)|< Cd(x, x0)α}, and will allow to obtain the rates of convergence.

3. TECHNICAL TOOLS

The first difficulty comes because the window Hn,k(x) is random. So, we do not have in the numerator and in denominator offkN Nx (y) sums of variables independents. To resolve this problem, the idea is to frame sensibly Hn,k(x) by two non-random windows. More generally, these technical tools could be useful as long as one has to deal with random bandwidths.

(5)

Lemma 1 (Burba et al. [4]). Let (Ai, Bi)i=1,n be n random pairs (inde- pendent but note, identically distributed) valued in F ×R+, where (F, α) a generic measurable space. We note W : R×F a measurable function such that:

(L0) ∀z∈ F,∀t, t0 ∈ R: t 6 t0 ⇒ W(t, z) 6 W(t0, z).

We put for every n ∈ N, and T a real.r.v:

Cn(T) = sup

y∈S n

X

i=1

W(T, Ai)h−1G G(1)i (h−1G (y−Bi))

n

X

i=1

W(T, Ai)

.

Let (Dn)N a sequence of real random variables. If for anyβ ∈]0,1[, there exists two sequence of r.r.v Dn(β) and Dn+(β) which verify:

(L1) ∀n ∈ N :Dn6D+n, and 1I{D

n6Dn6Dn+} a.co.

−−−→1.

(L2)

n

X

i=1

W(Dn, Ai)

n

X

i=1

W(D+n, Ai)

a.co.

−−−→β.

(L3) ∃c>0 such that: Cn(Dn)−a.co.−−→c and Cn(D+n)−a.co.−−→c . So, we have :

Cn(Dn)−a.co.−−→c .

Lemma 2 (Burba et al. [4]). Let (Ai, Bi)i=1,n be n random pairs (inde- pendent but note, identically distributed) valued in F ×R+, where (F, α) a generic measurable space. We note W : R×F a measurable function such that:

(L0) ∀z∈ F,∀t, t0 ∈ R: t 6 t0 ⇒ W(t, z) 6 W(t0, z).

We put for every n ∈ N, and T a real.r.v:

Cn(T) = sup

y∈S n

X

i=1

W(T, Ai)h−1G G(1)(h−1G (y−Bi))

n

X

i=1

W(T, Ai)

.

Let (Dn)N a sequence of r.r.v. And let (vn)n∈N a positive decreasing sequence.

(6)

1. if l = limvn 6= 0, and if for every increasing sequence βn∈]0,1[, there exists two sequences of r.r.v Dnn) and Dn+n) which verifies:

(L1) ∀n ∈ N :Dn6D+n , and 1I{D

n6Dn6D+n} a.co.

−−−→1,

(L02)

n

X

i=1

W(Dn, Ai)

n

X

i=1

W(Dn+, Ai)

− βn = oa.co.(vn),

(L03) ∃c>0such that: Cn(Dn)−c = oa.co.(vn) and Cn(D+n)−c = oa.co.(vn),

2. Or if l = 0, and if for every increasing sequences βn∈]0,1[width limits 1, (L1),(L02) and(L03) are verified.

So, we have:

Cn(Dn) − c = oa.co.(vn) .

Burbaet al. [4] use in their consistency proof of thek-NN kernel estimate for independent data a Chernoff-type exponential inequality to check Condition (L1) or (L01). In the case of thek-NN conditional mode, we can also use the exponential inequality.

Proposition 1 (Burbaet al. [4]). Let X1· · ·Xn independent random variables valued in {0,1} such as IP(Xi= 1) =pi. We note X=

n

X

i=1

Xi, and µ= IE[X], so ∀δ >0

1. IP(X >(1 +δ)µ)<

eδ (1 +δ)1+δ

.

2. IP(X <(1−δ)µ)< e δ2

2µ .

Lemma 3 (Ferratyet al. [18]). Under the hypotheses(H1)and (H12)we have:

1 n φx(hK)

n

X

i=1

K

d(x , Xi) hK

a.co.

−−−→ 1.

(7)

4. HYPOTHESES AND RESULTS

We need the following hypotheses gathered together for easy reference:

(H1) There exists a nonnegative differentiable functionφstrictly increas- ing such that:

φx(h) = IP(X ∈B(x, h))>0.

(H2) ∀(y1, y2) ∈ S × S, ∀(x1, x2) ∈ Nx × Nx, |Fx1(y1) − Fx2(y2)|

≤Cx d(x1, x2)b1+|y1−y2|b2

or for somej ≥0.

(H3) ∀(y1, y2) ∈ S × S, ∀(x1, x2) ∈ Nx × Nx, |fx1(j)(y1)−fx2(j)(y2)|

≤Cx d(x1, x2)b1+|y1−y2|b2 . (H4)

∀(y1, y2)∈IR2,|G(y1)−G(y2)| ≤C|y1−y2| R|t|b2G(1)(t)dt <∞.

(H5) K is a function with support (0,1) such that 0< C1 < K(t)< C2

<∞.

(H6) The kernelGis a positive function supported on [0,1]. Its derivative G0 exists and is such that there exist two constantsC3 andC4 with

−∞< C3< G0(t)< C4<0 for 0≤t≤1.

(H7) lim

n→∞hG= 0 width lim

n→∞nαhG=∞ for some α >0.

(H8)





∀(y1, y2)∈IR2,

G(j+1)(y1)−G(j+1)(y2)

≤C|(y1)−(y2)|;

∃ν >0,∀j0 ≤j+ 1, lim

y→∞|y|1+ν|G(j0+1)(y)|= 0, G(j+1)is bounded.

(H9) ∃ξ >0, fx% on (θ−ξ, θ) andfx &on (θ, θ+ξ).

(H10) fx is j-times continuously differentiable with respect to y on (θ− ξ, θ+ξ),

(H11)

fx(l)(θ) = 0,if 1≤l < j, and

fx(j)(θ) >0.

(H12) The sequence of positive real numbers kn = k satisfies: k n −→0, log(n)

k →0 and log(n)

k h2j+1G −→0 asn→ ∞.

(8)

Comments on the hypotheses:

1. (H1) is an assumption commonly used for the explanatory variable X (see Ferraty and Vieu [17]) for more details and examples).

2. Hypotheses (H2)–(H3) characterizes the structural functional space of our model and are needed to evaluate the bias term in our asymptotic properties.

3. (H5) describes complementary assumptions concerning the kernel func- tion, focus on non-continuous kernels

4. Hypotheses (H6)–(H8) are the same as those used in Ferraty et al. [18].

5. The convergence of the estimator can be obtained under the minimal assumption (H9).

6. (H10) and (H11) are classical hypotheses in the functional estimation in finite or infinite dimension spaces.

7. (H12) describes technical conditions imposed for the brevity of proofs.

5. ALMOST COMPLETE CONVERGENCE

AND ALMOST COMPLETE CONVERGENCE RATE In order to link the existing literature with this works, we present the consistency result of the conditional mode estimation given in Ferratyet al.[18].

Theorem 1 (Ferraty et al. [18]). Under the model (4), suppose that hy- potheses (H3) and (H8)–(H12) are verified for j= 0, then if (H1), (H4)–(H5) and (H7)–(H9)–(H11), we have

n→∞lim bθmode(x) =θ(x) almost completely. (a.co.)

Under the model (5) and the same condition of Theorem 1 we have:

θbmode(x)−θ(x) =O

hbK1+hbG2 1j

+O

log(n)

n hGφx(hK) 2j1

a.co.

Now, we state the almost complete convergence, and almost complete convergence rate result for the nonparametrick-NN mode introduced in (2).

Theorem2. Under the model (4), suppose that hypotheses (H3) and (H8) are verified forj= 0. Then if (H1), (H4)–(H5) and (H7)–(H9)–(H12), we have

n→∞lim θbkN N(x) =θ(x) a.co.

(6)

(9)

Under the model (5) and the same condition of Theorem 2 we have:

θbkN N(x)−θ(x) = O

φ−1x knb1

+hbG21j +O

log(n) hGk

2j1

a.co.

(7)

Proof. The conditional density fx(.) is continuous (see (H3) and (H9)).

We get

∀ >0, ∃σ()>0,∀y∈(θ(x)−ξ, θ(x) +ξ),

|fx(y)−fx(θ(x))| ≤σ()⇒ |y−θ(x)| ≤. By construction θbkN N(x)∈(θ(x)−ξ, θ(x) +ξ) then

∀ >0, ∃σ()>0, |fx(θbkN N)−fx(θ(x))| ≤σ()⇒ |y−θ(x)| ≤. So, we finally arrive at

∃σ()>0, IP(|bθkN N(x)−θ(x)|> )≤IP(|fx(θbkN N(x))−fx(θ(x))|> δ()).

In the other case, it comes directly by the definition ofθ(x) andbθ(x)kN N that:

(8) |fx(bθkN N(x))−fx(θ(x))|

=|fx(bθkN N(x))−fbx(bθkN N(x)) +fbx(bθkN N(x))−fx(θ(x))|

≤ |fx(θbkN N(x))−fbx(θbkN N(x))|+|fbx(bθkN N(x))−fx(θ(x))|

≤2 sup

y∈S

|fbkN Nx (y)−fx(y)|.

The uniform completes convergence of the conditional density estimate over the compact set |θ−ξ, θ+ξ| (see Lemma 1 below) can be used leading directly from both previous inequalities to:

∀ >0,

n

X

i=1

IP(|bθkN N(x)−θ(x)|> )<∞.

Finally, the claimed consistency result (5) will be proved as long as the following lemma could be checked.

Lemma 4. Under the conditions of Theorem 2 we have:

n→∞lim sup

y∈S

|fbkN Nx (y)−fx(y)|= 0 a.co.

We have already shown in (5) that

|fx(θbkN N)−fx(θ)|= 2 sup

y∈S

|fbkN Nx (y)−fx(y)|.

(9)

(10)

Let us now write the following Taylor expansion of the function fx: fx(bθkN N) = fx(θ) + 1

j!fx(j)) (bθkN N −θ)j ,

for some θ betweenθ and θbkN N because (9), as along as we could be able to check that

∀τ >0,

n

X

i=1

IP

fx(j))< τ

<∞.

(10)

We would have

(θbkN N −θ)j =O sup

y∈S

|fbkN Nx (y)−fx(y)|

! a.co., (11)

so it suffices to check (10), and this is done directly by using the second part of (H11) together with (6).

6. REAL DATA APPLICATION

Now, we apply the described method to some chemometrical real data.

We work on spectrometric data used to classify samples according to a physico- chemical property which is not directly accessible (and therefore, requires a specific analysis). The data were obtained using a Tecator InfratecFood and Feed Analyzer working in a near-infrared (wavelengths between 850 and 1050 nanometers). The measurement is performed by transmission through a sample finely chopped meat which is then analyzed by a chemical process to determine its fat. The spectra correspond to the absorbance (–log10 of the transmittance measured by the device) 100 for wavelengths between 850 regularly distributed and 1050 nm. Meat samples were divided into two classes according to their contents more or less than 20% fat (77 was more spectra corresponding to 20%

fat and 138 with less than 20% fat). The problem is then to discriminate the spectra to avoid the chemical analysis, which is costly and time consuming. The figure shows absorbance versus wavelength (850–1050) for 215 selected pieces of meat. Note that, the main goal of spectrometric analysis is to allow the discovery of the proportion of some specific chemical content (see Ferraty and Vieu (2006) for further details related to spectrometric data). At this stage, one would like to use the spectrometric curve X to predict Y the proportion of fat content in the piece of meat. The data are available on the web site

“http://lib.stat.cmu.edu/datasets/tecator”.

(11)

Fig. 1. The 215 spectrometric curves,{Xi(t), t[850,1050], i= 1,· · ·,215}.

However, as described in the real data application (see Ferratyet al. [18]), the prediction problem can be studied by using the conditional mode approach.

Starting from this idea, our objective is to give a comparative study to estimate the conditional mode of both methods: the kernel method estimation defined in (3) and the k-NN method estimation defined in (2).

For the both methods, we use a quadratic kernel functionK, defined by:

K(x) = 3

2(1−x2)1I[0,1]

and the distribution functionG(.) is defined by G(x) =

Z x

−∞

15

4 t2(1−t2)1I[−1,1](t)dt.

Note that, the shape of these spectrometric curves is very smooth, for this we used the semi-metric defined by the L2 distance between the second derivative of the curves (see Ferraty and Vieu [16] for more motivations of this choice). We proceed by the following algorithm:

• Step 1. We split our data into two subsets:

– (Xj, Yj)j=1,...,172 training sample, – (Xi, Yi)i=173,...,215 test sample.

• Step 2. We compute the mode estimator dθXj, for all j by using the training sample.

(12)

• Step 3. For each Xi in the test sample, we set: i= arg min

j=1,...,172

d(Xi, Xj).

• Step 4. For alli= 173, . . . ,215 we take Ybi =θbmode(Xi),

whereXi is the nearest curve to Xi in the training sample.

And

Yei=bθkN N(Xkopt),

whereas, the optimal number of neighbors kopt is defined by kopt=argmin

k CV(k), where

CV(k) =

215

X

i=171

(Yi−θbkN N−i (x))2 with

θb−ikN N(x) =argmax

y (fbkN Nx )−i(y) and

(fbkN Nx )−i(y) = h−1G P172

j=1,j6=iK(Hk−1(x)d(x, Xj))G(1)(h−1G (y−Yj)) P172

j=1,j6=iK(Hk−1(x)d(x, Xj)) . The error used to evaluate this comparison is the mean square error (MSE) expressed by

1 45

215

X

i=173

|Yi−Tb(Xi)|2,

whereTbdesignate the estimator used: kernel ork-NN method to estimate the conditional mode.

The MSE of k-NN mode estimation is 1.716115 and the MSE of the classical mode estimation used in Ferraty et al. [18] is 2.97524, this result exhibits the effectiveness of the k-NN method.

7. APPENDIX

Proof of Lemma 2. For i=1,...,n, we consider the quantities G(1)i (y) = G(1)(h−1G (y−Bi)).

Under the definition of the r.v Dn, we put: Dn ≤Dn≤Dn+. It’s clear that:

W (Dn(β), Ai) 6 W(Dn(β), Ai)6 W (Dn+(β), Ai)

n

X

i=1

W Dn(β), Ai 6

n

X

i=1

W(Dn(β), Ai)6

n

X

i=1

W Dn+(β), Ai .

(13)

So: 1

n

X

i=1

W Dn+(β), Ai

6 1

n

X

i=1

W(Dn(β), Ai)

6 1

n

X

i=1

W Dn(β), Ai

.

Under the hypotheses (H7)–(H8) and (H9), we have:

sup

y∈S n

X

i=1

W(Dn(β), Ai)h−1G G(1)i (y)

n

X

i=1

W(Dn+(β), Ai)

| {z }

Cn+(β)

6sup

y∈S n

X

i=1

W(Dn(β), Ai)h−1G G(1)i (y)

n

X

i=1

W(Dn(β), Ai)

| {z }

Cn(Dn)

6 sup

y∈S n

X

i=1

W(Dn+(β), Ai)h−1G G(1)i (y)

n

X

i=1

W(Dn(β)(β), Ai)

| {z }

Cn(β)

.

Let us put the following r.v:

Cn(β) = sup

y∈S n

X

i=1

W Dn(β), Ai

h−1G G(1)i (y)

n

X

i=1

W Dn+(β), Ai and

Cn+(β) = sup

y∈S n

X

i=1

W Dn+(β), Ai

h−1G G(1)i (y)

n

X

i=1

W Dn(β), Ai .

We consider for every > 0:

Tn() ={Cn(Dn)/ c− 6 Cn(Dn) 6 c+}, and for everyβ ∈]0,1[, we note:

Sn(, β) ={Cn(β)/ c− 6 Cn(β) 6 c+}, Sn+(, β) ={Cn+(β)/ c− 6 Cn+(β) 6 c+}, and

Sn(β) ={Cn(Dn)/ Cn(β) 6 Cn(Dn) 6 Cn+(β)}.

(14)

It is obvious that:

Sn(β) ∩ Sn( , β) ∩ Sn+( , β) ⊂ Tn(). (12)

Let

0 = 3c

2 and β= 1−

3c

. By denoting:

Wn() ={Cn)/ βc−

3 6 Cn) 6 βc+ 3}, Wn+() ={Cn+)/ c

β

3 6 Cn+) 6 c β

+ 3}, Wn() ={Dn/ Dn) 6 Dn 6 D+n)}. We see that for ∈] 0, 0[:

βc−

3 > c− , βc+

3 6 c+ ,

and



 c β

3 > c− , c

β

+

3 6 c+ . So, for ∈] 0, 0[:

Wn() ⊂Sn( , β) and Wn+() ⊂Sn+( , β). (13)

(L0) implies that:

∀z∈ F,∀t, t0 ∈ R: t 6 t0 ⇒ W(t, z) 6 W(t0, z). And in particular:

Dn ∈ Wn() ⇒ Dn) 6 Dn 6 Dn+) Dn ∈ Wn() ⇒ Cn) 6 Cn(Dn) 6 Cn+)

⇒ Cn(Dn) ∈ Sn) . So:

Wn() ⊂ Sn). (14)

The result (12), (13) and (14) implies:

∀ ∈] 0, 0[, Wn() ∩ Wn+() ∩ Wn() ⊂ Tn(). In other words:

∀ ∈] 0, 0[, 1IcTn() 6 1IcWn()+ 1IcWn+()+ 1IcWn(). (15)

On the other hand, we can express the r.r.v Cn and Cn+ in the follow- ing way:

(15)

Cn(β) = sup

y∈S n

X

i=1

W Dn(β), Ai

h−1G G(1)i (y)

n

X

i=1

W D+n(β), Ai

= Cn(Dn

n

X

i=1

W Dn(β), Ai

n

X

i=1

W D+n(β), Ai

,

and

Cn+(β) = sup

y∈S n

X

i=1

W D+n(β), Ai

h−1G G(1)i (y)

n

X

i=1

W Dn(β), Ai

= Cn(D+n

n

X

i=1

W Dn+(β), Ai

n

X

i=1

W Dn(β), Ai .

So, under (L2) and (L3) :

Cn(β)−a.co.−−→β c and Cn+(β)−a.co.−−→c/β . (16)

Because:

cWn() ={Cn)/

Cn) − βc >

3}, the first part of (16) implies:

Cn)−a.co.−−→βc ⇒ ∀0 > 0 : P

P(|Cn)−βc| > 0) < ∞

⇒ ∀ ∈] 0, 0[ : P P

1IcWn() >

< ∞. So,

∀∈] 0, 0[, 1IcWn() a.co.

−−−→0. (17)

In the same way,

∀∈] 0, 0[, 1IcWn+() a.co.

−−−→0. (18)

Under (L1): 1I{D

n6Dn6Dn+} a.co.

−−−→1,it implies that:

∀∈] 0, 0[ : 1IWn()a.co.−−→1.

(16)

In other words,

∀∈] 0, 0[, 1IcWn()a.co.−−→0. (19)

We consider the results (15), (17), (18) and (19)

∀∈] 0, 0[, 1IcTn()a.co.−−→0. We can repeat as follows:

1IcTn() a.co.

−−−→0 ⇒ ∀ ∈] 0, 0[ : P

P |1IcTn()| >

< ∞

⇒ ∀ ∈] 0, 0[ : P

P(|Cn(Dn)−c| > ) < ∞. And we finally obtain:

Cn(Dn)−a.co.−−→c.

Proof of Theorem 2 (Almost Complete Convergence). Let G(1)i (y) = G(1)(h−1G (y−Yi)).

For the demonstration of this Theorem we have to use Lemma 2 with:

• F =E , α=BE .

• ∀i= 1, n : (Ai, Bi) = (Xi, Yi) .

• Dn=Hn and Cn(Dn) = sup

y∈S

kN Nx (y) .

• ∀(t , z)∈R+×E : W(t , z) =K

d(x , z) t

.

And for one β ∈]0,1[, we choose Dn and Dn+ such that:

φx(Dn) =p β k

n and φx(D+n) = 1 pβ

k n .

For the demonstration, we just have to show that the kernel K veri- fies (L0) and that the sequences (Dn)N , (Dn+)N and (Dn)N verify (L1), (L2) and (L3).

Verification of (L0) :

∀t , t0 ∈R+ : t 6 t0 ⇔ ∃v ∈[1,+∞[/ t0 = v t

⇔ ∃u ∈]0,1]/ 1

t0 = u1 t

⇔ ∃u ∈]0,1]/ K

kx−zk t0

=K

ukx−zk t

,

∀z ∈E.

(17)

We have that K is a bounded kernel, verifying: ∀z ∈ E, ∀u ∈ [0,1] : K(uz)≥K(z).So, we have:

K

kx−zk t

6 K

ukx−zk t

, where:

∀t , t0 ∈R+,∀z ∈E : t 6 t0 ⇒K

kx−zk t

6 K

kx−zk t0

. So, (L0) is verified.

Verification of (L1) :

The proof of (L1) is the same one as in Burbaet al. [4].

Verification of (L2) : According to (H12) :





n→+∞lim φx(Dn) = lim

n→+∞

pβ k n = 0,

n→+∞lim φx(D+n) = lim

n→+∞

1 pβ

k n = 0. For (H1), we deduce that:

( lim

n→+∞Dn = 0,

n→+∞lim D+n = 0. And by (H12) we have:





















n→+∞lim

log(n) nφx(Dn)

= lim

n→+∞

log(n) np

β k

n

= 0,

n→+∞lim

log(n) nφx(D+n)

= lim

n→+∞

pβ log(n) n

k n

= 0.

Because (Dn) and (Dn+) verify the conditions of Lemma 4, we have:









Q(Dn) = 1 n φx(Dn )

n

X

i=1

K

d(x , Xi) Dn

a.co.

−−−→ 1, Q(Dn+) = 1

n φx(D+n )

n

X

i=1

K

d(x , Xi) D+n

a.co.

−−−→ 1.

(18)

So, Q(Dn ) Q D+n

=

1 n φx(Dn)

n

X

i=1

K

d(x , Xi) Dn

1 n φx(D+n)

n

X

i=1

K

d(x , Xi) Dn+

a.co.

−−−→ 1.

We can deduce that:

n

X

i=1

K

d(x , Xi) Dn

n

X

i=1

K

d(x , Xi) D+n

a.co.

−−−→ β.

So, (L2) is checked.

Verification of (L3) :

Because Dn and Dn+ verify the condition of Theorem 1, we obtain :



































 sup

y∈S n

X

i=1

K

d(x , Xi) Dn

h−1G G(1)i (y)

n

X

i=1

K

d(x , Xi) Dn

a.co.

−−−→ fx(y),

sup

y∈S n

X

i=1

K

d(x , Xi) Dn+

h−1G G(1)i (y)

n

X

i=1

K

d(x , Xi) Dn+

a.co.

−−−→ fx(y).

So, (L3) verifies. That finishes the proof of the first part of Theorem 2.

Proof of Theorem 2 (Almost Complete Convergence Rate). In the same way as the demonstration of the first part of Theorem 2 (almost complete convergence), with the following notations:

• F =E , α=BE .

• ∀i= 1, n : (Ai, Bi) = (Xi, Yi)

• Dn=Hn and Cn(Dn) = ˆfkN Nx (y)

• ∀(t , z)∈R+×E : W(t , z) =K

d(x , z) t

.

(19)

And for one βn∈]0,1[, we chooseDn andD+n such that:

φx(Dn) =p βn k

n and φx(Dn+) = 1 pβn

k n .

For the proof, it is enough to show that the kernel K verifies (L0) and the sequences (Dn)N , (D+n)N and (Dn)Nverify (L1), (L02) and (L03).

Verification of (L0) and (L1) :

The verification of (L0) and (L1) is made in the same way as for Lemma 1.

Verification of (L02) :

We have already shown in Ferratyet al. [18] that:

fbNx(y) = 1 n hGIE [K1]

n

X

i=1

K(h−1K d(x, Xi))G(1)i (y) and

FbDx = 1 nIE [K1]

X

i=1

K(h−1K d(x, Xi)).

So,

fbx(y)−fx(y) = 1 FbDx

n

fbNx(y)−IEfbNx(y)

fx(y)−IEfbNx(y)o +fbNx(y)

FbDx n

IEFbDx −FbDx o

. To show that:

sup

y∈S

fbx(y)−fx(y) =O

hb1K +O

hb2G +O

s logn nhGφx(hK)

!

, a.co.

It is enough to show that:

(A) 1

FbDx sup

y∈S

IEfbNx(j)(y)−fx(j)(y) =O

hb1K +O

hb2G

, a.co.

(B) 1

FbDx sup

y∈S

fbNx(j)(y)−IEfbNx(j)(y) =O

s log(n)

nh2j+1G φx(hK)

!

, a.co.

(C) FbDx−IEFbDx =O s

log(n) nφx(hK)

!

, a.co.

(D) ∃δ > 0, such that X P

h FbDx

< δi

< +∞.

(20)

We put: h =Dn−1x

n k n

and h+=D+n−1x 1 pβn

k n

! . Because : lim

n→+∞h = lim

n→+∞h+ = 0 and

n→+∞lim

log(n)

n hGφx(Dn) = lim

n→+∞

log(n) n hGφx(Dn+)

= lim

n→+∞

log(n) hGk lim

n→+∞

log(n) k = 0 (because βn is bounded by 1).

We apply Lemma 4 and we obtain both results:

1. 1 n

n

X

i=1

K

d(x, Xi) h

− φx(h) = oa.co.

s logn n φx(h)

! . What implies:

1 n

n

X

i=1

K

d(x, Xi) Dn

− p βn

k

n = oa.co.

rlogn k

! .

2. 1 n

n

X

i=1

K

d(x, Xi) h+

− φx(h+) = oa.co.

s logn n φx(h+)

! . So,

1 n

n

X

i=1

K

d(x, Xi) Dn+

− 1 pβn

k

n = oa.co.

rlogn k

! . We have:

n

X

i=1

K

d(x, Xi) Dn

n

X

i=1

K

d(x, Xi) Dn+

− βn = oa.co.

rlogn k

! .

And according to (H12)

n

X

i=1

K

d(x, Xi) Dn

n

X

i=1

K

d(x, Xi) Dn+

− βn = oa.co.

rlogn hGk

! .

(L02) is also verified.

(21)

Verification of (L03) :

Because h+ andh verify the conditions, we can apply Theorem 1, that gives:

























































sup

y∈S n

X

i=1

K

d(x, Xi) Dn

hGG(1)i (y)

n

X

i=1

K

d(x, Xi) Dn

−sup

y∈S

fx(y)

= O (Dn)b1

+O(hG)b2 +Oa.co.

s log(n) n hGφx(Dn)

! .

sup

y∈S n

X

i=1

K

d(x, Xi) Dn+

hGG(1)i (y)

n

X

i=1

K

d(x, Xi) Dn+

−sup

y∈S

fx(y)

= O (Dn+)b1

+O(hG)b2 + Oa.co.

s log(n) n hGφx(D+n)

! . In other words,













Cn(Dn) −sup

y∈S

fx(y)

=O φ−1x k

n b1!

+O(hG)b2 +Oa.co.

 s

log(n) hGk

.

Cn(D+n) −sup

y∈S

fx(y)

=O φ−1x k

n b1!

+O(hG)b2 +Oa.co.

 s

log(n) hGk

. So, ∃c = sup

y∈S

fx(y) > 0, such that:













|Cn(Dn) −c|=Oa.co.

 s

log(n) hGk

.

|Cn(Dn+) −c|=Oa.co.

 s

log(n) hGk

. So, (L03) is verified.

(22)

We finally have:

sup

y∈S

fbkN Nx (y)−fx(y) =O

φ−1X

k n

α

+O(hG)b2+Oa.co.

 s

log(n) hGk

.

8. CONCLUSION AND PERSPECTIVES

We proposed in this work a convergence result of the nonparametric k-NN method estimation of the conditional mode for functional data. The almost complete pointwise convergence of this estimator with rates is given.

The effectiveness of the k-NN method resort on practical examples, as long as the smoothing parameter k takes values in a discrete set this provides a better implementation.

This work can be generalized to the dependent data (see Azzahrioui [14]).

In nonparametric statistics, uniform convergence is considered as a preliminary step to obtain sharper results, in this context, it would be very interesting to extend the results of Ferraty et al. [20].

Obtaining explicit expressions of the dominant terms of centered moments can be envisaged when we obtain the asymptotic normal result (see Delsol [11]).

This idea can be investigated in the future.

Acknowledgments. The authors thank the referee and the editors-in-chief for helpful comments. Supported by “L’Agence Th´ematique de Recherche en Sciences et Tech- nologie ATRST (Ex ANDRU)” in P.N.R., No.46/15.

REFERENCES

[1] M. Attouch and T. Benchikh,Asymptotic distribution of robust k-nearest neighbour es- timator for functional nonparametric models. Math. Vesnik64(2012),4, 275–285.

[2] D. Bosq,Linear processes in function spaces. Theory and Application. Lectures Notes in Stat. 149, Springer Verlag, New-York, 2000.

[3] F. Burba, F. Ferraty and P. Vieu,Convergence de l’estimateur noyau deskplus proches voisins en r´egression fonctionnelle non-param´etrique. C. R. Math. Acad. Sci. Paris 346(2008),5–6, 339–342.

[4] F. Burba, F. Ferraty and P. Vieu,k-nearest neighbour method in functional nonpara- metric regression. J. Nonparametr. Stat. 21(2009),4, 453–469.

[5] G. Collomb, Estimation de la r´egression par la m´ethode des k points les plus proches avec noyau: quelques propri´et´es de convergence ponctuelle. Nonparametric Asymptotic Statistics, Lecture Notes in Math. 821, Springer-Verlag, 159–175, 1980.

[6] G. Collomb, W. H¨ardle and S. Hassani, A note on prediction via conditional mode estimation. J. Statist. Plann. Inference15(1987), 227–236.

[7] S. Dabo-Niang,Estimation de la densit´e dans un espace de dimension infinie: application aux diffusions. C. R. Acad. Sci. Paris334(2002),3, 213–216.

Références

Documents relatifs

Keywords: Active optics, space telescope, large mirror deformation, wave-front error correction,

As only the vertical acceleration was a significant predictor of sEMG activity, it can be speculated that the significant increases in muscle activity during circular

The intestinal glucose-apelin cycle controls carbohydrate absorption in mice.: Regulation and role of apelin in mouse intestine.8. The Intestinal Glucose-Apelin

Pyogenic sacroiliitis in a 13-month-old child a case report and literature review.. Julien Leroux, Isabelle Bernardini, Lucie Grynberg, Claire Grandguillaume, Paul Michelin, Mourad

In this way, it will be possible to show that the L 2 risk for the pointwise estimation of the invariant measure achieves the superoptimal rate T 1 , using our kernel

In this paper, inspired by Progressive Visual Analytics [30], we introduce a progressive k-d tree for approximate k-nearest neighbor algorithm that can keep the latency for

We then propose a piecewise linear convex cost function, obtained by linear interpolation of power-efficient configuration points, that provides a good approximation of the

There has recently been a very important literature on methods to improve the rate of convergence and/or the bias of estimators of the long memory parameter, always under the