THE k-NEAREST NEIGHBORS ESTIMATION OF THE CONDITIONAL MODE FOR FUNCTIONAL DATA

(1)

OF THE CONDITIONAL MODE FOR FUNCTIONAL DATA

MOHAMMED KADI ATTOUCH and WAHIBA BOUABC¸ A Communicated by Dan Cri¸san

The aim of this article is to study the nonparametric estimation of the conditional mode by the k-Nearest Neighbor method for functional explanatory variables.

We give the rate of the almost complete convergence of the k-NN kernel estimator of the conditional mode. A real data application is presented to exhibit the effectiveness of this estimation method with respect to the classical kernel estimation.

AMS 2010 Subject Classification: 62G05, 62G08, 62G20, 62G35.

Key words: functional data, the conditional mode, nonparametric regression, k-NN estimator, rate of convergence, random bandwidth.

1. INTRODUCTION

The conditional mode, by its importance in the nonparametric forecast- ing field, has motivated a number of researchers in the investigation of mode estimators, and constitutes an alternative method to estimate the conditional regression (see Ould Sa¨ıd [24] for more discussion and examples).

In finite dimension spaces, there exists an extensive bibliography for independent and dependent data cases. In the independent case, strong consistency and asymptotic normality using the kernel method estimation of the conditional mode is given in Samanta and Thavaneswaran [26]. In the dependent case, the strong consistency of conditional mode estimator was obtained by Collomb et al. [6] and Ould-Sa¨ıd [24] under strong mixing conditions. (See Azzahrioui [14] for extensive literature).

In infinite dimension spaces, the almost complete convergence of the conditional mode is obtained by Ferraty et al. [18], Ezzahrioui and Ould Sa¨ıd [15] establish the asymptotic normality of nonparametric estimators of the conditional mode for both independent and dependant functional data. The consistency in Lp-norm of the conditional mode function estimator is given in Dabo-Niang and Laksaci [10].

Estimating the conditional mode is directly linked to density estimation, and for the last one, the bandwidth selection is extremely important for the

REV. ROUMAINE MATH. PURES APPL.58(2013),4,393–415

(2)

performance of an estimate. The bandwidth must not be too large, so as to prevent over-smoothing,i.e. substantial bias, and must not be too small either, so as to prevent detecting the underlying structure. Particularly, in nonparametric curve estimation, the smoothing parameter is critical for performance.

Based on this fact, this work deals with the nonparametric estimation with k nearest neighbors method k-NN. More precisely, we consider a kernel estimator of the conditional mode build from a local window taking into account the exact knearest neighbors with real response variable Y and functional curves X.

The k-Nearest Neighbor or k-NN estimator is a weighted average of response variables in the neighborhood of x. The existent bibliography of the k-NN method estimation dates back to Royall [25] and Stone [27] and has re- ceived, continuous developments (Mack [22] derived the rates of convergence for the bias and variance as well as asymptotic normality in multivariate case; Col- lomb [5] studied different types of convergence (probability, almost.sure (a.s.) and almost.completely (a.co.)) of the estimator of the regression function; De- vroye [12] obtained the strong consistency and the uniform convergence). For the functional data studies, the k-NN kernel estimate was first introduced in the monograph of Ferraty and Vieu [18], Burba et al. [4] obtained the rate of almost complete convergence of the regression function using thek-NN method for independent data and Attouch and Benchikh [1] established the asymptotic normality of robust nonparametric regression function.

The paper is organized as follows: the following section presents thek-NN model estimation of the conditional mode. Then, we give hypotheses and state our main result in Section 3. After we introduce the almost complete convergence and the almost complete convergence rate, in Section 5 we illustrate the effectiveness of thek-NN method estimation for real data. All proofs are given in the appendix.

2. MODELS AND ESTIMATORS

Let (X1, Y1), . . .(Xn, Yn) be n independent pairs, identically distributed as (X, Y) which is a random pair valued in F ×IR, whereF is a semi-metric space, d(., .) denoting the semi-metric. We do not suppose the exitance of a density for the functional random variable (f.r.v)X.

All along the paper when no confusion will be possible we will denote by c, c⁰ or / C_x some generic constant in R⁺ and in the following, any real function with an integer in brackets an exponent denotes its derivative with the corresponding order. From a sample of independent pair (X_i, Y_i), each having the same distribution as (X, Y), our aim is to build nonparametric estimates

(3)

of several function related with the conditional probability distribution of Y given X. Forx∈ F we will denote the conditional cdf of Y givenX =x by:

∀y ∈IR F_Y^X = IP(Y ≤y|X=x).

If this distribution is absolutely continue with respect to the Lesbesgues measure on R, then we will denote byf^x (resp. f^x(j)) the conditional density (resp. its j^th order derivative) of Y given X = x. Then we will give almost complete convergence results (with rates) for nonparametric estimates off^x(j). we will deduce immediately the convergence of conditional density estimate from the general results concerningf^x(j).

In the following, x will be a fixed inF;N_x will denote a fixed neighborhood ofx,S will be a fixed compact subset ofR, and we will use the notation B(x, h)={x⁰ ∈ F/ d(x,x⁰)<h}.

The k nearest neighbors (k-NN) of F on xis defined by:

Fb_{kN N}^x (y) =

n

X

i=1

K(H_n,k⁻¹d(x, Xi))G(h⁻¹_G (y−Yi))

n

X

i=1

K(H_n,k⁻¹d(x, Xi))

.

Now, we focuse on the estimation of the j^th order derivative of the conditional density f on x by the k nearest neighbors (k-NN), wherefb_{kN N}^x(j) (y) is defined by:

fb_{kN N}^x(j) (y) =

h^−j−1_G

n

X

i=1

K(H_k⁻¹d(x, X_i))G^(j+1)(h⁻¹_G (y−Y_i))

n

X

i=1

K(H_k⁻¹d(x, X_i))

, (1)

whereK is a kernel function andH_n,k(.) is defined as follows : H_n,k(x) =min

(

h∈R⁺/

n

X

i=1

1I_B(x,h)(X_i) =k )

.

H_k,n is a positives random variable (r.v.) which depends (x₁, ..., x_n), G is a cdf, andh_G=h_G,n is a sequence of positive real numbers.

We propose the estimateθb_{kN N} of the conditional modeθ(assumed uniquely defined in the compact set S) defined below in (2). Our estimate is based on the previous functional conditional density estimate defined in (1) as:

(2) fb^x(θbkN N) =sup

y∈S

fb_{kN N}^x (y)

(4)

Note that the estimate θb_{kN N} is note necessarily unique, and if this is the case, all the remaining of our paper will concern any valueθb_{kN N} satisfying (2).

To prove the almost complete convergence, and the almost complete convergence rate of the k-NN conditional mode estimate, we need some results of density estimate of Ferraty et al. [19]. Therefore, we present this density estimate by

fb^x(j)(y) =

h^−j−1_G

n

X

i=1

K(h⁻¹_k d(x, Xi))G^(j+1)(h⁻¹_G (y−Yi))

n

X

i=1

K(h⁻¹_k d(x, Xi)) and the conditional mode is

(3) fb^x(θb_mode) = sup

y∈S

fb^x(y).

In the case, the non-random bandwidthh:=hn(sequence of positive real numbers which goes to zero as ngoes to infinity). The random feature of the k-NN bandwidth represents both its main quality and also its major disadvan- tage. Indeed, the fact that H_n,k(x) is a r.v. creates technical difficulties in proofs because we cannot use the same tools as in the standard kernel method.

But the randomness of Hn,k(x) allows to define a neighborhood adapted to x and to respect the local structure of the data.

We will consider two kinds of nonparametric models. The first one, called the “continuity-type” model, is defined as:

f ∈ C_E⁰ ={f :F →IR/ lim

d(x,x⁰)→0f(x⁰) =f(x)}, (4)

and will issue pointwise consistency results. The second one called the “Lipchitz- type” model, assumes the existence of an α >0 such that

(5) f ∈Lip_E,α={f :F →IR/∃C >0,∀x⁰ ∈E,|f(x⁰)−f(x)|< Cd(x, x⁰)^α}, and will allow to obtain the rates of convergence.

3. TECHNICAL TOOLS

The first difficulty comes because the window H_n,k(x) is random. So, we do not have in the numerator and in denominator off_{kN N}^x (y) sums of variables independents. To resolve this problem, the idea is to frame sensibly H_n,k(x) by two non-random windows. More generally, these technical tools could be useful as long as one has to deal with random bandwidths.

(5)

Lemma 1 (Burba et al. [4]). Let (A_i, B_i)_i=1,n be n random pairs (independent but note, identically distributed) valued in F ×R⁺, where (F, α) a generic measurable space. We note W : R×F a measurable function such that:

(L0) ∀z∈ F,∀t, t⁰ ∈ R: t 6 t⁰ ⇒ W(t, z) 6 W(t⁰, z).

We put for every n ∈ N^∗, and T a real.r.v:

C_n(T) = sup

y∈S n

X

i=1

W(T, A_i)h⁻¹_G G⁽¹⁾_i (h⁻¹_G (y−B_i))

n

X

i=1

W(T, Ai)

.

Let (D_n)_N a sequence of real random variables. If for anyβ ∈]0,1[, there exists two sequence of r.r.v D⁻_n(β) and D_n⁺(β) which verify:

(L₁) ∀n ∈ N^∗ :D_n⁻6D⁺_n, and 1I_{D⁻

n6Dn6Dn⁺} a.co.

−−−→1.

(L2)

n

X

i=1

W(D⁻_n, A_i)

n

X

i=1

W(D⁺_n, A_i)

a.co.

−−−→β.

(L3) ∃c>0 such that: Cn(D⁻_n)−^a.co.−−→c and Cn(D⁺_n)−^a.co.−−→c . So, we have :

C_n(D_n)−^a.co.−−→c .

Lemma 2 (Burba et al. [4]). Let (A_i, B_i)_i=1,n be n random pairs (independent but note, identically distributed) valued in F ×R⁺, where (F, α) a generic measurable space. We note W : R×F a measurable function such that:

(L0) ∀z∈ F,∀t, t⁰ ∈ R: t 6 t⁰ ⇒ W(t, z) 6 W(t⁰, z).

We put for every n ∈ N^∗, and T a real.r.v:

C_n(T) = sup

y∈S n

X

i=1

W(T, A_i)h⁻¹_G G⁽¹⁾(h⁻¹_G (y−B_i))

n

X

i=1

W(T, A_i)

.

Let (D_n)_N a sequence of r.r.v. And let (v_n)n∈N a positive decreasing sequence.

(6)

1. if l = limv_n 6= 0, and if for every increasing sequence β_n∈]0,1[, there exists two sequences of r.r.v D⁻_n(β_n) and D_n⁺(β_n) which verifies:

(L₁) ∀n ∈ N^∗ :D_n⁻6D⁺_n , and 1I_{D⁻

n6Dn6D⁺n} a.co.

−−−→1,

(L⁰₂)

n

X

i=1

W(D_n⁻, Ai)

n

X

i=1

W(D_n⁺, A_i)

− βn = oa.co.(vn),

(L⁰₃) ∃c>0such that: Cn(D⁻_n)−c = oa.co.(vn) and Cn(D⁺_n)−c = oa.co.(vn),

2. Or if l = 0, and if for every increasing sequences βn∈]0,1[width limits 1, (L₁),(L⁰₂) and(L⁰₃) are verified.

So, we have:

C_n(D_n) − c = o_a.co.(v_n) .

Burbaet al. [4] use in their consistency proof of thek-NN kernel estimate for independent data a Chernoff-type exponential inequality to check Condition (L1) or (L⁰₁). In the case of thek-NN conditional mode, we can also use the exponential inequality.

Proposition 1 (Burbaet al. [4]). Let X1· · ·Xn independent random variables valued in {0,1} such as IP(X_i= 1) =p_i. We note X=

n

X

i=1

X_i, and µ= IE[X], so ∀δ >0

1. IP(X >(1 +δ)µ)<

e^δ (1 +δ)^1+δ

.

2. IP(X <(1−δ)µ)< e⁻ δ²

2µ .

Lemma 3 (Ferratyet al. [18]). Under the hypotheses(H1)and (H12)we have:

1 n φ_x(h_K)

n

X

i=1

K

d(x , Xi) h_K

a.co.

−−−→ 1.

(7)

4. HYPOTHESES AND RESULTS

We need the following hypotheses gathered together for easy reference:

(H1) There exists a nonnegative differentiable functionφstrictly increasing such that:

φx(h) = IP(X ∈B(x, h))>0.

(H2) ∀(y₁, y2) ∈ S × S, ∀(x₁, x2) ∈ N_x × N_x, |F^x¹(y1) − F^x²(y2)|

≤C_x d(x₁, x₂)^b¹+|y₁−y₂|^b²

or for somej ≥0.

(H3) ∀(y₁, y₂) ∈ S × S, ∀(x₁, x₂) ∈ N_x × N_x, |f^x¹^(j)(y₁)−f^x²^(j)(y₂)|

≤C_x d(x₁, x₂)^b¹+|y₁−y₂|^b² . (H4)

∀(y₁, y2)∈IR²,|G(y₁)−G(y2)| ≤C|y₁−y2| R|t|^b2G⁽¹⁾(t)dt <∞.

(H5) K is a function with support (0,1) such that 0< C1 < K(t)< C2

<∞.

(H6) The kernelGis a positive function supported on [0,1]. Its derivative G⁰ exists and is such that there exist two constantsC3 andC4 with

−∞< C3< G⁰(t)< C4<0 for 0≤t≤1.

(H7) lim

n→∞h_G= 0 width lim

n→∞n^αh_G=∞ for some α >0.

(H8)







∀(y₁, y2)∈IR²,

G^(j+1)(y1)−G^(j+1)(y2)

≤C|(y₁)−(y2)|;

∃ν >0,∀j⁰ ≤j+ 1, lim

y→∞|y|^1+ν|G^(j⁰⁺¹⁾(y)|= 0, G^(j+1)is bounded.

(H9) ∃ξ >0, f^x% on (θ−ξ, θ) andf^x &on (θ, θ+ξ).

(H10) f^x is j-times continuously differentiable with respect to y on (θ− ξ, θ+ξ),

(H11)







f^x(l)(θ) = 0,if 1≤l < j, and

f^x(j)(θ) >0.

(H12) The sequence of positive real numbers k_n = k satisfies: k n −→0, log(n)

k →0 and log(n)

k h^2j+1_G −→0 asn→ ∞.

(8)

Comments on the hypotheses:

1. (H1) is an assumption commonly used for the explanatory variable X (see Ferraty and Vieu [17]) for more details and examples).

2. Hypotheses (H2)–(H3) characterizes the structural functional space of our model and are needed to evaluate the bias term in our asymptotic properties.

3. (H5) describes complementary assumptions concerning the kernel function, focus on non-continuous kernels

4. Hypotheses (H6)–(H8) are the same as those used in Ferraty et al. [18].

5. The convergence of the estimator can be obtained under the minimal assumption (H9).

6. (H10) and (H11) are classical hypotheses in the functional estimation in finite or infinite dimension spaces.

7. (H12) describes technical conditions imposed for the brevity of proofs.

5. ALMOST COMPLETE CONVERGENCE

AND ALMOST COMPLETE CONVERGENCE RATE In order to link the existing literature with this works, we present the consistency result of the conditional mode estimation given in Ferratyet al.[18].

Theorem 1 (Ferraty et al. [18]). Under the model (4), suppose that hypotheses (H3) and (H8)–(H12) are verified for j= 0, then if (H1), (H4)–(H5) and (H7)–(H9)–(H11), we have

n→∞lim bθmode(x) =θ(x) almost completely. (a.co.)

Under the model (5) and the same condition of Theorem 1 we have:

θbmode(x)−θ(x) =O

h^b_K¹+h^b_G² ¹_j

+O

log(n)

n hGφx(hK) _2j¹

a.co.

Now, we state the almost complete convergence, and almost complete convergence rate result for the nonparametrick-NN mode introduced in (2).

Theorem2. Under the model (4), suppose that hypotheses (H3) and (H8) are verified forj= 0. Then if (H1), (H4)–(H5) and (H7)–(H9)–(H12), we have

n→∞lim θb_{kN N}(x) =θ(x) a.co.

(6)

(9)

Under the model (5) and the same condition of Theorem 2 we have:

θb_{kN N}(x)−θ(x) = O

φ⁻¹_x ^k_nb1

+h^b_G²¹_j +O

log(n) hGk

_2j¹

a.co.

(7)

Proof. The conditional density f^x(.) is continuous (see (H3) and (H9)).

We get

∀ >0, ∃σ()>0,∀y∈(θ(x)−ξ, θ(x) +ξ),

|f^x(y)−f^x(θ(x))| ≤σ()⇒ |y−θ(x)| ≤. By construction θb_{kN N}(x)∈(θ(x)−ξ, θ(x) +ξ) then

∀ >0, ∃σ()>0, |f^x(θbkN N)−f^x(θ(x))| ≤σ()⇒ |y−θ(x)| ≤. So, we finally arrive at

∃σ()>0, IP(|bθ_{kN N}(x)−θ(x)|> )≤IP(|f^x(θb_{kN N}(x))−f^x(θ(x))|> δ()).

In the other case, it comes directly by the definition ofθ(x) andbθ(x)_{kN N} that:

(8) |f^x(bθ_{kN N}(x))−f^x(θ(x))|

=|f^x(bθ_{kN N}(x))−fb^x(bθ_{kN N}(x)) +fb^x(bθ_{kN N}(x))−f^x(θ(x))|

≤ |f^x(θbkN N(x))−fb^x(θbkN N(x))|+|fb^x(bθkN N(x))−f^x(θ(x))|

≤2 sup

y∈S

|fb_{kN N}^x (y)−f^x(y)|.

The uniform completes convergence of the conditional density estimate over the compact set |θ−ξ, θ+ξ| (see Lemma 1 below) can be used leading directly from both previous inequalities to:

∀ >0,

n

X

i=1

IP(|bθkN N(x)−θ(x)|> )<∞.

Finally, the claimed consistency result (5) will be proved as long as the following lemma could be checked.

Lemma 4. Under the conditions of Theorem 2 we have:

n→∞lim sup

y∈S

|fb_{kN N}^x (y)−f^x(y)|= 0 a.co.

We have already shown in (5) that

|f^x(θbkN N)−f^x(θ)|= 2 sup

y∈S

|fb_{kN N}^x (y)−f^x(y)|.

(9)

(10)

Let us now write the following Taylor expansion of the function f^x: f^x(bθ_{kN N}) = f^x(θ) + 1

j!f^x(j)(θ^∗) (bθ_{kN N} −θ)^j ,

for some θ^∗ betweenθ and θb_{kN N} because (9), as along as we could be able to check that

∀τ >0,

n

X

i=1

IP

f^x(j)(θ^∗)< τ

<∞.

(10)

We would have

(θb_{kN N} −θ)^j =O sup

y∈S

|fb_{kN N}^x (y)−f^x(y)|

! a.co., (11)

so it suffices to check (10), and this is done directly by using the second part of (H11) together with (6).

6. REAL DATA APPLICATION

Now, we apply the described method to some chemometrical real data.

We work on spectrometric data used to classify samples according to a physico- chemical property which is not directly accessible (and therefore, requires a specific analysis). The data were obtained using a Tecator InfratecFood and Feed Analyzer working in a near-infrared (wavelengths between 850 and 1050 nanometers). The measurement is performed by transmission through a sample finely chopped meat which is then analyzed by a chemical process to determine its fat. The spectra correspond to the absorbance (–log10 of the transmittance measured by the device) 100 for wavelengths between 850 regularly distributed and 1050 nm. Meat samples were divided into two classes according to their contents more or less than 20% fat (77 was more spectra corresponding to 20%

fat and 138 with less than 20% fat). The problem is then to discriminate the spectra to avoid the chemical analysis, which is costly and time consuming. The figure shows absorbance versus wavelength (850–1050) for 215 selected pieces of meat. Note that, the main goal of spectrometric analysis is to allow the discovery of the proportion of some specific chemical content (see Ferraty and Vieu (2006) for further details related to spectrometric data). At this stage, one would like to use the spectrometric curve X to predict Y the proportion of fat content in the piece of meat. The data are available on the web site

“http://lib.stat.cmu.edu/datasets/tecator”.

(11)

Fig. 1. The 215 spectrometric curves,{Xi(t), t∈[850,1050], i= 1,· · ·,215}.

However, as described in the real data application (see Ferratyet al. [18]), the prediction problem can be studied by using the conditional mode approach.

Starting from this idea, our objective is to give a comparative study to estimate the conditional mode of both methods: the kernel method estimation defined in (3) and the k-NN method estimation defined in (2).

For the both methods, we use a quadratic kernel functionK, defined by:

K(x) = 3

2(1−x²)1I_[0,1]

and the distribution functionG(.) is defined by G(x) =

Z x

−∞

15

4 t²(1−t²)1I_[−1,1](t)dt.

Note that, the shape of these spectrometric curves is very smooth, for this we used the semi-metric defined by the L₂ distance between the second derivative of the curves (see Ferraty and Vieu [16] for more motivations of this choice). We proceed by the following algorithm:

• Step 1. We split our data into two subsets:

– (Xj, Yj)j=1,...,172 training sample, – (Xi, Yi)i=173,...,215 test sample.

• Step 2. We compute the mode estimator dθ_X_j, for all j by using the training sample.

(12)

• Step 3. For each X_i in the test sample, we set: i∗= arg min

j=1,...,172

d(X_i, X_j).

• Step 4. For alli= 173, . . . ,215 we take Ybi =θbmode(Xi^∗),

whereXi^∗ is the nearest curve to Xi in the training sample.

And

Yei=bθkN N(Xkopt),

whereas, the optimal number of neighbors kopt is defined by k_opt=argmin

k CV(k), where

CV(k) =

215

X

i=171

(Y_i−θb_{kN N}⁻ⁱ (x))² with

θb⁻ⁱ_{kN N}(x) =argmax

y (fb_{kN N}^x )⁻ⁱ(y) and

(fb_{kN N}^x )⁻ⁱ(y) = h⁻¹_G P₁₇₂

j=1,j6=iK(H_k⁻¹(x)d(x, X_j))G⁽¹⁾(h⁻¹_G (y−Y_j)) P₁₇₂

j=1,j6=iK(H_k⁻¹(x)d(x, X_j)) . The error used to evaluate this comparison is the mean square error (MSE) expressed by

1 45

215

X

i=173

|Y_i−Tb(Xi)|²,

whereTbdesignate the estimator used: kernel ork-NN method to estimate the conditional mode.

The MSE of k-NN mode estimation is 1.716115 and the MSE of the classical mode estimation used in Ferraty et al. [18] is 2.97524, this result exhibits the effectiveness of the k-NN method.

7. APPENDIX

Proof of Lemma 2. For i=1,...,n, we consider the quantities G⁽¹⁾_i (y) = G⁽¹⁾(h⁻¹_G (y−Bi)).

Under the definition of the r.v D_n, we put: D⁻_n ≤D_n≤D_n⁺. It’s clear that:

W (D_n⁻(β), A_i) 6 W(D_n(β), A_i)6 W (D_n⁺(β), A_i)

n

X

i=1

W D⁻_n(β), A_i 6

n

X

i=1

W(D_n(β), A_i)6

n

X

i=1

W D_n⁺(β), A_i .

(13)

So: 1

n

X

i=1

W D_n⁺(β), Ai

6 1

n

X

i=1

W(Dn(β), Ai)

6 1

n

X

i=1

W D_n⁻(β), Ai

.

Under the hypotheses (H7)–(H8) and (H9), we have:

sup

y∈S n

X

i=1

W(D⁻_n(β), Ai)h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W(D_n⁺(β), Ai)

| {z }

Cn⁺(β)

6sup

y∈S n

X

i=1

W(Dn(β), Ai)h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W(Dn(β), Ai)

| {z }

Cn(Dn)

6 sup

y∈S n

X

i=1

W(D_n⁺(β), Ai)h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W(D⁻_n(β)(β), A_i)

| {z }

C⁻n(β)

.

Let us put the following r.v:

C_n⁻(β) = sup

y∈S n

X

i=1

W D_n⁻(β), A_i

h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W D_n⁺(β), A_i and

C_n⁺(β) = sup

y∈S n

X

i=1

W D_n⁺(β), A_i

h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W D_n⁻(β), A_i .

We consider for every > 0:

T_n() ={C_n(D_n)/ c− 6 C_n(D_n) 6 c+}, and for everyβ ∈]0,1[, we note:

S_n⁻(, β) ={C_n⁻(β)/ c− 6 C_n⁻(β) 6 c+}, S_n⁺(, β) ={C_n⁺(β)/ c− 6 C_n⁺(β) 6 c+}, and

S_n(β) ={C_n(D_n)/ C_n⁻(β) 6 C_n(D_n) 6 C_n⁺(β)}.

(14)

It is obvious that:

Sn(β) ∩ S_n⁻( , β) ∩ S_n⁺( , β) ⊂ Tn(). (12)

Let

0 = 3c

2 and β= 1−

3c

. By denoting:

W_n⁻() ={C_n⁻(β)/ βc−

3 6 C_n⁻(β) 6 βc+ 3}, W_n⁺() ={C_n⁺(β)/ c

β

−

3 6 C_n⁺(β) 6 c β

+ 3}, W_n() ={D_n/ D_n⁻(β) 6 D_n 6 D⁺_n(β)}. We see that for ∈] 0, ₀[:







βc−

3 > c− , βc+

3 6 c+ ,

and





 c β −

3 > c− , c

β

+

3 6 c+ . So, for ∈] 0, 0[:

W_n⁻() ⊂S_n⁻( , β) and W_n⁺() ⊂S_n⁺( , β). (13)

(L0) implies that:

∀z∈ F,∀t, t⁰ ∈ R: t 6 t⁰ ⇒ W(t, z) 6 W(t⁰, z). And in particular:

Dn ∈ Wn() ⇒ D⁻_n(β) 6 Dn 6 D_n⁺(β) Dn ∈ Wn() ⇒ C_n⁻(β) 6 Cn(Dn) 6 C_n⁺(β)

⇒ Cn(Dn) ∈ Sn(β) . So:

Wn() ⊂ Sn(β). (14)

The result (12), (13) and (14) implies:

∀ ∈] 0, 0[, W_n⁻() ∩ W_n⁺() ∩ Wn() ⊂ Tn(). In other words:

∀ ∈] 0, 0[, 1I^c_T_n₍₎ 6 1IcWn⁻()+ 1IcWn⁺()+ 1I^c_W_n₍₎. (15)

On the other hand, we can express the r.r.v C_n⁻ and C_n⁺ in the following way:

(15)

C_n⁻(β) = sup

y∈S n

X

i=1

W D⁻_n(β), Ai

h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W D⁺_n(β), Ai

= C_n(D_n⁻)×

n

X

i=1

W D⁻_n(β), Ai

n

X

i=1

W D⁺_n(β), Ai

,

and

C_n⁺(β) = sup

y∈S n

X

i=1

W D⁺_n(β), A_i

h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

W D_n⁻(β), Ai

= Cn(D⁺_n)×

n

X

i=1

W D_n⁺(β), A_i

n

X

i=1

W D_n⁻(β), A_i .

So, under (L2) and (L3) :

C_n⁻(β)−^a.co.−−→β c and C_n⁺(β)−^a.co.−−→c/β . (16)

Because:

cW_n⁻() ={C_n⁻(β)/

C_n⁻(β) − βc >

3}, the first part of (16) implies:

C_n⁻(β)−^a.co.−−→βc ⇒ ∀⁰ > 0 : P

P(|C_n⁻(β)−βc| > ⁰) < ∞

⇒ ∀ ∈] 0, 0[ : P P

1IcWn⁻() >

< ∞. So,

∀∈] 0, 0[, 1IcWn⁻() a.co.

−−−→0. (17)

In the same way,

∀∈] 0, 0[, 1IcWn⁺() a.co.

−−−→0. (18)

Under (L1): 1I_{D⁻

n6Dn6Dn⁺} a.co.

−−−→1,it implies that:

∀∈] 0, ₀[ : 1I_W_n₍₎−^a.co.−−→1.

(16)

In other words,

∀∈] 0, ₀[, 1I^c_W_n₍₎−^a.co.−−→0. (19)

We consider the results (15), (17), (18) and (19)

∀∈] 0, 0[, 1I^c_T_n₍₎−^a.co.−−→0. We can repeat as follows:

1IcTn() a.co.

−−−→0 ⇒ ∀ ∈] 0, ₀[ : P

P |1IcTn()| >

< ∞

⇒ ∀ ∈] 0, ₀[ : P

P(|C_n(D_n)−c| > ) < ∞. And we finally obtain:

C_n(D_n)−^a.co.−−→c.

Proof of Theorem 2 (Almost Complete Convergence). Let G⁽¹⁾_i (y) = G⁽¹⁾(h⁻¹_G (y−Y_i)).

For the demonstration of this Theorem we have to use Lemma 2 with:

• F =E , α=B_E .

• ∀i= 1, n : (A_i, B_i) = (X_i, Y_i) .

• D_n=H_n and C_n(D_n) = sup

y∈S

fˆ_{kN N}^x (y) .

• ∀(t , z)∈R⁺×E : W(t , z) =K

d(x , z) t

.

And for one β ∈]0,1[, we choose D_n⁻ and D_n⁺ such that:

φ_x(D_n⁻) =p β k

n and φ_x(D⁺_n) = 1 pβ

k n .

For the demonstration, we just have to show that the kernel K verifies (L0) and that the sequences (D⁻_n)_N , (D_n⁺)_N and (Dn)_N verify (L1), (L2) and (L3).

Verification of (L0) :

∀t , t⁰ ∈R⁺ : t 6 t⁰ ⇔ ∃v ∈[1,+∞[/ t⁰ = v t

⇔ ∃u ∈]0,1]/ 1

t⁰ = u1 t

⇔ ∃u ∈]0,1]/ K

kx−zk t⁰

=K

ukx−zk t

,

∀z ∈E.

(17)

We have that K is a bounded kernel, verifying: ∀z ∈ E, ∀u ∈ [0,1] : K(uz)≥K(z).So, we have:

K

kx−zk t

6 K

ukx−zk t

, where:

∀t , t⁰ ∈R⁺,∀z ∈E : t 6 t⁰ ⇒K

kx−zk t

6 K

kx−zk t⁰

. So, (L₀) is verified.

Verification of (L1) :

The proof of (L₁) is the same one as in Burbaet al. [4].

Verification of (L2) : According to (H12) :







n→+∞lim φx(D⁻_n) = lim

n→+∞

pβ k n = 0,

n→+∞lim φ_x(D⁺_n) = lim

n→+∞

1 pβ

k n = 0. For (H1), we deduce that:

( lim

n→+∞D⁻_n = 0,

n→+∞lim D⁺_n = 0. And by (H12) we have:











n→+∞lim

log(n) nφx(D⁻n)

= lim

n→+∞







log(n) np

β k

n







= 0,

n→+∞lim

log(n) nφ_x(D⁺_n)

= lim

n→+∞







pβ log(n) n

k n







= 0.

Because (D⁻_n) and (D_n⁺) verify the conditions of Lemma 4, we have:











Q(D_n⁻) = 1 n φx(D⁻n )

n

X

i=1

K

d(x , X_i) D⁻n

a.co.

−−−→ 1, Q(D_n⁺) = 1

n φ_x(D⁺n )

n

X

i=1

K

d(x , Xi) D⁺n

a.co.

−−−→ 1.

(18)

So, Q(D⁻_n ) Q D⁺n

=

1 n φ_x(D⁻n)

n

X

i=1

K

d(x , Xi) Dn⁻

1 n φ_x(D⁺n)

n

X

i=1

K

d(x , Xi) Dn⁺

a.co.

−−−→ 1.

We can deduce that:

n

X

i=1

K

d(x , Xi) D⁻n

n

X

i=1

K

d(x , Xi) D⁺_n

a.co.

−−−→ β.

So, (L₂) is checked.

Verification of (L₃) :

Because D_n⁻ and D_n⁺ verify the condition of Theorem 1, we obtain :









 sup

y∈S n

X

i=1

K

d(x , Xi) Dn⁻

h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

K

d(x , Xi) Dn⁻

a.co.

−−−→ f^x(y),

sup

y∈S n

X

i=1

K

d(x , X_i) Dn⁺

h⁻¹_G G⁽¹⁾_i (y)

n

X

i=1

K

d(x , Xi) Dn⁺

a.co.

−−−→ f^x(y).

So, (L3) verifies. That finishes the proof of the first part of Theorem 2.

Proof of Theorem 2 (Almost Complete Convergence Rate). In the same way as the demonstration of the first part of Theorem 2 (almost complete convergence), with the following notations:

• F =E , α=B_E .

• ∀i= 1, n : (A_i, B_i) = (X_i, Y_i)

• D_n=H_n and C_n(D_n) = ˆf_{kN N}^x (y)

• ∀(t , z)∈R⁺×E : W(t , z) =K

d(x , z) t

.

(19)

And for one β_n∈]0,1[, we chooseD⁻_n andD⁺_n such that:

φ_x(D⁻_n) =p β_n k

n and φ_x(D_n⁺) = 1 pβ_n

k n .

For the proof, it is enough to show that the kernel K verifies (L0) and the sequences (D⁻_n)_N , (D⁺_n)_N and (D_n)_Nverify (L₁), (L⁰₂) and (L⁰₃).

Verification of (L0) and (L1) :

The verification of (L₀) and (L₁) is made in the same way as for Lemma 1.

Verification of (L⁰₂) :

We have already shown in Ferratyet al. [18] that:

fb_N^x(y) = 1 n h_GIE [K₁]

n

X

i=1

K(h⁻¹_K d(x, X_i))G⁽¹⁾_i (y) and

Fb_D^x = 1 nIE [K1]

X

i=1

K(h⁻¹_K d(x, Xi)).

So,

fb^x(y)−f^x(y) = 1 Fb_D^x

n

fb_N^x(y)−IEfb_N^x(y)

−

f^x(y)−IEfb_N^x(y)o +fb_N^x(y)

Fb_D^x n

IEFb_D^x −Fb_D^x o

. To show that:

sup

y∈S

fb^x(y)−f^x(y) =O

h^b1_K +O

h^b2_G +O

s logn nh_Gφ_x(h_K)

!

, a.co.

It is enough to show that:

(A) 1

Fb_D^x sup

y∈S

IEfb_N^x(j)(y)−f^x(j)(y) =O

h^b1_K +O

h^b2_G

, a.co.

(B) 1

Fb_D^x sup

y∈S

fb_N^x(j)(y)−IEfb_N^x(j)(y) =O

s log(n)

nh^2j+1_G φ_x(h_K)

!

, a.co.

(C) Fb_D^x−IEFb_D^x =O s

log(n) nφx(h_K)

!

, a.co.

(D) ∃δ > 0, such that X P

h Fb_D^x

< δi

< +∞.

(20)

We put: h⁻ =D_n⁻=φ⁻¹_x

pβ_n k n

and h⁺=D⁺_n =φ⁻¹_x 1 pβn

k n

! . Because : lim

n→+∞h⁻ = lim

n→+∞h⁺ = 0 and

n→+∞lim

log(n)

n hGφx(Dn⁻) = lim

n→+∞

log(n) n hGφx(Dn⁺)

= lim

n→+∞

log(n) h_Gk lim

n→+∞

log(n) k = 0 (because β_n is bounded by 1).

We apply Lemma 4 and we obtain both results:

1. 1 n

n

X

i=1

K

d(x, Xi) h⁻

− φx(h⁻) = oa.co.

s logn n φx(h⁻)

! . What implies:

1 n

n

X

i=1

K

d(x, Xi) D⁻n

− p βn

k

n = oa.co.

rlogn k

! .

2. 1 n

n

X

i=1

K

d(x, X_i) h⁺

− φ_x(h⁺) = o_a.co.

s logn n φ_x(h⁺)

! . So,

1 n

n

X

i=1

K

d(x, X_i) Dn⁺

− 1 pβn

k

n = o_a.co.

rlogn k

! . We have:

n

X

i=1

K

d(x, X_i) Dn⁻

n

X

i=1

K

d(x, X_i) Dn⁺

− β_n = o_a.co.

rlogn k

! .

And according to (H12)

n

X

i=1

K

d(x, X_i) Dn⁻

n

X

i=1

K

d(x, X_i) Dn⁺

− β_n = o_a.co.

rlogn h_Gk

! .

(L⁰₂) is also verified.

(21)

Verification of (L⁰₃) :

Because h⁺ andh⁻ verify the conditions, we can apply Theorem 1, that gives:











sup

y∈S n

X

i=1

K

d(x, Xi) Dn⁻

hGG⁽¹⁾_i (y)

n

X

i=1

K

d(x, Xi) Dn⁻

−sup

y∈S

f^x(y)

= O (D_n⁻)^b¹

+O(h_G)^b² +O_a.co.

s log(n) n hGφx(D⁻n)

! .

sup

y∈S n

X

i=1

K

d(x, X_i) Dn⁺

h_GG⁽¹⁾_i (y)

n

X

i=1

K

d(x, X_i) Dn⁺

−sup

y∈S

f^x(y)

= O (D_n⁺)^b¹

+O(h_G)^b² + Oa.co.

s log(n) n h_Gφ_x(D⁺n)

! . In other words,











Cn(D⁻_n) −sup

y∈S

f^x(y)

=O φ⁻¹_x k

n b1!

+O(hG)^b² +Oa.co.



 s

log(n) hGk



.

C_n(D⁺_n) −sup

y∈S

f^x(y)

=O φ⁻¹_x k

n b1!

+O(h_G)^b² +O_a.co.



 s

log(n) hGk



. So, ∃c = sup

y∈S

f^x(y) > 0, such that:











|C_n(D_n⁻) −c|=Oa.co.



 s

log(n) h_Gk



.

|C_n(D_n⁺) −c|=Oa.co.



 s

log(n) hGk



. So, (L⁰₃) is verified.

(22)

We finally have:

sup

y∈S

fb_{kN N}^x (y)−f^x(y) =O

φ⁻¹_X

k n

α

+O(hG)^b²+Oa.co.



 s

log(n) h_Gk



.

8. CONCLUSION AND PERSPECTIVES

We proposed in this work a convergence result of the nonparametric k-NN method estimation of the conditional mode for functional data. The almost complete pointwise convergence of this estimator with rates is given.

The effectiveness of the k-NN method resort on practical examples, as long as the smoothing parameter k takes values in a discrete set this provides a better implementation.

This work can be generalized to the dependent data (see Azzahrioui [14]).

In nonparametric statistics, uniform convergence is considered as a preliminary step to obtain sharper results, in this context, it would be very interesting to extend the results of Ferraty et al. [20].

Obtaining explicit expressions of the dominant terms of centered moments can be envisaged when we obtain the asymptotic normal result (see Delsol [11]).

This idea can be investigated in the future.

Acknowledgments. The authors thank the referee and the editors-in-chief for helpful comments. Supported by “L’Agence Th´ematique de Recherche en Sciences et Tech- nologie ATRST (Ex ANDRU)” in P.N.R., No.46/15.

REFERENCES

[1] M. Attouch and T. Benchikh,Asymptotic distribution of robust k-nearest neighbour estimator for functional nonparametric models. Math. Vesnik64(2012),4, 275–285.

[2] D. Bosq,Linear processes in function spaces. Theory and Application. Lectures Notes in Stat. 149, Springer Verlag, New-York, 2000.

[3] F. Burba, F. Ferraty and P. Vieu,Convergence de l’estimateur noyau deskplus proches voisins en r´egression fonctionnelle non-param´etrique. C. R. Math. Acad. Sci. Paris 346(2008),5–6, 339–342.

[4] F. Burba, F. Ferraty and P. Vieu,k-nearest neighbour method in functional nonparametric regression. J. Nonparametr. Stat. 21(2009),4, 453–469.

[5] G. Collomb, Estimation de la régression par la méthode des k points les plus proches avec noyau: quelques propriétés de convergence ponctuelle. Nonparametric Asymptotic Statistics, Lecture Notes in Math. 821, Springer-Verlag, 159–175, 1980.

[6] G. Collomb, W. H¨ardle and S. Hassani, A note on prediction via conditional mode estimation. J. Statist. Plann. Inference15(1987), 227–236.

[7] S. Dabo-Niang,Estimation de la densit´e dans un espace de dimension infinie: application aux diffusions. C. R. Acad. Sci. Paris334(2002),3, 213–216.