Strong consistency of kernel estimator in a semiparametric regression model

(1)

HAL Id: hal-01897055

https://hal.archives-ouvertes.fr/hal-01897055

Preprint submitted on 16 Oct 2018

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Strong consistency of kernel estimator in a semiparametric regression model

Emmanuel de Dieu Nkou, Guy Martial Nkiet

To cite this version:

Emmanuel de Dieu Nkou, Guy Martial Nkiet. Strong consistency of kernel estimator in a semiparametric regression model. 2018. �hal-01897055�

(2)

Strong consistency of kernel estimator in a semiparametric regression model

Emmanuel de Dieu NKOU and Guy Martial NKIET

Universit´e des Sciences et Techniques de Masuku, Franceville, Gabon;

ARTICLE HISTORY Compiled September 23, 2018

ABSTRACT

Estimating the effective dimension reduction (EDR) space, related to the semiparametric regression model introduced by Li [9], is based on the estimation of the covariance matrix Λ of the conditional expectation of the vector of predictors given the response. An estimatorΛbnof Λ based on kernel method was introduced by Zhu and Fang [17] who then derived, under some conditions, the asymptotic distribution of√

n

Λbn−Λ

, asn→+∞. In this paper, we obtain, under specified conditions, the almost sure convergence ofΛbnto Λ, asn→+∞.

KEYWORDS

Strong consistency; Kernel estimator; Semiparametric; Regression

1. Introduction

Given a univariate response variableY, we consider the regression model:

Y =F(β₁^TX, ..., β_N^TX, ε), (1) whereX is a d-dimensional random vector with covariance matrix assumed, without loss of generality, to be the identity matrix,N is an integer of N^∗ such that N < d, β₁,· · · , β_N are vectors inR^d, andεis a real random variable that is independent ofX, and F is an arbitrary unknown function onR^N+1. This model, introduced by Li [9], permits to achieve dimension reduction since the numberN of variables to be considered for estimatingF is less than the initial dimension dof the regressor vectorX. It expresses the fact that the projection ofX onto theN-dimensional subspace spanned byβ₁^TX, ..., β_N^TX, named the effective dimension reduction (EDR) space, contains all information about the response variableY. Estimating N and the EDR space is then a crucial issue that has been tackled in several works (e.g., [1,2,3,4,8,9,10,11,12,13,16]).

Since the directionsβ₁,· · ·, β_K are, under some conditions, characterized as eigenvec- tors of the covariance matrix Λ ofE(X|Y), the aforementioned estimation problem is based on estimation of Λ. The most popular method for doing that is based on slicing the range of Y and leads to the well known sliced inverse regression (SIR) method that was introduced by Li [9] (see also [8]). An alternative method was introduced

(3)

by Zhu and Fang [17]; in this work an estimator Λbn of Λ based on kernel method is proposed and the limiting distribution of √

n

Λb_n−Λ

, as n → +∞, is derived under some conditions. Since this result just implies weak consistency of Λb_n, that is the convergence in probability ofΛbn to Λ as n→+∞, it is natural to wonder if one could obtain strong consistency forΛb_n.

In this paper, we tackle this problem and we prove, under some conditions, the almost sure convergence ofΛb_n to Λ, as n →+∞. The paper is organized as follows:

Section 2 is devoted to the presentation of the used estimator, that is the estimator given in [17]. In Section 3, the assumptions needed for our results are given, and the main theorems that establish the aforementioned consistency are given. Then, the proofs of all lemmas and theorems are postponed in Section 4.

2. Preliminaries and notations

Lettingf be the density ofY, we suppose that, for all y∈R, we havef(y)>0; then, for any j= 1,· · ·, d, we consider

Rj(y) =E(Xj|Y =y) = gj(y)

f(y) where gj(y) = Z

R

zf₍_{Xj ,Y}₎(z, y)dz, f₍_{Xj ,Y}₎ being the density of the pair (X_j, Y). Then, we consider the random vector

R(Y) =

R₁(Y), ..., R_d(Y) T

=

E(X₁|Y), ...,E(X_d|Y) T

=E

X|Y

and its covariance matrix Λ =Cov

E(X|Y)

which is of great importance since the EDR space is obtained from its spectral analysis (e.g. [9]). It cannot be computed in practice since it depends on the distribution of (X, Y) which is generally unknown;

that is why approaches for its estimation have been investigated by several authors.

Li [9] considered an estimation method based on slicing the range ofY, so introducing sliced inverse regression, whereas Zhu and Fang [17] introduced a kernel estimator.

More precisely, considering an i.i.d. sample (Y_i, X_i)_i=1,...nof the pair (Y, X) of random variables connected according to model (1) and putting

X_i = (X_i1,· · ·, X_id)^T , we define kernel estimates off and theg_j’s by:

fb_n(y) = 1 n

n

X

i=1

1 hn

K

y−Y_i hn

, bg_j,n(y) = 1 n

n

X

i=1

X_ij 1 hn

K

y−Y_i hn

,

wherehn is a bandwidth andK(·) is a kernel function. In order to avoid small values in the denominator, Zhu and Fang (1996) proposed to consider

f_b_n(y) = max f(y), b_n

and fb_b_n(y) = max

fb_n(y), b_n ,

(4)

where (bn)_n∈_N∗ is a sequence of positive real numbers that satisfies the property:

limn→+∞(b_n) = 0. Then, the R_b_n_,j’s defined by R_b_n_,j(y) = g_j(y)

fbn(y) are estimated by

Rbbn,j(y) = bgj,n(y) fb_b_n(y) and putting

Rb_b_n(y) =

Rb_b_n_,1(y), ...,Rb_b_n_,d(y) T

,

we take as estimator of Λ the random matrix:

Λb_n= 1 n

n

X

i=1

Rb_b_n(Y_i) Rb_b_n(Y_i)T

.

This estimator was considered in [17] who then proved that√ n

Λb_n−Λ

converges in distribution, as n → +∞, to a normal distribution. This result implies that Λb_n converges in probability, as n → +∞, to Λ, that is weak consistency of the estimator. In the following section, we establish, under specified conditions, the almost sure convergence ofΛbn to Λ as n→+∞.

3. Assumptions and main results

In this section, we present our assumptions, then we give the main results that establish almost sure convergence ofΛb_n to Λ asn→+∞.

Assumption 1. The random variableXis bounded, i.e. there existsG >0 such that kXk_d≤G, wherek · k_d is the usual Euclidean norm of R^d.

Assumption 2. The random variableY has a bounded density f.

Assumption 3. Theg_j’s andf are 3-times differentiable and their third derivatives satisfy the following condition: there exists a neighborhood of the origin, say U, and a constantc >0 such that, for any u∈U,

f⁽³⁾(y+u)−f⁽³⁾(u)

≤c|u| and

g_j⁽³⁾(y+u)−g⁽³⁾_j (u)

≤c|u|, forj= 1,· · ·, d.

Assumption 4. For any pair (k, `) such that 1≤k, `≤d, and any u∈U,

|R_k(y+u)R_`(y+u)−R_k(y)R_`(y)| ≤c|u|.

(5)

Assumption 5. There exists an integerr >6 such that, for any j∈ {1,· · ·, d}, the functiong_j belongs to the set

Σ (r, L, α) =n

g∈ D^r / ∀(x, y),

g^(r)(x)−g^(r)(y)

≤L|x−y|^αo ,

where α ∈]0,1], L > 0 and β := r+α satisfies β > 7, and D^r denotes the space of r-times differentiable functions.

Assumption 6. (i) The kernel K is continuous and its support is the interval [−1,1];

(ii) K is symmetric about 0;

(iii) The kernelK is bounded, that is: sup_u∈_R|K(u)|=D <+∞.

(iv) The kernelK is of order r, that is Z

u^kK(u)du= 0 for k∈ {1,2,· · ·, r}; (v)

Z

|K(u)|du <+∞ and Z

|u|^β|K(u)|du <+∞.

Assumption 7. When n is large enoughh_n ∼ n^−c¹ and b_n ∼n^−c² where c₁ and c₂ are numbers satisfyingc₁ >0, 0< c₂ <1/10 and 1/8 +c₂/4< c₁ <1/4−c₂.

Assumption 8. The eigenvaluesλ₁,· · ·, λ_d of Λ verify: λ₁ >· · ·> λ_d>0.

The assumptions 3, 4, 6-(i), 6-(ii) and 7 was introduced in [17] and are necessary here to use some results of this paper. Assumption 2 concerns the density ofY and is classical since it is satisfied for the usual probability distributions. The assumptions 5, 6-(iv) and 6-(v) are classical assumptions of nonparametric statistics literature (see, e.g., [15]). Assumption 6-(iii) is satisfied, for instance, by the gaussian kernel.

Remark 1. For overcoming technical difficulties due to small values in the denominator, Zhu and Fang (1996) introduced the modified versionfb_b_n = max(fbn, bn) of the kernel estimatefbnof the densityf. But this approach does not guarantee that we get a good estimator off. Indeed, if we takeb_n =n^−1/11, then b_n is still larger than 1/2 for very large values ofn (for example n= 2000). So, every value of fbn could be cut off and, therefore, fbbn would have a constant value. This is an undesirable property that makesfb_b_n a bad estimator of the density. To overcome this problem, we can take bn = min(a, n^−c²), where ais a fixed strictly positive number. When a is sufficiently small fbbn is near fromfbn and is, therefore, a good estimate of f. Indeed, it is easy to check that sup_x∈_R|fb_b_n(x)−fb_n(x)| ≤a. Finally, by takingb_n= min(a, n^−c²), we obtain a good estimate of the density and we still havebn∼n^−c² as required in Assumption 7.

For a symmetric (d×d) matrixA= (a_k,`)1≤k,`≤d, we denote byV ech(A) thed(d+1)/2- dimensional vector

(a11,· · · , ad1, a22, a32,· · · , ad2, a33, a43,· · · , ad3,· · ·, add)^T.

(6)

For a vectorV = (v1, v2,· · ·, vm)∈R^m, we denotekVk_∞= max1≤i≤m|v_i|.

Now, we give results which establish strong consistency forΛbn as estimator of Λ.

Theorem 1. Under the assumptions 1, 6 and 7 we have

V ech

Λb_n−E

Λb_n

_∞=O_a.s.

logn n

ν

withν = 1/2−2(c1+c2).

Putting

Λ = (λk,`)_1≤k,`≤d and Λbn=

bλ⁽ⁿ⁾_k,`

1≤k,`≤d, we have:

Theorem 2. Under the assumptions 1 to 7, we have for any 1≤k, `≤d:

n→+∞lim E

bλ⁽ⁿ⁾_k,`

=λk,`.

The following theorem is our main result; it results from Theorems 1 and 2.

Theorem 3. Under the assumptions 1 to 7, Λb_n converges almost surely to Λ, as n→+∞.

As a consequence of this theorem, we can deduce strong consistency for estimators of theβ_k’s. Since the covariance matrix ofX is assumed to be equal to thed×didentity matrixId , thenβk (fork= 1,· · ·, N) is an eigenvector of Λ associated with thek-th largest eigenvalueλ_k (see [9]). We consider the empirical covariance matrix

Σb_n= 1 n

n

X

i=1

X_i−X_n

X_i−X_nT

where X_n= 1 n

X

i=1

X_i,

and we denote byηbk an eigenvector ofΛbn associated with thek-th largest eigenvalue bλ_k. Clearly, from strong law of large numbers, Σbn converges almost surely to Id as n→+∞; thenΣb_nis also invertible for large values ofn, and we can take as estimator ofβk the vectorβbk=Σb^−1/2n ηbk. Then, we have:

Corollary 1. Under the assumptions 1 to 8, for any k ∈ {1,· · ·, N}, βb_k converges almost surely toβ_k, asn→+∞.

(7)

4. Proofs

4.1. Preliminary results

First, we recall below a lemma given in [17] (see p. 1058) and which will be useful for proving other results.

Lemma 1. Under the assumptions 1, 3, 4 and 7, we have almost surely:

sup

y∈R

fbn(y)−f(y) =O

h⁴_n+n^−1/2h⁻¹_n logn

,

asn→+∞.

Lemma 2. Under assumptions 1, 2, 3, 4 and 7, we have for any j ∈ {1,· · · , d}:

E

g²_j(Y)

<+∞.

Proof.According to [17] (see p. 1059), we have:

sup

y∈R

|E(gb_j,n(y))−g_j(y)|=O(h⁴_n).

Then, there existsM1 >0 such that sup_y∈_R|E(bgj,n(y))−gj(y)| ≤M1 for anyn∈N^∗. On the other hand,

|E(bgj,n(y))| ≤ Gh⁻¹_n E

K

y−Y1

h_n

=Gh⁻¹_n Z

K

y−t h_n

f(t)dt

= G

Z

|K(u)|f(hny−u) du≤Gkfk_∞ Z

|K(u)|du, wherekfk_∞= sup_t∈_Rf(t). Therefore, for anyy∈R,

|g_j(y)| ≤ |E(bg_j,n(y))−g_j(y)|+|E(bg_j,n(y))|

≤ M₁+Gkfk_∞ Z

|K(u)|du.

This shows thatg_j(Y) is a bounded real random variable and, therefore,E

g²_j(Y)

<

+∞.

Lemma 3. Under the assumptions 5 and 6, we have for any y ∈ R and any j ∈ {1,· · · , d}:

Z

gj(y−uhn)K(u)du−gj(y)

≤C h^β_n, whereC >0.

(8)

Proof.By a Taylor expansion, we have:

gj(y−uhn) =gj(y) +

r−1

X

k=1

g^(k)(y)

k! (−1)^ku^kh^k_n + (−1)^ru^rh^r_n

r! g^(r)_j (y−θuhn), whereθ∈]0,1[. Thus,

Z

g_j(y−uh_n)K(u)du = Z

g_j(y)K(u)du +

r−1

X

k=1

g^(k)(y)

k! (−1)^ku^kh^k_n Z

u^kK(u)du

+ (−1)^rh^r_n r!

Z

g^(r)_j (y−θuhn)u^rK(u)du,

= g_j(y) + (−1)^rh^r_n r!

Z

g^(r)_j (y−θuh_n)u^rK(u)du.

Furthermore, since Z

g_j^(r)(y)u^rK(u)du=g^(r)_j (y) Z

u^rK(u)du= 0, it follows:

Z

g_j(y−uh_n)K(u)du−g_j(y) = (−1)^rh^r_n r!

Z

g_j^(r)(y−θuh_n)u^rK(u)du− Z

g_j^(r)(y)u^rK(u)du

= (−1)^rh^r_n r!

Z

g^(r)_j (y−θuh_n)−g_j^(r)(y)

u^rK(u)du.

Thus, under Assumption 5,

Z

g_j(y−uh_n)K(u)du−g_j(y)

≤ h^r_n r!

Z

g_j^(r)(y−θuh_n)−g_j^(r)(y)

|u|^r|K(u)|du

≤ h^r_n r!

Z

L θ^α|u|^αh^α_n|u|^r|K(u)|du

≤ h^r+α_n L r!

Z

|u|^r+α|K(u)|du

≤ h^β_nL r!

Z

|u|^β|K(u)|du,

what gives the required inequality withC = ^L_r!R

|u|^β|K(u)|du.

Lemma 4. Considering E_j,n=

Z f(y)

Z

g_j(y−uh_n)K(u)du 2

dy and E_j = Z

f(y)g²_j(y)dy=E g²_j(Y) ,

then, under the assumptions 1 to 7, we have : |E_j,n− E_j| ≤C²h^2βn +2C h^βnE

|g_j(Y)|

.

(9)

Proof.Using the equalitya²−b² = (a−b)²+ 2b(a−b), we obtain : E_j,n− E_j =

Z f(y)

"

Z

g_j(y−uh_n)K(u)du 2

−

g_j(y) 2#

dy

= Z

f(y)

"

Z

gj(y−uhn)K(u)du−gj(y) 2

+ 2g_j(y) Z

dy.

Thus

|E_j,n− E_j| ≤ Z

f(y)

"

Z

gj(y−uhn)K(u)du−gj(y) 2

+ 2 |g_j(y)|

Z

dy.

Then, from Lemma 3, it follows

|E_j,n− E_j| ≤ Z

f(y)h

C²h^2β_n + 2|g_j(y)|C h^β_ni dy

= C²h^2β_n Z

f(y)dy+ 2Ch^β_n Z

|g_j(y)|f(y)dy

= C²h^2β_n + 2Ch^β_nE(|g_j(Y)|).

Lemma 5. Putting δn = nhn

1− ¹_n

E_j,n− E_j

, we have under the assumptions 1 to 7, limn→+∞δ_n= 0.

Proof.First,

|E_j,n| ≤ |E_j,n− E_j|+|E_j| ≤C²h^2β_n + 2C h^β_nE

|g_j(Y)|

+E g_j²(Y) .

Therefore,

|δ_n| = nhn

(E_j,n− E_j)− 1 nE_j,n

≤nhn

|E_j,n− E_j|+ 1 n|E_j,n|

≤ nh^β+1_n

C²h^β_n+ 2CE

|g_j(Y)|

+1 n

E g_j²(Y)

+C²h^2β_n + 2C h^β_nE

|g_j(Y)|

.

Clearly,

n→+∞lim 1

n

E g_j²(Y)

+C²h^2β_n + 2C h^β_nE

|g_j(Y)| = 0.

(10)

On the other hand, sincehn∼n^−c¹, it follows thatnh^β+1n ∼n^1−(β+1)c¹. Further, from β >7 we deduce that_β+1¹ < ¹₈ < c1, that is 1−(β+1)c1<0. Thus limn→+∞nh^β+1n = 0 and, therefore,

n→+∞lim

nh^β+1_n

C²h^β_n+ 2CE

|g_j(Y)| = 0.

Finally, limn→+∞δn= 0.

Lemma 6. Under the assumptions 1 to 7, we have:

E

(bgj,n(Y)−gj(Y))²

=O 1

n hn

.

Proof.Considering the random variable W_i,j,n =X_ijK

Y−Yi

hn

, we have:

E bgj,n(Y)²

= 1

nh²_nE W_1,j,n² + 1

h²_n

1− 1 n

E(W1,j,nW2,j,n). (2) Clearly, E

W_1,j,n²

= h_nJ_n, where J_n = R R ₁

hnK² u

hn

V(y − u)f(u)dudy with V(y) = R

z²f_(X,Y₎(z, y)dz, and from Theorem 2.1.1 in [11] it is known that limn→∞J_n=J :=R R

V(y)K²(u)f(u)du dy. Furthermore, E(W_1,j,nW_2,j,n) =

Z Z K

u h_n

g_j(y−u)du Z

K v

h_n

g_j(y−v)dv

f(y)dy

= Z

f(y) Z

K u

hn

g_j(y−u)du 2

dy.

Puttingt= _h^u

n, we obtain E(W_1,j,nW_2,j,n) =h²_n

Z f(y)

Z

K(t)g_j(y−th_n)dt 2

dy=h²_nE_j,n. On the other handE g_j(Y)²

=E_j. Then, we deduce from (2) that E bgj,n(Y)²

−E gj(Y)²

= 1

nh²_nhnJn+ 1 h²_n

1− 1

n

h²_nE_j,n− E_j = 1 nhn

(Jn+δn). On the other hand, it is easy to check that

E

[bg_j,n(Y)−g_j(Y)]²

= E bg_j,n(Y)²

−E g_j(Y)²

−2

E

bgj,n(Y)gj(Y)

−E gj(Y)²

= 1

nh_n(J_n+δ_n)−2∆_j,n,

(11)

where ∆_j,n=E

bg_j,n(Y)g_j(Y)

−E g_j(Y)²

. We have:

E

bg_j,n(Y)g_j(Y)

= E

"

g_j(Y) 1 nhn

n

X

i=1

X_ijK

Y −Y_i hn

#

= 1

hnE

g_j(Y)X_1jK

Y −Y₁ hn

= 1

h_n Z

(3)

z gj(y)K

y−u h_n

f₍_{Xj ,Y}

1,Y)(z, u, y)dz du dy

= 1

hn

Z

(3)

z gj(y)K

y−u hn

f₍_{Xj ,Y1}₎(z, u)f_Y(y)dz du dy

= 1

hn

Z

g_j(y)f(y) Z

K

y−u hn

Z

zf₍_{Xj ,Y}

1)(z, u)

du

dy

= 1

h_n Z

gj(y)f(y) Z

K

y−u h_n

gj(u)du

dy.

Puttingt= ^y−u_h

n , we obtain E

gj(Y)bgj,n(Y)

= Z

gj(y)f(y) Z

gj(y−thn)K(t)dt

dy.

Hence

∆_j,n = Z

g_j(y)f(y) Z

g_j(y−th_n)K(t)dt

dy− Z

g²_j(y)f(y);

= Z

gj(y)f(y) Z

gj(y−thn)K(t)dt−gj(y)

dy

and

|∆_j,n| ≤ Z

|g_j(y)|f(y)

Z

g_j(y−th_n)K(t)dt−g_j(y)

dy

≤ Ch^β_n Z

|g_j(y)|f(y)dy=Ch^β_nE(|g_j(Y)|), the second inequality coming from Lemma 3. Then, we have:

E

[gbj,n(Y)−gj(Y)]²

≤ 1

nhn

|J_n+δn|+ 2|∆_j,n|

≤ 1

nh_n n

|J_n+δn|+ 2Cnh^β+1_n E(|g_j(Y)|)o .

Moreover,nh^β+1n ∼n^1−(β+1)c1 and, sinceβ+ 1>8 and ¹₈ < c1 we have the inequality 1−(β+ 1)c₁ <0 which implies limn→∞nh^β+1n = 0. Consequently,

n→+∞lim

|J_n+δn|+ 2Cnh^β+1_n E(|g_j(Y)|)

=|J|,

(12)

from what we deduce thatE

(bg_j,n(Y)−g_j(Y))²

=O

1 nhn

.

Lemma 7. Under the assumptions 1, 2, 3, 4 and 7, we have:

n→+∞lim E





"

bgj,n(Y)−gj(Y)fb_b_n(Y) f_b_n(Y)

#2

= 0.

Proof.We have:

E





"

bg_j,n(Y)−g_j(Y)fb_b_n(Y) fbn(Y)

#2



= E





"

bg_j,n(Y)−g_j(Y)

+g_j(Y)

1−fb_b_n(Y) fbn(Y)

#2



≤ 2E

bgj,n(Y)−gj(Y) 2!

+ 2E g_j²(Y)

1−fbbn(Y) f_b_n(Y)

2!

. (3)

Equation 4.4 in [17] and Lemma 1 allow to obtain, almost surely, the inequality sup

y∈R

|fb_b_n(y)−f_b_n(y)| ≤sup

y∈R

|fb_n(y)−f(y)| ≤M₂

h⁴_n+n^−1/2h⁻¹_n logn ,

whereM₂ is a positive constant. Therefore, almost surely,

1−fbbn(Y) f_b_n(Y)

2

=

fbbn(Y)−fbn(Y) f_b_n(Y)

2

≤

M2b⁻¹_n

h⁴_n+n^−1/2h⁻¹_n logn 2

,

and, consequently, E g²_j(Y)

1−fb_b_n(Y) fbn(Y)

2!

≤

M2b⁻¹_n

h⁴_n+n^−1/2h⁻¹_n logn 2

E g²_j(Y)

. (4) Clearly, b⁻¹_n h⁴_n ∼ n^c²^−4c¹ and b⁻¹_n n^−1/2h⁻¹_n ∼ n^c¹^+c²^−1/2 as n → +∞. Since, under assumption 4, we have c₂ −4c₁ < 0 and c₁ +c₂ −1/2 < 0, it follows that limn→+∞ b⁻¹_n h⁴_n+n^−1/2h⁻¹_n logn

= 0. Then, from (4), (3) and Lemma 6, we de-

duce the required result.

4.2. Proof of Theorem 1 Since the class of functions

H_n=

h_(k,l) :y7−→h_k,l(y) = 1 n

bgk,n(y)bgl,n(y)

fb_b²_n(y) ,1≤k, l≤d

,

(13)

is finite, we deduce from Lemma 3 in [6] that it is a Vapnik- ˇCervonenkis (VC) class of functions with respect to the envelope

h= max{|h_k,l|:h_k,l ∈ H_n,1≤k, l≤d}. The related covering number N

H_n,k·k_L2(P), εkhk_L2(P)

satisfies, for all ε ∈]0,1[

and for all probability measuresP on (S,S), N

H_n,k·k_L2(P), εkhk_L2(P)

≤ A

ε ν

,

whereAandν are postive constant named the VC characteristics ofH_n. Assumptions 1 and 6 implyh≤ _nh^D²2^G²

nb²_n, then we obtain, for allh∈ H_n, E(h(Y))≤ D²G²

h²_nnb²_n and E h²(Y)

≤ D⁴G⁴ h⁴_nn²b⁴_n.

Takingµn= _nb^D²2^G²

nh²_n andσ_n² = _h^D4⁴^G⁴

nn²b⁴_n, we can apply Talagrand’s inequality (see [14] and Proposition 2.2 in [7]): there exist positive constantsK₁ and K₂, depending only on Aand ν, such that for allt>K₁

"

µ_nlog^Aµ_σⁿ+√ nσ

q log^Aµ_σⁿ

n

# ,

P (

sup

h∈H_n

n

X

i=1

h(Yi)−E

h(Y)

> t )

≤ K2exp







− 1 K2

t µn

log





1 + tµn

K₂√

nσ_n+µ_n q

log^Aµ_σⁿ

n

2











 ,

that is,

P (

sup

1≤k,l≤d

1 n

n

X

i=1

bgk,n(Yi)bgl,n(Yi) fb_b²

n(Y_i) −E bgk,n(Y)bgl,n(Y) fb_b²

n(Y)

!

> t )

≤ K₂exp (

− 1 K₂

tnb²_nh²_n

D²G² log 1 + th²_nb²_n K₂D²G² 1 +√

logA2

!)

. (5)

Since hn∼n^−c¹ and bn∼n^−c² , we have limn→∞ hn

n^−c¹

=B1 and limn→∞ bn

n^−c²

= B₂, where B₁ >0 and B₂ >0. Thus for ε such as 0 < ε < min (1, B₁, B₂) and n is large enough, we have

B₁−ε < h_n

n^−c¹ < B₁+ε and B₂−ε < b_n

n^−c² < B₂+ε, that is

n^−c¹(B1−ε)< hn< n^−c¹(B1+ε) and n^−c²(B2−ε)< bn< n^−c²(B2+ε).

(14)

Then, puttingδ= (B1−ε)²(B2−ε)², we deduce from (5) that

P (

sup

1≤k,l≤d

1 n

n

X

i=1

bg_k,n(Y_i)bg_l,n(Y_i) fb_b²

n(Yi) −E bg_k,n(Y)gb_l,n(Y) fb_b²

n(Y)

!

> t )

(6)

≤ K2exp (

− 1 K2

δ t

D²G²n^1−2(c¹^+c²⁾×log 1 + δ t K2D²G² 1 +√

logA2n^−2(c¹^+c²⁾

!)

Let us put t_n =

logn n

1/2−2(c1+c2)

; Assumption 7 implies 0 < c₁ +c₂ < 1/4 and, consequently, thatα= 1/2−2(c1+c2) is strictly positive. Then, limn→+∞(logn)^α = +∞ and, therefore, putting U = K₁G D√

logA we have for nlarge enough (logn)^α>2U =U(1 + 1)>U 1 +

rlogA n

!

=U √

n+√ logA

√n

,

that is

logn n

¹

2−2(c₁+c2)

> K1

D²G² nb²_nh²_n ×p

logA √

n+p logA

what means that

tn>K1

"

µnlogAµ_n σn

+√ nσ

r

logAµ_n σn

# .

Then, (6) can be applied tot_n and we obtain

P (

sup

1≤k,l≤d

1 n

n

X

i=1

gbk,n(Yi)bgl,n(Yi) fb_b²

n(Y)

!

> t_n )

≤v_n

where

v_n=K₂exp (

− 1 K2

δ√

n(logn)^α

D²G² log 1 + δ(logn)^α K₂D²G²√

n 1 +√ logA2

!) .

Clearly,vn∼wnas n→+∞, where w_n=K₂exp

(

− δ²(logn)^2α K₂²D⁴G⁴ 1 +√

logA2

) ,

and since P+∞

n=1w_n < +∞, we deduce that P+∞

n=1v_n < +∞. Then from the above inequality it follows that

+∞

X

n=1

P (

sup

1≤k,l≤d

1 n

n

X

i=1

bgk,n(Yi)bgl,n(Yi) fb_b²

n(Y)

!

> tn

)

<+∞,

(15)

and the required result is obtained from Borel Cantelli’s lemma.

4.3. Proof of Theorem 2 Let us consider

R_b_n_,j(y) = g_j(y)

f_b_n(y), I_kl⁽¹⁾(y) = g_k(y)g_l(y) f_b²

n(y) =R_b_n_,k(y)R_b_n_,l(y),

I_kl⁽²⁾(y) = g_k(y)bg_l,n(y) +g_l(y)bg_k,n(y) f_b²

n(y) = R_b_n_,k(y)bg_l,n(y)

f_b_n(y) +R_b_n_,l(y)bg_k,n(y) f_b_n(y) , and

I_kl⁽³⁾(y) = 2R_b_n_,k(y)R_b_n_,l(y)fb_b_n(y) fbn(y).

Denoting bybλ⁽ⁿ⁾_k,l the element at the k-th row and the l-th column of the matrixΛb_n , it is known from [17] (see pp. 1059-1060) that

bλ⁽ⁿ⁾_k,l = 1 n

n

X

i=1

bg_k,n(Y_i)bg_l,n(Y_i) fb_b²

n(Yi)

!

= 1

n

X

i=1

n

I_kl⁽¹⁾(Y_i) +I_kl⁽²⁾(Y_i)−I_kl⁽³⁾(Y_i)o

−A_n+Bn+Cn−Dn, (7) where

An= 1 n

n

X

i=1

g_k(Yi)

gb_l,n(Yi)−g_l(Yi)

+g_l(Yi)

bg_k,n(Yi)−g_k(Yi)

fb_b²

n(Yi)−f_b²

n(Yi) fb_b²

n(Y_i)f_b²

n(Y_i)

! ,

Bn= 1 n

n

X

i=1

(bg_k,n(Y_i)−g_k(Y_i)) (bg_l,n(Y_i)−g_l(Y_i)) fb_b²

n(Y_i) ,

C_n= 1 n

n

X

i=1

R_(b_n_,k)(Y_i)R_(b_n_,l)(Y_i)

fb_b²

n(Y_i)−f_b²

n(Y_i)2

fb_b²

n(Yi)f_b²

n(Yi) , and

D_n= 1 n

n

X

i=1

fb_b²_n(Y_i)−f_b²_n(Y_i) 2

R_b_n_,k(Yi)R_b_n_,l(Yi) f_b²

n(Yi) .

(16)

First, we will obtain the rates of convergence of the sequencesE(An),E(Bn), E(Cn) andE(D_n) to 0 as n→+∞. Clearly,E(A_n) =E

A^(k,l)n

+E

A^(l,k)n

, where

A^(k,l)_n =g_k(Y)

gb_l,n(Y)−g_l(Y) fb_b²

n(Y)−f_b²

n(Y) fb_b²

n(Y)f_b²

n(Y)

! .

Further,

E

A^(k,l)_n

≤ 1 b⁴_nE

g_k(Y)

bg_l,n(Y)−g_l(Y)

fb_b²_n(Y)−f_b²_n(Y)

≤ 1 b⁴_nE

(

gk(Y)

bgl,n(Y)−gl(Y)

fbbn(Y)−fbn(Y) 2

+ 2f_b_n(Y)

fb_b_n(Y)−f_b_n(Y)

≤ 1 b⁴_nE

g_k(Y)

bg_l,n(Y)−g_l(Y)

fb_b_n(Y)−f_b_n(Y)2

+ 2

b⁴_nE

gk(Y)

bgl,n(Y)−gl(Y)

fbn(Y)

fbbn(Y)−fbn(Y)

≤ 1 b⁴_nE

g_k(Y)

bg_l,n(Y)−g_l(Y)

fb_n(Y)−f(Y)2

(8)

+ 2

b⁴_nE

gk(Y)

bgl,n(Y)−gl(Y)

fbn(Y)

fbn(Y)−f(Y)

.

Putting α_n = h⁴_n+n^−1/2h⁻¹_n logn and using Lemma 1, Cauchy-Schwartz inequality and Lemma 6, we obtain

E

g_k(Y)

bg_l,n(Y)−g_l(Y)

≤ α²_n v u u tE

bg_l,n(Y)−g_l(Y) 2!

q

E g_k²(Y)

≤M3 α²_nλ^1/2_n q

E g_k²(Y) ,

where λ_n = n⁻¹h⁻¹_n and M₃ is a positive constant. On the other hand, since for n large enoughf_b_n(Y)≤ kfk_∞, it follows

E

gk(Y)

bgl,n(Y)−gl(Y)

fbn(Y)

fbn(Y)−f(Y)

≤ α_nkfk_∞ v u u tE

bg_l,n(Y)−g_l(Y) 2!

q

E g²_k(Y)

≤M₃α_nλ^1/2_n kfk_∞ q

E g_k²(Y) .

Therefore, from (8) we deduce that

|E(A_n)|=O

b⁻⁴_n α_nλ^1/2_n

=O(β_n) (9)

(17)

whereβn=b⁻⁴_n αnλ^1/2n . In addition, E(Bn) = E

fb_b⁻²

n (Y) (bgk,n(Y)−gk(Y)) (bgl,n(Y)−gl(Y))

≤ b⁻²_n s

E

(bg_k,n(Y)−g_k(Y))² s

E

(bg_l,n(Y)−g_l(Y))²

≤ M₃²b⁻²_n λn; thus

|E(B_n)|=O b_n⁻²λ_n

. (10)

Next, we have

|E(C_n)| ≤ b⁻⁴_n E

|R_k(Y)R_l(Y)|

fb_b²_n(Y)−f_b²_n(Y)2

≤ b⁻⁴_n E |R_k(Y)Rl(Y)|

fbbn(Y)−fbn(Y) 4!

+ 4b⁻⁴_n E |R_k(Y)R_l(Y)|f_b_n(Y)

fb_b_n(Y)−f_b_n(Y) 3!

+ 4b⁻⁴_n E |R_k(Y)R_l(Y)|f_b²_n(Y)

fb_b_n(Y)−f_b_n(Y) 2!

≤ b⁻⁴_n α⁴_n+ 4kfk_∞α³_n+ 4kfk²_∞α²_n

E(|R_k(Y)R_l(Y)|). Thus

|E(Cn)|=O b⁻⁴_n α²_n

. (11)

Similarly, we have

|E(Dn)| ≤ b⁻²_n E

2

R_(b_n_,k)(Y)R_(b_n_,l)(Y)

= b⁻²_n E

"

|R_k(Y)R_l(Y)|

fb_b_n(Y)−f_b_n(Y) 4#

+ 4b⁻²_n E

"

|R_k(Y)R_l(Y)|f_b_n(Y)

fb_b_n(Y)−f_b_n(Y) 3#

+ 4b⁻²_n E

"

|R_k(Y)Rl(Y)|f_b²_n(Y)

fbbn(Y)−fbn(Y) 2#

≤ b⁻²_n α⁴_n+ 4kfk∞α³_n+ 4kfk²_∞α²_n

E(|R_k(Y)Rl(Y)|), what implies

|E(D_n)|=O α²_nb⁻²_n

. (12)

(18)

From Eq. (9) to Eq. (12), we obtain

|E(−A_n+Bn+Cn−Dn)|=O

n^−(1−4c¹^−2c²⁾logn

.

Then, from Eq.(7) we deduce that E

bλ⁽ⁿ⁾_k,l

=E h

I_kl⁽¹⁾(Y) +I_kl⁽²⁾(Y)−I_kl⁽³⁾(Y)i

+ ∆_n, (13)

where|∆_n|=O n^−(1−4c¹^−2c²⁾logn

. On the other hand,

E

I_kl⁽²⁾(Y)−I_kl⁽³⁾

≤E

R_b_n_,k(Y)bg_l,n(Y)

f_b_n(Y) −R_b_n_,k(Y)R_b_n_,l(Y)fb_b_n(Y) f_b_n(Y)

!

+E

Rbn,l(Y)bgk,n(Y)

f_b_n(Y) −Rbn,k(Y)Rbn,l(Y)fbbn(Y) f_b_n(Y)

! (14) and

E

R_b_n_,k(Y)bg_l,n(Y)

!

= E

R_b_n_,k(Y) f_b_n(Y)

bgl,n(Y)−Rbn,l(Y)fbbn(Y)

= E

Rbn,k(Y) f_b_n(Y)

bg_l,n(Y)−g_l(Y)fb_b_n(Y) f_b_n(Y)

!

≤ E

R_k(Y) f(Y)

bg_l,n(Y)−g_l(Y)fb_b_n(Y) f_b_n(Y)

! .

Then, using Cauchy-Schwarz inequality and Lemma 7 we obtain

n→+∞lim E

Rbn,k(Y)bgl,n(Y)

f_b_n(Y) −Rbn,k(Y)Rbn,l(Y)fbbn(Y) f_b_n(Y)

!

= 0.

Since, from similar arguments, we also obtain

n→+∞lim E

R_b_n_,l(Y)bg_k,n(Y)

!

= 0, we deduce from (14) that

n→+∞lim E

I_kl⁽²⁾(Y)−I_kl⁽³⁾

= 0. (15) Moreover, since

I_kl,n⁽¹⁾ (Y) ≤

gk(Y)gl(Y) f²(Y)

=|R_k(Y)R_l(Y)|, we can apply the dominated convergence theorem that gives:

n→+∞lim E

I_kl⁽¹⁾(Y)

=E

g_k(Y)g_l(Y) f²(Y)

=E

R_k(Y)R_l(Y)

=λ_k,l. (16)