• Aucun résultat trouvé

Nonparametric estimation of random effects densities in linear mixed-effects model

N/A
N/A
Protected

Academic year: 2021

Partager "Nonparametric estimation of random effects densities in linear mixed-effects model"

Copied!
30
0
0

Texte intégral

(1)

HAL Id: hal-00657052

https://hal.archives-ouvertes.fr/hal-00657052

Submitted on 5 Jan 2012

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Nonparametric estimation of random effects densities in linear mixed-effects model

Fabienne Comte, Adeline Samson

To cite this version:

Fabienne Comte, Adeline Samson. Nonparametric estimation of random effects densities in linear mixed-effects model. Journal of Nonparametric Statistics, American Statistical Association, 2012, 24 (4), pp.951-975. �hal-00657052�

(2)

IN LINEAR MIXED-EFFECTS MODEL

FABIENNE COMTE(1) AND ADELINE SAMSON(1)

January 5, 2012

Abstract. We consider a linear mixed-effects model whereYk,j=αkktjk,jis the observed value for individualkat timetj,k= 1, . . . , N,j= 1, . . . , J. The random effects αk,βkare independent identically distributed random variables with unknown densities fα and fβ and are independent of the noise. We develop nonparametric estimators of these two densities, which involve a cutoff parameter. We study their mean integrated square risk and propose cutoff-selection strategies, depending on the noise distribution assumptions. Lastly, in the particular case of fixed interval between times tj, we show that a completely data driven strategy can be implemented without any knowledge on the noise density. Intensive simulation experiments illustrate the method.

Keywords. Linear mixed-effects model, Nonparametric density estimation, Random effect density

MSC 2010 subject classification 62G07

1. Introduction

Longitudinal data and repeated measurements along time of a process are classically analyzed with mixed-effects models. This allows taking into account both the inter-subjects and the intra-subjects variabilities. In this paper, we focus on a simple linear mixed-effects model written as follows. Let Yk,j denote the observed value for individual k at time tj, fork= 1, . . . , N,j= 1, . . . , J. The linear mixed-effects model is defined as

(1) Yk,jkktjk,j, k= 1, . . . , N j= 1, . . . , J,

where(αk, βk) represent the individual random variables of subject k, also called random effects, and(εk,j) are the measurement errors. We assume that:

[A1] times(tj)1jJ are known and deterministic,

[A2] measurement errors εk,j are independent identically distributed (i.i.d.) with a densityfε, such that E(eiuε)6= 0, for any u∈R,

[A3] variables(αk, βk) are i.i.d. and we denote by fα and fβ the densities ofα1 and β1,

[A4] the sequence(αk, βk)1kN is independent of the sequence(εk,j)1kN,1jJ. When the random effects (αk, βk) and errors(εk,j) are normally distributed, the maxi- mum likelihood has been widely studied, the marginal density of Y having a closed form

(1). MAP5, UMR CNRS 8145, Université Paris Descartes, Sorbonne Paris Cité, 45 rue des Saints Pères, 75006 Paris, France.

1

(3)

[see e.g. Pinheiro and Bates, 2000, and references therein]. However, the normality as- sumption of the random effects may be inappropriate in some situations. For example when an important covariate is omitted, a bimodal density may be more pertinent. As recalled by Ghidey et al. [2010], wrongly assuming normality of the random effects can lead to poor estimation results. Some authors propose to relax this assumption by developing estimation of the random effects density first four moments [see e.g. Wu and Zhu, 2010, and references therein]. But estimation of the random effects complete density may even be more appropriate, especially if the true density is multimodal. Our aim is therefore to estimate the densitiesfα and fβ.

Several approaches have been proposed for this purpose. Shen and Louis [1999] consider a smoothing by roughening method without any assumption onfε. AssumingεGaussian, Zhang and Davidian [2001], Chen et al. [2002], Vock et al. [2011] propose a semi nonpara- metric approach based on the approximation of the random effects density by an Hermite series. Verbeke and Lesaffre [1996] develop an heterogeneity model where the random effects have a finite mixture Gaussian density. Ghidey et al. [2004] propose a penalised Gaussian mixture approach. Morris and Carroll [2006] use a wavelet-based approach. Non parametric maximum likelihood has also been studied by Laird [1978], Mallet et al. [1988], Kuhn [2003], Chafaï and Loubes [2006]. Recently Antic et al. [2009] compare several of these approaches with an intensive simulation study.

In this paper, we consider a different approach based on deconvolution tools. Deconvolu- tion has been widely studied in various contexts. First, the noise density was systematically assumed known and different estimators have been proposed: kernel estimators (e.g. Ste- fanski and Carroll [1990], Fan [1991]), kernel estimators with bandwidth selection strategies (Delaigle and Gijbels [2004]), wavelet estimators (Pensky and Vidakovic [1999]), or pro- jection methods with model selection (Comte et al. [2006]). Then several extensions have been considered to relax the assumption about noise density knowledge. Neumann [1997]

first studied the case where the noise density is estimated from a preliminary noise sample.

A complete adaptive procedure has been provided by Comte and Lacour [2011]. Recently, several papers focus on repeated observations, which provide another way to estimate the noise density, see Neumann [2007], Delaigle et al. [2008], Meister and Neumann [2010] or Comte et al. [2011].

We propose now a repeated observations strategy applied to mixed-effects models. We describe different estimators whether the noise density is partly or completely unknown.

A preliminary estimation of fε Fourier transform is used, if needed. We study the risk bounds of these estimators and show how the context of longitudinal data affects the variance terms of the bounds. We also propose cut-off selections depending on the noise distribution assumptions. Especially, a completely data driven estimator is proposed in the case of unknown noise density.

The paper is organized as follows. Sections 2 and 3 present the estimators offα andfβ, their risk bounds and the cut-off selection assumingfε is known. In section 4, the partic- ular case of a Gaussian noisefε is considered. We propose an optimal cut-off selection in the two casesσεknown or unknown. Finally, the general case withfεcompletely unknown is considered in section 5 and new estimators and cut-off selections are proposed. Perfor- mances of the different estimators are evaluated by simulation in section 6. Concluding remarks are proposed in section 7. Proofs are gathered in Appendix.

(4)

2. Definition of the estimators with a known noise density

2.1. Model and notations. In this section, we consider model (1) under Assumptions [A1]–[A4] and the additional assumption:

[A5] the noise densityfε is known.

For the estimation of fα, we shall distinguish two cases.

– First case, the observation at time 0 is available. For notation simplicity, we will denote byYk,0 the observation associated to timet0= 0, and we will considerJ as the number of other observations withtj 6= 0.

– Second case, no observation is available at time 0, then we have tj 6= 0 for all j= 1, . . . , J.

For the estimation offβ, we do not distinguish the two cases. We shall also consider that the time sequence(tj)1jJ is in increasing order.

In the following, for notation simplicity and without loss of generality, we will assume thatJ is even. Then forj= 1, . . . , J/2, we denote by

j =t2j−t2j1 the time step between two successive observations.

Assumptions [A1]–[A4] on model (1) imply that for a given j, (Yk,j)k=1,...,N are i.i.d.

Thus we denote byfYj the density of Yk,j.

We also need few notations related to Fourier transform theory. If f is an integrable function, then we denote by f(u) = R

eiuxf(x)dx the Fourier transform of f on R. For two real valued square integrable functionsf and g, we denote the convolution product of f andgby(f ⋆ g)(x) =R

f(x−y)g(y)dyand we recall that, iff andgare both integrable and square integrable, then (f ⋆ g) = fg. If f is integrable and square integrable we recall that inverse Fourier transform formula yieldsf(x) = 1/(2π)R

eixuf(u)du.

2.2. Estimator of fβ. We first remark that by introducing the difference between two successive observations normalized by the length of the time interval, forj= 1, . . . , J/2,

Zk,j= Yk,2j −Yk,2j1

j ,

we have

(2) Zk,jkk,2j−εk,2j1

j .

For a fixedj, the variables (Zk,j)k=1,...,N are i.i.d. but the variablesZk,j and Zk,l for j6=l are not independent. Let us denote fZj the density of the variables Zk,j. It follows from (2) and the independence of(βk) and(εk,j) under [A4], that

(3) fZj =fβ⋆ fk,2jεk,2j−1)/∆j.

Thus, by noting that, for allj= 1, . . . , J/2, fk,2jεk,2j−1)/∆j(u) = E

eiu

εk,2jεk,2j−1

j

=E ei

u

jεk,2j

ei

u

jεk,2j−1

= E ei

u

jε E

ei

u

jε

= fε( u

j)

2

,

(5)

we get, by taking the Fourier transform of equality (3), (4) fZj(u) =fβ(u)|fε(u/∆j)|2. It follows from (4) that, for allj = 1, . . . , J/2,

fβ(u) = fZ

j(u)

|fε(u/∆j)|2.

In order to exploit all the available observations, we can also write:

fβ(u) = 2 J

XJ/2 j=1

fZ

j(u)

|fε(u/∆j)|2. Now, Fourier inversion implies

(5) fβ(x) = 1 2π

Z

eiuxfβ(u)du= 1 2π

Z

eiux2 J

XJ/2 j=1

fZj(u)

|fε(u/∆j)|2du.

This formula allows us to define the estimator of fβ based on the natural estimator of fZj(u)

(6) fcZj(u) = 1 N

XN k=1

eiuZk,j = 1 N

XN k=1

eiu

Yk,2jYk,2j−1

j ,

Plugging (6) in (5) would give a proposal but may induce convergence problems of the integral. Thus, we introduce a cutoffπm in the integral defining the estimator of fβ:

fdβ,m(x) = 1 2π

Z πm

πm

eixu2 J

XJ/2 j=1

fcZj(u)

|fε(u/∆j)|2du wherefcZ

j(u) is given by (6). To summarize, our proposal to estimate fβ is:

(7) fdβ,m(x) = 2

N J XN k=1

XJ/2 j=1

1 2π

Z πm

πm

eixu eiuZk,j

|fε(u/∆j)|2du Note that under [A5], the estimator can indeed be computed.

2.3. Estimator of fα. First, we can notice that, if observations for t0 = 0 are available, then we have

Yk,0kk,0, k= 1, . . . , N.

This model is a classical deconvolution model. Thus, we propose to estimate fα with the deconvolution estimator proposed by Fan [1991] with specific kernel as in Comte et al.

[2006]:

(8) fdα,m0 (x) = 1

2πN XN k=1

Z πm

πm

eixueiuYk,0 fε(u)du.

Its theoretical properties have been first studied by Stefanski and Carroll [1990], Fan [1991], and then by Comte et al. [2006] for cutoff selection and in more general context of functional

(6)

regularities (see also Pensky and Vidakovic [1999] for such ideas in wavelet framework).

Now, we also provide an estimator when observations at t0 = 0 are not available. In that case, we follow a construction similar to the one used forfdβ,m. Set, forj= 1, . . . , J/2,

Vk,j= Yk,2j

t2j −Yk,2j1 t2j1 . By definition ofYk,j, we have

Vk,j = 1

t2j − 1 t2j1

αk+

εk,2j

t2j −εk,2j1 t2j1

Remark that for a fixedj, the variables (Vk,j)k=1,...,N are i.i.d. but the variables Vk,j and Vk,l for j 6= l are not independent. Let us denote by fVj the Fourier transform of the density of the variablesVk,j and

pj = 1

t2j − 1 t2j1. We have, for allj= 1, . . . , J/2,

(9) fα(u) = fVj(u/pj)

fε(u/(pjt2j))fε(−u/(pjt2j1)).

A natural estimator offα(u)would be to compute the mean of the estimators of (9) for j= 1, . . . , J/2. However, this choice can lead to numerical instability because the quantity 1/(pjt2j)involved in the denominator of (9) can be large for large values ofj. Indeed, when

j = ∆is fixed, thentj =j∆,1/(pjt2j) =−(2j−1)and fε(u/(pjt2j)) =fε(−(2j−1)u).

Since fε tends to zero near infinity, fε(u/(pjt2j)) decreases when j increases. Therefore the estimator offα based on (9) may artificially take large values for large j. It shall be noted this is not the case for the estimator offβ(u)which only involves the step size ∆j.

Thus, for numerical reasons, we propose an estimator of fα(u) which is only based on the first observationVk,1

fcα(u) = 1 N

XN k=1

eiVk,1u/p1

fε(u/(p1t2))fε(−u/(p1t1)) Finally, the estimator offα is defined by

(10) fdα,m(x) = 1 2πN

XN k=1

Z πm

πm

eiux eiVk,1u/p1

fε(u/(p1t2))fε(−u/(p1t1))du 3. Risk bounds and cutoff selection with known noise density

3.1. Risk bound for the estimator offβ. Let us definefβ,msuch thatfβ,m =fβ1[πm;πm]. The function fβ,m is the function which is in fact estimated by fdβ,m. We wish to bound the mean integrated squared error (MISE) defined byE

||fβ −fdβ,m||2

. We first remark

(7)

that the MISE is the sum of the integrated bias and the integrated variance:

E dfβ,m−fβ 2 =

E

fdβ,m

−fβ

2+E dfβ,m−E

fdβ,m2

We can easily calculate the expectationE fdβ,m

of the estimatorfdβ,m. We have

E

fdβ,m(x)

= 2

N J XN k=1

XJ/2 j=1

1 2π

Z πm

πm

eiux E eiuZk,j

|fε(u/∆j)|2 du

= 2

N J XN k=1

XJ/2 j=1

1 2π

Z πm

πm

eiux fZj(u)

|fε(u/∆j)|2 du

= 1

2π Z πm

πm

eiuxfβ(u)du=fβ,m(x)

where the last line follows from (4). Therefore the pointwise bias is

fβ(x)−E

fdβ,m(x)

= 1

2π Z

eiux(fβ(u)−fβ,m (u))du= 1 2π

Z

|u|≥πm

eiuxfβ(u)du,

and the integrated bias is equal to

kfβ−fβ,mk2= 1 2π

Z

|u|≥πm|fβ(u)|2 du.

To compute the integrated variance we first write, using Parseval formula,

dfβ,m−fβ,m2 = 2 J

1 2π

XJ/2 j=1

Z

eiux fcZ

j(u)−fZ

j(u)

|fε(u/∆j)|2 1[πm,πm](u)du

2

= 1

2π Z πm

πm

2 J

XJ/2 j=1

fcZj(u)−fZj(u)

|fε(u/∆j)|2

2

du

= 1

2π Z πm

πm

2 N J

XJ/2 j=1

XN k=1

eiuZk,j −E(eiuZk,j)

|fε(u/∆j)|2

2

du (11)

(8)

Thus, we get E dfβ,m−fβ,m

2

= 2

πN J2 Z πm

πm

Var

 XJ/2 j=1

eiuZ1,j

|fε(u/∆j)|2

 du

= 2

πN J2 Z πm

πm

 XJ/2 j=1

Var(eiuZ1,j)

|fε(u/∆j)|4 + XJ/2 j,j=1,j6=j

cov(eiuZ1,j, eiuZ1,j)

|fε(u/∆j)|2fε(u/∆j)2

du

≤ 2

πN J2 Z πm

πm

 XJ/2 j=1

1

|fε(u/∆j)|4 + XJ/2 j,j=1

(1− |fβ(u)|2)

du

≤ 4

N J2 XJ/2 j=1

1 2π

Z πm

πm

du

|fε(u/∆j)|4

! +m

N

Note that when the observation times are equally spaced (∆j = ∆), this reduces to E dfβ,m−fβ,m2 ≤ 1

πN J Z πm

πm

du

|fε(u/∆)|4 +m N

This result shows that we reduce the variance of a factor1/J by taking the mean of allJ available values ofj in (7).

To summarize, the following result holds.

Proposition 1. Consider Model (1) under Assumptions [A1]–[A5] andfdβ,m the estimator given by (7). Iffβ is integrable and square-integrable, then

(12) E dfβ,m−fβ2 ≤ 1 2π

Z

|u|≥πm|fβ(u)|2 du+ 4 N J2

XJ/2 j=1

1 2π

Z πm

πm

du

|fε(u/∆j)|4

! +m

N Inequality (12) requires few comments. First, the term

1 2π

Z

|u|≥πm|fβ(u)|2du

is a squared bias term due to the truncation of the integral. It decreases whenmincreases, and the rate of decrease is faster when the function fβ is more regular. Indeed, classical regularity spaces considered for densityfβ on Rare described by:

[A6]fβ ∈ Ab(L) ={fβ ∈L1∩L2, R

|fβ(x)|2(x2+ 1)bdx≤L} withb >1/2,L >0.

Then, under [A6], we have the following bias order:

(13) kfβ−fβ,mk2 ≤CL(πm)2b.

Obviously, the larger the regularity indexb offβ, the faster the bias decreases.

The two other terms of inequality (12) are variance terms. They clearly increase whenm increases. Moreover, the first of these two terms is dominating: since|f(u)| ≤1,∀u∈R, we have Rπm

πmdu/|fε(u/∆j)|4 ≥ 2πm and it is usually much larger. For instance, if the noise is Gaussian,Rπm

πmdu/|fε(u/∆j)|4 is larger than e2ε(πm/∆j)22j/(2σ2επm).

We have therefore to find how to realize a compromise between the bias and the variance terms. This is the purpose of section 3.3 in which a cutoff selection is proposed. Note that

(9)

we can also provide pointwise risk bounds for the estimator fdβ,m; details about this are gathered in Appendix A.

3.2. Risk bound for the estimator offα. Let us definefα,msuch thatfα,m =fα1[πm;πm]. The functionfα,m is the function which is in fact estimated byfdα,m. Using similar calcu- lations to those used forfdβ,m, we get the following global MISE bound.

Proposition 2. Consider Model (1) under Assumptions [A1]–[A5] and estimator fdα,m

given by (10). Iffα is integrable and square integrable, then we have

(14) E dfα,m−fα

2 ≤ 1

2π Z

|u|≥πm|fα(u)|2 du+ 1 2πN

Z πm

πm

du fε

u p1t2

fε

u p1t1

2 +m

N.

The same comments as for bound (12) apply here.

3.3. Cutoff selection. We propose the following model selection procedure for choosing a relevant cutoffm for the estimators fdβ,m and fdα,m. We define forω =α or ω=β

ˆ

mω = arg min

m∈Mω,N

n−kfdω,mk2+ penω(m)o ,

where forω =β:

(15) penβ(m) =κβ

 4 N J2

XJ/2 j=1

1 2π

Z πm

πm

du

|fε(u/∆j)|4

! + m

N

,

and forω=α,

(16) penα(m) =κα

 1 2πN

Z πm

πm

du fε

u p1t2

fε

u p1t1

2 + m

N.

.

Hereκβ and κα are constants which are calibrated once for all on preliminary simulation experiments. Moreover, we set

Mω,N ={m∈ {1, . . . N}, such thatpenω(m)≤1}. We can prove the following result.

Theorem 1. Consider Model (1) under Assumptions [A1]–[A5] with fω integrable and square integrable. Assume that the noise is ordinary smooth, i.e. that there exist two constantscǫ, Cǫ such that, ∀x∈R,

(17) cǫ(1 +x2)δ ≤1/|fε(x)|2 ≤Cǫ(1 +x2)δ. Then, for ω=α, β,

(18) E

kf\ω,mˆω−fωk2

≤C inf

m∈Mω,N kfω−fω,mk2+ penω(m) +C

N, whereC and C are constants depending on the problem.

(10)

Inequality (18) shows that the resulting estimator automatically realizes the squared- bias/variance trade-off, up to a multiplicative constant C. For the sake of simplicity, the result is given only in the ordinary smooth case. It can be obtained in general setting of ordinary or super smooth noise, provided that a factor is added in the penalty for the super smooth case, as detailed in Comte et al. [2006]. In practice, in the case of super smooth noise, we multiply the penaltypenβ by the factor

(19) log

 4 N J2

XJ/2 j=1

1 2π

Z πm

πm

du

|fε(u/∆j)|4

!

/log(m+ 1)

and the penaltypenα by the factor

(20) log

 1 2πN

Z πm

πm

du fε

u p1t2

fε

u p1t1

2

/log(m+ 1).

Indeed, it is shown in Comte and Lacour [2011] that this is a slight over-penalization which has the advantage of being easy to generalize to the unknown noise case.

4. Special case of Gaussian noise

In this section, we assume that fε is Gaussian and centered, and that [A6] holds. We distinguish two cases whetherσε2 is known or not.

4.1. Cutoff choice forfdβ,m whenσ2ε is known. The Fourier transform offεisfε(u) = exp(−σε2u2/2). Let∆min be a constant such that for all j = 1, . . . , J,∆min ≤∆j. Then we get

Z πm

πm

du

|fε(u/∆j)|4 ≤ Z πm

πm

exp(2σ2ε(u/∆min)2)du

≤ 2πmexp(2σε2min2 π2m2) which, associated to [A6] and inequality (12), gives the following result.

Proposition 3. Consider Model (1) under Assumptions [A1]–[A6] with fβ integrable and square integrable, and assume that ε is Gaussian and ∆j ≥ ∆min, ∀j = 1, . . . , J . Then the choice

(21) m0,β=m0,βε) = κβlog(N J) 4π2σε2min2

!1/2

gives the boundE(kf\β,m0,β−fβk2) =O(1/[log(N J)]b) provided that κβ <1.

The consequence is that the convergence rate of the estimator is logarithmic, which is rather slow. Nevertheless, simulation experiments show that deconvolution estimators behave well also in this setting. It is easy to see that the rate will be much improved iffβ is also Gaussian or more generally super-smooth.

(11)

4.2. Cutoff choice for fdα,m when σε2 is known. Now, we assume that∆j = ∆is fixed.

The main variance term of Inequality (14) withp1t1 =−1/2 and p1t2=−1 is of order:

2 πN

Z πm

πm

exp(5σε2u2)du ≤ 4m

N exp(5π2σε2m2).

Under [A6], the bias order is given by (13). We deduce that a good choice ofm is

(22) m0,α=m0,αε) =

καlog(N) 5π2σε2

1/2

withκα<1. It is worth noticing that here and contrary to m0,β, the choicem0,α does not depend on the step∆.

4.3. Cutoff selection when σε2 is unknown. The optimal choicesm0,β and m0,α pro- vided in the two previous sections depend on σε2. When this variance is unknown, we propose to replace it by an estimator. A natural estimator can be obtained based on the following relations

V ar(Yk,j) =σj2 = σα2 +t2jσβ2+ 2tjσα,βε2 (23)

cov(Yk,j, Yk,j) =σjj2 = σα2 +tjtjσβ2+ (tj+tjα,β (24)

which hold for all k, j, j and where σα2, σβ2 and σα,β are the variances of αk, βk and the covariance of(α1, β1), respectively. SetY.j= N1 PN

k=1Yk,j. The following quantities ˆ

σY.j2 = 1 N

XN k=1

(Yk,j−Y.j)2, and σˆY.j,Y.j2 = 1 N

XN k=1

(Yk,j−Y.j)(Yk,j −Y.j) are natural estimators of σ2j and σjj2, respectively. From equations (23) and (24), let us define the4×4- matrix A

A=





1 J3 PJ/3

j=1t23j J3 PJ/3

j=1t3j 1

1 J3 PJ/3

j=1t23j1 J3 PJ/3

j=1t3j1 1 1 J3 PJ/3

j=1t23j2 J3 PJ/3

j=1t3j2 1 1 J(J21)PJ/2

1j<jtjtj 2 J(J1)

PJ/2

1j<j(tj+tj) 0





,

and the vector

ΣbY =





3 J

PJ/3 j=1σˆY.3j2

3 J

PJ/3

j=1ˆσY.3j2 1

3 J

PJ/3

j=1ˆσY.3j2 2

2 J(J1)

PJ/2

1j<jσˆY.j,Y.j2





.

We assume thatAis invertible. In this case, we deduce from equations (23)-(24) estimators ofσ2 = (σ2αβ2α,βε2), which are defined as

b

σ2 = ˆσ2α,σˆβ2,σˆαβ,σˆ2ε

=A1ΣbY

(12)

AsΣbY is a M-estimator of parameters(3JPJ/3

j=1σ3j2 ,J3 PJ/3

j=1σ3j21,J3 PJ/3

j=1σ3j22,J(J21)PJ/2

1j<jσ2jj), ifE(|Yk,j|4)<∞, there exists an explicit matrixI such that

√N

bΣY

3 J

XJ/3 j=1

σ3j2 ,3 J

XJ/3 j=1

σ3j21,3 J

XJ/3 j=1

σ3j22, 2 J(J −1)

XJ/2 1j<j

σjj2

−−−−→L

N→∞ N(0,I) We deduce that

(25) √

N bσ2−σ2 L

−−−−→

N→∞ N(0, A1IA1)

We assume that there exists a known upper bound for the unknown value ofσε, denoted byσε,max. By pluggingσˆε2 in the definition (21) of m0,β, we obtain a random cutoff m:

(26) m[0,β =m0,β(ˆσε)/√

2∧mn=

plog(N J) 2√

2ˆσεmin1 π ∧mn, wheremn=m0,βε,max)/√

2.

The study of f\β,m[0,β is complex in that case. We can prove the following upper bound of the integrated risk.

Proposition 4. Consider Model (1) under Assumptions [A1]–[A6], and assume that ε is Gaussian N(0, σ2ε) with unknown σε ≤ σε,max. Then the estimator f\β,md0,β with fdβ,m defined by (7) andmd0,β defined by (26) is such that

(27) E(kf\β,md0,β −fβk2)≤C

[log(N J)]b+ 1 N

.

It follows from Inequality (27) that the estimator f\β,m[0,β automatically reaches its best possible rate, without requiring any information on the unknown function.

A similar proposal can be done for fdα,m0 .

5. Estimators with unknown noise density

In this section, we consider Model (1) under Assumptions [A1], [A3], [A4] and Assump- tion [A2’] replacing Assumption [A2]:

[A2’] Assumption [A2] holds and the measurement errors εk,j are symmetric.

Note that the symmetry of the noise together with the conditionfε(u)6= 0 imply thatfε takes values inR+/{0}, i.e. |f(u)|=f(u)>0, ∀u∈R.

Furthermore, we restrict to the following assumption on the observations design:

[A7]∆j = ∆for allj, such that tj =j∆and J ≥6.

5.1. Estimator of fε. Under Assumptions [A2’] and [A7], we can propose an estimator of the densityfε. Indeed, let us introduce

Wk = Zk,2−Zk,1

= βkk,4−εk,3

∆ −βk−εk,2−εk,1

∆ = 1

∆(εk,4−εk,3−εk,2k,1)

(13)

which implies that

fW (u) =E(eiuWk) =E(cos(uWk)) =|fεu

|4 = fε(u

∆)4

.

Under Assumption [A2’], we can estimate(fε)4 via (f[ε)4(u/∆) = 1

N XN k=1

cos(uWk).

Let us define estimators offβ and fα when fε is unknown. For numerical reasons, we can not directly plug estimators of1/(fε)2 and1/fε in (4) and (9) by considering1/((f[ε)4)1/2 and1/((f[ε)4)1/4. Therefore, following Neumann [1997], Comte and Lacour [2011] or Comte et al. [2011], we define a truncated estimator of1/(fε)2

1

(f]ε)2(u) = 1\(f

ε)4(u)N−1/2

h(f[ε)4(u)i1/2

to be plugged infdβ,m and a truncated estimator of1/fε 1

ffε(u) = 1\(f

ε)4(u)N−1/2

h(f[ε)4(u)i1/4

to be plugged infdα,m0 and fdα,m.

The error induced by the truncation is studied in the following lemma, which is an extension of Neumann [1997]’s lemma for the case we study here.

Lemma 1. Assume that Assumption [A2’] holds.

(1) There exists a constant C0 such that

(28) E

1

(ffε)2(u) − 1 (fε)2(u)

2

≤ 2

|fε(u)|4 ∧ C0N1/2

|fε(u)|8 ∧ C0N1

|fε(u)|12. (2) There exists a constant C1 such that

(29) E

1

ffε(u) − 1 fε(u)

2

≤ 1

|fε(u)|2 ∧ C1 min

p∈{1,2,3,4}

Np/4

|fε(u)|2+2p

! .

5.2. Estimator of fβ with unknown fε. We can now define an estimator for fβ by plugging(f]ε)2in (4). Under Assumption [A7],J ≥6and the observations used to estimate fεcan be different from those used to estimatefβ. The estimator offβwhenfεis unknown is denoted byfgβ,m and defined by

(30) fgβ,m(x) = 2

N(J−4) XN k=1

XJ/2 j=3

1 2π

Z πm

πm

eiu(xZk,j) (f]ε)2(u/∆)du.

(14)

Applying Lemma 1 shows that the risk bound in the case of estimated noise density is getting more complicated. We denote for any functionf integrable and square integrable

Dk(m, f) = 1 2π

Z πm

πm

|f(u)|2

|fε(u/∆)|2kdu.

Then we obtain the following risk bound

Proposition 5. Consider Model (1) under Assumptions [A1], [A2’], [A3], [A4], [A7].

Assume moreover thatfβ is integrable and square-integrable, then the estimator defined by (30) satisfies

E(kfgβ,m−fβk2)≤ kfβ,m−fβk2+ 16D2(m,1) N(J−4)+ 4C0

D2(m, fβ)

√N

D4(m, fβ) N

+ 6m

N, whereC0 is defined in Lemma 1.

This risk bound implies the usual bias termkfβ,m−fβk2and three terms of variance. The terms16D2(m,1)/[N(J−4)]and6m/N correspond to the terms obtained in Inequality (12) of Proposition 1 for knownfε. The additional term,4C0(D2(m, fβ)/√

N)∧(D4(m, fβ)/N) comes from estimatingfε. If|fβ|2decreases faster than|fε|8, it can happen thatD4(m, fβ) is bounded by a fixed constant. Then this term is negligible compared toD2(m,1)/[N(J− 4)]. Moreover, in Comte et al. [2011], in a context of repeated measurements, similar variance terms are also obtained and their simulation experiments show that the first variance term remains the dominating one. We conjecture that the same thing happens here, even if the estimation may be more difficult.

For the adaptive procedure, we customize the proposals of Comte and Lacour [2011] to the present case. More precisely, we replace fε by its estimate in the penalty and in the definition of the collection of cutoffs. For super smooth noise, an additional multiplicative factor has to be added. Therefore, when the noise density is unknown, this multiplicative factor must be systematically added “blindly". Practically, our estimator isf^β,m˜β defined by (30) with

˜

mβ = arg min

mMfβ,N

n−kfgβ,mk2+pengβ(m)o with

g

penβ(m) = ˜κβlog(DˆJ2(m,1)4 ) log(m+ 1)

4πDˆ2(m,1) N(J−4) +m

N

!

, Dˆ2(m,1) = 1 2π

Z πm

πm

du (f]ε)4(u) andMfβ,N ={m∈ {1, . . . , N},pengβ(m)≤1}.

For ordinary smooth noise, this multiplicative factor behaves roughly like a constant. For super smooth noise, it provides a slight overpenalization in the Gaussian case, as required by the theory, see Comte et al. [2006] and Comte and Lacour [2011].

5.3. Estimator for fα with unknown fε. Analogously, we define for the estimation of fα, either, if observations for j= 0,t0 = 0 are available

(31) fgα,m0 (x) = 1

2π Z πm

πm

eiux 1 N

XN k=1

eiuYk,0

ffε(u) du,

(15)

or when observations at time0are not available

(32) fgα,m(x) = 1

2π Z πm

πm

eiux 1 N

XN k=1

e2i∆uVk,3

ffε(−u)ffε(2u) du,

otherwise. In that second case, observations used for the estimation offε are not used for the estimation of the numerator.

Now we detail the risk bound of the estimator fgα,m0 deduced from Lemma 1.

Proposition 6. Consider Model (1) under Assumptions [A1], [A2’], [A3], [A4] and [A7].

Assume moreover that fα is integrable and square-integrable and observations for t0 = 0 are available, then the estimator defined by (31) satisfies

(33)

E(kfgα,m0 −fαk2)≤ kfα,m−fαk2+ 3 πN

Z πm

πm

du

|fε(u)|2+4C1 min

p∈{1,...,4}

Np/4

Z πm

πm

|fα(u)|2

|fε(u)|2pdu

! ,

whereC1 is defined in Lemma 1.

The first two terms of the right-hand-side of inequality (33) are the standard terms, which are also obtained for knownfε. The last term comes from the substitution offε by its estimate. The main difference with the deconvolution estimators studied in Comte and Lacour [2011] comes from the fact that we replace fε by a truncated estimator based on (f[ε)4, while in Comte and Lacour [2011], the truncated estimator is based on an estimator of fε. We notice that a similar (but still different) phenomenon happens for fgβ,m where we replace (fε)2 by a truncated estimator based on (f[ε)4. The risk bound forfgβ,m is in fact similar to the one found in Comte et al. [2011].

If fα is very smooth, and in particular much smoother than fε, then the integrals Rπm

πm|fα(u)|2/(fε(u))2pdu may be convergent and the last term negligible. For instance, if α is Gaussian and ε is Laplace i.e. ε = σεη/√

2 where η has density fη(x) = e−|x|/2, then, forp= 1,2,3,4,

Z πm

πm

|fα(u)|2

(fε(u))2pdu≤κp

and the last term is less than (2C1κ4/π)/N. If α is Gaussian with variance σ2α and ε is Gaussian with varianceσ2ε, then the same behavior happens provided thatσ2α>4σε2.

Practically, our estimator is f^α,0m˜α defined by (31) with

˜

mα = arg min

mMfα,N

n

−kfgα,m0 k2+peng0α(m)o with

(34) peng0α(m) = ˜κ0α log(2πIˆ) log(m+ 1)

2πIˆ

N , Iˆ= 1 2π

Z πm

πm

du (f]ε)2(u) andMfα,N ={m∈ {1, . . . , N},peng0α(m)≤1}.

A similar study can be performed for the estimator fgα,m.

Références

Documents relatifs

Nonparametric density estimation in compound Poisson process using convolution power estimators.. Fabienne Comte, Céline Duval,

In this section, we adapt the results of Kappus and Mabon (2014) who proposed a completely data driven procedure in the framework of density estimation in deconvo- lution problems

The method permits to handle various problems such as addi- tive and multiplicative regression, conditional density estimation, hazard rate estimation based on randomly right

The estimation of the regression function and its derivatives is a largely addressed subject and several types of estimators can be used such as smooth- ing splines (more

In a second time, basing on these result, a functional Weissman estimator is derived, permitting to estimate large conditional quantiles of arbitrary large order.. While

Abstract. Since its introduction, the pointwise asymptotic properties of the kernel estimator f b n of a probability density function f on R d , as well as the asymptotic behaviour

First, the noise density was systematically assumed known and different estimators have been proposed: kernel estimators (e.g. Ste- fanski and Carroll [1990], Fan [1991]),

The first result, which has an extremely simple proof, is based on direct application of an exponential bound for empirical processes indexed by V C classes of functions from Giné