HAL Id: hal-00079007
https://hal.archives-ouvertes.fr/hal-00079007
Preprint submitted on 8 Jun 2006
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Semi-parametric estimation of the hazard function in a model with covariate measurement error
Marie-Laure Martin-Magniette, Marie-Luce Taupin
To cite this version:
Marie-Laure Martin-Magniette, Marie-Luce Taupin. Semi-parametric estimation of the hazard func-
tion in a model with covariate measurement error. 2006. �hal-00079007�
ccsd-00079007, version 1 - 8 Jun 2006
MODEL WITH COVARIATE MEASUREMENT ERROR
MARIE-LAURE MARTIN-MAGNIETTE1,2, MARIE-LUCE TAUPIN3
1
I.N.A. Paris-Grignon, Math´ematique et Informatique Appliqu´ees, Paris,
2
I.N.R.A., Unit´e de Recherche en Genomique V´eg´etale, Evry, France,
3
Universit´e Paris-Sud, Orsay and Universit´e Paris Ren´e Descartes, IUT de Paris
Abstract. We consider a model where the failure hazard function, conditional on a time- independent covariateZis given byR(t, θ0|Z) =ηγ0(t)fβ0(Z), withθ0= (β0, γ0)⊤∈Rm+p. The baseline hazard functionηγ0 and relative riskfβ0 belong both to parametric families.
The covariate Z is measured with an error through an additive error model U = Z+ε where ε is a random variable, independent fromZ, with known density fε. We observe a n-sample (Xi, Di, Ui), i= 1, . . . , n, whereXi is the minimum between the failure time and the censoring time, andDiis the censoring indicator. We aim at estimatingθ0 in presence of the unknown densitygof the covariateZ using the observations (Xi, Di, Ui),i= 1, . . . , n.
Our estimation procedure based on least squares criterion provide two estimators ofθ0. The first one is obtained by minimizing an estimation of the least squares criterion where gis estimated by density deconvolution. We give upper bounds for its risk that depend on the smoothness properties offε andfβ(z) as a function ofz. We derive from this construction sufficient conditions that ensure the√
n-consistency. The second estimator is constructed under conditions ensuring that the least squares criterion can be directly estimated with the parametric rate. We propose a deep study of examples considering various type of relative risksfβ and various types of error densityfε. We show in particular that in the Cox model and in the excess risk model, the estimators are√
n-consistent and asymptotically Gaussian estimators ofθ0 whatever isfε.
Abstract. Consid´erons un mod`ele `a risque instantan´e mod´elis´e par la relationR(t, θ0|Z) = ηγ0(t)fβ0(Z), o`uθ0 = (β0, γ0)⊤ ∈ Rm+p. Le risque de base ηγ0 et la fonction de risque relatif fβ0 appartiennent `a des familles param´etriques. La covariable Z est mesur´ee avec une erreur au travers de la relationU=Z+ε,ε´etant une variable al´eatoire, ind´ependante deZ, de densit´e connuefε. Nous disposons d’unn-´echantillon (Xi, Di, Ui),i= 1, . . . , no`u Xi est le minimum entre le temps de survie et le temps de censure et Di est l’indicateur de censure. Notre but est d’estimerθ0, en pr´esence la densit´e inconnueg, de la covariable Z, en utilisant les observations (Xi, Di, Ui), i = 1, . . . , n. Notre m´ethode d’estimation, fond´ee sur le crit`ere des moindres carr´es nous fournit deux estimateurs. Pour le premier, nous ´etablissons des bornes sup´erieures du risque d´ependant des r´egularit´es de la densit´e des erreursfε et de la fonction de risque relatif, comme fonction dez. Nous en d´eduisons des conditions suffisantes pour atteindre la vitesse param´etrique. Le deuxi`eme estimateur est construit sous des hypoth`eses assurant que le crit`ere des moindres carr´es peut ˆetre estim´e `a la vitesse param´etrique. Au travers d’exemples, nous ´etudions les propri´et´es des estimateurs ainsi que les conditions assurant la√n-consistance pour des fonctions de risque relatif et des densit´es d’erreurs vari´ees. En particulier, dans le mod`ele de Cox et dans le mod`ele d’exc`es de risque, les estimateurs construits sont√n-consistants et asymptotiquement gaussiens, quelle que soit la loi des erreursε.
Key Words and Phrases: Semiparametric estimation, errors-in-variables model, nonparametric estimation, excess risk model, Cox model censoring, survival analysis.
MSC Classifications (2000): Primary 62G05, 62F12,62N01, 62N02; Secondary 62J02.
1*
1
1. Introduction
In a proportional hazard model the hazard function is defined by R(t, θ
0| Z) = η
γ0(t)f
β0(Z ),
(1.1)
where η
γ0is the baseline hazard function and f
β0is the relative risk, i.e. the risk associated with the value of the covariate Z and relative to the risk under standard condition given by f
β0(0) = 1. In this paper we consider general relative risk f
βwith a special interest in f
β(z) = exp(βz) and f
β(z) = 1 + βz which define respectively the Cox model and the model of excess relative risk. The functions η
γ0and f
β0belong both to parametric families and θ
0= (β
0, γ
0)
⊤belongs to the interior of a compact set Θ = B × Γ ⊂ R
m+p. To ensure that the hazard function is a positive function, we assume that both are positive functions.
We are interested in the estimation of θ
0when the covariate Z is measured with error.
If Z were measured without error, we would consider a cohort of n individuals during a fixed time interval [0, τ ]. For each individual, we would observe a triplet (X
i, D
i, Z
i), where X
i= min(T
i, C
i) is the minimum between the failure time T
iand the censoring time C
i, D
i= 1I
Ti≤Cidenotes the failure indicator, and Z
iis the value of the covariate. In this paper we consider that the covariate is mismeasured. For example the covariate Z is a stage of a disease, not correctly diagnosed or a dose of ingested pathogenic agent, not correctly evaluated, so that the error range between the unknown dose and the evaluated dose is sizeable. In this context, the available observation for each individual is the triplet ∆
i= (X
i, D
i, U
i) where U
iis an evaluation of the unobservable covariate Z
i. The random variables U and Z are related by the error model defined by
U = Z + ε, (1.2)
where ε is a centered random variable, independent of Z , T , and C. The density of ε is known and denoted by f
ε. Our aim is thus to estimate the parameter θ
0= (β
0, γ
0)
⊤from the n-sample (∆
1, . . . , ∆
n) in the presence of the unknown density g of the unobservable covariate Z , seen as a nuisance parameter belonging to a functional space.
1.1. Previous known results and ideas. Models with measurement errors are deeply studied since the 50’s with the first papers of Kiefer and Wolfowitz (1956) and Reiersøl (1950) for regression models with errors-in-variables. We refer to Fuller (1987) and Carroll et al. (1995) for a presentation of such models and results related to measurement error models.
The interest for survival models when covariates are subject to measurement errors is more recent.
To take into account that the covariate Z is measured with error, the first idea is simply to replace Z with the observation U in the score function defined by
L
(1)n(β, Z
(n)) = 1 n
X
ni=1
Z
τ0
f
β(1)(Z
i) f
β(Z
i) −
P
nj=1
Y
j(t)f
β(1)(Z
j) P
nj=1
Y
j(t)f
β(Z
j)
dN
i(t), (1.3)
where N
i(t) = 1I
Xi≤t,Di=1, Y
i(t) = 1I
Xi≥t, Z
(n)= (Z
1, . . . , Z
n), and where f
β(1)is the first derivative of f
βwith respect to β. We refer to Gill and Andersen (1982) for futher details on (1.3). This method, named the naive method, is known to provide, even in the Cox model, a biased estimator of β
0. This comes from the fact that
n
lim
→∞E [L
(1)n(β
0, U
(n))] 6 = lim
n→∞
E [L
(1)n(β
0, Z
(n))] = 0.
To our knowledge, all previously known results about consistency for the semi-parametric
estimation of the hazard function when the covariate is mismeasured are obtained in the
Cox model. Let us present those results. Various authors propose estimation procedures
based on corrections of the score function L
(1)n(β, U
(n)). Among them, one can cite Kong (1999) who calculates the asymptotic bias of the naive estimator obtained by minimization of L
(1)n(β
0, U
(n)), and defines an adjusted estimator. His estimator is not consistent, but a simulation study indicates that it is less biased than the naive estimator. In the same context, Buzas (1998) proposes an unbiased score function, and shows throughout a simulation study that his method yields to an estimator with a small bias. Following the approach developped first by Stefanski (1989) and Nakamura (1990) for generalized linear models, Nakamura (1992) constructs an approximately corrected partial score likelihood, defined by L
(1)n(β, U
(n)) + σ
2βN (τ ), where N (τ ) is the number of failures in the interval [0, τ ] and where ε is a centered Gaussian random variable with variance σ
2. Under the error model defined in (1.2), this correction is based on the facts :
n
lim
→∞E [L
(1)n(β, Z
(n))] only depends on E (Z ) and E [exp(βZ)], (1.4)
E (Z) = E (U) (1.5)
E [exp(βU )] = E [exp(βZ )] E [exp(βε)].
(1.6)
Kong and Gu (1999) prove that the Nakamura (1992)’s estimator is a √
n-consistent and asymptotically Gaussian estimator of β
0. One can also cite Augustin (2004) who proposes an exact correction of the log-likelihood function.
Again in the Cox model, an extension of the previously mentionned works is presented in Hu and Lin (2002). They obtain a broad class of consistent estimators for the regression parameter when U is measured on all study individuals and the true covariate is ascertained on a randomly selected validation set. A nonparametric correction approach of the partial score function is also developped by Huang and Wang (2000) when repetitions are available.
We point out that those results strongly depend on the exponential form of the relative risk of the Cox model, through the use of (1.4)-(1.6) and the extension of such methods to other relative risks is not concluding. For instance, in the model of excess relative risk without errors, the hazard function is defined by R(t, θ
0| Z ) = η
γ0(t)(1 + β
0Z ) and the score function is given by
L
(1)n(β, Z
(n)) = 1 n
X
ni=1
Z
τ0
Z
i1 + βZ
i−
P
ni=j
Y
j(t)Z
jP
nj=1
Y
j(t)(1 + βZ
j)
!
dN
i(t).
In this model, the naive method also provides biased estimator of β
0, since
n
lim
→∞E [L
(1)n(β
0, U
(n))] 6 = lim
n→∞
E [L
(1)n(β
0, Z
(n))] = 0.
Indeed, easy calculations combined with the Law of Large Numbers give that the limit lim
n→∞E [L
(1)n(β, Z
(n))] depends on E [Z/(1 + βZ)] whereas the limit lim
n→∞E [L
(1)n(β, U
(n))]
depends on E [U/(1 + βU )]. Since the error model (1.2) does not provide any expression of E [Z/(1 + βZ )] related to E [U/(1 + βU )], corrections analogous to the ones proposed in the Cox model cannot be exhibited. In other words, it seems impossible to find a function Ψ(β, U ), independent of the unknown density g satisfying that E (Ψ(β, U )) = E [Z/(1 + βZ )].
Consequently the methods proposed in the Cox model, by Nakamura (1992), Kong and Gu (1999), Buzas (1998), Lin (2002), Huang and Wang (2000) or by Augustin (2004) cannot be applied to the model of excess relative risk and a fortiori to a model with a general relative risk. As a conclusion, methods based on a correction of the partial score likelihood (1.3) where Z is replaced with U seem not concluding in a general setting.
An other possible way is to consider a partial log-likelihood related to the filtration gen-
erated by the observations. If the covariate Z were observable, then the filtration at time
t, generated by the observations would be σ { Z, N (s), 1I
X>s, 0 ≤ s ≤ t ≤ τ } , and the in-
tensity of the censored process N (t) with respect to this filtration would equal λ(t, θ
0, Z ) =
η
γ0(t)Y (t)f
β0(Z). In case of covariate measurement error, Z is unobservable and only the evaluation U is available. In this context, the filtration generated by the observations is E
t= σ { U, N(s), 1I
X>s, 0 ≤ s ≤ t ≤ τ } , and the intensity of the censored process N (t) with respect to the filtration E
tequals
E [λ(t, θ
0, Z ) |E
t] = η
γ0(t)Y (t) E [f
β0(Z ) | σ(U, 1I
T≥t)].
This is studied in the Cox model by Prentice (1982) who proposes the following induced failure hazard function
η
γ0(t) E [exp(β
0Z) | σ(1I
T≥t, U )].
The presence of { T ≥ t } in the conditioning usually implies that the induced partial log- likelihood has not explicit form. When the marginal distribution of Z given { T ≥ t, U } is specified at each time t, Prentice (1982) proposes an approximation of the induced partial log-likelihood independent of the baseline hazard function. Nevertheless this approximation is appropriate only when the disease is rare. Tsiatis et al. (1995) propose another approximation of the induced partial log-likelihood, but emphasise that their method cannot be used for the model of excess relative risk.
In the Cox model with missing covariate Pons (2002) uses also the partial likelihood.
She proposes an estimator based on (1.3) where E [exp(βZ ) | σ(U, 1I
T≥t)] is replaced with E [exp(βZ) | U ] by considering that
E [exp(βZ) | σ(U, 1I
T≥t)] = E [exp(βZ ) | U].
(1.7)
Nevertheless, obvious examples can be exhibited to prove that the equality (1.7) does not hold in a general setting. As a conclusion, the partial likelihood related to the filtration E
tseems unusable since it is difficult to separate the estimation of β
0from the estimation of η
γ0.
1.2. Our results. Our estimation procedure is based on the estimation of least squares cri- terion using deconvolution methods. More precisely, using the observations ∆
i= (X
i, D
i, U
i) for i = 1, . . . , n, we estimate the least squares criterion
S
θ0,g(θ) = E
f
β2(Z)W (Z ) Z
τ0
Y (t)η
γ2(t)dt
− 2 E
f
β(Z )W (Z ) Z
τ0
η
γ(t)dN (t)
. (1.8)
The function W is a positive weight function to be suitably chosen such that W f
βand its derivatives up to order 3 with respect to β are in L
1( R ) ∩ L
2( R ) and have the best smoothness properties as possible, as functions of z. Under reasonable identifiability assumptions, S
θ0,g(θ) is minimum if and only if θ = θ
0. We propose to estimate S
θ0,g(θ) for all θ ∈ Θ by a quantity depending on the observations ∆
1, · · · , ∆
n, expecting thus that the argument minimum of the estimator converges to the argument minimum of S
θ0,g(θ), say θ
0.
We propose a first estimator of θ
0, say θ b
1, constructed by minimizing S
n,1(θ), a consistent estimator of S
θ0,gwhere g is replaced by a kernel deconvolution estimator. We show that under classical assumptions, this estimator is a consistent estimator of θ
0. Its rate of conver- gence depends on the smoothness of f
εand on the smoothness on W (z)f
β(z), as a function of z. More precisely, its depends on the behavior of the ratios of the Fourier transforms (W f
β)
∗(t)/f
ε∗(t) and (W f
β2)
∗(t)/f
ε∗(t) as t tends to infinity. We give upper bounds for the risk of θ b
1for various relative risks and various types of error density and derive sufficient conditions ensuring the √
n-consistency and the asymptotic normality. These upper bounds and these sufficient conditions are deeply studied through examples. In particular we show that θ b
1is a √
n-consistent asymptotically Gaussian estimator of θ
0in the Cox model, in the model of excess relative risk, and when f
βis a general polynomial function.
The estimation procedure is related to the problem of the estimation S
θ0,g(θ). Under condi-
tions ensuring that it can be estimated at the parametric rate, we propose a second estimator
θ b
2which is √
n-consistent and asymptotically Gaussian of θ
0. Clearly, these conditions are not always fullfilled and θ b
2does not always exist, whereas θ b
1can be constructed and studied in all setups.
The paper is organized as follows. Section 2 presents the model and the assumptions. In Sections 3 and 4 we present the two estimators and their asymptotic properties illustrated in Section 5. In Section 6, we comment the use of the least squares criterion. The proofs are gathered in Section 7 and in the Appendix.
2. Model, assumptions and notations
Before we describe the estimation procedure, we give notations used throughout the paper and assumptions commonly done in survival data analysis.
Notations For two complex-valued functions u and v in L
2( R ) ∩ L
1( R ), let u
∗(x) =
Z
e
itxu(t)dt, u ⋆ v(x) = Z
u(y)v(x − y)dy, and < u, v >=
Z
u(x)v(x)dx with z the conjugate of a complex number z. We also use the notations
k u k
1= Z
| u(x) | dx, k u k
2= Z
| u(x) |
2dx, k u k
∞= sup
x∈R
| u(x) | , and for θ ∈ R
d,
k θ k
2ℓ2= X
dk=1
θ
k2. For a map
ϕ
θ: Θ × R −→ R (θ, u) 7→ ϕ
θ(u),
whenever they exist, the first and second derivatives with respect to θ are denoted by ϕ
(1)θ( · ) =
ϕ
(1)θ,j( · )
j
with ϕ
(1)θ,j( · ) = ∂ϕ
θ( · )
∂θ
jfor j ∈ { 1, · · · , m + p } and ϕ
(2)θ( · ) =
ϕ
(2)θ,j,k( · )
j,k
with ϕ
(2)θ,j,k( · ) = ∂
2ϕ
θ( · )
∂θ
jθ
k, for j, k ∈ { 1, · · · , m + p } . Throughout the paper P , E and Var denote respectively the probability, the expectation, and the variance when the underlying and unknown true parameters are θ
0and g. Finally we use the notation a
−for the negative part of a, which equals a if a ≤ 0 and 0 otherwise.
Model assumptions
The function η
γ0is non-negative and integrable on [0, τ ].
(A
1)
Conditionnally on Z and U, the failure time T and the censoring time C are (A
2)
independent.
The distribution of the censoring time C, conditional on Z and U, does not (A
3)
depend on Z and U.
The distribution of the failure time T, conditional on Z and U, does not (A
4)
depend on U.
These assumptions are usual in most frameworks dealing with survival data analysis and
covariate measured with error, see Andersen et al. (1993), Prentice and Self (1983), Pren-
tice(1982), Gong (1990) and Tsiatis (1995). Assumptions (A
2) and (A
3) state that a general
censorship model is considered, where the censoring time has an arbitrary distribution in-
dependent of the covariates. Assumption (A
4) states that the failure time is independent
of the observed covariate when the observed and true covariates are both given, i.e. the measurement error is not prognostic.
We define the filtration
F
t= σ { Z, U, N(s), 1I
X≥s, 0 ≤ s ≤ t ≤ τ } .
The intensity of the censored process N (t) with respect to the filtration F
tequals λ(t, θ
0, Z) = η
γ0(t)Y (t)f
β0(Z ).
(2.1)
It follows from (2.1) and from the independence of the observations ∆
i, that for the indi- vidual i the intensity and the compensator process of the censored process N
i(t) = 1I
Xi≤t,Di=1with respect to the filtration F
tare respectively
λ
i(t, θ
0, Z
i) = η
γ0(t)Y
i(t)f
β0(Z
i) and Λ
i(t, θ
0, Z
i) = Z
t0
λ
i(s, θ
0, Z
i)ds.
(2.2)
Moreover the process M
i(t) = N
i(t) − Λ
i(t, θ
0, Z
i) is a local square integrable martingale. As a consequence, the least squares criterion defined in (1.8) can be rewritten as
(2.3) S
θ0,g(θ) = Z
τ0
E h
η
γ(t)f
β(Z ) − η
γ0(t)f
β0(Z)
2Y (t)W (Z) i dt
− Z
τ0
E h
η
γ0(t)f
β0(Z )
2Y (t)W (Z ) i dt.
Since we consider general relative risk functions we assume the below minimal smoothness conditions with respect to θ.
Smoothness assumptions
The functions β 7→ f
βand γ 7→ η
γadmit continuous derivatives up to order 3 (A
5)
with respect to β and γ respectively.
We denote by S
θ(1)0,g(θ) and S
θ(2)0,g(θ) the first and second derivatives of S
θ0,g(θ) with respect to θ. For all t in [0, τ ], set S
θ(2)0,g(θ, t) the second derivative of S
θ0,gwhen the integral is taken over [0, t], with the convention that S
θ(2)0,g(θ) = S
θ(2)0,g(θ, τ).
Identifiability and moment assumptions S
θ(1)0,g(θ) = 0 if and only if θ = θ
0. (A
6)
For all t ∈ [0, τ ], the matrix S
θ(2)0,g(θ
0, t) exists and is positive definite.
(A
7)
The quantity E (f
β2(Z)W (Z)) is finite . (A
8)
For j = 1, · · · , m, E | f
β0(Z )f
β(1)0,j(Z )W (Z) |
3, E | f
β0(Z)W (Z) |
3are finite.
(A
9)
We denote by G the set of densities g such that the assumptions (A
2)-(A
4),(A
6)-(A
9) hold.
3. Construction and study of the first estimator θ b
13.1. Construction. If the Z
i’s were observed, S
θ0,g(θ) would be estimated by
S ˜
n(θ) = − 2 n
X
ni=1
f
β(Z
i)W (Z
i) Z
τ0
η
γ(t)dN
i(t) + 1 n
X
ni=1
f
β2(Z
i)W (Z
i) Z
τ0
η
γ2(t)Y
i(t)dt
(3.1)
and θ
0would be estimated by minimizing ˜ S
n(θ). Since the Z
i’s are unobservable and Z
iindependent of ε
i, the density h of U
iequals h = g ⋆ f
ε. We thus estimate S
θ0,gby (3.2)
S
n,1(θ) = − 2 n
X
ni=1
(f
βW )⋆K
n,Cn(U
i) Z
τ0
η
γ(t)dN
i(t)+ 1 n
X
ni=1
(f
β2W )⋆K
n,Cn(U
i) Z
τ0
η
γ2(t)Y
i(t)dt, where K
n,Cn( · ) = C
nK
n(C
n· ) is a deconvolution kernel defined via its Fourier transform, such that R
K
n(x)dx = 1, and
K
n,C∗ n(t) = K
C∗n(t)
f
ε∗(t) = K
∗(t/C
n) f
ε∗(t) , (3.3)
with K
∗compactly supported satisfying | 1 − K
∗(t) | ≤ 1I
|t|≥1and C
n→ ∞ as n → ∞ . Using this criterion we propose to estimate θ
0by
θ b
1= β b
1b γ
1= arg min
θ=(β,γ)⊤∈Θ
S
n,1(θ).
(3.4)
We require for the construction of S
n,1(θ), that
the density f
εbelongs to L
2( R ) ∩ L
∞( R ) and for all x ∈ R , f
ε∗(x) 6 = 0.
(A
10)
3.2. Asymptotic properties of the first θ b
1. Assume that sup
g∈G
k f
β20g k
22≤ C
2(f
β20), sup
g∈G
k f
β0g k
22≤ C
2(f
β0).
(A
11)
sup
β∈B
(W f
β), W and sup
β∈B
(W f
β2) belong to L
1( R ).
(A
12)
sup
β∈B
(W f
β(1)) and sup
β∈B
(W f
βf
β(1)) belong to L
1( R ).
(A
13)
As in density deconvolution, or for the estimation of the regression function in errors-in- variables models, the rate of convergence for estimating θ
0is given by both the smoothness of f
εand the smoothness of (f
βW )(z), and ∂(f
βW )(z)/∂β, as functions of z. The smoothness of the error density f
εis described by the decrease of its Fourier transform.
There exist positive constants C(f
ε), C(f
ε), and nonnegative δ, ρ, α and u
0such (A
14)
that C(f
ε) ≤ | f
ε∗(u) | | u |
αexp (δ | u |
ρ) ≤ C(f
ε) for all | u | ≥ u
0.
If ρ = 0, by convention δ = 0. When ρ = 0 = δ in (A
14), f
εis called ”ordinary smooth”.
When δ > 0 and ρ > 0, it is called ”super smooth”. Densities satisfying (A
14) with ρ > 0 and δ > 0 are infinitely differentiable. The standard examples for super smooth densities are the Gaussian or Cauchy distributions which are super smooth of respective order α = 0, ρ = 2 and α = 0, ρ = 1. For ordinary smooth densities, one can cite for instance the double exponential (also called Laplace) distribution with ρ = 0 = δ and α = 2. We consider here that 0 ≤ ρ ≤ 2.
The square integrability of f
εin (A
10) requires that α > 1/2 when ρ = 0 in (A
14).
The smothness of f
βW is described by the following assumption.
There exist positive constants L(f ), L(f ) and a, d, u
0, r nonnegative numbers such (A
15)
that for all β ∈ B , f
βW and f
β2W and their derivatives up to order 3 with respect to β, belong to
H
a,d,r= { f ∈ L
1( R ); L(f ) ≤ | f
∗(u) || u |
aexp(d | u |
r) ≤ L(f ) < ∞ for all | u | ≥ u
0} . (3.5)
If r = 0, by convention d = 0.
Theorem 3.1. Let (A
1)-(A
15) hold. Let θ b
1= θ b
1(C
n) be defined by (3.2) and (3.4) with C
na sequence such that
C
n(2α−2a+1−ρ+(1−ρ)−)exp {− 2dC
nr+ 2δC
nρ} /n = o(1) as n → + ∞ . (3.6)
1) Then E ( k θ b
1(C
n) − θ
0k
2ℓ2) = o(1), as n → ∞ and θ b
1(C
n) is a consistent estimator of θ
0. 2) Moreover, E ( k θ b
1− θ
0k
2ℓ2) = O(ϕ
2n) with ϕ
2n= k (ϕ
n,j) k
2ℓ2, ϕ
2n,j= B
n,j2(θ
0) + V
n,j(θ
0)/n, where B
n,j2(θ
0) = min { B
n,j[1](θ
0), B
[2]n,j(θ
0) } , V
n,j(θ
0) = min { V
n,j[1](θ
0), V
n,j[2](θ
0) } , with
B
n,j[q](θ
0) =
(f
β20W )
∗(K
C∗n− 1)
2 q
+
(f
β0W )
∗(K
C∗n− 1)
2q
+
f
β(1)0,jW
∗(K
C∗n− 1)
2 q
+
f
β(1)0,jf
β0W
∗(K
C∗n− 1)
2 q
, V
n,j[q](θ
0) =
(f
β20W )
∗K
C∗nf
ε∗2 q
+
(f
β0W )
∗K
C∗nf
ε∗2 q
+
f
β(1)0,jW
∗K
C∗nf
ε∗2 q
+
f
β(1)0,jf
β0W
∗K
C∗nf
ε∗2 q
.
3) Furthermore, the resulting rate ϕ
2nis given in Table 1
We point out that the rate for estimating β
0depends on the smoothness properties of
∂(W f
β)(z)/∂β and ∂(W f
β2)(z)/∂β as a function of z, whereas, the rate for estimating γ
0depends on the smoothness properties of W f
β(z) and W f
β2(z) as a function of z. In both cases, the smoothness properties of η
γas a function of t does not have influence on the rate of convergence.
The terms B
n,j2and V
n,jare respectively the squared bias and variance terms. As usual ,the bias is the smallest for the smoothest functions (W f
β)(z) and ∂(f
βW )(z)/∂β, as functions of z. As in density deconvolution, the biggest variance are obtained for the smoothest error density f
ε. Hence, the slowest rates are obtained for the smoothest errors density f
ε, for instance for Gaussian ε’s.
The rate of convergence of the estimator θ b
1could be improved by assuming smoothness properties on the density g. But, since g is unknown, we choose to not assume such properties.
Consequently, without any additional assumptions on g, the parametric rate of convergence is achieved as soon as (W f
β) and (W f
β2) and their derivatives, as functions of z, are smoother than the errors density f
ε.
3.3. Consequence : a sufficient condition to obtain the parametric rate of conver- gence with θ b
1.
There exists a weight function W such that the functions (C
1)
sup
β∈B
(f
βW )
∗/f
ε∗, sup
β∈B
(f
β2W )
∗/f
ε∗belong to L
1( R ) ∩ L
2( R ).
The functions sup
β∈B
f
β(1)W
∗/f
ε∗and sup
β∈B
f
β(1)f
βW
∗/f
ε∗(C
2)
belong to L
1( R ) ∩ L
2( R ).
The functions
f
β(2)W
∗/f
ε∗and ∂
2(f
β2W )
∂β
2 ∗/f
ε∗(C
3)
belong to L
1( R ) ∩ L
2( R ) for all β ∈ B .
fε
ρ= 0 in(A14) ρ >0 in(A14)
ordinary smooth super smooth
W fβ0 d=r= 0 in(A15) Sobolev
a < α+ 1/2 n−2a2α−1
a≥α+ 1/2 n−1
(logn)−2aρ−1
r >0 in(A15) C∞
n−1
r < ρ (logn)A(a,r,ρ)expn
−2d log2δnr/ρo
r=ρ
d < δ (logn)A(a,r,ρ)+2αd/(δr)n−d/δ d=δ, a < α+ 1/2 (logn)(2α−2a+1)/rn−1 d=δ, a≥α+ 1/2 n−1
d > δ n−1
r > ρ n−1
whereA(a, r, ρ) = (−2a+ 1−r+ (1−r)−)/ρ.
Table 1. Rates of convergence ϕ
2nof θ b
1Theorem 3.2. Let (A
1)-(A
12) and (C
1)-(C
3) hold. Then θ b
1defined by (3.4) is a √ n- consistent estimator of θ
0. Moreover
√ n( θ b
1− θ
0)
n−→
L→∞
N (0, Σ
1), where Σ
1equals
(3.7) E
"
− 2 Z
τ0
∂
2((f
βW )(Z)η
γ(s))
∂θ
2|
θ=θ0dN (s) + Z
τ0
∂
2((f
β2W )(Z )η
γ2(s))
∂θ
2|
θ=θ0Y (s)ds
#!
−1× Σ
0,1E
"
− 2 Z
τ0
∂
2((f
βW )(Z)η
γ(s))
∂θ
2|
θ=θ0dN(s) + Z
τ0
∂
2((f
β2W )(Z)η
2γ(s))
∂θ
2|
θ=θ0Y (s)ds
#!
−1with Σ
0,1= E
("
− 2 Z
τ0
∂(R
β,fε,1(U )η
γ(s))
∂θ |
θ=θ0dN (s) + Z
τ0
∂(R
β,fε,2(U )η
γ2(s))
∂θ |
θ=θ0Y (s)ds
#
×
"
− 2 Z
τ0
∂(R
β,fε,1(U )η
γ(s))
∂θ |
θ=θ0dN (s) + Z
τ0
∂(R
β,fε,2(U )η
γ2(s))
∂θ |
θ=θ0Y (s)ds
#
⊤
where
R
β,fε,1(U ) = Z
(W f
β)
∗(t) e
−itUf
ε∗(t) dt and R
β,fε,2(U ) = Z
(W f
β2)
∗(t) e
−itUf
ε∗(t) dt.
The conditions (C
1)-(C
3), stronger than (A
14) and (A
15), ensure the existence of the functions R
β,fε,jfor j = 1, 2.
4. Construction and study of the second estimator θ b
24.1. Construction. Our estimation procedure, based on the estimation of the least squares criterion, requires the estimation of E [ R
τ0
(f
βW )(Z )dN(t)] and E [ R
τ0
(f
β2W )(Z )Y (t)dt], which are linear functional of g. It may appear that these linear functional could be directly esti- mated, without kernel deconvolution plugg-in. In this context, we propose another estimator of θ
0. It is based on sufficient conditions allowing to construct a √
n-consistent estimator of these linear functionals and hence to estimate S
θ0,gwith the parametric rate.
We say that the conditions (C
4)-(C
6) hold if there exist a weight function W and two functions Φ
β,fε,1and Φ
β,fε,2not depending on g, such that for all β ∈ B and for all g
E
θ0,gZ
τ0
(f
βW )(Z )dN(t)
= E
θ0,hZ
τ0
Φ
β,fε,1(U )dN(t) (C
4)
and E
θ0,gZ
τ 0(f
β2W )(Z)Y (t)dt
= E
θ0,hZ
τ 0Φ
β,fε,2(U )Y (t)dt
;
For k = 0, 1, 2 and for j = 1, 2, E [sup
β∈B
k Φ
(k)β,fε,j(U ) k
ℓ2] < ∞ ; (C
5)
For j = 1, 2 and for all β ∈ B , E h
k Φ
(1)β,fε,j(U ) k
2ℓ2i
< ∞ . (C
6)
Under (C
4)-(C
6), we estimate S
θ0gby S
n,2(θ) = − 2
n X
ni=1
Z
τ0
Φ
β,f ε,1(U
i)η
γ(t)dN
i(t) + 1 n
X
ni=1
Z
τ0
Φ
β,f ε,2(U
i)η
γ2(t)Y
i(t)dt (4.1)
and θ
0is estimated by
θ b
2= arg min
θ∈Θ
S
n,2(θ).
(4.2)
The main difficulty for finding such functions Φ
β,fε,1and Φ
β,fε,2lies in the constraint that they must not depend on the unknown density g. We refer to Section 4.3 for details on how to construct such functions Φ
β,fε,j, j = 1, 2.
4.2. Asymptotic properties of θ b
2.
Theorem 4.1. Let (A
1)-(A
9), and the conditions (C
4)-(C
6) hold. Then θ b
2, defined by (4.2) is a √
n-consistent estimator of θ
0. Moreover
√ n( θ b
2− θ
0)
n−→
L→∞
N (0, Σ
2),
where Σ
2equals (4.3)
E
"
− 2 Z
τ0
∂
2(Φ
β,fε,1(U )η
γ(s))
∂θ
2|
θ=θ0dN(s) + Z
τ0
∂
2(Φ
β,fε,2(U )η
2γ(s))
∂θ
2|
θ=θ0Y (s)ds
#!
−1× Σ
0,2E
"
− 2 Z
τ0
∂
2(Φ
β,fε,1(U )η
γ(s))
∂θ
2|
θ=θ0dN (s) + Z
τ0
∂
2(Φ
β,fε,2(U )η
γ2(s))
∂θ
2|
θ=θ0Y (s)ds
#!
−1with Σ
0,2= E
("
− 2 Z
τ0
∂(Φ
β,fε,1(U )η
γ(s))
∂θ |
θ=θ0dN(s) + Z
τ0
∂(Φ
β,fε,2(U )η
2γ(s))
∂θ |
θ=θ0Y (s)ds
#
×
"
− 2 Z
τ0
∂(Φ
β,fε,1(U )η
γ(s))
∂θ |
θ=θ0dN(s) + Z
τ0
∂(Φ
β,fε,2(U )η
2γ(s))
∂θ |
θ=θ0Y (s)ds
#
⊤
. 4.3. Comments on conditions ensuring √
n-consistency : comparison of θ b
1and θ b
2.
Let us briefly compare the conditions (C
1)-(C
3) to the conditions (C
4)-(C
6). It is note- worthy that the conditions (C
4)-(C
6) are more general. First, the condition (C
4) does not require that f
βW , f
β2W belong to L
1( R ) (as for instance in the Cox Model). Second, we point out that Condition (C
1) implies (C
4), with Φ
β,fε,j= R
β,fε,j. This comes from the facts that under (C
1)-(C
3), by denoting Φ
β,fε,1= (W f
β)
∗/f
ε∗and Φ
β,fε,2= (W f
β2)
∗/f
ε∗, we have
E [Y (t)Φ
β,fε,2(U )] = Z ZZ
1I
x≥tΦ
β,fε,2(u)f
X,Z(x, z)f
ε(u − z)dx du dz
= Z Z
1I
x≥tf
X,Z(x, z) 1 2π
Z
Φ
∗β,fε,2(s)e
−iszf
ε∗(s)ds dx dz
= Z Z
1I
x≥tf
X,Z(x, z) 1 2π
Z (W f
β2)
∗(s)
f
ε∗(s) e
−iszf
ε∗(s)ds dx dz
= E [Y (t)(W f
β2)(Z )].
Consequently
E h Z
τ0
Φ
β,fε,2(U )η
2γ(t)Y (t)dt i
= E h Z
τ0
f
β2(Z )W (Z )η
γ2(t)dt i , and analogoulsy
E h Z
τ0
Φ
β,fε,1(U )η
γ(t)dN(t) i
= E
h Z
τ0