A NNALES DE L ’I. H. P., SECTION B
R ICHARD D. G ILL
M ARK J. VAN DER L AAN J ON A. W ELLNER
Inefficient estimators of the bivariate survival function for three models
Annales de l’I. H. P., section B, tome 31, n
o3 (1995), p. 545-597
<http://www.numdam.org/item?id=AIHPB_1995__31_3_545_0>
© Gauthier-Villars, 1995, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
545-
Inefficient estimators of the bivariate survival function for three models
Richard D. GILL
Mark J. van der LAAN
Jon A. WELLNER Mathematical Institute, Utrecht University,
P.O. Box 80.010, 3508 TA Utrecht, Netherlands
Mathematical Institute, Utrecht University,
P.O. Box 80.010, 3508 TA Utrecht, Netherlands
Department of Statistics, GN-22, B320 Padelford Hall, University of Washington, Seattle, Washington 98195, U.S.A.
Vol. 31, n° 3, 1995, p. -597. Probabilités et Statistiques
ABSTRACT. - Three
explicit
estimators of the bivariate survival functions for threetypes
of data areanalyzed by proving
theanalytical
andprobabilistic
conditions necessary forapplication
of the functional delta- method. It tells us that the estimators convergeweakly
at~-rate
to a Gaussian process and that for estimation of their
asymptotical
distribution the
bootstrap
works well. We also prove’efficiency
of theDabrowska and Prentice=Cai estimator for the bivariate
censoring
modelunder
independence.
Key words: Functional delta-method, compact differentiability, weak convergence,
bootstrap, efficient estimator. ,
Annales de l’Institut Henri Poincaré - Probabilités Statistiques - 0246-0203
GILL, AND J. A. WELLNER
1. THREE MULTIVARIATE
MODELS
AND APPROACHES TOESTIMATION
We
begin by describing
the three models and theresulting
estimationproblems
upon which we will focus. In all three models theparameter
to be estimated is the bivariate survivalfunction.
In this section we will describe the threerepresentations
of the bivariate survivalfunction,
as maps fromthe distribution function of the
data,
on which the three estimators for each model are based. The estimators are obtainedby substituting
theempirical
distribution
counterpart
of the data into therepresentation.
Our aim is to prove that these estimators are
uniformly
consistent and that the estimators convergeweakly
in supnorm at root-n rate to a Gaussian process.Moreover,
we also want to show that thebootstrap
can be usedto estimate the variance of these estimators and to obtain some
efficiency
results for these estimators.
The weak convergence and
bootstrap
can beproved by applying
thefunctional
delta-method(Gill, 1989,
V. d.Vaart, Wellner, 1995).
Thismeans that we have to
verify
the(by
the functional delta-methodrequired) differentiability
of therepresentation
and the weak convergence of theempirical
process which weplug
in therepresentation.
We were able toverify
these conditions underessentially
no conditions on the model. For aformal statement of our results see our final theorem in section 5. We also succeeded in
proving
that for the bivariatecensoring
model(our
thirdmodel)
the Dabrowska and Prentice-Cai estimator are efficient under
independence.
Practical simulations show that the
asymptotic
distribution isclosely approached
forsurprisingly
smallsamples (Bakker, 1990, Prentice-Cai, 1992a).
The
organisation
of the paper is as follows. In section 2 we willgive
thebasic
techniques
as lemmas forobtaining
therequired differentiability
resultfor the
representations.
In section 3 we will prove thedifferentiability
resultsby applying
these lemmas. In section 4 we will see how eachrepresentation
leads to an estimator for each model
by just substituting
theempirical
distribution of the data. In section 5 we
verify
the weak convergence of theseempirical
processes whichprovide
us,by application
of the functionaldelta
method,
with results which are summarized in our final theorem.Finally,
in section 6 we prove that for the bivariatecensoring
model theDabrowska and Prentice-Cai estimator are efficient under
independence.
MODEL 1. - Estimation
of
a bivariate distribution with knownmarginals.
Suppose
that( X 1, Yl ) , ... , ( X n , Yn )
areindependent
andidentically
distributed with distribution function F on and suppose further that
Annales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
the
marginal
distribution functionsFx , Fy
of F are known:Fx
andFy
=F$
whereF$
are known one-dimensionalmarginal
distributions.Problem: Use the data and the
knowledge
of themarginal
distributionsto estimate the
joint
distribution F.MODEL 2. - The bivariate
"three-sample "
model.Suppose
thatYii),..., Ylnl )
areindependent
andidentically
distributed with distribution function F on that
X 2,1, ... , X 2,n2
are
independent
andidentically
distributed with distribution functionFx
=F( . , co)
onR,
and thatY3,1, ... , Y3,n3
areindependent
andidentically
distributed with
Fy
= on R. Here the threesamples
are allindependent.
We canregard
this as either a"missing
data" model(the
Y’ sare
missing
insample
2 and the X’s aremissing
insample 3);
or we canregard
it as an"auxiliary samples"
model inwhich
in addition to the firstsample
of(X, Y) pairs
we also have someauxiliary
data(samples
2 and3) concerning
themarginal
distributions.Problem: Use all the data to estimate F.
MODEL 3. - Bivariate random
censoring.
Suppose
that(6’i,Ti),..., (Sn , Tn )
areindependent
andidentically
distributed with distribution function F on =
[0,cxJ)~,
that(Ci, Di), ... , (Cn, Dn)
areindependent
andidentically
distributed with distribution function Gindependent
of all of the( S, T )’ s,
and thatwe observe
Problem: Use the observed data to estimate F.
For all three models we assumed that we have observations in The estimators we propose are invariant under translations. Therefore our results
can be
generalized
to data onR~.
We have the
following relationships
between these models. If n2 and n3are very
large relatively
to then this model 2approximates
the knownmarginals
model 1. If we allow G to have three atoms at(o, oo), (oo, 0)
and
(00,00) adding
up to1,
then model 3 includes model 2 with nl, n2 and n3 random.Many
other related models are of interest. Forexample,
suppose that F hasmarginal
distributionsFx
andFy
which areequal,
but unknown:=
Fy(x), x
E R. All three models havegeneralizations
to thecase of k-variate F with ~ >
3,
but of course there are many differentgeneralizations
of models 1 and 2depending,
forexample,
on which of theGILL,
2~ - 2 (multivariate) marginals
are known in model 1. In this paper werestrict ourselves to the bivariate case. The
analyzed
estimators have naturalgeneralizations
to the k-variate case, and the k-variateanalysis
can be doneby simply using
k-variateanalogues
of theingredients
we use in theanalysis
for the case k =
2;
for some ofthese,
see Gill(1990).
Efficient estimation in models 1-3 has
proved
to be difficult.Bickel, Ritov,
and Wellner
(1989)
have studied estimates of linear functionals of F for model 1 which are efficient under additionalregularity
conditions. Van der Laan(1992) proposed
animplicit
modified maximum likelihood estimator whichdepends
on a bandwidthhn (n
is the number ofobservations)
formodel
3,
which isproved
to be efficient forhn -
0slowly enough.
Thechoice of the bandwidth in
practice
is still an openproblem
andpractical
tests remain to be done. Also here additional smoothness
assumptions
wereneeded. The estimator can be
computed
with theEM-algorithm.
It is well known that this is an iterativealgorithm
which convergesslowly.
Forinformation calculations for models 1 and
3,
seeBickel, Klaassen, Ritov,
and Wellner
(1993)
sections 6.3 and 6.6 and van der Laan(1992);
for information calculations for model2,
see van der Vaart and Wellner( 1990).
It
appeared
thatexplicit
information calculations areextremely
difficult.Only
in thespecial
case ofindependence
in model 3 we succeeded inobtaining
anexplicit expression
for the information bound. These difficulties inconstructing
efficient estimators and thatthey only
seem to work underadditional
regularity assumptions
are a motivation forconsidering
inefficient estimators.Considerable work on construction of inefficient estimates in model 3 has been done: among them Burke
( 1988),
Dabrowska(1988), (where
alsothe Volterra estimator of P. J. Bickel is
mentioned),
Dabrowska(1989),
Prentice and Cai
( 1992a), ( 1992b),
Pruitt(1991c),
van der Laan(1991).
Many
otherexplicit
inefficient estimators have beenproposed (see
the reference list inDabrowska, 1988).
Bakker(1990)
and Prentice and Cai( 1992b)
show that the Dabrowska and the Prentice-Cai estimators havea very
good practical performance
and beat the otherproposed explicit
estimators. Under
independence they approximate surprisingly quickly
the
optimal
variance. Pruitt’s( 1991 c)
estimator is animplicit
estimatorwhich solves an
approximation
to theself-consistency equations.
Thisestimator uses kernel
density
estimators andconsequently depends
on abandwidth. His estimator has a
good practical performance (Pruitt, 1991b).
Also here we have the
problem
ofselecting
thegood
bandwidth andadditional smoothness
assumptions
on the model are necessary(van
derLaan, 1991).
Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques
In this paper we focus on three inefficient
estimators,
but estimators(except
the Volterraestimator)
which have been shown to have a verygood practical performance.
We included the Volterra estimator because itsrepresentation
issimilarly
derived as the Prentice-Cairepresentation,
andthe
analysis
of the Dabrowska and Volterra estimatorgives
theanalysis
ofthe Prentice-Cai estimator for free. The estimators are
explicit
and easy(quickly)
tocompute
incontrary
with efficient estimators which have to becomputed
withquite computer
intensivealgorithms (such
as the EM-algorithm
in van derLaan, 1992).
The estimators are very smooth functions of the observations and thereforethey
are very robust: i. e. insensitive to smallchanges
of theunderlying
distributions. Moreover for eachrectangle
[0, T],
where there is mass left for[0,
we do not need anyassumptions
on the model for
obtaining
theconsistency,
weak convergence andbootstrap
results on
[0, T] .
Also the last twoproperties
arecertainly
not shared withefficient estimators.
Our
approach
to estimation of F in these three models is as follows:we find
representations
of F asmaps $
from the distribution of thedata,
which can be estimated from the observed
data,
to F. The threeparticular representations
which westudy
here aregiven by:
A. Dabrowska’s
(1988) representation.
B. The Volterra
equation.
C. Prentice and Cai
( 1992a) representation.
We
give
a newproof
of the Prentice and Cai( 1992a) representation.
NOTATION AND DEFINITION OF
[0 , T] . -
If we write, ~ , , >
then thisshould hold
componentwise
for bothcomponents:
so if x E1R2
thenx
~1, x2
2/2. We often will not use aspecial
notationfor the bivariate
time-vector;
if we do not mean a vector this will be made clear. IfF(t)
=P(X t)
is a distribution function we will denote its survival function withF(t)
=P(X
>t).
All functions we encounter aredefined on a
rectangle [0, T]
CR~o
where T can be chosenarbitrarily large
except
thatF (T - )
> 0 andG (T - )
> 0 isrequired. Finally,
we define for abivariate function
f : 1R2 -7 sup I
A. Dabrowska’s
representation
The
representation,
the estimator and L-measure were all introducedby
Dabrowska
(1988),
but in a rather different way than we dohere;
we take therepresentation
in terms ofproduct-integrals
as done inGill,
1990(see
GILL, M. J. VAN DER LAAN AND J. A. WELLNER
also
Andersen, Borgan, Gill, Keiding, 1992).
We define thefollowing
threehazard measures with their heuristic
interpretation:
Formally,
we introduce a vectorhazard
functionA : [0, T]
as follows:
~1(t) -
ER50,
where- -
One of the main
advantages
of modelbuilding
in terms of hazards is thatthey
are undisturbedby censoring
and therefore in all three models we canget
estimates of theintegrated
hazardsby replacing
themby
their naturalempirical counterparts.
For a bivariate distribution M
(i. e. measure) 77(1 + dM)
is the bivariateproduct integral
over therectangle [0, t] (see Gill, Johansen, 1990,
or oursection
3.2).
It isjust
like the univariateproduct integral
defined as thelimit of finite
products
over finiterectangular partitions
of[0, t] . Now,
thefollowing representation
can beproved:
L is defined
by
and
f2 represents
the bivariateproduct-integral mapping.
With= we denote the
jump
ofAnnales de l’Institut Henri Poincaré - Probabilites et Statistiques
s -
Aio(s,v-)
at u. Assume that for each v(u
~F(t~)) ~
piand for each u
(v
~F(~,~)) ~
112 for certain(signed)
measures pi and/~2. We define
These
assumptions
areeasily
verified for the hazard measuresby choosing
III =
Fi
and 112 =F2,
themarginals
of F. We will see in section 4 that theempirical counterpart in
ofA
is obtainedby replacing
in therepresentation
of
A
Fby
anempirical
survival function.Therefore,
theassumptions
areverified in
exactly
the same wayby choosing
III and 112 themarginals
ofthis
empirical
survival function. We will do this in theproof
of the final theorem in section 5.Note that
by (2)
and(3)
thisgives
a map f such thatThis
representation
caneasily
beheuristically verified, just
as the one-dimensional
Kaplan-Meier product-integral.
For models 1-3 there are natural
empirical
estimators ofA
whichgeneralizes
the famous Nelson-Aalen estimator from the one dimensionalcase; we will do this
explicitly
and in detail in section 4._
If we denote the estimate
of A
within,
then the estimate of F based onDabrowska’s
representation
issimply
This estimator was studied in the case of model 3
by
Dabrowska(1988, 1989).
Gill(1990) generalized
therepresentation
to dimensionl~ >
2 andanalyzed
the estimatorby applying
the functional delta-method.B. The Volterra
equation
This
equation
is derivedby extending
thefollowing argument
for k = 1: letbe the cumulative hazard function
corresponding
to F. ThenGILL, M. J. VAN DER LAAN AND J. A. WELLNER
and
consequently
For a
given
functionA,
this is ahomogeneous
Volterraequation
forF,
where the solution is
given by
the Peano series(a special
Neumannseries)
ex)
2014 y 2014
where
A(F)
=1 F _dA.
In thiscase, k
=1,
this is solvedi=l
[0,.]
explicitly by
theproduct-integral
of A:For
theory
about the univariateproduct-integral
and inparticular
theequivalence
between the univariate Peano series and the univariateproduct- integral
we refer to Gill and Johansen(1990).
For k = 2 the
argument generalizes
as follows. For F on[0,
asdefined above
where
F(x) - P(X
>x).
Thisimplies
thatIt remains
only
to relate F to F and themarginal
distributions: letFi
andF2
denote themarginal
distributions of F. Then since(5) yields
where 03A8(t)
= 1 -F2 (t2 )
involvesonly
themarginal
distributionsFi
andF2. Regarded
as anequation
for Fgiven
fixed functions ~ andA, (6)
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
is an
inhomogeneous
Volterraequation
with aunique solution 03A61(03A8,11)
[Gill
and Johansen(1990),
Kantorovich and Akilov(1982), p. 396].
Thiscan be seen as follows.
Represent
theequation
as(I - ~X~) = ~
whereAA(F) = [0,t]
F_dA. It is easy to check that this structure takes care thatwhere one has to notice that
by
definition of r the supnorm(over [0, T])
is bounded.
00
Consequently, £ A §
is a boundedoperator:
k=0
This proves that F is
given by
the Neumann series ofBecause
Anll depends only
onAll
and(7)
defines a mapIt is not clear from
(7)
that F does alsocontinuously depend
onAll,
but we will prove in section
3,
as in Gill and Johansen(1990),
that thebivariate Peano
series,
andthereby
satisfies the characterization ofweakly continuously compactly differentiable
at(W , ~1).
Finally,
it should be noticed that because of theexponential
convergence of the terms to zero,(7) provides
us also with anexponentially
fastalgorithm
forfinding
a solution of the Volterraequation
for known( W ,
Now,
the Volterra estimator of F isgiven by:
GILL,
C. Prentice-Cai
representation
We
give
a newproof
of the Prentice-Cairepresentation (see
also Prentice andCai, 1992a).
For still anotherproof,
see Wellner(1993).
For this weneed the
following differentiability
rules for U : R 2014~ R and V : R - R:If we
apply
these one dimensional rules to the sections u -F(u, v)
and v -
F ( u, v )
of a bivariate functionF,
then we denote these withdl
andd2, respectively.
Weapply
these two one dimensional rules to each of the two variables of R -F/FiF2
in turn in order to express dR =d12R
=dl(d2(R))
as follows:for certain measure L. Define the well known univariate hazards
A2( ds2) ==
The reader caneasily verify
thefollowing
result[when applying
theproduct
rule toF/Fi
wegive
the left continuous version to F instead of one of theF i,
i =1, 2,
and we denote
F(~i)(~i~2) = ~1-~2). F - 2> ( s i , s2 ) > F(SI, S2- )):
At the third
equality
notice that1/(1 - Al(6sl))(1 -
Integrating
the left andright-hand
side over therectangle (0, t] provides
us with:Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques
Here one has to notice that
R(~0)
=t2 )
= 1 for all t. This is ahomogeneous
Volterraequation
with aunique
solutiongiven by
the Peanoseries of L which we will write out below.
By
definition of R and the well knownproduct integral representation
ofthe univariate
Fi,
i =1, 2,
thisprovides
us with thefollowing representation
for the bivariate survival function:
where R is the
unique
solution of(9), just
the Neumann seriesas
given
in(7), given by
the Peano series:Define
Above,
we derived thefollowing representation
of L in terms of thehazards
A
Of course, we can write
63 (A),
but because of model 1(marginals known)
we want to
distinguish
between hazards which are known in model 1 and hazards which are not known in model 1.Note that this
gives
a mapGILL, WELLNER
Again,
the estimate of F based on the Prentice and Cairepresentation
issimply
Prentice and Cai
N( 1992a)
motivated thisrepresentation through
aconnection between L and the covariance of univariate
counting
processmartingales. Moreover, they proved
almost sureconsistency
of theresulting
estimator for model 3 via
continuity
of e.Remark. - Firstly,
the Volterra estimator is based on the idea to express dF in F - dA for a certain measure A which makes F a solution of aninhomogeneous
Volterraequation,
while in Prentice-Cai’srepresentation
wedo the same with which leads in this case to a
homogeneous
Volterra
equation.
The Volterra uses an estimate ofonly
one bivariate hazardAll,
while the Dabrowska and the Prentice-Cairepresentations
involve otherfunctions L and L which describe the covariance structure.
Furthermore,
notice the
similarity
in the structure of L and L.Elaborating
on the functional delta-method and resultsOur
approach
tostudying
theestimators,
which we will denoteby
Fn , FD, F n
will be tostudy
the maps03A6,
rand e which define them(analytically).
In section 2 and 3 we show that thesesatisfy
thecharacterization of
weakly
continuous Hadamarddifferentiability
withrespect
to the supremum norm-metric for the sequences which can occur inpractice.
In section 4 werepresent A
as maps from the distribution function of the data toA
itself for all three models.By applying
thefunctional delta-method to these
representations
we prove therequired
weakconvergence and
validity
of thebootstrap
forAn . Now, application
of thefunctional delta-method to the
mappings
e and rprovides
us with withconsistency,
weak convergence, andasymptotic validity
of thebootstrap
forFn ,
These results are summarized in Theorem 5.1.This method of
analyzing provides
us withoptimal
results for the estimators in the sense that weessentially
do not need any conditions.The
only improvements
can be madeby extending
these results to thewhole
plane
andby investigating
the rate at which the normalized estimators converge to its linearization in terms of theempirical
processes weplug
in.This
analysis
of anestimator, separates
theanalysis
in apurely analytical (differentiability
of and apurely probabilistic [weak
convergence ofZn
=A)] part.
Afterestablishing purely analytical properties
ofcomponents
of 03A6 one can deduce several results for differentsampling methods,
models(e.g.
our models1-3)
or statistics withoutAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques
repeating
theanalysis.
The supnormmight
be considered as aquite
naivechoice in order to
get
anoptimal
weak convergenceresult,
but the supnorm is easy to use, tointerpret,
and has a naturalgeneralization
tohigher
dimensions.
After
establishing
thedifferentiability
of the functionals which appear in therepresentation
r and 03A6 wegot
thedifferentiability
of Prenticerepresentation
for free:by
the chain-rule adifferentiability
result for a functional can be used forestablishing differentiability
for anycomposition
of several
mappings (like
the threerepresentations)
which involves this functional. Because of thisproperty
of theanalysis,
otherproposed explicit
estimators can
immediately
beput
in our framework.2. LEMMAS
In this section we
give
the basictechniques
as lemmas forobtaining
the
(by
the functionaldelta-method) required differentiability
result for therepresentations. Moreover,
we willgive
two illustrations which show how thesetechniques easily
lead todifferentiability
of relevant functionals.DEFINITIONS AND NOTATION. - All functions we encounter are considered
as elements of the space of bivariate
cadlag
functions on[0, T] (Neuhaus,
1972)
endowed with the supnorm.We will denote this space with If
Fn
converges in supnorm toF,
then we will denote this withFn -7
F(this
doesnot lead to
problems
because weonly
use supnormconvergence).
Thevariation norm of a function F is defined as usual as the supremum
over all
partitions
ofrectangles
of the sum of the absolute values ofgeneralized
differences over thepartition
elements(i. e. rectangles)
ofF,
and it will be denoted The
uniform
sectional variation normof a bivariate function F is defined as the supremum of the variations of the sections u -
F(u, v), v
-F(u, v), (u, v)
E[0, T],
and thevariation of F itself. It will be denoted If a
cadlag
functionis of bounded
variation,
then itgenerates
asigned
measure. If for a Fwe write
F(du, v), F(u, dv), F(du, dv),
we mean the one dimensionalmeasures
generated by
the sections ~ )2014~F ( u, v )
and the twodimensional measure
generated by (u, v)
~F(u, v), respectively,
and it willbe
automatically
assumed that these sections and the function itself are of bounded variation.Finally, integrals like F ( du, v ) G ( u, dv )
are defined asu f)
-2014v ) dpi (u) dp2 assuming
that for each themapping (u
-F(u, v))
« J-ll and for each u themapping (v
-F(u, v))
112for certain
(signed)
measures J-ll and ~2. Thisassumption
is satisfiedby
asimple
choice of measures ~c2 in all ourapplications
as will be shownin the
proof
of the final theorem in section 5(see
also illustration 2 at the end of thissection).
LEMMA 2.1
(Telescoping). - Let ai ,
i =1, ... k, bi ,
i =1,..., k
bereal numbers.
This can be
easily
verified and it holds also for matrices(see
Gill andJohansen, 1990).
It is a very useful lemma forproving
that differences oftwo
products
converge to zero and that is what we often have to do in thedifferentiability proofs.
In our
applications
we want to be able to defineintegrals
FdH whenH is of unbounded variation. This can be done
by applying integration by parts
so that H appears as function.LEMMA 2.2
(Integration by Parts).
Notice that with these formulas we can also define these
integrals
for Hof unbounded variation.
Also,
we can boundFdH by
Notice that if F is zero at the
edges
of[0, T],
thenonly
the first term atthe
right
hand-side is non-zero.Annales de l’lnstitut Henri Poincaré - Probabilités et Statistiques
Proof. -
We refer to(Gill, 1990)
for thegeneral
case. It works asfollows. For the first
integral,
substituteand for the second
integral
substituteand
apply
Fubini’s theorem. DThis
integration by parts
formula is aspecial
case of thefollowing general integration by parts
formulaby letting AS
=(0, s],
but it is one which takes account of massgiven
to theedge
of therectangle [0, T~ :
LetAs
CR2
bea measurable set indexed
by s
E1R2.
Then we have:Assume that
Fi,F2
are of bounded variation and that orFi(Sl-, s2)
orFZ(sl, s2-)
orFi(SI-, s2-)
iscadlag,
i =1, 2.
ThenFi
andF2 generate signed
measures. We haveSo
by
two timesapplying
lemma 2.2 towe can do
integration by parts
so that H appears as function andFi, F2
as measures.
The
following
lemma istrivially checked,
but useful.LEMMA 2.3
(d-A interchange). -
We have:GILL,
Recall the denominator in the
mappings
L and L in rand8,
which areof the form
!/{(! 2014 c~) ( 1 - b)y,.
where a, b are areonly
nice functions inone
coordinate,
and thereforecertainly
do notgenerate
a measure. Therefore it is not clear how we canintegrate
w.r.t. this denominator. Thefollowing
lemma will take care of this
problem.
LEMMA 2.4
(Denominator splitting). -
Letal, a2 be
real numbers.Then the
following
holds:or
In
general,
we have:This follows from the
identity 201420142014
= 1 +201420142014.
.Now,
we are able to define thefollowing
terms withintegration by parts
as follows:
COROLLARY 2.1. -
Define v) - (1 - AlO(6u, v))(1 - Aol(u, ~v)).
We have:
Annales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
H
plays
the role of a function of unbounded variation(Brownian bridge)
and
A,Aio,Aoi
arecadlag
functions of bounded variation. Notice that all terms at theright-hand
side of theequalities
where H appears as measureare of the
form
FdH where Fgenerates
a finite measure.Therefore,
forall these terms we can
apply
theintegration by parts
formulas of Lemma 2.2 in order to make H appear as function.Again,
thiscorollary
issimple
to proveby applying
denominatorsplitting
and
d-A-interchange.
In thedifferentiability proof
of theL-mapping
wehave to be able to bound the terms above in the supnorm of H. It is
now clear that this can be done with the
integration by parts
formulas.We will see that this is the whole
story
of thedifferentiability proofs:
weuse
denominator-splitting
andd-A-interchange
in order toget
anintegral
FdH,
where Fgenerates
a measure(so
it must be of boundedvariation)
and then we
apply
ourintegration by parts
formulas in order to bound these terms in the supnorm of H.We did not
deal, yet,
with anintegral
of theform 0,
= oo, which we want to show to converge to zero.
Here,
one cannot dointegration by parts
in order to bound this in the supnorm ofFn,
because H is not of finite variation. The nextingredient
takes care ofthis,
the so calledHelly-Bray technique:
LEMMA 2.5
(Helly-Bray lemma). - If H
Eunbounded
variation,
then we canapproximate
H with a sequenceHm
where
M(m)
00and IIH -
- 0. Thisgives
us thefollowing
bound:For
Hm
one can(e.g.)
take thestep
functionequal
to H on agrid
Jrm.We did the substitution H =
(H - Hm)
+integration by parts
andbounding
termslike J FdHm by
The bound in Lemma 2.5 is useful because it proves thatintegrals
of theform J H dFn
converge tozero
0,
even = oo,provided
o0(just
let ?7~ - ooslowly enough).
ILLUSTRATION 1. - We will illustrate how these lemmas
easily provide
us with the characterization of
compact differentiability (F, G) -
J FdG
at apoint (F, G)
with F and Gcadlag
functions of boundedGILL, M. J. VAN DER LAAN AND J. A. WELLNER
variation for sequences
Fn , G~
ofuniformly (in n)
bounded variation: ifYn
==2n
== thenfor a certain continuous linear
mapping (D~O,
-7 R. Wehave
by telescoping:
So if we subtract from this its
supposed
limitthen we
get (again) by telescoping:
where the last two
integrals
are definedby integration by parts.
The firstintegral
canimmediately
be bounded - 0. The secondintegral
converges to zeroby
theHelly-Bray
lemma. For the thirdintegral
we can do
integration by parts
withrespect
toFn
andthereby
bound thisterm
by cllZn -
0. °ILLUSTRATION 2. - We will
give
an illustration of how these lemmas areused to prove convergence to zero of
quite complicated
terms which we willencounter in our
analysis
of the Dabrowska’s estimator. Consider the term/ (1//~(~, v) - v))H(du, dv),
where H is ofunbounded variation,
and
,~~ (u, v), v)
is the denominator of L as defined abovecorresponding
to
(1110, (Aio, Aoi)
and converges insupnorm to (AlO, AOl)’
We will show that this term converges to zero if
10, 01, n10, AOl
havethe
following
fourproperties :
1 ) f3
> 8 > 0 on[0, T]
for certain 8 > 0.2)
There exists a sequence ofuniformly
in n finite(signed)
measures 1l2nso that «
(dv)
for all u.Similarly
for10, 01, n01.
3)
There exists a sequence ofuniformly
in n finite(signed)
measures Illnso that
(du, v)
« 2n(du)
for all u.Similarly
for10, 01,
4)
M and M forcertain M oo
(uniform
boundedness of theRadon-Nykodym derivatives).
Similarly
for10, 01, 01.
Annales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
In our
applications
theassumptions
2-4 areeasily
verifiedby
asimple
choice of ~.G2n, and
by
definition of[0 , T] assumption
1 will holdtrivially.
This will be done in theproof
of the final theorem in section 5.This term involves all the above
techniques. Apply
the denominatorsplitting
lemma to rewritev ) - v ) .
Thisgives
Then the
integral
is the sum of threeintegrals
which we will denote withA,
B and Crespectively.
The first term A isgiven by:
We did the substitution H = H -
Hm
+Hm (Helly-Bray)
andapplied d-A-interchange.
Consider the first term, say A 1.Al.
Here,
we canapply
the secondintegration by parts
formula of Lemma 2.2. Then one of the terms weget
is thefollowing:
GILL, M. J. VAN DER LAAN AND J. A. WELLNER
where we used
assumption /3n
> 6 > 0 on[0, T]
for certain 8 > 0 in the lastline,
which follows fromassumption
1 and the uniform convergence of/3n
to
/?.
The other terms onegets
afterapplying integration by parts
are dealt in the same way.By assumption
2-4 we have:for certain M’ oo. So if
Aïo
satisfiesassumptions 1-4,
thenI
0 for m - oo. The otherterms are dealt
similarly using
theassumptions
1-4 forAio
andA2. The second term can be bounded
by
the supnorm ofAïo(6u, v)) - v)) (which
converges tozero)
times the variation norm of
Hm .
So if we let m =
m(n)
- ooslowly enough
for n - oo, then both terms Al and A2 converge to zero.The second term B is dealt
similarly. Now,
we will deal with the thirdterm C.
Firstly, by telescoping
we can rewrite:We have to
integrate
these terms withrespect
to H. We set H =(H - Hm )
+Hm (here
anapplication
ofHelly-Bray-lemma starts).
By using
thed-0394-interchange
trick we can transform all three terms withH -
Hm
intointegrals
where H -Hm
appears as a function: e.g.Annales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques
So if
assumption
1-4holds,
then as we did above we can bound this termby
Similarly,
we have this bound for the other terms with H -Hm .
The threeterms with
Hm
we candirectly
boundby
where
M(m)
stands for a constant times the variation norm of So we can conclude that we have thefollowing
bound:where En converges to zero. Let now m - oo
slowly enough
to obtainthat this bound converges to zero
(here
endsHelly-Bray).
This proves thestatement.
In
general
all terms we will encounter in thedifferentiability proofs
aredealt in the
following
way:Telescoping
STEP 1. -
Firstly,
we dotelescoping
in order to rewrite a differenceof two
products
as a sum ofsingle differences: ~
AB =A)B + / An(Bn - B).
Consider one term(e.g.) A)B.
Here,
we know thatAn -7 A,
butAn
can appear as a measure in one or twocoordinates: ~ (~ 2014 A)(du, dv)B(u, v) A)(du, v)B(u, dv)
orthe easiest
case ~ /’ (~ 2014 A) ( u, v ) B ( du, dv ) .
Goal
STEP 2. - We want to bound the
term / (An - A)B,
where weusually
have that
An -
A appears as a measure, in the supnorm ofAn -
A which isknown to converge to zero. Therefore our
goal
is toget
this term in a formso that we can
apply integration by parts
withrespect
to B.Denominator-splitting, d-A-interchange
STEP 3.
Case 0. - If
An -
A appears as function we do not even have toapply integration by parts
and we areready.
Case 1. - If B
generates
a measure of bounded variation or is it aproduct
of such functions(of
bounded variation but some left and someright continuous)
we can bound the term in the supnorm ofAn -
Aby applying
theintegration by parts
formula of Lemma 2.2.Case 2. - If B is of unbounded
variation,
we substitute B =(B - Bm)
+Bm
and we now want to bound the term with B -Bm
in the supnorm of B -
Bm
and the term withBm
in the variation norm ofBm (Helly-Bray
Lemma2.5).
We go back tostep
3.Case 3. - If B involves the denominator
/3
wefirstly apply
the denominator trick Lemma 2.4 and d -A-interchange
Lemma 2.3 as inCorollary
2.1 inorder to rewrite the term to a term of Case 0 or 1.
3. DIFFERENTIABILITY OF THE
DABROWSKA, VOLTERRA,
AND PRENTICE AND CAI REPRESENTATIONS OF F
In this section our
goal
is to establish Hadamarddifferentiability
of theVolterra, Dabrowska,
and Prentice-Cairepresentations
of F. Infact,
wewill prove that each of these
representations
iscontinuously
Hadamarddifferentiable, thereby paving
the way forvalidity
of thebootstrap
in eachcase.
NOTATION AND ASSUMPTIONS ON SEQUENCES. - For any
symbol
which occursas
argument
of theanalyzed mapping,
sayA, An
andAf
are sequences which both converge in supnorm to A and moreover it will beautomatically
assumed that
they
are of bounded variationuniformly
in n.An plays
the role of the estimator of Ausing
theoriginal
data andAf
plays
the role of the sameestimator,
butusing
abootstrap sample
of theoriginal
data.3.1. The Volterra
Representation
We do the
proof
of the Volterrarepresentation
before theproof
of thebivariate
product integral (as part
of the Dabrowskarepresentation),
becauseAnnales de l ’lnstitut Henri Poincaré - Probabilités et Statistiques