A NNALES DE L ’I. H. P., SECTION B
J EAN B RETAGNOLLE
J AN M IELNICZUK
On asymptotic minimaxity of the adaptative kernel estimate of a density function
Annales de l’I. H. P., section B, tome 25, n
o2 (1989), p. 143-152
<http://www.numdam.org/item?id=AIHPB_1989__25_2_143_0>
© Gauthier-Villars, 1989, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
On asymptotic minimaxity of the adaptative kernel
estimate of
adensity function
Jean BRETAGNOLLE and Jan MIELNICZUK
(*)
Universite de Paris-Sud U.A. 743, C.N.R.S., Statistique Appliquee, Mathématiques, Bat. n° 425,
91405 Orsay Cedex, France
Ann. Inst. Henri Poincaré,
Vol. 25, n° 2, 1989, p. 143-152. Probabilités et Statistiques
ABSTRACT. - The
adaptative
kernel estimate of adensity
function witha data
dependent
bandwidtharising naturally
as anempirical counterpart
ofMSE-optimal parameter
isdealt
with. It is shown to beasymptotically
minimax in a
family
of densitieshaving
bounded second derivative in aneighbourhood
of 0. As aby-product
asimple proof
of Farell’s[4]
resulton minimax bounds for
density
estimates is obtained.Key words : Data dependent bandwidth, kernel estimator, mean square error, minimax estimate.
RESUME. - On considère l’estimateur a noyau
(adaptatif)
d’une densitéconstruit pour minimiser
asymptotiquement
lerisque quadratique
en unpoint.
On prouvequ’il
est minimax dans la famille des densitésayant
une dérivée seconde bornée auvoisinage
dupoint.
Du même coup on trouveune preuve
simple
des résultats de Farrell[4]
basée surl’inégalité
deCramer-Rao.
Mots clés : Estimateur a noyau, fenêtre adaptative, risque quadratique, estimateur mini-
max.
Classi fication A.M.S. : 62 G 05, 62 G 20.
(*) On leave from Institute of Computer Science, Polish Academy of Sciences Warsaw, Poland. Supported by the grant of the French government, No. 93572.
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques - 0294-1449
Vol. 25/89/02/143/10/$3,00/6) Gauthier-Villars
144 J. BRETAGNOLLE AND J. MIELNICZUK
1. INTRODUCTION
Let
Xi,
...,Xn
beindependent
real random variables with a commondistribution function F and a
density f
withrespect
toLebesgue
measureon R. Consider the
problem
ofnonparametric
estimationof 1(0) by
meansof kernel
estimate f ’h ( o) (Parzen [7]):
where kernel K
integrates
to 1 andh = h (n)
is asmoothing
parameter(bandwidth).
The natural bandwidth’s choice isprovided by computation
of the mean square error
R ( f, fh) = E f ( f (o) - fh (O)) 2;
under standardassumptions
onK, ~ f , f ":
If
f (o) ~ f " (o) ~ 0 minimizing
main terms of( 1. 2) yields ho satisfying
Replacing f(O)
andf " (0) by
somepreliminary
estimates weget
a naturalempirical counterpart ho, n
ofho
with thefollowing property
of theresulting adaptive
estimate(Woodroofe [11])
(Recall
thatho
andhho depend
onn. )
The
problem
offinding
suitable modification of(1.1)
which for fixeddensity f
willyield
smaller mean square error attracted attention of many authors. Theproposed
methods consist in bias reductionby considering
of so-called Parzen kernels
(Parzen [7]), geometric extrapolation (Tarell
and Scott
[10])
orchoosing appropriately
constructed bandwidthsdepend- ing
on data andpoint
at whichdensity
is estimated(Abramson [1]).
Suchprocedures usually yield
an estimate which is not a bona fidedensity
function
although
there exist methods to tackle with thisproblem (cf Gajek [5]).
All theseresults, however,
leave unanswered thequestion
about
possible improvement
or kernel estimateuniformly
over reasonable145
ASYMPTOTIC MINIMAXITY OF ADAPTATIVE KERNEL ESTIMATE
family
of densities. Forexample,
Farrell’s[4]
result shows that it is notpossible
toget
a better rate thann - 4~5
whenconsidering
thefamily
~ ofdensities
having
bounded second derivative in aneighbourhood
of 0 butimprovement
of a constant is apriori possible.
For the relatedproblems
in the area of
global
estimation we refer toBretagnolle
and Huber[3].
We show however that the answer to this
question
isnegative, namely
that the
adaptive
kernel estimator isasymptotically
minimax in ~ . The method is to construct aparametric family {f03B8}
of densitiesintrinsically joined
with the kernel estimatefh’
forwhich {f03B8}
oe W andcan be minorized for an
arbitrary
estimateIn of f(0).
As aby-product,
Farrell’s result on a minimax bound for
density
estimates is obtained.Note. - Our referee
pointed
out to our attention two recentpreprints
from Brown and Farrell
[12]
and from Donoho and Liu[13]: Estimating f(0)
in various families(slightly
different ofours) they
compare minimax lower bounds(on
allestimators)
to minimax lower bounds on kernel estimators or linearprocedures.
Brown and Farrellgive
tables forfinite samples.
2. THE CONSTRUCTION
From now on we will
impose
thefollowing
conditions on K and h(n):
(a)
kernel K is a functionintegrating
to1;
(b) ~ x : K {x) ~ 0 ~
c[ - A, A] for some OAoo;
Let us observe that if K is
nonnegative
function such that(a)
and(c) hold,
then(d)
isautomatically
satisfied.Let us
denote fo
anarbitrary
smoothdensity equal
to 1 on[ -1 i4, 1 /4}
and
put Kh ( . ) = h -1 K ( . /h).
From now on we will assume thatso that the support of
Kh
is contained in[-1/4, 1/4].
Wedefine eh by (we
center and reduceKh
wrtfo),
thusVol. 25, n° 2-1989.
146 J. BRETAGNOLLE AND J. MIELNICZUK
Moreover, using (c)
weget
Let 8 be some real number. If we assume
ph (o) = 0 (a + h)/~, h _ 1/2,
is a
density.
We shall useparametric
families=
f fe, ~ ~ 9 E O ~
where sup Ph(o) _ 1/2.
8
The
density f03B8,h
can berepresented
in theequivalent
wayLet f~
=fn (X
1, ...,~n)
denotearbitrary
estimate off (o).
The results arebased on the
following
LEMMA. - Let
robe (a - Taking
at stage n where n = T >1,
we haveProof -
First we observethat,
sincenh (n)
-~ ooby
condition(e),
thecondition on ph
(0)
isasymptotically
fulfilled as lim sup ph(8)
= 0.n 9 E 9n
Further
thus the bias
(o) - fe (0)
= 0( (3 - oc)/~,
h.Moreover
and
straightforward computations using (2. 1), (2.2)
and(2. 3) give:
Thus
using variance-squared
biasdecomposition
offh)
we haveOn the other
hand, putting cp (8) = E~ ( f~ (o))
andusing
Cramer-Rao’sbound,
147
ASYMPTOTIC MINIMAXITY OF ADAPTATIVE KERNEL ESTIMATE
where
(Cramer-Rao’s
conditions are fullfilled since allexpressions
arepolynomial
wrt
8). Denoting by
R( fn/fh)
the ratio of therespective
mean square errorswe have
Setting cp { 8) =1 + ~8 + ~, Y ( 8),
weget, using ( 2 . 1 ) :
Substituting
andwe obtain
Since lim then
denoting by Ty
the supremum of then e
above
expression
on[-T~~2, T1~2]
it isenough
to show thatTy
is bounded from belowby
theexpressions given
in(2. 4).
Let usproceed by
contradic-tion. Assume that 1 and
consider
R definedby:
Since R
(t)
is convex, with R( 1) 1-_ R (0), R’ (o) _- o,
thusrxy- y’ 0.
Whence
putting
we obtain that z isnondecreasing.
Observe futher that
y (x) _ -y ( - x) provides
the same bound T.Thus, by convexity,
r is greater that the bound obtained for(y
+y)/2.
In otherwords,
we can assume y an oddfunction, whet, together
withz(0)=0 implies
foro _ x T l2.
Consider first the case Note that r > ro.
Since y
ispositive
on[0,
it is sufficient toverify
thatinequality
holds atx = Tl/2
where(using y > 0)
it isimplied by ro T > (1 + ro T) ( 1- (ra T) -1).
For
r = ro = 0
we argue as follows: ify (T1~2) _ 1- T -1~2
then in view of there existsT1~2]
such thaty’ (s) _ T-1~2 -1/T
and1 -Y’ (s) ~ ~ - T’ 1~2~
DVol. 25, n° 2-1989.
148 J. BRETAGNOLLE AND J. MIELNICZUK
3. RESULTS
Let us define
L) (y = k + cx, 0 a __ 1)
the class of functions pos-sessing
k-th order derivative(k >_ 1)
such thatfor -1/4 _ x, y _ 1/4
We will show that the lemma
yields
instraightforward
manner the minimax bounds ondensity
estimates(Farrell [4]),
see also theorem 5 .1 inIbragi-
mov, Hasminskii
[6]).
COROLLARY 1.
Proof -
In order to prove thecorollary
1 let us consider anarbitrary,
nonuniform kernel
satisfying
conditions(a) - (d)
and such that Holdercondition
(3 . 1)
is satisfiedby K(k)
with constantC (K)
and the power a.It is easy to see that in this case
_fR
definedby (2. 3) satisfy
Thus if E>h and
then
L).
Underassumption nh2Y + 1 = C
the condition(3. 2)
isequivalent
to thefollowing
condition on the range of 8: e E[
-8max]
where Observe also that the
proof
of thelemma
yields
forsufficiently large n
Thus
using (2 . 4)
we obtain(?-o > 0)
(1+o(1))R(.~e~.f’n)~~’’o2~+uC~~t2Y+1~(~W) l~~-~2~~)/(r21-~C(I~’-E)).
Whence we obtain that lim sup R
( fo, fn) n2’’~t2~r +
1is
bounded from below~
0by
It is easy to check that the maximum of the above
expression
is attained atC = a (2 y + 1 )
and isequal
toSince E > 0 can be chosen
arbitrarily
thisgives
the boundequal
toL(0),
for y = 2 L
(0) = (5/6) (r L2 ~i6/6 C2 (K)) 1~5.
DLet
I ~ . I denote
supremum norm on[ -1 /4, 1/4].
We will prove.149
ASYMPTOTIC MINIMAXITY OF ADAPTATIVE KERNEL ESTIMATE
COROLLARY 2. - Let ~ be a
family of
twicedifferentiable
densities such thatI I f " I I
~. Let K benonuniform
kernelsatisfying
conditions(a) - (d)
such that K" exists and is bounded. Let
f(n, K, C)
be the kernel estimateof j(O)
withnh5
= C.There exists a constant D
(K) depending only
on K such thatProof - Put 03B4=sup|k’’| I
and let range of 8 be definedby
Since
82/h = o (1)
for such choice of 8and h,
we have2 for
sufficiently large
n.Moreover,
it is easy to see thatI - n 82 ~2/(~ - h) nh5 = n 82 ~2/~ C + o ( 1)-
Thus
2
forsufficiently large n.
The relation(2. 4) yields
theconclusion of the
corollary
with boundgreater
than 1-82 /(r2 [3)
C. DThe
Corollary
2 should becompared
with thefollowing
result. Leta E [ao, 03B11]
whereSo
will stand for an interval[ - so, so]
and F03B1
consists of densities of thefollowing
form:(a)
(b) r (x) x~S0.
It is easy to see that is
nonempty
for every choice of r(x) satisfying ( b) if
and
Put fF = summation
being
overaE[ao, Xi].
It is known(cf.
Sacksand Ylvisacker
[9]
and Sacks and Strawderman[8]), respectively
that thekernel estimate based on
Epanechnikov
kernel andm)n-1/5)
isasymptotically
minimax in Fagainst
the class of kernelestimates fn
with deterministicbandwidths, namely
but it is not
asymptotically
minimaxagainst
allpossible
choicesof fn
i. e.there exists an
estimate In
such thatfor some a > 0.
The link of this result with our
Corollary
2 is thefollowing.
It is clearthat under the
imposed assumptions
oe W forappropriate
choice ofVol. 25, n° 2-1989.
150 J. BRETAGNOLLE AND J. MIELNICZUK
so, ocl and m. Thus
taking
inCorollary
2 theEpanechnikov
kernel for K and we obtain that there exists8o
such thatfor some small
positive
number c. At the same timeThus Sacks and Strawderman’s result shows that it is not
possible
toreplace
the lower bound in theCorollary
2by
1. However we will showthat the
adaptive
kernel estimate isasymptotically
minimax among allestimates fn.
Put y = and consider the
following adaptive
kernel estimate(cf
Woodroofe[11]).
Let H besymmetric
two times differentiable kernel with acompact support
and let~s (0)
be a kernel estimate off(0)
definedin
( 1. 1 ) pertaining
to the kernel H and bandwidthsequence 03B4(n).
Denoteby (0)
its second derivativeThe
adaptive
bandwidth is definedby
thefollowing equation (cf (1.3)].
If
then
where cn ~
0, Cn ~
~.We prove
that fn
cannot beimproved uniformly
COROLLARY 3. - Assume that K is
symmetric and H" I
andI K" I
arebounded. Let 6
(n)
= A +£for
somepositive
A and E and assume thatCn = nb
wherea > 0, 0 11 b/5 6 E.
Leth
bedefined by ( 3 . 3).
Then
Proo f -
LethM(hm) be
definedby
theequality
Con-sider the
parametric family
defined inCorollary
2 with Creplaced by C . n K, CJ. By Corollary
2To
enlighten notations,
we set151
ASYMPTOTIC MINIMAXITY OF ADAPTATIVE KERNEL ESTIMATE
Now it sufficies to show that
which follows from
Observe that in view of
(2. 5)
thus it suffices to show that
uniformly
in e. SinceI K is
boundedby
a we haveFurther
and observe that twofold
integration by parts yields
and
substituting f e (x) = (~i - h~ - ~~2
ohM
s~2 K" for small x weobtain that the above
integral
is of the formExpanding
H around 0 andusing
the fact thatJK" (y) dy = 0
and K" issymmetric
we obtain that the aboveintegral
is of order(C~ ha~/n)~’~ 6~ ~.
Thus it is easy to see that a condition
- ..- -
implies
thatAnalogously
we obtain thatprovided
thatVol. 25, n° 2-1989.
152 J. BRETAGNOLLE AND J. MIELNICZUK
Observe further
that f03B4
andf’’03B4
are sums of i. i. d. r. v’s andThus Benett
inequality (Benett [2]) together
with(3.6)
and(3.7) yields
for some
C, C1, C~ > o.
Thus(3 . 4)
is satisfiedprovided
thatC;/5 e~
z~s exp( - n5E /Cn) =
o(I)
which is obvious in view of theimposed assumptions.
DRemark. - Let us devide the
sample
into threeparts
of size m, m,n - m
respectively,
wherem = o (n)
and let denote now thepilot
estimate of the
density (the
secondderivative)
based on the first(the second) subsample.
Denoteby
the kernel estimate based on the thirdsubsample
with h defined as in(3.3).
The similarreasoning
to thatpresented
above shows thatCorollary
3 is also true forf n
under suitablegrowth
conditionsimposed
on c~ andCn.
REFERENCES
[1] I. S. ABRAMSON, On Bandwidth Variation in Kernel Estimates a Square Root Law, Ann. Stat., Vol. 10, 1982, pp. 1217-1223.
[2] G. BENETT, Probability Inequalities for the Sum of Independent Random Variables,
J. Amer. Statist., Vol. 57, 1962, pp. 33-45.
[3] J. BRETAGNOLLE and C. HUBER, Estimation des densités : risque minimax,
Z. Wachrscheinlichkeitstheorie und verw. Gebiete, Vol. 47, 1979, pp. 113-137.
[4] R. H. FARRELL, On the Best Obtainable Rates of Convergence in Estimation of a
Density Function at a Point, Ann. Math. Statist., Vol. 43, 1972, pp. 170-180.
[5] L. GAJEK, On Improving Density Functions Wchich are not Bona Fide Functions, Ann.
Statist., Vol. 14, 1986, pp. 1612-1618.
[6] I. A. IBRAGIMOV and R. Z. HASMINSKII, Statistical Estimation, Asymptotic Theory, Springer, 1981.
[7] E. PARZEN, On Estimation of Probability Density and Mode, Ann. Math. Statist., Vol. 33, 1962, pp. 1065-1076.
[8] J. SACKS and W. STRAWDERMAN, Improvements on Linear Minimax Estimates. In Statistical Decision Theory and Related Topics 3, Vol. 2, S. S. Gupta and J. O. Berger Eds., New York, Academic Press, 1982, pp. 287-304.
[9] J. SACKS and D. YLVISACKER, Asymptotically Optimal Kernels for Density Estimation
at a Point, Ann. Statist., Vol. 9, 1981, pp. 334-346.
[10] G. R. TARELL and D. W. SCOTT, On Improving Convergence Rates for Nonnegative
Kernel Density Estimators, Ann. Statist., Vol. 8, 1980, pp. 1160-1163.
[11] M. WOODROOFE, On Chosing a Delta Sequence, Ann. Math. Statist., Vol. 41, 1970, pp. 1665-1671.
[12] L. D. BROWN and R. H. FARRELL, A Lower Bound for the Risk in Estimating the Value of a Probability Density, Preprint Math. Dt Cornell University, 1987.
[13] D. L. DONOHO and R. L. LIU, Geometrizing Rates of Convergence, III, Preprint Dt of Statistics, University of California, 1987.
(Manuscrit reçu le 11 avril 1988) (corrigé le 28 novembre 1988.)