On asymptotic minimaxity of the adaptative kernel estimate of a density function

(1)

A NNALES DE L ’I. H. P., SECTION B

J EAN B RETAGNOLLE

J ^AN M ^IELNICZUK

On asymptotic minimaxity of the adaptative kernel estimate of a density function

Annales de l’I. H. P., section B, tome 25, n

^o

2 (1989), p. 143-152

<http://www.numdam.org/item?id=AIHPB_1989__25_2_143_0>

L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les conditions générales d’utilisation (http://www.numdam.org/conditions). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.

Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques

(2)

On asymptotic minimaxity ^{of the} adaptative ^kernel

estimate of

a

density ^function

Jean BRETAGNOLLE and Jan MIELNICZUK

(*)

Universite de Paris-Sud U.A. 743, C.N.R.S., Statistique Appliquee, Mathématiques, ^Bat.^n°425,

91405 Orsay Cedex, ^France

Ann. Inst. Henri Poincaré,

Vol. 25, ^n°2, 1989, p. 143-152. Probabilités et Statistiques

ABSTRACT. - The

adaptative

kernel estimate of a

density

^function^with

a data

dependent

^bandwidth

arising naturally

^{as an}

empirical counterpart

of

MSE-optimal parameter

^is

dealt

with. It is shown to be

asymptotically

minimax in a

family

of densities

having

bounded second derivative in a

neighbourhood

^{of 0. As}^a

by-product

^a

simple proof

of Farell’s

[4]

^result

on minimax bounds for

density

^estimatesis obtained.

Key ^{words :}^Datadependent bandwidth, kernel estimator, ^meansquare error, minimax estimate.

RESUME. - On considère l’estimateur a _noyau

(adaptatif)

^d’une^densité

construit pour minimiser

asymptotiquement

^le

risque quadratique

^{en un}

point.

^Onprouve

qu’il

^estminimax dans la famille des densités

ayant

^une dérivée seconde bornée au

voisinage

^du

point.

^Du^mêmecoup ^on^trouve

une preuve

simple

^desrésultats de Farrell

[4]

^basée^sur

l’inégalité

^de

Cramer-Rao.

Mots clés : Estimateur a noyau, fenêtre adaptative, risque quadratique, estimateur mini-

max.

Classi fication A.M.S. : 62 G 05, ^{62 G}^20.

(*) On leave from Institute of Computer Science, Polish Academy of Sciences Warsaw, Poland. Supported by ^thegrant of the French government, No. 93572.

Annales de l’Institut Henri Poincaré - Probabilités et Statistiques - ^0294-1449

Vol. 25/89/02/143/10/$3,00/6) Gauthier-Villars

(3)

144 J. BRETAGNOLLE AND J. MIELNICZUK

1. INTRODUCTION

Let

Xi,

^...,

Xn

^be

independent

real random variables with a common

distribution function F and a

density f

^with

respect

^to

Lebesgue

^measure

on R. Consider the

problem

^of

nonparametric

estimation

of 1(0) by

^means

of kernel

estimate f ’h ( o) (Parzen [7]):

where kernel K

integrates

^to¹^and

h = h (n)

^is^a

^smoothing

parameter

(bandwidth).

The natural bandwidth’s choice is

provided by computation

of the mean square ^error

R ( f, fh) = E f ( f (o) - fh (O)) 2;

under standard

assumptions

^on

K, ~ f , f ":

If

f (o) ~ f " (o) ~ 0 minimizing

^main^terms^of

( 1. 2) yields ho satisfying

Replacing f(O)

^and

f " (0) by

^some

preliminary

^estimates^we

get

^a^natural

empirical counterpart ho, n

^of

ho

^{with the}

following property

^{of the}

resulting adaptive

^estimate

(Woodroofe [11])

(Recall

^that

ho

^and

hho ^depend

^on

^{n. )}

The

problem

^of

finding

suitable modification of

(1.1)

^{which for}^fixed

density f

^will

yield

^smaller^meansquare ^errorattracted attention of _many authors. The

proposed

methods consist in bias reduction

by considering

of so-called Parzen kernels

(Parzen [7]), geometric extrapolation (Tarell

and Scott

[10])

^or

choosing appropriately

constructed bandwidths

depend- ing

^on^{data and}

point

^at^which

density

is estimated

(Abramson [1]).

^Such

procedures usually yield

^anestimate which is not a bona fide

density

function

although

there exist methods to tackle with this

problem (cf Gajek [5]).

All these

results, however,

leave unanswered the

question

about

possible improvement

^orkernel estimate

uniformly

^over^reasonable

(4)

145

ASYMPTOTIC MINIMAXITY OF ADAPTATIVE KERNEL ESTIMATE

family

of densities. For

example,

^Farrell’s

[4]

result shows that it is not

possible

^to

^get

^a^better^rate^than

^{n - 4~5}

^when

considering

^the

family

^~^of

densities

having

bounded second derivative in a

neighbourhood

^{of 0}^but

improvement

ôfâ^constantîsâ

priori possible.

^{For the}^related

problems

in the ^areaof

global

estimation ^werefer to

Bretagnolle

^{and Huber}

[3].

We show however that the answer to this

question

^is

negative, namely

that the

adaptive

kernel estimator is

asymptotically

minimax in ~ . The method is to construct a

parametric family {f03B8}

^of^densities

intrinsically joined

with the kernel estimate

fh’

^for

which {f03B8}

^{oe W and}

can be minorized for an

arbitrary

^estimate

In of f(0).

^As^a

by-product,

Farrell’s result on a minimax bound for

density

estimates is obtained.

Note. - Our referee

pointed

ôut^toôurâttentiontwo recent

preprints

from Brown and Farrell

[12]

^andfrom Donoho and Liu

[13]: Estimating f(0)

in various families

(slightly

different of

ours) they

compare minimax lower bounds

(on

^all

estimators)

^tominimax lower bounds ^onkernel estimators or linear

procedures.

^{Brown and}^Farrell

give

tables for

finite samples.

2. THE CONSTRUCTION

From now on we will

impose

^the

following

conditions on K and h

(n):

(a)

^kernel^K^is^a^function

integrating

^to

1;

(b) ~ x : K {x) ~ 0 ~

^c

^{[ - A, A]} for some OAoo;

Let us observe that if K is

nonnegative

function such that

(a)

^and

(c) hold,

^then

(d)

^is

automatically

^satisfied.

Let us

denote fo

^an

arbitrary

^smooth

density equal

^to¹^on

[ -1 i4, 1 /4}

and

put Kh ( . ) = h -1 K ( . /h).

^From^{now on we}^will^assume^that

so that the support ^of

Kh

is contained in

[-1/4, 1/4].

^We

define eh by (we

^centerand reduce

Kh

^wrt

fo),

^thus

Vol. 25, ^n°2-1989.

(5)

Moreover, using (c)

^we

get

Let 8 be some real number. If we assume

ph (o) = 0 (a + h)/~, h _ 1/2,

is a

density.

^We^shall ^use

parametric

^families

=

f fe, ~ ~ 9 E O ~

where sup Ph

(o) _ 1/2.

8

The

density f03B8,h

^can^be

represented

^{in the}

equivalent

way

Let f~

⁼

fn (X

^1,^...,

~n)

^denote

arbitrary

estimate of

f (o).

The results are

based on the

following

LEMMA. - Let

robe (a - Taking

^atstage ⁿ where n ⁼T >

1,

^we^have

Proof -

^First^we^observe

that,

^since

nh (n)

^-~^oo

by

^condition

(e),

^the

condition on ph

(0)

^is

asymptotically

^fulfilled^as^limsup ph

(8)

⁼^0.

n 9 E 9n

Further

thus the bias

(o) - fe (0)

⁼⁰

( (3 - oc)/~,

^h.

Moreover

and

straightforward computations using (2. 1), (2.2)

^and

(2. 3) give:

Thus

using variance-squared

^bias

decomposition

^of

fh)

^we^have

On the other

hand, putting cp (8) = E~ ( f~ (o))

^and

using

Cramer-Rao’s

bound,

(6)

147

where

(Cramer-Rao’s

conditions ^arefullfilled since all

expressions

^are

polynomial

wrt

8). Denoting by

^R

( fn/fh)

the ratio of the

respective

^meansquare ^errors

we have

Setting cp { 8) =1 + ~8 + ~, Y ( 8),

^we

get, using ( 2 . 1 ) :

Substituting

^and

we obtain

Since lim then

denoting by _Ty

the supremum of the

n e

above

expression

^on

[-T~~2, T1~2]

^{it is}

^enough

^to^{show that}

Ty

is bounded from below

by

^the

expressions given

ⁱⁿ

(2. 4).

^Let^us

proceed by

^contradic-

tion. Assume that 1 and

consider

^Rdefined

by:

Since R

(t)

^is^convex,^with^R

( 1) ^{1-_ R} (0), R’ (o) _- o,

^thus

rxy- y’ 0.

Whence

putting

^weobtain that z is

nondecreasing.

Observe futher that

y (x) _ -y ( - x) ^provides

^the^same^bound^T.

^Thus, ^by convexity,

^r^isgreater that the bound obtained for

(y

⁺

y)/2.

^In^other

words,

we can assume y ^anodd

function, whet, together

^with

z(0)=0 implies

^for

o _ x T l2.

Consider first the case Note that r > ro.

Since y

^is

positive

^on

[0,

^{it is}sufficient to

verify

^that

inequality

^holds^at

^{x = Tl/2}

^where

(using y > 0)

^{it is}

implied by ro ^{T >} (1 + ro T) ( 1- (ra T) -1).

For

r = ro = 0

^weargue ^asfollows: if

y (T1~2) _ ^{1- T -1~2}

then in view of there exists

T1~2]

such that

y’ (s) _ T-1~2 -1/T

^and

1 -Y’ (s) ~ ~ - T’ 1~2~

^D

Vol. 25, n° ^2-1989.

(7)

3. RESULTS

Let us define

L) (y = k + cx, 0 a __ 1)

the class of functions _pos-

sessing

k-th order derivative

(k >_ 1)

^{such that}

for -1/4 _ x, y _ 1/4

We will show that the lemma

yields

ⁱⁿ

straightforward

^mannerthe minimax bounds on

density

^estimates

(Farrell [4]),

^seealso theorem 5 .1 in

Ibragi-

mov, Hasminskii

[6]).

COROLLARY 1.

Proof -

^In^order^to^prove^the

corollary

^{1 let}^us^consider^an

arbitrary,

nonuniform kernel

satisfying

conditions

(a) - (d)

^{and such}^that^Holder

condition

(3 . 1)

^is^satisfied

by ^K(k)

^with^constant

C (K)

and the power ^a.

It is easy to ^seethat in this case

_fR

^defined

by (2. 3) satisfy

Thus if E>h and

then

L).

^Under

assumption nh2Y + 1 = C

the condition

(3. 2)

^is

equivalent

^to^the

following

^condition^on^the^range^{of 8: e}^E

[

^-

8max]

where Observe also that the

proof

^of^the

lemma

yields

^for

sufficiently large n

Thus

using (2 . 4)

^we^obtain

(?-o > 0)

(1+o(1))R(.~e~.f’n)~~’’o2~+uC~~t2Y+1~(~W) l~~-~2~~)/(r21-~C(I~’-E)).

Whence we obtain that _{lim sup}^R

( fo, fn) n2’’~t2~r +

¹

is

bounded from below

~

⁰

by

It is _easyto check that the maximum of the above

expression

is attained at

C = a (2 y + 1 )

^{and is}

equal

^to

Since E > 0 can be chosen

arbitrarily

^this

gives

^{the bound}

equal

^to

L(0),

for _y⁼2 L

(0) = (5/6) (r ^L2 ~i6/6 ^C2 (K)) 1~5.

^D

Let

I ~ . I denote

^supremum^{norm on}

^{[ -1 /4,} ^1/4].

We will prove.

(8)

149

COROLLARY 2. ^-Let ~ be a

family of

^twice

differentiable

densities such that

I I f " I I

^~.^Let^K^be

nonuniform

^kernel

satisfying

conditions

(a) - (d)

such that K" exists and is bounded. Let

f(n, ^K, C)

be the kernel estimate

of j(O)

^with

^nh5

⁼^C.

There exists a constant D

(K) depending only

^on^K^{such that}

Proof - Put 03B4=sup|k’’| I

^and ^let ^range ^{of 8 be}^defined

^by

Since

82/h = o (1)

^forsuch choice of 8

and h,

^we^have

2 for

sufficiently large

^n.

^Moreover,

^{it is}^easy^to^see^that

I - n 82 ~2/(~ - h) nh5 = n 82 ~2/~ C + o ( 1)-

Thus

2

^for

sufficiently large n.

The relation

(2. 4) yields

^the

conclusion of the

corollary

with bound

greater

^{than 1-}

82 /(r2 [3)

^C. ^D

The

Corollary

²should be

compared

^{with the}

following

result. Let

a E [ao, 03B11]

^where

So

^will^{stand for}^an^interval

[ - so, so]

and F03B1

consists of densities of the

following

^form:

(a)

(b) r (x) ^x~S0.

It is easy ^to^seethat is

nonempty

for every choice of ^r

(x) satisfying ( b) if

and

Put fF = summation

being

^over

aE[ao, Xi].

It is known

(cf.

^Sacks

and Ylvisacker

[9]

and Sacks and Strawderman

[8]), respectively

^{that the}

kernel estimate based on

Epanechnikov

^kernel ^and

m)n-1/5)

^is

asymptotically

minimax in F

against

the class of kernel

estimates fn

with deterministic

bandwidths, namely

but it is not

asymptotically

^minimax

against

^all

possible

^choices

of fn

^i.^e.

there exists an

estimate In

^{such that}

for some a > 0.

The link of this result with our

Corollary

^{2 is the}

following.

^It^is^clear

that under the

imposed assumptions

^{oe W for}

appropriate

^{choice of}

Vol. 25, n° ^2-1989.

(9)

so, ocl and m. Thus

taking

ⁱⁿ

Corollary

^{2 the}

Epanechnikov

kernel for K and we obtain that there exists

8o

^{such that}

for some small

positive

^number^c.^{At the}^same^time

Thus Sacks and Strawderman’s result shows that it is not

possible

^to

replace

^thelower bound in the

Corollary

²

by

^1.^However^we^{will show}

that the

adaptive

kernel estimate is

asymptotically

minimax among all

estimates fn.

Put _{y =} and consider the

following adaptive

kernel estimate

(cf

^Woodroofe

[11]).

^Let^H^be

symmetric

^twotimes differentiable kernel with ^a

compact support

^{and let}

~s (0)

^be^a^kernelestimate of

f(0)

^defined

in

( 1. 1 ) pertaining

^tothe kernel ^Hand bandwidth

sequence 03B4(n).

^Denote

by (0)

its second derivative

The

adaptive

^bandwidth^is^defined

by

^the

following equation (cf (1.3)].

If

then

where cn ^~

0, Cn ~

^~.

We prove

that fn

^cannot^be

improved uniformly

COROLLARY 3. ^-Assume that K is

symmetric and H" I

^and

I K" I

^are

bounded. Let 6

(n)

^{= A} ^+£

for

^some

positive

^{A and}Êândâssume^that

Cn = nb

^where

^{a > 0,} 0 11 b/5 6 E.

^Let

h

be

defined by ( 3 . 3).

Then

Proo f -

^Let

hM(hm) be

^defined

^by

^the

^equality

^Con-

sider the

parametric family

^definedⁱⁿ

Corollary

^{2 with C}

replaced by C . n K, CJ. By Corollary

²

To

enlighten notations,

^we^set

(10)

151

Now it sufficies to show that

which follows from

Observe that in view of

(2. 5)

thus it suffices to show that

uniformly

in e. Since

I K is

^bounded

^by

^{a we}^have

Further

and observe that twofold

integration by parts yields

and

substituting f e (x) = (~i - h~ - ~~2

^o

hM

^s~2^K" ^for^small^{x we}

obtain that the above

integral

^{is of the}^form

Expanding

^H^{around 0}^and

using

^the^fact^that

JK" (y) dy = 0

^and^{K" is}

symmetric

^we^obtainthat the above

integral

is of order

(C~ ha~/n)~’~ 6~ ~.

Thus it is _{easy to}see that ^acondition

- ..- -

implies

^that

Analogously

^we^obtain^that

provided

^that

Vol. 25, ^n°^2-1989.

(11)

Observe further

that f03B4

^and

f’’03B4

^{are sums}of i. i. d. r. v’s and

Thus Benett

inequality (Benett [2]) together

^with

(3.6)

^and

(3.7) yields

for some

C, C1, C~ > o.

^Thus

(3 . 4)

^is ^satisfied

provided

^that

C;/5 e~

^z~s^exp

( - n5E /Cn) =

^o

^(I)

which is obvious in view of the

imposed assumptions.

^D

Remark. - Let us devide the

sample

^into^three

parts

^{of size}m, m,

n - m

respectively,

^where

m = o (n)

^{and let} ^denote^now^the

pilot

estimate of the

density (the

^second

derivative)

^based^onthe first

(the second) subsample.

^Denote

by

the kernel estimate based ^onthe third

subsample

with h defined as in

(3.3).

The similar

reasoning

^to^that

presented

above shows that

Corollary

3 is also ^truefor

f n

under suitable

growth

conditions

imposed

^onc~ and

Cn.

REFERENCES

[1] ^I.^S.ABRAMSON, ^OnBandwidth Variation in Kernel Estimates a Square ^RootLaw, Ann. Stat., ^Vol.10, 1982, pp. 1217-1223.

[2] ^G.BENETT, Probability Inequalities ^{for the}^Sum^ofIndependent ^RandomVariables,

J. Amer. Statist., ^Vol.57, 1962, pp. 33-45.

[3] ^J. BRETAGNOLLE and C. HUBER, Estimation des densités : risque minimax,

Z. Wachrscheinlichkeitstheorie und verw. Gebiete, ^Vol.47, 1979, pp. 113-137.

[4] ^{R. H.}FARRELL, On the Best Obtainable Rates of Convergence in Estimation of a

Density ^Function^at^aPoint, Ann. Math. Statist., Vol. 43, 1972, pp. 170-180.

[5] ^L.GAJEK, ^OnImproving Density Functions Wchich are not Bona Fide Functions, ^Ann.

Statist., Vol. 14, 1986, pp. 1612-1618.

[6] ^I.A. IBRAGIMOV and R. Z. HASMINSKII, Statistical Estimation, Asymptotic Theory, Springer, ^1981.

[7] Ê.PARZEN, ÔnEstimation of Probability Density ând^Mode,Ânn.^Math.Statist., Vol. 33, 1962, pp. 1065-1076.

[8] J. SACKS and W. STRAWDERMAN, Improvements ^onLinear Minimax Estimates. In Statistical Decision Theory and Related Topics ^3,^Vol.2, ^{S. S.}Gupta and J. O. Berger Eds., ^NewYork, Academic Press, 1982, pp. 287-304.

[9] J. SACKS and D. YLVISACKER, Asymptotically Optimal ^Kernels^forDensity ^Estimation

at a Point, ^Ann.Statist., Vol. 9, 1981, pp. 334-346.

[10] G. R. TARELL and D. W. SCOTT, ^OnImproving Convergence ^{Rates for}Nonnegative

Kernel Density Estimators, ^Ann.Statist., Vol. 8, 1980, pp. 1160-1163.

[11] ^M.WOODROOFE, ÔnChosing â^DeltaSequence, Ânn.^Math.Statist., ^Vol.41, 1970, pp. 1665-1671.

[12] L. D. BROWN and R. H. FARRELL, ^{A Lower}^Boundfor the Risk in Estimating ^the^Value of a Probability Density, Preprint Math. Dt Cornell University, ^1987.

[13] D. L. DONOHO and R. L. LIU, Geometrizing ^Ratesof Convergence, III, Preprint ^Dt^of Statistics, University ^ofCalifornia, ^1987.

(Manuscrit reçu le 11 avril 1988) (corrigé ^le²⁸^novembre1988.)

On asymptotic minimaxity of the adaptative kernel estimate of a density function

A NNALES DE L ’I. H. P., SECTION B

J EAN B RETAGNOLLE

J AN M IELNICZUK

On asymptotic minimaxity of the adaptative kernel estimate of a density function

Annales de l’I. H. P., section B, tome 25, n

2 (1989), p. 143-152

On asymptotic minimaxity of the adaptative kernel

estimate of

density function

(*)

adaptative

density

dependent

arising naturally

empirical counterpart

MSE-optimal parameter

dealt

asymptotically

family

having

neighbourhood

by-product

simple proof

[4]

density

(adaptatif)

asymptotiquement

risque quadratique

point.

qu’il

ayant

voisinage

point.

simple

[4]

l’inégalité

Xi,

Xn

independent

density f

respect

Lebesgue

problem

nonparametric

of 1(0) by

estimate f ’h ( o) (Parzen [7]):

integrates

h = h (n)

smoothing

(bandwidth).

provided by computation

R ( f, fh) = E f ( f (o) - fh (O)) 2;

assumptions

K, ~ f , f ":

f (o) ~ f " (o) ~ 0 minimizing

( 1. 2) yields ho satisfying

Replacing f(O)

f " (0) by

preliminary

get

empirical counterpart ho, n

ho

following property

resulting adaptive

(Woodroofe [11])

(Recall

ho

hho depend

n. )

problem

finding

(1.1)

density f

yield

proposed

by considering

(Parzen [7]), geometric extrapolation (Tarell

[10])

choosing appropriately

J ^AN M ^IELNICZUK

On asymptotic minimaxity ^{of the} adaptative ^kernel

density ^function

^smoothing

hho ^depend

^{n. )}

^get

^{n - 4~5}

^{[ - A, A]} for some OAoo;