A NNALES DE L ’I. H. P., SECTION B
K ARINE T RIBOULEY G ABRIELLE V IENNET
L
padaptive density estimation in a β mixing framework
Annales de l’I. H. P., section B, tome 34, n
o2 (1998), p. 179-208
<http://www.numdam.org/item?id=AIHPB_1998__34_2_179_0>
© Gauthier-Villars, 1998, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
179
Lp adaptive density estimation
in
a03B2 mixing framework
Karine TRIBOULEY and Gabrielle VIENNET Laboratoire de modelisation stochastique et statistique, Bat 425,
Universite Paris Sud, 91405 Orsay Cedex, France.
Ann. Inst. Henri Poincaré,
Vol. 34, n° 2, 1998, p. -208. Probabilités et Statistiques
ABSTRACT. - We
study
theLx-integrated
risk with x > 2 of anadaptive density
estimatorby
wavelets method forabsolutely regular
observations.By
aduality argument,
thestudy
of the risk is linked to the control of the supremum of theempirical
process over a suitable class of functions.The main
argument
is ageneralization
toabsolutely regular
variables of aresult of
Talagrand
stated for i.i.d. variables.Assuming
that the sequence of the,8-mixing
coefficients isarithmetically decreasing,
we prove that our estimator isadaptive
in a class of Besov spaces with unknown smoothness. (c)Elsevier,
ParisKey words and phrases: Adaptive estimation, absolutely regular variables, Besov spaces,
density estimation, strictly stationary sequences, wavelet orthonormal basis.
RESUME. - Dans un cadre des variables absolument
regulieres,
on etudiele
risque
7r> 2,
d’un estimateur par methode d’ ondelettesadaptatif.
A l’aide d’unargument
dedualite,
F etude durisque
est lieeau controle du supremum du processus
empirique
sur une classeadequate
de fonctions.
L’ argument principal
est unegeneralisation
a des variables absolumentregulieres
d’un resultat deTalagrand
enonce pour des variables i.i.d. Ensupposant
que la suite des coefficients de,8-mélange
est
arithmetiquement decroissante,
on demontre que notre estimateur estadaptatif
dans une classed’ espace
de Besov deregularite
inconnue.@
Elsevier,
ParisAnnales de l’Institut Henri Poincaré - Probabilités et Statistiques - 0246-0203 Vol. 34/98/02/@ Elsevier, Paris
180 K. TRIBOULEY AND G. VIENNET
1. INTRODUCTION
Let
(Xi,..., Xn)
be n observations drawn from somestrictly stationary
process The aim of this paper is to
study adaptive
estimationof the
stationary density f
of the process under someadequate mixing assumptions. Assuming
someprior knowledge
onf (such
as itsdegree
of smoothness for
instance)
it ispossible
to prove theoptimality
of many estimators.But,
from apractical point
ofview,
this is notsatisfactory:
onewould rather
prefer
toget optimal
estimation without any extraknowledge
on the
density. Recently
differentprocedures
have beenproposed
in thei.i.d. case which all tend to reduce the
prior
information needed to estimate the unknowndensity. Roughly,
we say that suchprocedures
areadaptive,
the
precise
sensebeing
defined in eachspecific
situation. Let uspresent
the definition ofadaptivity
that we use in thesequel.
We first recall the minimax risk associated to the loss function > 1 for a set of functions~’~ :
where the infimum is taken over all estimators
f .
We say that an estimatorf *
isadaptive
in a class of functions{.F0152;
a EA~
if andonly
if thereexists a
positive
constantC( a)
such that:We aim at
giving
an account of the different constructions ofadaptive
estimators and of their
performances
in theindependent
framework. One of the mostpopular
method is the cross-validation method. It consists inminimizing
anempirical
criterion which tends to estimate the unknownquadratic
loss. The firstexample
ofadaptation
to unknown smoothness in the sense of Definition(2)
appears in a crucial paper of Efromovich and Pinsker[ 11 ] . They
deal with the white noise model in the context of the Fourier basis and theirprocedure
is based onthresholding.
Efromovich[ 10]
has
adapted
their method todensity
estimation.They
obtainadaptation
overa class
5i~
of Sobolevellipsoids relatively
toL2
loss. The introduction of wavelet basesprovides
more accurateapproximation
than the Fourier basis for functions withspacially inhomogeneous
smoothness.They
allowadaptation
over morecomplicated
functions classes. Waveletthresholding
methods have been
extensively developed
these last years and we refer to Donoho et al.[7]
for numerous references. Let us describe the methodAnnales de l ’Institut Henri Poincaré - Probabilités et Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A (3 MIXING FRAMEWORK 181
of local
thresholding.
Letf
be the unknowndensity
to be estimated. Oneassumes that
f belongs
to a ball of the Besov spaceBs,p,q
and iscompactly supported.
Moreprecisely,
one assumes thatdenotes the Besov norm. The
density
isexpanded
onsome wavelet basis into a sum of a low
frequency
term and a detailterm, and one considers the
projection
estimator. The localthresholding
method consists in
keeping only
theempirical
detailsgreater
than a fixed level and thecomputation
of the estimatorrequires
theknowledge
ofthe radius
Mi.
Let N be the number ofvanishing
moments of thewavelet. Donoho et al. show that the local
thresholding
estimator isadaptive (up
to a power oflog(n)
for the smallregularities)
over the class{Fs,p,q(M1,M2,B), 1/p
sN + 1,p
>l, q
>1,B
>0} relatively
toany
L,-loss function, x >
1. Aglobal thresholding procedure
is studiedby Kerkyacharian et
al.[14]: they compute
a statistic close to thelx-
norm of the
empirical
details at a fixed resolutionlevel; they keep
all theempirical
details of the resolutionlevel,
if thisquantity
isgreater
than a fixed amount. Theglobal procedure
ofKerkyacharian et
al. isadaptive
over the class of functions s N +
l, p
> x, q >1,
B >0, Mi
>0, M2
>0} relatively
to the loss function x > 2.This latter
procedure
has beeninspired by
the work of Efromovich[10].
For the
particular
case x =2,
it is similar to the cross validationprocedure
with the
advantage
toprovide
anexplicit
estimator.The
problem
ofadaptive
estimation inweakly dependent
framework isquite
new. In the case of the estimation of theregression
function(in
linear AR
models),
some results have been obtainedby
Dahlaus et al.[5],
Hoffmann
[12].
In this paper, we propose to extend the
thresholding
methods tothe
dependent
framework. Infact,
without any apriori independence assumption
on thedata,
it isinteresting
to get some robustness results withrespect
todependence.
We focus on the situation where the data areabsolutely regular.
It covers alarge
class ofexamples
and allows us to usecoupling
technics in theproofs. Examples
of such processes may be found in Doukhan[8].
Even if the localthresholding
method caneasily
be
generalized
in the weaker case of~2014mixing,
it is notadapted
to theabsolutely regular
context.Nevertheless,
theglobal thresholding
method iswell fitted for this
dependent
framework.Introducing
a small modification of thethreshold,
we show that theglobal thresholding
method preserves itsVol. 34, n° 2-1998.
182 K. TRIBOULEY AND G. VIENNET
adaptive properties
in the minimax > 2. Our results may be extend to other bases such assplines;
we focus on the wavelet estimator whichprovides
clearestproofs.
Let us
present
our results. Let(Xi )iEZ
be astrictly stationary absolutely regular
process, also called3-mixing
with a sequence of3-mixing
coefficients We assume that the rate of
decay
of the sequencez >o is arithmetic. More
precisely,
we assume that there exists () > x - 2 and apositive
constant such thatand we denote
by B7r
the bound for the seriesSuch an
assumption
of arithmeticdecay
of the coefficients is often made in papers ondensity
estimation in adependent
framework. Let us remark that for x =2,
the conditionB2
oo is known to be a minimal condition for results like the central limit theorem(see
Doukhan et al.[9]).
In this paper we consider a
compactly supported
wavelet basis withscaling
function cp and waveletfunction ~
and we denoteby
N the numberof
vanishing
moments of the wavelet. We use an estimatorf
similar to theglobal
threshold estimator introducedby Kerkyacharian et
al.114]
whichcomputation depends
and onB~
if x >2,
butonly
onB~
if x = 2.We show that if 2 x, as soon as
our
global
threshold estimator isadaptive
in the class:For 1r =
2,
as soon as e >2,
the estimator isadaptive
in the class:The idea of the
proof
ofadaptation
iscompletely
different as the oneused in the
independent
case. It is based on aninteresting
result stated in Theorem 2.2 which relies on animportant
Theorem ofTalagrand [22].
In the same
spirit
as inBirge
and Massart[4],
itprovides
a control ofAnnales de Henri Poincaré - Probabilités
Lp ADAPTATIVE DENSITY ESTIMATION IN A Q MIXING FRAMEWORK 183
the supremum of the
empirical
process over a suitable class of functions.This method has the
advantage
ofproviding simpler proofs
and estimates than those ofKerkyacharian et
al.[14]
at theprice
ofintroducing
theunknown constants
Bx
if x > 2.Nevertheless,
we propose apractical procedure using
an over-estimate of the unknownquantity 1111100.
We prove that the error due to this
practical procedure is,
inprobability,
of the same order than the error due to the
adaptive procedure.
Under thesame
mixing
conditions asbefore,
this estimator is shown to beadaptive
in
probability
in the classwith
that is
Let us remark that the a
priori knowledge
about the process needed to make the estimation isquite
reasonable.Indeed,
as noticedbefore,
theassumption
of arithmeticdecay
of themixing
coefficients is usual andsuggest
thatBx
is not toolarge.
It is alsoimportant
to notice that in contrast toprevious authors,
noassumption
isrequired
on thejoint
lawof
(Xo, Xl).
In fact asmentioned,
our result could be understood in thefollowing
way: when we think that the process isnearly independent
butwhen
independency
isdebatable,
a safestrategy
to avoid toolarge
errorsis to increase the threshold.
The paper is
organized
as follows. In the section2,
webriefly
recall someresults about Besov spaces, wavelets and
absolutely regular
processes. Westate our main tool Theorem 2.2. We introduce in section 3 the two estimators of interest and
study
theiradaptive properties.
All theproofs
are
given
in section 4.Vol. 34, n° 2-1998.
184 K. TRIBOULEY AND G. VIENNET
2. WAVELETS AND ABSOLUTELY REGULAR PROCESSES We first review the very basic features of the multiresolution
analysis
ofMeyer [17]
andgive
useful elements of waveletsanalysis.
We recall then the definition ofabsolutely regular mixing
coefficients and a result usedon several occasions
along
this work. This resultprovides
asharp
controlof the variance of the
empirical
process and a Rosenthaltype
momentinequality
for boundedabsolutely regular
variables. It isproved
in Viennet[23]. Finally,
we state in Theorem 2.2 a control on theprobability
tails forthe supremum of the
empirical
process on a suitable class of functions. This result is based on a recent result ofTalagrand [21] avaible
forindependent
and
identically
distributed variables and is shown in the last section of this paper.2.1. Wavelets and Besov spaces
One can construct a real function p
(the scaling function)
such that1. the
sequence {03C60k
=cp ( . - k)
EZ}
is an orthonormalfamily
of)L-2(tR).
Let us callVo
thesubspace spanned by
this sequence.2. if
Vj
denotes thesubspace spanned by
thesequence {03C6jk
=2~(~(2~. 2014
E~},
then is anincreasing
sequence of nested spaces such thatIt is
possible
torequire
in addition that cp is of class C’~° andcompactly supported (Daubechies
wavelets[6]).
We define the spaceWj by
thefollowing:
=Wj. Then,
there also exists afunction ~ (the wavelet)
such that1.
~
is of class C~’°compactly supported,
For
jo
EZ,
thefollowing decomposition
is also truewhere
Annales de l’Institut
Lp ADAPTATIVE DENSITY ESTIMATION IN A # MIXING FRAMEWORK 185
According
to a result ofMeyer [17],
we link theLr-norm
of the details at thelevel j
or theLr-norm
of the lowfrequency part
to thelr-norm
ofthe wavelets coefficients.
the
lr-norm.
Besov spaces are characterized in terms of wavelet coefficients
(see Meyer [17]);
we do not use this characterization butjust
thefollowing
property (8).
Let N be apositive integer.
We consider in thesequel
awavelet basis such that the
scaling
function cp and the waveletfunction ~ satisfy
thefollowing properties
(i)
for any m =0,..., N
(iii)
for any m =0, ... , N
As a
typical example,
the Daubechies wavelets DB2Nsatisfy P(N)
(Daubechies [6]). Then,
for anyf
E 0 sN + 1,
1 p oo,1 g oo
We can notice moreover
that,
as soon asf
iscompactly supported, only
afinite number of coefficients
(or (jk),
is nonzero. Infact,
this number is less than2j A B-1
where 2B and 2A are therespective lengths
of thesupports of f and 1jJ.
Vol. 34, n° 2-1998.
186 K. TRIBOULEY AND G. VIENNET
2.2.
Absolutely regular
sequencesLet
([1, A; IP)
be aprobability
space. For any twoa-algebra
U and Vof
A,
theabsolutely regular mixing (or j3 mixing)
coefficient(3(U, V )
isdefined
by
where the supremum is taken over all the finite
partitions (Ui)iEI
andrespectively U
and V measurable(Kolmogorov
and Rozanov[15]).
Let
(Xi)iEZ
be astrictly stationary
process of R-valued random elements of a Polish space ~. If we denote:Fo
=a ( X j , j 0)
andsii
=a( Xj, j > l ),
the;3-mixing coefficient 03B2l
is definedby 03B2l
=for any
integer l.
The process is calledabsolutely regular (or /3 mixing)
if the sequence of its;3-mixing
coefficients tend to zerowhen l tends to
infinity.
Let us introduce some notation. Let P be the distribution of
Xo
on X.For any measurable function h which is
P-integrable, Ix
hdP is denotedby Ep(h) .
We denoteby vn
the centeredempirical
process vn P whereIP n = n ¿~ 1 8_Yi
is theempirical
measure.Finally,
for r >2,
let,C~r, ,~, P)
be the set of functions b :R+
such thatLet us recall that
Br
is the bound of the series(l
+The
following
lemma whichgives
an evaluation of the of the function b is very useful in thesequel.
For more details about Lemma(2.2)
and Theorem(2.1)
we refer to Viennet[23].
LEMMA 2.2. - Let 1 r oo. As soon as ~,
for
anyfunction
b in
/;(2,/3,P),
Finally,
Theorem 2.1provides
asharp
control of the variance of a sum ofabsolutely regular
variables and states a Rosenthal typeinequality
forabsolutely regular
variables.THEOREM 2.1. - Let be a
strictly stationary absolutely regular
process with sequence
of Q-mixing coefficients
Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A f3 MIXING FRAMEWORK 187
2022
If
the03B2-mixing coefficients satisfy
thesummability
conditionB2
00,there exists a
function
b in£(2, Q, P),
such thatfor
anypositive integer
nand any measurable
function
h E~~ (P),
. Let r > 2.
If
the03B2-mixing coefficients satisfy
thesummability
conditionBr
oo, there exist twofunctions
band b’ in,~ ( 2, ~, P )
and,C ( r, ~, P ) respectively,
suchthat for
any measurable boundedfunction
hwhere
Cl (r)
andC2 (r)
arepositive
constantsdepending only
on r.The
following
theorem is our first main result. Weexplain
in someremarks how to use it for our statistical purpose.
THEOREM 2.2. - Let K be a set
of
indicesof cardinality
D.be a real basis such that:
for
any m’ > 1, there exists a constantCm~
> 0depending
on m’ such thatfor
any ERD:
For any r >
2,
we denoteby
the classof functions defined by
Let 2
m oo and(Xi)i>Ü
be astrictly stationary absolutely regular
process such that its sequence
of 03B2-mixing coefficients satisfies
We assume that the
stationary
distributionPf of (Xi)2>o
admits adensity f
with respect to the
Lebesgue
measure and thatf
isuniformly
bounded.Then,
there exists a
positive
constantKl depending
on r,Br
and and thereexists a
function
b infL?.,.t (P f ) C mBm+1
oo suchthat,
assoon as D n,
for
anyinteger
q, q n,for any 03BB1
> 0 andfor
any03BB2
> 0Vol. 34. n° 2-1998.
188 K. TRIBOULEY AND G. VIENNET
where
with
Rr is
the Rosenthal constant, isdefined
in( 11 )
and~’1 (r)
andO2 (r)
are the same as in Theorem 2.1.According
to Ledoux andTalagrand [16],
the Rosenthal constant is smaller than 4’~ . Thefollowing
remarks will be detailed in theproof
ofTheorem 2.2.
Remark 1. - We will
apply
this theorem in our wavelet framework for thefollowing
class of functionsIn that case, condition
( 11 )
is satisfied for any m’ > 1(see
Lemma2.1 )
with
Cm,
=111m’.
Remark 2. - Let us comment the theorem in the classical case r = 2.
The constant is then more
tractable, namely K2
=When
(Xi)iEZ
is auniformly mixing
process(also
called~-mixing)
suchthat its sequence of
mixing
coefficients( ~ 1 ) l > o
satisfies thesummability
condition oo, the function b may be taken as a constant,
namely
b =~l ~2, Then,
the conclusion of Theorem 2.2 isagain
validfor m = 00 and with
1>~/2.
Remark 3. -
When
is anindependent
process one can take~ -
3. ESTIMATION
In the first
part
of thissection,
wepresent
our estimationprocedure.
Wegive
in Theorem 3.1 the main statistical result of the paperconcerning
theadaptation (in
the sense defined in theintroduction)
of our estimator. Weexplain
in the secondpart
how tocompute
our estimator inpractice.
3.1. Estimation and result
Let 7r
>
2. We consider in thesequel
astrictly stationary absolutely regular
process withQ-mixing
coefficientssatisfying: Blf
o0Annales de l’Institut Henri Poincaré - Probabilités Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A # MIXING FRAMEWORK 189
where
B7r
is defined in(3).
Theseassumptions
arequite
natural in view ofapplying
Theorem 2.1 and Lemma 2.2. We assume moreover that themixing
coefficients arearithmetically decreasing,
moreprecisely:
Let
P f
be themarginal
distribution of the process,absolutely
continuouswith
respect
to theLebesgue
measure. We denoteby f
its unknowndensity
to estimate. Because of theexpansions (6)
and(7),
we considerthe
weighted
estimator:where
and (
are theempirical
coefficients:and
where j
is athresholding
statistic:for a
threshold
to be determined below.This estimator is
nearly
the same as the one introducedby Kerkyacharian
et al.
[14]
in anindependent
andidentically
distributed framework. Undersome Besov
regular assumption
onf,
the first sum in(15)
is an estimationof the
low-frequency part of f
and we choose thelevel jo (n) =
0 in orderto make the term of linear
variance E ~~(c~o~ " negligible
in the
global
error. In the samespirit,
the level canalso be chosen such that the bias
term E~03A3j~j1 03A3k03B6jk03C8jk~03C0
will nevercontribute in the
global
error.We use the same idea as in the i.i.d. case to determine the threshold statistic: we
keep
all the details of thelevel j if,
at thislevel,
the17r-norm
CE k 17r) 1/ 7r
of the coefficients isgreater
than the threshold We havenow to estimate the
quantity
i.i.d.case, the study of
the
properties
of thedensity
estimator is based on thecomputation
of themoments of the estimator of
~ ~ ~ ~a ~ ~ ~ :
it is then necessary to estimate~~
with its associated U-statistic because of thecrossing
terms. Thismethod allows to
choose sa
=C2j n03C0/2
with C= 1. Since it is unreasonableVol. 34, n° 2-1998.
190 K. TRIBOULEY AND G. VIENNET
to
compute
the moments of the U-statistic in amixing setting,
we use another
approach.
Theadvantage
is that we can use the natural estimator~~
theprice
to pay for thissimplicity
ofimplementation
is that the constant C is nowdepending
on thequantities 11.11100
andEn.
The main idea is contained in the
following
remark: thestudy
oflinked to the
study
of the supremum of theempirical
process on a suitable function class.
Indeed,
if~~.
is the class of functions definedby (14) (with
r =7r),
wehave
== It followsby duality arguments
that -where = 1. This linearization is crucial in our
mixing framework;
indeed it allows to take
advantage
of Theorem 2.2. Let us now state ourstatistical result.
THEOREM 3.1. - Let 7r > 2. We assume
that f belongs
to the classwhere s N + 1 and q > 1. Let
with:
where
with
R7r
is the Rosenthal constant,~’~
isdefined
in(11)
and andare the same as in Theorem 2.1.
Then
for
sE]
N + 1[,
q E[1, +00],
p > 1r andfor mixing coefficients arithmetically decreasing
such thatAnnales de l’Institut Henri Poincaré - Probabilités
Lp ADAPTATIVE DENSITY ESTIMATION IN A {3 MIXING FRAMEWORK 191 there exists a
positive
constant C which is anincreasing function of Ml
and
M2
such that:Let us recall
(see
Donoho et al.[7]
for thedensity,
Nemirovskii[ 18]
forthe
regression),
that theoptimal
theoretical rate of the minimax riskRn (a)
defined in
(1)
for the set of functionsB)
is-1+2s
for the loss function 2. Weimmediately
deduce thefollowing corollary:
COROLLARY 3.1. - Under the same
assumptions
as in Theorem3.2,
j is adaptive
in the classB),
s N +1, p
> 1f, q >1, B
>0,Mi
>o~.
3.2. Practical estimation
When 7f >
2, f depends
on thequantity M2.
We propose hereafter a newestimator /
thecomputation
of which does not needprior knowledge
ofM2.
Letjo
andji
be defined aspreviously.
We assume now thatf belongs
to a Besov space Let S be a
positive
constant(0
S00).
We consider
We introduce the
estimator f
andspecify
itsadaptivity property
in thefollowing
theorem.THEOREM 3.2. - Let 03C0 > 2. We assume that
f belongs
to the classwith:
Vol, 34, nO 2-] 998.
192 K. TRIBOULEY AND G. VIENNET
where
with
R7r
is the Rosenthal constant,C~
isdefined
in(11)
andCl
andare the same as in Theorem 2.1.
Then
for
s V1 2(1+03B8),
N + andfor mixing coefficients arithmetically decreasing
such thatWe
immediately
deduce thefollowing corollary:
COROLLARY 3.2. - Under the same
assumptions
as in Theorem 3.2f
isadaptive
inprobabil ity
in the classRemark. - We recall that the
computation of / only requires
theknowledge
ofB7ï,
the upper boundfor ~ l ~ o { l -~-1 )’~ -2 ~l .
Theprice
to payfor the reduction of the
prior knowledge
is the diminution of theadaptive regularity
bandwidth. In theindependent setting (0
=oo),
ourapproach
shows that without any
prior knowledge f
isadaptive
4. PROOFS
C denotes
throughout
a constant whose value maychange
from line toline and may
depend
on The constantR~
denotes the Rosenthal constant of Lemma 4.3 and and62 (7r)
are the contantsintroduced in Theorem 2.1. We first state some
preliminary
results usedto establish the theorems.
4.1.
Preliminary
resultsWe first need a bound for the
l7ï-norm
of wavelet coefficients at a fixed levelj.
Thefollowing
lemma is a direct consequence of Theorem 2.1.Lp ADAPTATIVE DENSITY ESTIMATION IN A {3 MIXING FRAMEWORK 193
PROPOSITION 4. l. - Let 1[" > 2. Under the
summability
conditionB.~
oo, there exists constantsC~
and~’2 depending
onB.~ and ~ ~ f ~ ( ~
such thatfor any j
with 2J n:where â
and 03B6
are theempirical
waveletcoefficients.
In a second
time,
we derive from Theorem 2.2 aproposition
which is its directapplication
to the wavelet framework.PROPOSITION 4.2. - Let 03C0 > 2.
Assuming
that the sequenceof 03B2 mixing coefficients satisfies B.~
00, there exists apositive
constant C suchthat,
under the condition () > 7r, we havefor any j
where
As a
particular
case,if j
is such that2~~
= 12j 2~S
= n1+2S :
Finally,
we userepeatedly
in theproofs
thefollowing
lemma which issimply
due to a combination of thetriangular inequality
and of Lemma 2.1.LEMMA 4.1. - Let 7r > 2 and r~ be an
appropriately
chosen constant.4.2. Proof of Theorem 2.2
The scheme of this
proof
isquite
classical. We takeadvantage
of the absolute
regularity using
acorollary
of Berbee’scoupling
lemma
(Lemma 4.2).
Thanks to thislemma,
weapproximate
ouroriginal
process
by
a sequence ofindependent
variables. Wedecompose
the centeredempirical
process into an error term and a centeredempirical
processVol. 34, n° 2-1998.
194 K. TRIBOULEY AND G. VIENNET
associated to
independent
variables. The control of the first term isquite direct,
whereas the control of the second onerequires
a closer attention.We use in the
following
four essentialresults,
the sub-citedcorollary
of Berbee’s
lemma,
thefollowing
version of aTalagrand
result(Talagrand [22],
the Rosenthalinequality (Rosenthal [ 19])
and the momentinequality
for
absolutely regular
variables(2.1 ).
As we shall see, for r = 2 theproof
is
simpler,
we use varianceinequalities
instead of Rosenthalinequalities.
LEMMA 4.2. - Let
(X 2 ) i > o
be a sequenceof
random variablestaking
their values in a Polish space X.
Then,
there exists a sequenceof independent
random variables such thatfor
anypositive integer i, XZ
hasthe same distribution as
Xi
andThe result of
Talagrand
is not stated in this form but one canactually
write it as follows.
THEOREM 4. I . - Let
X 1,
...,Xn,
be nindependent identically
distributed randomvariables,
and F afamily of functions
that areuniformly
boundedby
some constantM1.
Let H and v bedefined by
Then,
there exists universal constantK~
suchthat for
any.~1
>0,
Moreover, according
toCorollary
3.4 inTalagrand [21],
thefollowing
bound holds
where v and H are defined
by
with 61,..., En n
independent
Rademacher variables. We derive then thefollowing inequality:
there exist apositive
constantK1
such that for anyAi
>0,
Annales de l’Institut Henri Poincaré - Probabilités
Lp ADAPTATIVE DENSITY ESTIMATION IN A /3 MIXING FRAMEWORK 195
Let us now start the
proof.
For sake ofsimplicity,
we assume in thefollowing
thatX2
= 0 if i > n, and thatEp(h)
= 0. It is also useful to notice that the condition(11)
for m’ = ooimplies
(see Birge
and Massart[3] ). Let q
be a fixedinteger, q
= where~-~
denotes theinteger part. According
to Lemma4.2,
we build a sequence ofindependent
variables(Xi )i>o
such thatand
Y/ = (Xqk+~, ... , X~~~+l~ )
fulfill the conditions: for any k >0, Y~
and
Yt
have the samedistributions,
for any k >0, Yk*) /3~,
the random variables are
independent,
and the areindependent.
The centeredempirical
processVn (h)
is thendecomposed
aswhere
vn * ( h)
is theempirical
process associated with the random variables Sincewe
just
have to control theses twoquantities
to prove Theorem 2.2.First,
sinceaccording
to theassumption (11)
with m = oo, we haveBy
Markovinequality,
we deduceLet us now control
> Ai
+~-~2014~j.
Letp(n)
be thegreatest
integer
such that n =qp(n)
+ x for 0 ~ q. ThenVol. 34, n° 2-1998.
196 K. TRIBOULEY AND G. VIENNET
The control is made in two
steps, considering
odd terms and evenones.
They
are both treated in the same way; weonly
detail the evenpart.
Since the variables areindependent by
construction,
we are allowed toapply
theinequality (19)
to~.~ =
with
adequate
choices ofAi, Mi, H, H
andv. Let
J~
be thefamily
of functions defined in( 12).
We have to determinethe
quantities Mi,~f,7:f
and v. Sincewe
put
Under the
summability
condition on themixing
coefficientsB2m
oo, Theorem 2.1 and Lemma 2.2 ensure that there exists a function bbelonging
to~(2,~,P~)
n suchthat, using
the Holderinequality
with m
+~,
= 1 and aconvexity inequality {2m’ > r’),
we haveThus we set
Annales de I ’Institut Henri Poincaré - Probabilités et Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A ~3 MIXING FRAMEWORK 197
Let us now determine the
quantity
H.Applying
Holder’sinequality
with~ +
-L
= 1 and Jensen’sinequality,
we haveIn order to bound this last term, we use Rosenthal’s
inequality.
LEMMA 4.3. - Let
Y1, ... , Yn
be n realindependent identically
distributedvariables,
such thatfor
any 0 i nThen,
there exists a constantRr
such thatfor
any r > 2We
apply
this lemma to the variables(5fi,k = ¿(2~;21
which are
independent
andsatisfy
The variance is bounded
by inequality (9)
and the control of theLr-
norm is obtained
combining inequality (10)
and Holderinequality
with..20141
Vol. 34, n° 2-1998.
198 K. TRIBOULEY AND G. VIENNET
Thus,
and
since q nand
D n,We deduce then
and we take
with
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A ~3 MIXING FRAMEWORK 199 and
by inequality (9)
andassumption (11)
Finally,
we notice thatand we take H = H.
Now, applying inequality (19)
withMi, v,
Hand77
defined in(22), (23), (24)
and(25),
andreminding
the choice ofp(n),
we obtain for any m >
r/2 -
1where
Ki
is apositive
constantdepending
on r,Br
andC~ . Similarly,
we find for the odd
part
Regrouping (21), (26), (27),
theproof
of Theorem 22 iscomplete.
Proof of
Remark 2. - If is auniformly mixing
process such that its sequence bfmixing
coefficients satisfies thesummability
condition
~l ~2
oo,according
to Viennet[23]
the function b may be taken as a constant,namely
b= ~ l ~ o ~ l 1 ~ 2 . Thus,
we setand
Vol. 34, n° 2-1998.
200 K. TRIBOULEY AND G. VIENNET
The conclusion of Theorem 2.2 is then valid for m = oo and with
Ilb(Xo) 1100 == ~~/2.
4.3. Proof of
Proposition
4.2We
apply
Theorem 2.2 to the withT7
and q
= [n2j(1/03C0-1/2)(~j)-1] where ~
is apositive
constant to choose.Clearly
under thesummability
condition on themixing coefficients,
b(Xo) E Thus,
there exists somepositive
constants C and C‘ such thatSince m >
Tr/2,
if we choose 7ybig enough,
theexponential
term isof smaller order than the
mixing
term and we obtain for somepositive
constant
C2:
In order to
complete
theproof,
we have to bound this termby
aquantity
of order
2js03C0(2j n)03C0/2
. We have to prove that:Under the
assumption B
> 7r, the exponent of2j
ispositive
and wejust
haveto
verify
theinequality for j == j’g.
Since 8 > 7r -2,
we obtain the assertion.4.4. Proof of
Proposition
4.1We
only 0152j.I;],
the other term is boundedexactly
inthe same way. This lemma is a direct
application
of the firstpart
ofAnnales de l’Institut Henri Poincaré - Probabilités Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A (3 MIXING FRAMEWORK 201
Theorem 2.1. There exists two functions b and b’
respectively
in£(2 , /?, Pf)
and
£( 7r, Pf),
and apositive
constant Cdepending
on 03C0 such thatUsing (20)
and Holderinequality,
we deduce4.5. Proof of Theorem 3.1 Let us denote
where ~
+~,
= i andlet js
be such that235
=(~~~’) n 1+~S .
Classically,
wedecompose E~ - f~03C003C0
into three terms: a bias term, a linear stochastic term and a non linear stochastic term:We have to prove that each term of the above bound is bounded
by
C n - 1+2s .
1. Bias term: it is the same term as for the
independent
variables case.By
a directapplication
of the caracterization(8)
of Besov spaces, we have:Vol. 34, n° 2-1998.
202 K. TRIBOULEY AND G. VIENNET
2. Linear stochastic term:
using
Lemma 2.1 andProposition
4.1 forthe we have:
3. Non linear stochastic term: we
decompose
this term into four terms.where
Study
of the termE4=E~ 03A3j1j=js 03A3k(jk-03B6jk)03C8jk{03A3k|jk|03C0203C0s03C0j}~03C003C0:
in the same way as for the bias term, we can derive the upper bound:
Study
of=E!
we bound this term as the linear
stochastic term. Using Lemma 4.1 for
Annales de l’Institut Henri Poincaré - Probabilités et Statistiques
Lp ADAPTATIVE DENSITY ESTIMATION IN A /3 MIXING FRAMEWORK 203
some 77 >
0,
Lemma 4.1 andProposition
4.1 for the(j k,
weget:
Study
of the termE2
=using
Lemma 4.1 for some ~ > 0 and Lemma2.1,
weget:
On the one hand:
Vol. 34, n° 2-1998.
204 K. TRIBOULEY AND G. VIENNET
On the other
hand, according
to thetriangular inequality
and
using
the caracterization(8)
of the Besov spaces andapplying
thenProposition 4.2,
weget:
Study
of the termE3
=using
thecaracterization (8)
of theBesov spaces and the definition
of js,
wehave,
forall j
Let us now
applying
Lemma 2.1:which
imply
Combining (28)
and(29),
we deduce thenAnnales de l’Institut Henri Poincaré - Probabilités
Lp ADAPTATIVE DENSITY ESTIMATION IN A /1 MIXING FRAMEWORK 205
According
to thetriangular inequality
and
using
Lemma 4.1 for some ?? >0,
Lemma 2. t and the Holderinequality
for m -~- m~
=l,
weget:
This
quantity
is boundedby
Cn-1+;8
as soon as 6~ > + 7r - 2.Using
the fact that s N + 1 and
choosing
and m > 1 as small aspossible,
wehave the result under the
assumption 8 >
2. The choicem == 1 +
completes
theproof
of Theorem 3.1. °4.6. Proof of Theorem 3.2
We first state a lemma which is
proved immediately
after. It ensuresthat,
for an
adequate
choice ofji,
the statisticM2
is bounded inprobability.
According
to, Lemma2.1,
we defineM2
an upper boundfor by
Vol. 34, n° 2-1998.