HAL Id: hal-00634423
https://hal.archives-ouvertes.fr/hal-00634423
Submitted on 21 Oct 2011
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Adaptive Dantzig density estimation
Karine Bertin, Erwan Le Pennec, Vincent Rivoirard
To cite this version:
Karine Bertin, Erwan Le Pennec, Vincent Rivoirard. Adaptive Dantzig density estimation. Annales
de l’Institut Henri Poincaré (B) Probabilités et Statistiques, Institut Henri Poincaré (IHP), 2011, 47
(1), pp.43-74. �10.1214/09-AIHP351�. �hal-00634423�
K. Bertin
∗
, E.Le Penne
†
, V.Rivoirard
‡
Abstrat
Thispaperdealswiththeproblemofdensityestimation. Weaimatbuildinganestimate
of an unknown density as a linear ombination of funtions of a ditionary. Inspired by
Candès and Tao's approah, we propose an ℓ1-minimization under an adaptive Dantzig onstraint oming from sharp onentration inequalities. This allows to onsider a wide
lass of ditionaries. Under loal or globaloherene assumptions,orale inequalities are
derived. ThesetheoretialresultsarealsoprovedtobevalidforthenaturalLassoestimate
assoiated withour Dantzig proedure. Then, theissue of alibrating theseproeduresis
studiedfromboththeoretialandpratialpointsofview. Finally,anumerialstudyshows
thesigniantimprovementobtainedbyourproedureswhenomparedwithotherlassial
proedures.
Keywords: Calibration, Conentrationinequalities,Dantzig estimate,Densityestimation,
Ditionary,Lassoestimate,Oraleinequalities,Sparsity.
AMS subjet lassiation: 62G07,62G05,62G20
1 Introdution
Variousestimationproeduresbasedonl1penalization(exempliedbytheDantzigproedurein [13℄andtheLASSOproedurein[28℄)haveextensivelybeenstudiedreently. Theseproedures
areomputationallyeientasshownin [17,24,25℄,andthusareadaptedto high-dimensional
data. Theyhavebeenwidelyusedin regressionmodels, but onlytheLassoestimatorhasbeen
studied in the density model (see [7, 10, 29℄). Although we will mostly onsider the Dantzig
estimatorin thedensity model forwhih no resultexists sofar,wereall someofthe lassial
resultsobtainedindierentsettingsbyproeduresbasedonl1 penalization.
The Dantzig seletor hasbeen introdued by Candèsand Tao [13℄ in the linear regression
model. Morepreisely,given
Y =Aλ0+ε,
whereY ∈Rn,Ais an by M matrix,ε∈Rn is thenoisevetorandλ0∈RM istheunknown regressionparameterto estimate,theDantzigestimatorisdenedby
λˆD = arg min
λ∈RM||λ||ℓ1 subjetto||AT(Aλ−Y)||ℓ∞ ≤η,
∗
SupportedbyProjetPBCT13laboratorioANESTOCandProjetFONDECYT1090285. Departamento
deEstadístia,CIMFAV,UniversidaddeValparaíso,AvenidaGranBretaña1091,Valparaíso,Chile. Tel0056-
(0)32-2508324. Email:karine.bertinuv.l
†
SupportedbytheANRprojetPARCIMONIE.LaboratoiredeProbabilitéetModèlesAléatoires,Université
ParisDiderot,175ruedeChevaleret,F-75013Paris,Frane.Email:lepennemath.jussieu.fr
‡
SupportedbytheANRprojetPARCIMONIE.LaboratoiredeMathématique,U.M.R.C.N.R.S.8628,Univer-
sitéParisSud,91405OrsayCedex,FraneandDépartementdeMathématiquesetAppliations,U.M.R.C.N.R.S.
8553,ENS-Paris,45Rued'Ulm,75230ParisCedex05,Frane,Email:vinent.rivoirardmath.u-psud.fr
where || · ||ℓ∞ is the sup-norm in RM, || · ||ℓ1 is the ℓ1 norm in RM, and η is a regularization parameter. Anaturalompanionofthis estimatoristhe Lassoproedure ormorepreisely its
relaxedform
λˆL= arg min
λ∈RM
1
2||Aλ−Y||2ℓ2 +η||λ||ℓ1
,
whereηplaysexatlytheexatsameroleasfortheDantzigestimator. Thisℓ1penalizedmethod
isalsoalledbasis pursuit in signalproessing(see[14,15℄).
Candèsand Tao[13℄ haveobtainedabound fortheℓ2 riskof theestimatorλˆD, withlarge
probability,underaglobalonditiononthematrixA(the RestritedIsometryProperty)anda
sparsityassumption on λ0, even forM ≥n. Bikelet al. [3℄ haveobtained oraleinequalities andboundsoftheℓp lossforbothestimatorsunder weakerassumptions. Atually,Bikeletal.
[3℄dealwiththenonparametriregressionframeworkin whih oneobserves
Yi=f(xi) +ei, i= 1, . . . , n
where f is anunknownfuntion while (xi)i=1,...,n are known designpointsand (ei)i=1,...,n isa
noisevetor. There isnointrinsimatrixAin thisproblem butforanyditionaryoffuntions Υ = (ϕm)m=1,...,M oneansearhf asaweightedsumfλ ofelementsofΥ
fλ=
M
X
m=1
λmϕm
andintroduethematrixA= (ϕm(xi))i,m,whihsummarizestheinformationontheditionary and on the design. Notie that if there exists λ0 suh that f = fλ0 then the model an be
rewrittenexatlyasthelassiallinearmodel. However,ifitisnottheaseandifamodelbias
exists,theDantzigandLasso proeduresanbeafterallapplied under similarassumptionson
A. Oraleinequalitiesare obtainedforwhihapproximationtheory playsanimportantrolein [3,8,9,29℄.
Let us also mention that in various settings, under various assumptions on the matrix A
(or more preisely on the assoiated Gram matrix G = ATA), properties of these estimators
have been establishedfor subset seletion (see [11, 20, 22, 23, 30, 31℄) and for predition (see
[3,19,20,23,32℄).
1.1 Our goals and results
WeonsiderinthispaperthedensityestimationframeworkalreadystudiedfortheLassoestimate
by Bunea et al [7, 10℄ and vande Geer [29℄. Namely, ourgoal is to estimate f0, an unknown
density funtion, by using the observations of an n-sample of variables X1, . . . , Xn of density
f0 withrespet to aknown measure dxon R. As in thenon parametriregressionsetting, we introdueaditionaryoffuntionsΥ = (ϕm)m=1,...,M,andsearhagainestimatesoff0aslinear
ombinationsfλ oftheditionaryfuntions. WerelyontheGrammatrixGassoiatedwithΥ,
denedbyGm,m′ =R
ϕm(x)ϕm′(x)dx,andontheempirialsalarprodutsoff0 withϕm
βˆm= 1 n
n
X
i=1
ϕm(Xi).
The Dantzig estimate fˆD is then obtained by minimizing ||λ||ℓ1 over the set of parameters λ
satisfyingtheadaptiveDantzigonstraint:
∀m∈ {1, . . . .M}, |(Gλ)m−βˆm| ≤ηγ,m
whereform∈ {1, . . . , M},(Gλ)m isthesalarprodutoffλ withϕm,
ηγ,m=
r2˜σm2γlogM
n +2||ϕm||∞γlogM
3n ,
˜
σm2 is asharpestimateof thevarianeofβˆm andγ is aonstantto behosen. Setion2gives
preisedenitionsandheuristisforusingthisonstraint. Wejustmentionherethatηγ,momes
from sharp onentration inequalities to give tight onstraints. Our idea is that if f0 an be
deomposedonΥas
f0=
M
X
m=1
λ0,mϕm,
thenwefore thesetoffeasibleparametersλtoontainλ0 withlargeprobabilityand tobeas smallaspossible. Signiantimprovementsin pratieareexpeted.
Our goals in this paper are mainly twofold. First, we aim at establishingsharp orale in-
equalitiesunderverymildassumptionsontheditionary. Ourstartingpointisthatmostofthe
papersin theliteratureassumethat thefuntions ofthe ditionaryarebounded byaonstant
independent of M and n, whih onstitutes a strong limitation, in partiular for ditionaries basedonhistogramsorwavelets(seeforinstane[6℄,[7℄,[8℄, [9℄,[11℄or[29℄). Suhassumptions
onthefuntions ofΥwill notbeonsideredin ourpaper. Likewise,ourmethodologydoesnot relyontheknowledgeof||f0||∞thatanevenbeinnite(asnotiedbyBirgé[4℄forthestudyof
theintegratedL2-risk, mostof thepapersin theliteraturetypiallyassumethatthesup-norm oftheunknowndensityisnitewithaknownorestimatedboundforthisquantity). Finally,let
us mentionthat,in ontrastwithwhat Bunea et al[10℄did, weobtainoraleinequalitieswith
leadingonstant 1, and furthermore these are establishedunder muh weakerassumptions on
theditionarythanin [10℄.
The seond goal of this paper deals with the problem of alibrating the so-alled Dantzig
onstant γ: how should this onstant be hosen to obtain good results in both theory and
pratie? Mostofthetime,forLasso-typeestimators,theregularizationparameterisoftheform
a qlogM
n with a a positiveonstant (see [3℄, [7℄, [6℄, [9℄, [12℄, [20℄ or [23℄ for instane). These
resultsareobtainedwithlargeprobabilitythatdependsonthetuningoeienta. Inpratie,it
isnotsimpletoalibratetheonstanta. Unfortunately,mostofthetime,thetheoretialhoie oftheregularizationparameterisnotsuitableforpratialissues. ThisfatistrueforLasso-type
estimatesbut alsoformanyalgorithmsfor whih theregularizationparameterprovidedbythe
theoryisoftentooonservativeforpratialpurposes(see[18℄wholearlyexplainsandillustrates
thispointfortheirthresholdingproedure). So,oneofthemain goalsofthispaperistollthe
gapbetweentheoptimal parameterhoie provided by theoretialresultsontheonehand and
by asimulation study onthe other hand. Only afew papersare devoted to this problem. In
the model seletion setting, the issueof alibration hasbeen addressedby Birgé and Massart
[5℄ who onsidered ℓ0-penalized estimatorsin a Gaussian homosedasti regressionframework andshowedthatthere existsaminimal penaltyin thesensethat takingsmallerpenaltiesleads
to inonsistentestimation proedures. Arlot andMassart [1℄generalizedthese resultsfor non-
GaussianorheterosedastidataandReynaud-BouretandRivoirard[26℄addressedthisquestion
forthresholdingrulesin thePoissonintensityframework.
Now,letusdesribeourresults. Byusingthepreviousdata-drivenDantzigonstraint,orale
inequalitiesare derived under loal onditionson the ditionary that are valid under lassial
assumptions onthe struture of theditionary. We extensivelydisuss these assumptions and
weshowtheirowninterestintheontextofthepaper. Eah termoftheseoraleinequalitiesis
||ϕm||2∞≤c1
n logM
||f0||∞,
where c1 is a onstant. This assumption is verymild and, unlikein lassialworks,allows to
onsider ditionaries based on wavelets. Then, relying on our Dantzig estimate, we build an
adaptiveLasso proedure whose oraleperformanes are similar. This illustrates theloseness
betweenLassoandDantzig-typeestimates.
Ourresultsare provedforγ >1. Forthetheoretialalibrationissue, westudy theperfor-
maneof our proedure when γ < 1. We show that in a simpleframework,estimation of the
straightforwardsignalf0=1[0,1]annotbeperformedataonvenientrateofonvergenewhen
γ <1. Thisresultprovesthattheassumptionγ >1 isthus nottooonservative.
Finally, a simulation study illustrates how ditionary-based methods outperform lassial
ones. Morepreisely,weshowthatourDantzigandLassoproedureswithγ >1,butloseto1,
outperformlassialones,suhassimplehistogramproedures,waveletthresholdingorDantzig
proeduresbasedontheknowledgeof||f0||∞ andlesstightDantzigonstraints.
1.2 Outlines
Setion 2introduesthe density estimatorof f0 whose theoretial performanes are studiedin Setion3. Setion 4studies theLasso estimateproposed inthis paper. Thealibrationissueis
studiedin Setion5.1andnumerialexperimentsareperformedinSetion 5.2. Finally,Setion
6isdevotedtotheproofsofourresults.
2 The Dantzig estimator of the density
f
0As said in Introdution, ourgoal is to build anestimate of the densityf0 with respet to the
measuredxasalinearombinationoffuntionsofΥ = (ϕm)m=1,...,M,whereweassumewithout
anylossofgeneralitythat,foranym,kϕmk2= 1: fλ=
M
X
m=1
λmϕm.
For this purpose, wenaturally relyon naturalestimates of theL2-salarproduts between f0
andtheϕm's. So,form∈ {1, . . . , M},weset β0,m=
Z
ϕm(x)f0(x)dx, (1)
andweonsideritsempirialounterpart
βˆm= 1 n
n
X
i=1
ϕm(Xi) (2)
thatisanunbiasedestimateofβ0,m. ThevarianeofthisestimateisVar( ˆβm) = σ
2 0,m
n where
σ20,m= Z
ϕ2m(x)f0(x)dx−β20,m. (3)
Note also that for any λ and any m, theL2-salar produt betweenfλ and ϕm anbe easily
omputed:
Z
ϕm(x)fλ(x)dx=
M
X
m′=1
λm′
Z
ϕm′(x)ϕm(x)dx= (Gλ)m
whereGistheGrammatrixassoiatedtotheditionaryΥdened forany1≤m, m′ ≤M by Gm,m′ =
Z
ϕm(x)ϕm′(x)dx.
Anyreasonablehoieof λshould ensurethattheoeients(Gλ)m areloseto βˆm forallm.
Therefore,usingCandèsandTao'sapproah,wedene theDantzigonstraint:
∀m∈ {1, . . . .M}, |(Gλ)m−βˆm| ≤ηγ,m (4)
andtheDantzig estimatefˆD byfˆD=fˆλD,γ with
λˆD,γ= argminλ∈RM||λ||ℓ1 suhthatλsatisestheDantzig onstraint(4),
whereforγ >0 andm∈ {1, . . . , M}, ηγ,m=
r2˜σm2γlogM
n +2||ϕm||∞γlogM
3n , (5)
with
˜
σm2 = ˆσm2 + 2||ϕm||∞
r2ˆσm2γlogM
n +8||ϕm||2∞γlogM
n (6)
and
ˆ
σm2 = 1 n(n−1)
n
X
i=2 i−1
X
j=1
(ϕm(Xi)−ϕm(Xj))2. (7)
Notethatηγ,mdependsonthedata,sotheonstraint(4)willbereferredastheadaptiveDantzig
onstraint inthesequel. WenowjustifytheintrodutionofthedensityestimatefˆD.
Thedenitionofηλ,γ isbasedonthefollowingheuristis. Givenm,whenthereexistsaon-
stantc0>0suhthatf0(x)≥c0forxinthesupportofϕmsatisfyingkϕmk2∞=on(n(logM)−1),
then, with largeprobability, thedeterministi termof (5),
2||ϕm||∞γlogM
3n , is negligiblewithre-
spetto the randomone,
q2˜σ2mγlogM
n . Inthis ase, therandom termis themain oneand we
asymptotiallyderive
ηγ,m≈ r
2γlogM σ˜m2
n . (8)
Having in mindthat σ˜m2/nisa onvenientestimatefor Var( ˆβm)(see theproofof Theorem 1),
theshapeoftherighthandtermoftheformula(8)looksliketheboundproposedbyCandèsand
Tao [13℄ to denethe Dantzigonstraintin thelinear model. Atually,the deterministiterm
of(5) allowstogetsharponentrationinequalities. Asoftendonein theliterature,insteadof
estimatingVar( ˆβm),weouldusetheinequality Var( ˆβm) =σ20,m
n ≤||f0||∞
n
and weould replae σ˜m2 with ||f0||∞ in the denition of the ηγ,m. But this requires astrong
assumption: f0isboundedand||f0||∞isknown. Inourpaper,Var( ˆβm)isestimated,whihallows
notto impose these onditions. More preisely, we slightly overestimate σ20,m to ontrol large
deviation termsand this is the reasonwhyweintrodue ˜σm2 instead of usingσˆ2m, an unbiased
estimateofσ20,m. Finally,γ isaonstantthathasto besuitablyalibrated andplaysaapital
rolein pratie.
Thefollowingresultjustiespreviousheuristisbyshowingthat,ifγ >1,withhigh proba-
bility,thequantity|βˆm−β0,m| issmallerthanηγ,m forallm. Theparameterηγ,m withγlose
to1 anbeviewedasthesmallest quantitythatensuresthisproperty.
Theorem1. Letus assumethat M satises
n≤M ≤exp(nδ) (9)
for δ <1. Letγ >1. Then, for any ε >0,there existsaonstant C1(ε, δ, γ)depending onε, δ
andγ suhthat P
∃m∈ {1, . . . , M}, |β0,m−βˆm| ≥ηγ,m
≤C1(ε, δ, γ)M1−1+εγ .
Inaddition, there existsaonstantC2(δ, γ)dependingon δ andγ suhthat P
∀m∈ {1, . . . , M}, η(γ,m−) ≤ηγ,m≤ηγ,m(+)
≥1−C2(δ, γ)M1−γ
where, for m∈ {1, . . . , M},
ηγ,m(−) =σ0,m
r8γlogM
7n +2||ϕm||∞γlogM 3n
and
η(+)γ,m=σ0,m
r16γlogM
n +10||ϕm||∞γlogM
n .
ThisresultisprovedinSetion6.1. Therstpartisasharponentrationinequalityproved
byusing Bernsteintypeontrols. Theseond partofthetheorem provesthat, upto onstants
dependingonγ,ηγ,misoforderσ0,m
qlogM
n +||ϕm||∞logM
n withhighprobability. Notethatthe assumptionγ >1 isessentialtoobtainprobabilitiesgoingto0.
Finally,letλ0= (λ0,m)m=1,...,M ∈RM suhthat
PΥf0=
M
X
m=1
λ0,mϕm
wherePΥ istheprojetiononthespaespannedbyΥ. Wehave (Gλ0)m=
Z
(PΥf0)ϕm= Z
f0ϕm=β0,m.
So,Theorem1provesthatλ0satisestheadaptiveDantzigonstraint(4)withprobabilitylarger than1−C1(ε, δ, γ)M1−1+εγ foranyε >0. Atually,weforethesetofparametersλsatisfyingthe
adaptiveDantzig onstrainttoontainλ0with largeprobabilityandtobeassmallaspossible.
Therefore, fˆD = fλˆD,γ is a good andidate among sparse estimates linearlydeomposed onΥ
forestimatingf0.
We mention that Assumption (9) an be relaxed and we an take M < n provided the
denitionofηγ,m ismodied.
In thesequel, wewill denote ˆλD = ˆλD,γ to simplify thenotations, but the Dantzig estimator fˆD still depends on γ. Moreover, we assume that (9) is true and we denote the vetor ηγ = (ηγ,m)m=1,...,M onsideredwiththeDantzigonstantγ >1.
3.1 The main result under loal assumptions
Letusstatethemainresultofthispaper. ForanyJ ⊂ {1, . . . , M},wesetJC={1, . . . , M}rJ
anddeneλJ thevetorwhihhasthesameoordinatesasλonJ andzerooordinatesonJC.
WeintroduealoalassumptionindexedbyasubsetJ0.
• Loal Assumption Given J0 ⊂ {1, . . . , M}, for some onstants κJ0 > 0 and µJ0 ≥ 0
dependingonJ0,wehaveforanyλ,
||fλ||2≥κJ0||λJ0||ℓ2 − µJ0
p|J0|
||λJ0C||ℓ1− ||λJ0||ℓ1
+. (LA(J0, κJ0, µJ0))
Weobtainthefollowingoraletypeinequalitywithoutanyassumptiononf0.
Theorem2. Withprobability atleast1−C1(ε, δ, γ)M1−1+εγ ,forallJ0⊂ {1, . . . , M} suhthat
thereexistκJ0 >0andµJ0 ≥0 forwhih (LA(J0, κJ0, µJ0))holds, wehave, for anyα >0,
||fˆD−f0||22≤ inf
λ∈RM
(
||fλ−f0||22+α
1 +2µJ0
κJ0
2
Λ(λ, J0c)2
|J0| + 16|J0| 1
α+ 1 κ2J0
||ηγ||2ℓ∞
) ,
(10)
with
Λ(λ, J0c) =||λJC
0 ||ℓ1+
||ˆλD||ℓ1− ||λ||ℓ1
+
2 .
Letusommenteahtermoftherighthandsideof(10). Thersttermisanapproximation
term whih measures the loseness between f0 and fλ. This term an vanish if f0 an be
deomposed on the ditionary. Theseond term, a biasterm, is a prieto paywhen either λ
isnotsupportedbythesubsetJ0 onsideredoritdoesnotsatisfytheondition||ˆλD||ℓ1 ≤ ||λ||ℓ1
whih holdsassoonasλsatisestheadaptiveDantzigonstraint. Finally,thelastterm,whih
doesnotdepend onλ,anbeviewedasavarianetermorrespondingtotheestimationonthe subsetJ0. The parameterαalibratestheweightsgivenfor thebias andvarianetermsinthe
oraleinequality. Conerning the last term, remember that ηγ,m relies on an estimate of the
varianeof βˆm. Furthermore,wehavewithhighprobability:
||ηγ||2ℓ∞ ≤ 2 sup
m
16σ0,m2 γlogM
n +
10||ϕm||∞γlogM n
2! .
So,iff0 isboundedthen, σ0,m2 ≤ ||f0||∞andifthereexists aonstantc1suhthat foranym,
||ϕm||2∞≤c1
n logM
||f0||∞, (11)
(whih istrueforinstaneforaboundedditionary),then
||ηγ||2ℓ∞ ≤C||f0||∞logM n ,