• Aucun résultat trouvé

Adaptive Dantzig density estimation

N/A
N/A
Protected

Academic year: 2021

Partager "Adaptive Dantzig density estimation"

Copied!
35
0
0

Texte intégral

(1)

HAL Id: hal-00634423

https://hal.archives-ouvertes.fr/hal-00634423

Submitted on 21 Oct 2011

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive Dantzig density estimation

Karine Bertin, Erwan Le Pennec, Vincent Rivoirard

To cite this version:

Karine Bertin, Erwan Le Pennec, Vincent Rivoirard. Adaptive Dantzig density estimation. Annales

de l’Institut Henri Poincaré (B) Probabilités et Statistiques, Institut Henri Poincaré (IHP), 2011, 47

(1), pp.43-74. �10.1214/09-AIHP351�. �hal-00634423�

(2)

K. Bertin

, E.Le Penne

, V.Rivoirard

Abstrat

Thispaperdealswiththeproblemofdensityestimation. Weaimatbuildinganestimate

of an unknown density as a linear ombination of funtions of a ditionary. Inspired by

Candès and Tao's approah, we propose an1-minimization under an adaptive Dantzig onstraint oming from sharp onentration inequalities. This allows to onsider a wide

lass of ditionaries. Under loal or globaloherene assumptions,orale inequalities are

derived. ThesetheoretialresultsarealsoprovedtobevalidforthenaturalLassoestimate

assoiated withour Dantzig proedure. Then, theissue of alibrating theseproeduresis

studiedfromboththeoretialandpratialpointsofview. Finally,anumerialstudyshows

thesigniantimprovementobtainedbyourproedureswhenomparedwithotherlassial

proedures.

Keywords: Calibration, Conentrationinequalities,Dantzig estimate,Densityestimation,

Ditionary,Lassoestimate,Oraleinequalities,Sparsity.

AMS subjet lassiation: 62G07,62G05,62G20

1 Introdution

Variousestimationproeduresbasedonl1penalization(exempliedbytheDantzigproedurein [13℄andtheLASSOproedurein[28℄)haveextensivelybeenstudiedreently. Theseproedures

areomputationallyeientasshownin [17,24,25℄,andthusareadaptedto high-dimensional

data. Theyhavebeenwidelyusedin regressionmodels, but onlytheLassoestimatorhasbeen

studied in the density model (see [7, 10, 29℄). Although we will mostly onsider the Dantzig

estimatorin thedensity model forwhih no resultexists sofar,wereall someofthe lassial

resultsobtainedindierentsettingsbyproeduresbasedonl1 penalization.

The Dantzig seletor hasbeen introdued by Candèsand Tao [13℄ in the linear regression

model. Morepreisely,given

Y =Aλ0+ε,

whereY ∈Rn,Ais an by M matrix,ε∈Rn is thenoisevetorandλ0∈RM istheunknown regressionparameterto estimate,theDantzigestimatorisdenedby

λˆD = arg min

λ∈RM||λ||1 subjetto||AT(Aλ−Y)|| ≤η,

SupportedbyProjetPBCT13laboratorioANESTOCandProjetFONDECYT1090285. Departamento

deEstadístia,CIMFAV,UniversidaddeValparaíso,AvenidaGranBretaña1091,Valparaíso,Chile. Tel0056-

(0)32-2508324. Email:karine.bertinuv.l

SupportedbytheANRprojetPARCIMONIE.LaboratoiredeProbabilitéetModèlesAléatoires,Université

ParisDiderot,175ruedeChevaleret,F-75013Paris,Frane.Email:lepennemath.jussieu.fr

SupportedbytheANRprojetPARCIMONIE.LaboratoiredeMathématique,U.M.R.C.N.R.S.8628,Univer-

sitéParisSud,91405OrsayCedex,FraneandDépartementdeMathématiquesetAppliations,U.M.R.C.N.R.S.

8553,ENS-Paris,45Rued'Ulm,75230ParisCedex05,Frane,Email:vinent.rivoirardmath.u-psud.fr

(3)

where || · || is the sup-norm in RM, || · ||1 is the1 norm in RM, and η is a regularization parameter. Anaturalompanionofthis estimatoristhe Lassoproedure ormorepreisely its

relaxedform

λˆL= arg min

λ∈RM

1

2||Aλ−Y||22 +η||λ||1

,

whereηplaysexatlytheexatsameroleasfortheDantzigestimator. This1penalizedmethod

isalsoalledbasis pursuit in signalproessing(see[14,15℄).

Candèsand Tao[13℄ haveobtainedabound forthe2 riskof theestimatorλˆD, withlarge

probability,underaglobalonditiononthematrixA(the RestritedIsometryProperty)anda

sparsityassumption on λ0, even forM ≥n. Bikelet al. [3℄ haveobtained oraleinequalities andboundsofthep lossforbothestimatorsunder weakerassumptions. Atually,Bikeletal.

[3℄dealwiththenonparametriregressionframeworkin whih oneobserves

Yi=f(xi) +ei, i= 1, . . . , n

where f is anunknownfuntion while (xi)i=1,...,n are known designpointsand (ei)i=1,...,n isa

noisevetor. There isnointrinsimatrixAin thisproblem butforanyditionaryoffuntions Υ = (ϕm)m=1,...,M oneansearhf asaweightedsumfλ ofelementsofΥ

fλ=

M

X

m=1

λmϕm

andintroduethematrixA= (ϕm(xi))i,m,whihsummarizestheinformationontheditionary and on the design. Notie that if there exists λ0 suh that f = fλ0 then the model an be

rewrittenexatlyasthelassiallinearmodel. However,ifitisnottheaseandifamodelbias

exists,theDantzigandLasso proeduresanbeafterallapplied under similarassumptionson

A. Oraleinequalitiesare obtainedforwhihapproximationtheory playsanimportantrolein [3,8,9,29℄.

Let us also mention that in various settings, under various assumptions on the matrix A

(or more preisely on the assoiated Gram matrix G = ATA), properties of these estimators

have been establishedfor subset seletion (see [11, 20, 22, 23, 30, 31℄) and for predition (see

[3,19,20,23,32℄).

1.1 Our goals and results

WeonsiderinthispaperthedensityestimationframeworkalreadystudiedfortheLassoestimate

by Bunea et al [7, 10℄ and vande Geer [29℄. Namely, ourgoal is to estimate f0, an unknown

density funtion, by using the observations of an n-sample of variables X1, . . . , Xn of density

f0 withrespet to aknown measure dxon R. As in thenon parametriregressionsetting, we introdueaditionaryoffuntionsΥ = (ϕm)m=1,...,M,andsearhagainestimatesoff0aslinear

ombinationsfλ oftheditionaryfuntions. WerelyontheGrammatrixGassoiatedwithΥ,

denedbyGm,m =R

ϕm(x)ϕm(x)dx,andontheempirialsalarprodutsoff0 withϕm

βˆm= 1 n

n

X

i=1

ϕm(Xi).

The Dantzig estimateD is then obtained by minimizing ||λ||1 over the set of parameters λ

satisfyingtheadaptiveDantzigonstraint:

∀m∈ {1, . . . .M}, |(Gλ)m−βˆm| ≤ηγ,m

(4)

whereform∈ {1, . . . , M},(Gλ)m isthesalarprodutoffλ withϕm,

ηγ,m=

r2˜σm2γlogM

n +2||ϕm||γlogM

3n ,

˜

σm2 is asharpestimateof thevarianeofβˆm andγ is aonstantto behosen. Setion2gives

preisedenitionsandheuristisforusingthisonstraint. Wejustmentionherethatηγ,momes

from sharp onentration inequalities to give tight onstraints. Our idea is that if f0 an be

deomposedonΥas

f0=

M

X

m=1

λ0,mϕm,

thenwefore thesetoffeasibleparametersλtoontainλ0 withlargeprobabilityand tobeas smallaspossible. Signiantimprovementsin pratieareexpeted.

Our goals in this paper are mainly twofold. First, we aim at establishingsharp orale in-

equalitiesunderverymildassumptionsontheditionary. Ourstartingpointisthatmostofthe

papersin theliteratureassumethat thefuntions ofthe ditionaryarebounded byaonstant

independent of M and n, whih onstitutes a strong limitation, in partiular for ditionaries basedonhistogramsorwavelets(seeforinstane[6℄,[7℄,[8℄, [9℄,[11℄or[29℄). Suhassumptions

onthefuntions ofΥwill notbeonsideredin ourpaper. Likewise,ourmethodologydoesnot relyontheknowledgeof||f0||thatanevenbeinnite(asnotiedbyBirgé[4℄forthestudyof

theintegratedL2-risk, mostof thepapersin theliteraturetypiallyassumethatthesup-norm oftheunknowndensityisnitewithaknownorestimatedboundforthisquantity). Finally,let

us mentionthat,in ontrastwithwhat Bunea et al[10℄did, weobtainoraleinequalitieswith

leadingonstant 1, and furthermore these are establishedunder muh weakerassumptions on

theditionarythanin [10℄.

The seond goal of this paper deals with the problem of alibrating the so-alled Dantzig

onstant γ: how should this onstant be hosen to obtain good results in both theory and

pratie? Mostofthetime,forLasso-typeestimators,theregularizationparameterisoftheform

a qlogM

n with a a positiveonstant (see [3℄, [7℄, [6℄, [9℄, [12℄, [20℄ or [23℄ for instane). These

resultsareobtainedwithlargeprobabilitythatdependsonthetuningoeienta. Inpratie,it

isnotsimpletoalibratetheonstanta. Unfortunately,mostofthetime,thetheoretialhoie oftheregularizationparameterisnotsuitableforpratialissues. ThisfatistrueforLasso-type

estimatesbut alsoformanyalgorithmsfor whih theregularizationparameterprovidedbythe

theoryisoftentooonservativeforpratialpurposes(see[18℄wholearlyexplainsandillustrates

thispointfortheirthresholdingproedure). So,oneofthemain goalsofthispaperistollthe

gapbetweentheoptimal parameterhoie provided by theoretialresultsontheonehand and

by asimulation study onthe other hand. Only afew papersare devoted to this problem. In

the model seletion setting, the issueof alibration hasbeen addressedby Birgé and Massart

[5℄ who onsidered0-penalized estimatorsin a Gaussian homosedasti regressionframework andshowedthatthere existsaminimal penaltyin thesensethat takingsmallerpenaltiesleads

to inonsistentestimation proedures. Arlot andMassart [1℄generalizedthese resultsfor non-

GaussianorheterosedastidataandReynaud-BouretandRivoirard[26℄addressedthisquestion

forthresholdingrulesin thePoissonintensityframework.

Now,letusdesribeourresults. Byusingthepreviousdata-drivenDantzigonstraint,orale

inequalitiesare derived under loal onditionson the ditionary that are valid under lassial

assumptions onthe struture of theditionary. We extensivelydisuss these assumptions and

weshowtheirowninterestintheontextofthepaper. Eah termoftheseoraleinequalitiesis

(5)

||ϕm||2≤c1

n logM

||f0||,

where c1 is a onstant. This assumption is verymild and, unlikein lassialworks,allows to

onsider ditionaries based on wavelets. Then, relying on our Dantzig estimate, we build an

adaptiveLasso proedure whose oraleperformanes are similar. This illustrates theloseness

betweenLassoandDantzig-typeestimates.

Ourresultsare provedforγ >1. Forthetheoretialalibrationissue, westudy theperfor-

maneof our proedure when γ < 1. We show that in a simpleframework,estimation of the

straightforwardsignalf0=1[0,1]annotbeperformedataonvenientrateofonvergenewhen

γ <1. Thisresultprovesthattheassumptionγ >1 isthus nottooonservative.

Finally, a simulation study illustrates how ditionary-based methods outperform lassial

ones. Morepreisely,weshowthatourDantzigandLassoproedureswithγ >1,butloseto1,

outperformlassialones,suhassimplehistogramproedures,waveletthresholdingorDantzig

proeduresbasedontheknowledgeof||f0|| andlesstightDantzigonstraints.

1.2 Outlines

Setion 2introduesthe density estimatorof f0 whose theoretial performanes are studiedin Setion3. Setion 4studies theLasso estimateproposed inthis paper. Thealibrationissueis

studiedin Setion5.1andnumerialexperimentsareperformedinSetion 5.2. Finally,Setion

6isdevotedtotheproofsofourresults.

2 The Dantzig estimator of the density

f

0

As said in Introdution, ourgoal is to build anestimate of the densityf0 with respet to the

measuredxasalinearombinationoffuntionsofΥ = (ϕm)m=1,...,M,whereweassumewithout

anylossofgeneralitythat,foranym,mk2= 1: fλ=

M

X

m=1

λmϕm.

For this purpose, wenaturally relyon naturalestimates of theL2-salarproduts between f0

andtheϕm's. So,form∈ {1, . . . , M},weset β0,m=

Z

ϕm(x)f0(x)dx, (1)

andweonsideritsempirialounterpart

βˆm= 1 n

n

X

i=1

ϕm(Xi) (2)

thatisanunbiasedestimateofβ0,m. ThevarianeofthisestimateisVar( ˆβm) = σ

2 0,m

n where

σ20,m= Z

ϕ2m(x)f0(x)dx−β20,m. (3)

(6)

Note also that for any λ and any m, theL2-salar produt betweenfλ and ϕm anbe easily

omputed:

Z

ϕm(x)fλ(x)dx=

M

X

m=1

λm

Z

ϕm(x)ϕm(x)dx= (Gλ)m

whereGistheGrammatrixassoiatedtotheditionaryΥdened forany1≤m, m ≤M by Gm,m =

Z

ϕm(x)ϕm(x)dx.

Anyreasonablehoieof λshould ensurethattheoeients(Gλ)m areloseto βˆm forallm.

Therefore,usingCandèsandTao'sapproah,wedene theDantzigonstraint:

∀m∈ {1, . . . .M}, |(Gλ)m−βˆm| ≤ηγ,m (4)

andtheDantzig estimateD byD=fˆλD,γ with

λˆD,γ= argminλ∈RM||λ||1 suhthatλsatisestheDantzig onstraint(4),

whereforγ >0 andm∈ {1, . . . , M}, ηγ,m=

r2˜σm2γlogM

n +2||ϕm||γlogM

3n , (5)

with

˜

σm2 = ˆσm2 + 2||ϕm||

r2ˆσm2γlogM

n +8||ϕm||2γlogM

n (6)

and

ˆ

σm2 = 1 n(n−1)

n

X

i=2 i1

X

j=1

m(Xi)−ϕm(Xj))2. (7)

Notethatηγ,mdependsonthedata,sotheonstraint(4)willbereferredastheadaptiveDantzig

onstraint inthesequel. WenowjustifytheintrodutionofthedensityestimateD.

Thedenitionofηλ,γ isbasedonthefollowingheuristis. Givenm,whenthereexistsaon-

stantc0>0suhthatf0(x)≥c0forxinthesupportofϕmsatisfyingmk2=on(n(logM)1),

then, with largeprobability, thedeterministi termof (5),

2||ϕm||γlogM

3n , is negligiblewithre-

spetto the randomone,

qσ2mγlogM

n . Inthis ase, therandom termis themain oneand we

asymptotiallyderive

ηγ,m≈ r

2γlogM σ˜m2

n . (8)

Having in mindthat σ˜m2/nisa onvenientestimatefor Var( ˆβm)(see theproofof Theorem 1),

theshapeoftherighthandtermoftheformula(8)looksliketheboundproposedbyCandèsand

Tao [13℄ to denethe Dantzigonstraintin thelinear model. Atually,the deterministiterm

of(5) allowstogetsharponentrationinequalities. Asoftendonein theliterature,insteadof

estimatingVar( ˆβm),weouldusetheinequality Var( ˆβm) =σ20,m

n ≤||f0||

n

(7)

and weould replae σ˜m2 with ||f0|| in the denition of the ηγ,m. But this requires astrong

assumption: f0isboundedand||f0||isknown. Inourpaper,Var( ˆβm)isestimated,whihallows

notto impose these onditions. More preisely, we slightly overestimate σ20,m to ontrol large

deviation termsand this is the reasonwhyweintrodue ˜σm2 instead of usingσˆ2m, an unbiased

estimateofσ20,m. Finally,γ isaonstantthathasto besuitablyalibrated andplaysaapital

rolein pratie.

Thefollowingresultjustiespreviousheuristisbyshowingthat,ifγ >1,withhigh proba-

bility,thequantity|βˆm−β0,m| issmallerthanηγ,m forallm. Theparameterηγ,m withγlose

to1 anbeviewedasthesmallest quantitythatensuresthisproperty.

Theorem1. Letus assumethat M satises

n≤M ≤exp(nδ) (9)

for δ <1. Letγ >1. Then, for any ε >0,there existsaonstant C1(ε, δ, γ)depending onε, δ

andγ suhthat P

∃m∈ {1, . . . , M}, |β0,m−βˆm| ≥ηγ,m

≤C1(ε, δ, γ)M11+εγ .

Inaddition, there existsaonstantC2(δ, γ)dependingon δ andγ suhthat P

∀m∈ {1, . . . , M}, η(γ,m) ≤ηγ,m≤ηγ,m(+)

≥1−C2(δ, γ)M1γ

where, for m∈ {1, . . . , M},

ηγ,m()0,m

r8γlogM

7n +2||ϕm||γlogM 3n

and

η(+)γ,m0,m

r16γlogM

n +10||ϕm||γlogM

n .

ThisresultisprovedinSetion6.1. Therstpartisasharponentrationinequalityproved

byusing Bernsteintypeontrols. Theseond partofthetheorem provesthat, upto onstants

dependingonγ,ηγ,misoforderσ0,m

qlogM

n +||ϕm||logM

n withhighprobability. Notethatthe assumptionγ >1 isessentialtoobtainprobabilitiesgoingto0.

Finally,letλ0= (λ0,m)m=1,...,M ∈RM suhthat

PΥf0=

M

X

m=1

λ0,mϕm

wherePΥ istheprojetiononthespaespannedbyΥ. Wehave (Gλ0)m=

Z

(PΥf0m= Z

f0ϕm0,m.

So,Theorem1provesthatλ0satisestheadaptiveDantzigonstraint(4)withprobabilitylarger than1−C1(ε, δ, γ)M11+εγ foranyε >0. Atually,weforethesetofparametersλsatisfyingthe

adaptiveDantzig onstrainttoontainλ0with largeprobabilityandtobeassmallaspossible.

Therefore,D = fλˆD,γ is a good andidate among sparse estimates linearlydeomposed onΥ

forestimatingf0.

We mention that Assumption (9) an be relaxed and we an take M < n provided the

denitionofηγ,m ismodied.

(8)

In thesequel, wewill denote ˆλD = ˆλD,γ to simplify thenotations, but the Dantzig estimatorD still depends on γ. Moreover, we assume that (9) is true and we denote the vetor ηγ = (ηγ,m)m=1,...,M onsideredwiththeDantzigonstantγ >1.

3.1 The main result under loal assumptions

Letusstatethemainresultofthispaper. ForanyJ ⊂ {1, . . . , M},wesetJC={1, . . . , M}rJ

anddeneλJ thevetorwhihhasthesameoordinatesasλonJ andzerooordinatesonJC.

WeintroduealoalassumptionindexedbyasubsetJ0.

Loal Assumption Given J0 ⊂ {1, . . . , M}, for some onstants κJ0 > 0 and µJ0 ≥ 0

dependingonJ0,wehaveforanyλ,

||fλ||2≥κJ0||λJ0||2 − µJ0

p|J0|

||λJ0C||1− ||λJ0||1

+. (LA(J0, κJ0, µJ0))

Weobtainthefollowingoraletypeinequalitywithoutanyassumptiononf0.

Theorem2. Withprobability atleast1−C1(ε, δ, γ)M1−1+εγ ,forallJ0⊂ {1, . . . , M} suhthat

thereexistκJ0 >0andµJ0 ≥0 forwhih (LA(J0, κJ0, µJ0))holds, wehave, for anyα >0,

||fˆD−f0||22≤ inf

λ∈RM

(

||fλ−f0||22

1 +2µJ0

κJ0

2

Λ(λ, J0c)2

|J0| + 16|J0| 1

α+ 1 κ2J0

||ηγ||2

) ,

(10)

with

Λ(λ, J0c) =||λJC

0 ||1+

||ˆλD||1− ||λ||1

+

2 .

Letusommenteahtermoftherighthandsideof(10). Thersttermisanapproximation

term whih measures the loseness between f0 and fλ. This term an vanish if f0 an be

deomposed on the ditionary. Theseond term, a biasterm, is a prieto paywhen either λ

isnotsupportedbythesubsetJ0 onsideredoritdoesnotsatisfytheondition||ˆλD||1 ≤ ||λ||1

whih holdsassoonasλsatisestheadaptiveDantzigonstraint. Finally,thelastterm,whih

doesnotdepend onλ,anbeviewedasavarianetermorrespondingtotheestimationonthe subsetJ0. The parameterαalibratestheweightsgivenfor thebias andvarianetermsinthe

oraleinequality. Conerning the last term, remember that ηγ,m relies on an estimate of the

varianeof βˆm. Furthermore,wehavewithhighprobability:

||ηγ||2 ≤ 2 sup

m

16σ0,m2 γlogM

n +

10||ϕm||γlogM n

2! .

So,iff0 isboundedthen, σ0,m2 ≤ ||f0||andifthereexists aonstantc1suhthat foranym,

||ϕm||2≤c1

n logM

||f0||, (11)

(whih istrueforinstaneforaboundedditionary),then

||ηγ||2 ≤C||f0||logM n ,

Références

Documents relatifs

We show however that the main information theoretic idea of [11] remains sound in this Poissonian setting as well: We use a Lepski-type procedure to construct j ˆ n (x), and we

Keywords: Nonparametric estimation, model selection, relative density, two-sample problem.. AMS Subject Classification 2010: 62G05;

To improve the bound based on the continuous relaxation of (P ), Lagrangian methods, like Lagrangian Re- laxation (LR) [2], Lagrangian Decomposition (LD) [5,6,11,13,12],

Although the minimax approach provides a fair and convenient criterion for comparison between different estimators, it lacks some flexibility. Typically Σ is a class of functions

It is worth noting in this context that estimation procedures based on a local selection scheme can be applied to the esti- mation of functions belonging to much more general

We construct a new family of kernel density estimators that do not suffer from the so-called boundary bias problem and we propose a data-driven procedure based on the Goldenshluger

In this paper, we discuss the problem of adaptive confidence balls, from a non-asymptotic point of view, in the particular context of density

Notons que le but visé par cette étude est l'établissement de la loi hauteur–débit ainsi que l'expression du coefficient de débit, deux types de dispositifs ont été