Adaptive Dantzig density estimation

(1)

HAL Id: hal-00634423

https://hal.archives-ouvertes.fr/hal-00634423

Submitted on 21 Oct 2011

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Adaptive Dantzig density estimation

Karine Bertin, Erwan Le Pennec, Vincent Rivoirard

To cite this version:

Karine Bertin, Erwan Le Pennec, Vincent Rivoirard. Adaptive Dantzig density estimation. Annales

de l’Institut Henri Poincaré (B) Probabilités et Statistiques, Institut Henri Poincaré (IHP), 2011, 47

(1), pp.43-74. �10.1214/09-AIHP351�. �hal-00634423�

(2)

K. Bertin

∗

, E.Le Penne

†

, V.Rivoirard

‡

Abstrat

Thispaperdealswiththeproblemofdensityestimation. Weaimatbuildinganestimate

of an unknown density as a linear ombination of funtions of a ditionary. Inspired by

Candès and Tao's approah, we propose an ℓ₁-minimization under an adaptive Dantzig onstraint oming from sharp onentration inequalities. This allows to onsider a wide

lass of ditionaries. Under loal or globaloherene assumptions,orale inequalities are

derived. ThesetheoretialresultsarealsoprovedtobevalidforthenaturalLassoestimate

assoiated withour Dantzig proedure. Then, theissue of alibrating theseproeduresis

studiedfromboththeoretialandpratialpointsofview. Finally,anumerialstudyshows

thesigniantimprovementobtainedbyourproedureswhenomparedwithotherlassial

proedures.

Keywords: Calibration, Conentrationinequalities,Dantzig estimate,Densityestimation,

Ditionary,Lassoestimate,Oraleinequalities,Sparsity.

AMS subjet lassiation: 62G07,62G05,62G20

1 Introdution

Variousestimationproeduresbasedonl1penalization(exempliedbytheDantzigproedurein [13℄andtheLASSOproedurein[28℄)haveextensivelybeenstudiedreently. Theseproedures

areomputationallyeientasshownin [17,24,25℄,andthusareadaptedto high-dimensional

data. Theyhavebeenwidelyusedin regressionmodels, but onlytheLassoestimatorhasbeen

studied in the density model (see [7, 10, 29℄). Although we will mostly onsider the Dantzig

estimatorin thedensity model forwhih no resultexists sofar,wereall someofthe lassial

resultsobtainedindierentsettingsbyproeduresbasedonl1 penalization.

The Dantzig seletor hasbeen introdued by Candèsand Tao [13℄ in the linear regression

model. Morepreisely,given

Y =Aλ0+ε,

whereY ∈Rⁿ,A^is ^an ^by M ^matrix,ε∈Rⁿ is thenoisevetorandλ0∈R^M istheunknown regressionparameterto estimate,theDantzigestimatorisdenedby

λˆ^D = arg min

λ∈R^M||λ||^ℓ1 ^subjet^to||A^T(Aλ−Y)||^ℓ^∞ ≤η,

∗

SupportedbyProjetPBCT13laboratorioANESTOCandProjetFONDECYT1090285. Departamento

deEstadístia,CIMFAV,UniversidaddeValparaíso,AvenidaGranBretaña1091,Valparaíso,Chile. Tel0056-

(0)32-2508324. Email:karine.bertinuv.l

†

SupportedbytheANRprojetPARCIMONIE.LaboratoiredeProbabilitéetModèlesAléatoires,Université

ParisDiderot,175ruedeChevaleret,F-75013Paris,Frane.Email:lepennemath.jussieu.fr

‡

SupportedbytheANRprojetPARCIMONIE.LaboratoiredeMathématique,U.M.R.C.N.R.S.8628,Univer-

sitéParisSud,91405OrsayCedex,FraneandDépartementdeMathématiquesetAppliations,U.M.R.C.N.R.S.

8553,ENS-Paris,45Rued'Ulm,75230ParisCedex05,Frane,Email:vinent.rivoirardmath.u-psud.fr

(3)

where || · ||ℓ∞ îs ^the ^sup-norm ⁱⁿ R^M, || · ||ℓ1 îs ^the ℓ1 ^norm ⁱⁿ R^M, and η îs â regularization parameter. Anaturalompanionofthis estimatoristhe Lassoproedure ormorepreisely its

relaxedform

λˆ^L= arg min

λ∈R^M

1

2||Aλ−Y||²ℓ2 +η||λ||ℓ1

,

whereη^playsêxatly^theêxat^same^roleâs^for^the^Dantzigêstimator. ^Thisℓ1^penalized^method

isalsoalledbasis pursuit in signalproessing(see[14,15℄).

Candèsand Tao[13℄ haveobtainedabound fortheℓ2 ^risk^of ^the^estimatorλˆ^D^, ^with^large

probability,underaglobalonditiononthematrixA^(the ^RestritedÎsometry^Property)ândâ

sparsityassumption on λ0^, êven ^forM ≥n^. ^Bikelêt âl. ^[3℄ ^haveôbtained ôraleinequalities andboundsoftheℓp ^loss^for^bothêstimatorsûnder ^weakerassumptions. Atually,Bikeletal.

[3℄dealwiththenonparametriregressionframeworkin whih oneobserves

Yi=f(xi) +ei, i= 1, . . . , n

where f îs ânûnknown^funtion ^while (xi)i=1,...,n âre ^known ^design^pointsând (ei)i=1,...,n îsâ

noisevetor. There isnointrinsimatrixAⁱⁿ ^this^problem ^but^forâny^ditionaryôf^funtions Υ = (ϕm)m=1,...,M ôneân^searhf âsâ^weighted^sumfλ ôfêlementsôfΥ

fλ=

M

X

m=1

λmϕm

andintroduethematrixA= (ϕm(xi))i,m^,^whih^summarizes^theinformationontheditionary and on the design. Notie that if there exists λ0 ^suh ^that f = fλ0 ^then ^the ^model ^an ^be

rewrittenexatlyasthelassiallinearmodel. However,ifitisnottheaseandifamodelbias

exists,theDantzigandLasso proeduresanbeafterallapplied under similarassumptionson

A^. ^Oraleinequalitiesare obtainedforwhihapproximationtheory playsanimportantrolein [3,8,9,29℄.

Let us also mention that in various settings, under various assumptions on the matrix A

(or more preisely on the assoiated Gram matrix G = A^TA^), ^properties ^of ^these ^estimators

have been establishedfor subset seletion (see [11, 20, 22, 23, 30, 31℄) and for predition (see

[3,19,20,23,32℄).

1.1 Our goals and results

WeonsiderinthispaperthedensityestimationframeworkalreadystudiedfortheLassoestimate

by Bunea et al [7, 10℄ and vande Geer [29℄. Namely, ourgoal is to estimate f0^, ^an ^unknown

density funtion, by using the observations of an n^-sample ^of ^variables X1, . . . , Xn ^of ^density

f0 ^with^respet ^to â^known ^measure dxôn R. As in thenon parametriregressionsetting, we introdueaditionaryoffuntionsΥ = (ϕm)m=1,...,M^,ând^searhâgainêstimatesôff0âs^linear

ombinationsfλ ôf^the^ditionary^funtions. ^We^relyôn^the^Gram^matrixGâssoiated^withΥ^,

denedbyGm,m^′ =R

ϕm(x)ϕm^′(x)dx^,ândôn^theêmpirial^salar^produtsôff0 ^withϕm

βˆm= 1 n

n

X

i=1

ϕm(Xi).

The Dantzig estimate fˆ^D îs ^then ôbtained ^by ^minimizing ||λ||^ℓ1 ôver ^the ^set ôf ^parameters λ

satisfyingtheadaptiveDantzigonstraint:

∀m∈ {1, . . . .M}, |(Gλ)m−βˆm| ≤ηγ,m

(4)

whereform∈ {1, . . . , M}^,(Gλ)m ^is^the^salar^produt^offλ ^withϕm^,

ηγ,m=

r2˜σ_m²γlogM

n +2||ϕm||∞γlogM

3n ,

˜

σ_m² îs â^sharpêstimateôf ^the^varianeôfβˆm ândγ îs âônstant^to ^be^hosen. ^Setion²^gives

preisedenitionsandheuristisforusingthisonstraint. Wejustmentionherethatηγ,m^omes

from sharp onentration inequalities to give tight onstraints. Our idea is that if f0 ^an ^be

deomposedonΥ^as

f0=

M

X

m=1

λ0,mϕm,

thenwefore thesetoffeasibleparametersλ^to^ontainλ0 ^with^largeprobabilityand tobeas smallaspossible. Signiantimprovementsin pratieareexpeted.

Our goals in this paper are mainly twofold. First, we aim at establishingsharp orale in-

equalitiesunderverymildassumptionsontheditionary. Ourstartingpointisthatmostofthe

papersin theliteratureassumethat thefuntions ofthe ditionaryarebounded byaonstant

independent of M ând n^, ^whih ônstitutes â ^strong limitation, in partiular for ditionaries basedonhistogramsorwavelets(seeforinstane[6℄,[7℄,[8℄, [9℄,[11℄or[29℄). Suhassumptions

onthefuntions ofΥ^will ^not^beônsideredⁱⁿ ôur^paper. ^Likewise,ôurmethodologydoesnot relyontheknowledgeof||f0||∞^thatânêven^beînnite^(as^notied^by^Birgé^[4℄^for^the^studyôf

theintegratedL₂-risk, mostof thepapersin theliteraturetypiallyassumethatthesup-norm oftheunknowndensityisnitewithaknownorestimatedboundforthisquantity). Finally,let

us mentionthat,in ontrastwithwhat Bunea et al[10℄did, weobtainoraleinequalitieswith

leadingonstant 1, and furthermore these are establishedunder muh weakerassumptions on

theditionarythanin [10℄.

The seond goal of this paper deals with the problem of alibrating the so-alled Dantzig

onstant γ^: ^how ^should ^this ônstant ^be ^hosen ^to ôbtain ^good ^results ⁱⁿ ^both ^theory ând

pratie? Mostofthetime,forLasso-typeestimators,theregularizationparameterisoftheform

a qlogM

n ^with a â ^positiveônstant ^(see ^[3℄, ^[7℄, ^[6℄, ^[9℄, ^[12℄, ^[20℄ ôr ^[23℄ ^for înstane). ^These

resultsareobtainedwithlargeprobabilitythatdependsonthetuningoeienta^. ^In^pratie,^it

isnotsimpletoalibratetheonstanta^. Unfortunately,mostofthetime,thetheoretialhoie oftheregularizationparameterisnotsuitableforpratialissues. ThisfatistrueforLasso-type

estimatesbut alsoformanyalgorithmsfor whih theregularizationparameterprovidedbythe

theoryisoftentooonservativeforpratialpurposes(see[18℄wholearlyexplainsandillustrates

thispointfortheirthresholdingproedure). So,oneofthemain goalsofthispaperistollthe

gapbetweentheoptimal parameterhoie provided by theoretialresultsontheonehand and

by asimulation study onthe other hand. Only afew papersare devoted to this problem. In

the model seletion setting, the issueof alibration hasbeen addressedby Birgé and Massart

[5℄ who onsidered ℓ0^-penalized ^estimatorsⁱⁿ ^a ^Gaussian homosedasti regressionframework andshowedthatthere existsaminimal penaltyin thesensethat takingsmallerpenaltiesleads

to inonsistentestimation proedures. Arlot andMassart [1℄generalizedthese resultsfor non-

GaussianorheterosedastidataandReynaud-BouretandRivoirard[26℄addressedthisquestion

forthresholdingrulesin thePoissonintensityframework.

Now,letusdesribeourresults. Byusingthepreviousdata-drivenDantzigonstraint,orale

inequalitiesare derived under loal onditionson the ditionary that are valid under lassial

assumptions onthe struture of theditionary. We extensivelydisuss these assumptions and

weshowtheirowninterestintheontextofthepaper. Eah termoftheseoraleinequalitiesis

(5)

||ϕm||²_∞≤c1

n logM

||f0||∞,

where c1 îs â ônstant. ^This âssumption îs ^very^mild ând, ûnlikeⁱⁿ ^lassial^works,âllows ^to

onsider ditionaries based on wavelets. Then, relying on our Dantzig estimate, we build an

adaptiveLasso proedure whose oraleperformanes are similar. This illustrates theloseness

betweenLassoandDantzig-typeestimates.

Ourresultsare provedforγ >1^. ^Fôr^the^theoretialâlibrationîssue, ^we^study ^the^perfor-

maneof our proedure when γ < 1^. ^We ^show ^that ⁱⁿ â ^simple^framework,êstimation ôf ^the

straightforwardsignalf0=1[0,1]ânnot^be^performedâtâônvenient^rateôfônvergene^when

γ <1^. ^This^result^proves^that^the^assumptionγ >1 ^is^thus ^not^tooonservative.

Finally, a simulation study illustrates how ditionary-based methods outperform lassial

ones. Morepreisely,weshowthatourDantzigandLassoproedureswithγ >1^,^but^lose^to^1,

outperformlassialones,suhassimplehistogramproedures,waveletthresholdingorDantzig

proeduresbasedontheknowledgeof||f0||∞ ^and^less^tight^Dantzigonstraints.

1.2 Outlines

Setion 2introduesthe density estimatorof f0 ^whose ^theoretial performanes are studiedin Setion3. Setion 4studies theLasso estimateproposed inthis paper. Thealibrationissueis

studiedin Setion5.1andnumerialexperimentsareperformedinSetion 5.2. Finally,Setion

6isdevotedtotheproofsofourresults.

2 The Dantzig estimator of the density

f

₀

As said in Introdution, ourgoal is to build anestimate of the densityf0 ^with ^respet ^to ^the

measuredxâsâ^linearômbinationôf^funtionsôfΥ = (ϕm)m=1,...,M^,^where^weâssume^without

anylossofgeneralitythat,foranym^,kϕmk²= 1^: fλ=

M

X

m=1

λmϕm.

For this purpose, wenaturally relyon naturalestimates of theL₂-salarproduts between f0

andtheϕm^'s. ^So,^form∈ {1, . . . , M}^,^we^set β0,m=

Z

ϕm(x)f0(x)dx, ⁽¹⁾

andweonsideritsempirialounterpart

βˆm= 1 n

n

X

i=1

ϕm(Xi) ⁽²⁾

thatisanunbiasedestimateofβ0,m^. ^The^varianeôf^thisêstimateîsVar( ˆβm) = ^σ

2 0,m

n ^where

σ²_0,m= Z

ϕ²_m(x)f0(x)dx−β²_0,m. ⁽³⁾

(6)

Note also that for any λ ând âny m^, ^theL₂-salar produt betweenfλ ând ϕm ân^be êasily

omputed:

Z

ϕm(x)fλ(x)dx=

M

X

m^′=1

λm^′

Z

ϕm^′(x)ϕm(x)dx= (Gλ)m

whereGîs^the^Gram^matrixâssoiated^to^the^ditionaryΥ^dened ^forâny1≤m, m^′ ≤M ^by Gm,m^′ =

Z

ϕm(x)ϕm^′(x)dx.

Anyreasonablehoieof λ^should ênsure^that^theôeients(Gλ)m âre^lose^to βˆm ^forâllm^.

Therefore,usingCandèsandTao'sapproah,wedene theDantzigonstraint:

∀m∈ {1, . . . .M}, |(Gλ)m−βˆm| ≤ηγ,m ⁽⁴⁾

andtheDantzig estimatefˆ^D ^byfˆ^D=fˆλ^D,γ ^with

λˆ^D,γ= argmin_λ_∈R^M||λ||ℓ1 ^suh^thatλ^satises^the^Dantzig ^onstraint⁽⁴⁾,

whereforγ >0 ^andm∈ {1, . . . , M}^, ηγ,m=

r2˜σ_m²γlogM

3n , ⁽⁵⁾

with

˜

σ_m² = ˆσ_m² + 2||ϕm||∞

r2ˆσ_m²γlogM

n +8||ϕm||²_∞γlogM

n ⁽⁶⁾

and

ˆ

σ_m² = 1 n(n−1)

n

X

i=2 i−1

X

j=1

(ϕm(Xi)−ϕm(Xj))². ⁽⁷⁾

Notethatηγ,m^dependsôn^the^data,^so^theônstraint⁽⁴⁾^will^be^referredâs^theâdaptive^Dantzig

onstraint inthesequel. Wenowjustifytheintrodutionofthedensityestimatefˆ^D^.

Thedenitionofηλ,γ îs^basedôn^the^following^heuristis. ^Givenm^,^when^thereêxistsâôn-

stantc0>0^suh^thatf0(x)≥c0^forxⁱⁿ^the^support^ofϕm^satisfyingkϕmk²_∞=on(n(logM)⁻¹)^,

then, with largeprobability, thedeterministi termof (5),

2||ϕm||^∞γlogM

3n ^, ^is ^negligible^with^re-

spetto the randomone,

q2˜σ²_mγlogM

n ^. În^this âse, ^the^random ^termîs ^the^main ôneând ^we

asymptotiallyderive

ηγ,m≈ r

2γlogM σ˜_m²

n . ⁽⁸⁾

Having in mindthat σ˜_m²/nîsâ ônvenientêstimate^for Var( ˆβm)^(see ^the^proofôf ^Theorem ^1),

theshapeoftherighthandtermoftheformula(8)looksliketheboundproposedbyCandèsand

Tao [13℄ to denethe Dantzigonstraintin thelinear model. Atually,the deterministiterm

of(5) allowstogetsharponentrationinequalities. Asoftendonein theliterature,insteadof

estimatingVar( ˆβm)^,^weôuldûse^theînequality Var( ˆβm) =σ²_0,m

n ≤||f0||∞

n

(7)

and weould replae σ˜_m² ^with ||f0||∞ ⁱⁿ ^the ^denition ^of ^the ηγ,m^. ^But ^this ^requires ^a^strong

assumption: f0îs^boundedând||f0||∞îs^known. Înôur^paper,Var( ˆβm)îsêstimated,^whihâllows

notto impose these onditions. More preisely, we slightly overestimate σ²_0,m ^to ^ontrol ^large

deviation termsand this is the reasonwhyweintrodue ˜σ_m² înstead ôf ûsingσˆ²_m^, ân ûnbiased

estimateofσ²_0,m^. ^Finally,γ îsâônstant^that^has^to ^be^suitablyâlibrated ând^playsââpital

rolein pratie.

Thefollowingresultjustiespreviousheuristisbyshowingthat,ifγ >1^,^with^high ^proba-

bility,thequantity|βˆm−β0,m| ^is^smaller^thanηγ,m ^for^allm^. ^The^parameterηγ,m ^withγ^lose

to1 ân^be^viewedâs^the^smallest ^quantity^thatênsures^this^property.

Theorem1. Letus assumethat M ^satises

n≤M ≤exp(n^δ) ⁽⁹⁾

for δ <1^. ^Letγ >1^. ^Then, ^for âny ε >0^,^there êxistsâônstant C1(ε, δ, γ)^depending ônε, δ

andγ ^suh^that P

∃m∈ {1, . . . , M}, |β0,m−βˆm| ≥ηγ,m

≤C1(ε, δ, γ)M¹⁻^1+ε^γ .

Inaddition, there existsaonstantC2(δ, γ)^depending^on δ ^andγ ^suh^that P

∀m∈ {1, . . . , M}, η⁽_γ,m⁻⁾ ≤ηγ,m≤η_γ,m⁽⁺⁾

≥1−C2(δ, γ)M¹⁻^γ

where, for m∈ {1, . . . , M}^,

η_γ,m⁽⁻⁾ =σ0,m

r8γlogM

7n +2||ϕm||∞γlogM 3n

and

η⁽⁺⁾_γ,m=σ0,m

r16γlogM

n .

ThisresultisprovedinSetion6.1. Therstpartisasharponentrationinequalityproved

byusing Bernsteintypeontrols. Theseond partofthetheorem provesthat, upto onstants

dependingonγ^,ηγ,mîsôfôrderσ0,m

qlogM

n +||ϕm||∞logM

n ^with^highprobability. Notethatthe assumptionγ >1 îsêssential^toôbtainprobabilitiesgoingto0.

Finally,letλ0= (λ0,m)m=1,...,M ∈R^M suhthat

PΥf0=

M

X

m=1

λ0,mϕm

wherePΥ ^is^the^projetion^on^the^spae^spanned^byΥ^. ^We^have (Gλ0)m=

Z

(PΥf0)ϕm= Z

f0ϕm=β0,m.

So,Theorem1provesthatλ0^satises^theâdaptive^Dantzigônstraint⁽⁴⁾^withprobabilitylarger than1−C1(ε, δ, γ)M¹⁻^1+ε^γ ^forânyε >0^. Âtually,^we^fore^the^setôf^parametersλ^satisfying^the

adaptiveDantzig onstrainttoontainλ0^with ^largeprobabilityandtobeassmallaspossible.

Therefore, fˆ^D = fλˆ^D,γ îs â ^good ândidate âmong ^sparse êstimates ^linearly^deomposed ônΥ

forestimatingf0^.

We mention that Assumption (9) an be relaxed and we an take M < n ^provided ^the

denitionofηγ,m ^is^modied.

(8)

In thesequel, wewill denote ˆλ^D = ˆλ^D,γ ^to ^simplify ^the^notations, ^but ^the ^Dantzig êstimator fˆ^D ^still ^depends ôn γ^. ^Moreover, ^we âssume ^that ⁽⁹⁾ îs ^true ând ^we ^denote ^the ^vetor ηγ = (ηγ,m)m=1,...,M ônsidered^with^the^Dantzigônstantγ >1^.

3.1 The main result under loal assumptions

Letusstatethemainresultofthispaper. ForanyJ ⊂ {1, . . . , M}^,^we^setJ^C={1, . . . , M}rJ

anddeneλJ ^the^vetor^whih^has^the^sameôordinatesâsλônJ ând^zeroôordinatesônJ^C^.

WeintroduealoalassumptionindexedbyasubsetJ0^.

• ^Loal Âssumption ^Given J0 ⊂ {1, . . . , M}^, ^for ^some ônstants κJ0 > 0 ând µJ0 ≥ 0

dependingonJ0^,^we^have^for^anyλ^,

||fλ||²≥κJ0||λJ0||^ℓ2 − µJ0

p|J0|

||λJ₀^C||^ℓ1− ||λJ0||^ℓ1

+. ⁽LA(J0, κJ0, µJ0)⁾

Weobtainthefollowingoraletypeinequalitywithoutanyassumptiononf0^.

Theorem2. Withprobability atleast1−C1(ε, δ, γ)M¹⁻^1+ε^γ ^,^for^al^lJ0⊂ {1, . . . , M} ^suh^that

thereexistκJ0 >0^andµJ0 ≥0 ^for^whih ⁽LA(J0, κJ0, µJ0)⁾^holds, ^we^have, ^for ^anyα >0^,

||fˆ^D−f0||²2≤ inf

λ∈R^M

(

||fλ−f0||²2+α

1 +2µJ0

κJ0

2

Λ(λ, J₀^c)²

|J0| + 16|J0| 1

α+ 1 κ²_J₀

||ηγ||²ℓ∞

) ,

(10)

with

Λ(λ, J₀^c) =||λ_J^C

0 ||ℓ1+

||ˆλ^D||^ℓ1− ||λ||^ℓ1

+

2 .

Letusommenteahtermoftherighthandsideof(10). Thersttermisanapproximation

term whih measures the loseness between f0 ând fλ^. ^This ^term ân ^vanish îf f0 ân ^be

deomposed on the ditionary. Theseond term, a biasterm, is a prieto paywhen either λ

isnotsupportedbythesubsetJ0 ônsideredôrît^does^not^satisfy^theôndition||ˆλ^D||ℓ1 ≤ ||λ||ℓ1

whih holdsassoonasλ^satises^the^adaptive^Dantzig^onstraint. ^Finally^,^the^last^term,^whih

doesnotdepend onλ^,ân^be^viewedâsâ^variane^termorrespondingtotheestimationonthe subsetJ0^. ^The ^parameterαâlibrates^the^weights^given^for ^the^bias ând^variane^termsⁱⁿ^the

oraleinequality. Conerning the last term, remember that ηγ,m ^relies ôn ân êstimate ôf ^the

varianeof βˆm^. Furthermore,wehavewithhighprobability:

||ηγ||²ℓ∞ ≤ 2 sup

m

16σ_0,m² γlogM

n +

10||ϕm||∞γlogM n

2! .

So,iff0 îs^bounded^then, σ_0,m² ≤ ||f0||∞ândîf^thereêxists âônstantc1^suh^that ^forânym^,

||ϕm||²_∞≤c1

n logM

||f0||∞, ⁽¹¹⁾

(whih istrueforinstaneforaboundedditionary),then

||ηγ||²ℓ∞ ≤C||f0||∞logM n ,