• Aucun résultat trouvé

Robust group lasso: Model and recoverability

N/A
N/A
Protected

Academic year: 2022

Partager "Robust group lasso: Model and recoverability"

Copied!
40
0
0

Texte intégral

(1)

Contents lists available atScienceDirect

Linear Algebra and its Applications

www.elsevier.com/locate/laa

Robust group lasso: Model and recoverability

Xiaohan Weia,∗, Qing Lingb, Zhu Hanc

a DepartmentofElectricalEngineering,UniversityofSouthernCalifornia,United Statesof America

b Schoolof DataandComputerScience,SunYat-SenUniversity,China

cDepartmentofElectricalandComputerEngineering,UniversityofHouston, UnitedStatesofAmerica

a r t i c l e i n f o a bs t r a c t

Article history:

Received19November2016 Accepted30June2018 Availableonlinexxxx SubmittedbyH.Rauhut MSC:

60D05 94A12 Keywords:

Compressedsensing Groupsparsity Sparsenoise

Restrictedisometryproperty Golfingscheme

Thispaperconsiderstheproblemofrecoveringagroupsparse signal matrix Y = [y1,· · ·,yL] from sparsely corrupted measurements M= [A(1)y1,· · ·,A(L)yL]+S,whereA(i)’s are known sensing matrices and S is an unknown sparse errormatrix.Arobustgrouplasso(RGL)modelisproposed to recover Y andSthrough simultaneouslyminimizing the 2,1-normofYandthe1-normofSunderthemeasurement constraints.WeprovethatYandScanbeexactlyrecovered fromtheRGLmodelwithhighprobabilityforaverygeneral classofA(i)’s.

©2018ElsevierInc.Allrightsreserved.

Qing Ling’s research is partially supported by National Science Foundation of China under Grant 61573331andbytheNationalScienceFoundationAnhuiunderGrant1608085QF130.ZhuHan’sresearch ispartiallysupportedbyUSMURI,NSFCNS-1717454,CNS-1731424,CNS-1702850,CNS-1646607.

* Correspondingauthor.

E-mailaddresses:[email protected](X. Wei),[email protected](Q. Ling),[email protected] (Z. Han).

https://doi.org/10.1016/j.laa.2018.06.034

0024-3795/©2018ElsevierInc. Allrightsreserved.

(2)

1. Introduction

Considertheproblem ofrecovering agroupsparsematrixY= [y1,· · ·,yL]Rn×L fromsparsely corruptedmeasurements

M= [A(1)y1,· · ·,A(L)yL] +S, (1) where M = [m1,· · ·,mL] Rm×L is a measurement matrix, A(i) Rm×n is thei-th sensingmatrix, and S= [s1,· · ·,sL] Rm×L is an unknown sparse errormatrix. The error matrix S is sparse as it has only a small number of nonzero entries. The signal matrixYisgroupsparse,meaningthatYissparseanditsnonzeroentriesappearina smallnumberofcommonrows.

Given M and A(i)’s, our goal is to recover Y and S from the linear measurement equation(1).Inthispaper,weproposetoaccomplishtherecoverytaskthroughsolving thefollowingrobustgrouplasso(RGL) model

minY,S Y2,1+λS1, (2)

s.t. M= [A(1)y1,· · ·,A(L)yL] +S. (3) Denoting yij and sij as the (i,j)-th entries of Y and S, respectively, Y2,1 n

i=1

L

j=1y2ij is defined as the 2,1-norm of Y and S1 m i=1

L

j=1|sij| is de- finedas the1-norm ofS.Minimizingthe2,1-normtermpromotesgroupsparsityofY whileminimizingthe1-normtermpromotessparsityofS;λisanonnegativeparameter to balance the two terms. We prove that solving the RGL model (2)–(3), which is a convex program, enables exact recovery of Y and S with high probability, given that A(i)’ssatisfycertainconditions.

1.1. From grouplasso torobustgroup lasso

Sparsesignal recovery has attractedmuch research interest inthe signal processing andoptimizationcommunities duringthepastfewyears.Various sparsity modelshave beenproposed tobetterexploitthesparsestructures ofhigh-dimensionaldata, suchas sparsity ofa vector[1], [2], groupsparsity of vectors[3], and low-ranknessof amatrix [4]. For moretopics relatedto sparse signalrecovery,readers arereferredto therecent surveypaper[5].

Inthis paperwe areinterestedinthe recoveryof groupsparse(also knownasblock sparse [6] or jointly sparse [7]) signals which finds a variety of applications such as direction-of-arrivalestimation[8],[9],collaborativespectrumsensing[10–12] andmotion detection [13]. A signal matrix Y = [y1,· · ·,yL] Rn×L is called k-group sparse if k rowsofYarenonzero.AmeasurementmatrixM= [m1,· · ·,mL]Rm×Listakenfrom linearprojectionsmi=A(i)yi,i= 1,· · ·,L,whereA(i)Rm×n isasensingmatrix.In ordertorecoverYfromA(i)’sandM,thestandard2,1-normminimizationformulation proposesto solveaconvexprogram

(3)

minY Y2,1,

s.t. M= [A(1)y1,· · ·,A(L)yL]. (4) Thisisastraightforwardextensionfromthecanonical1-normminimizationformulation thatrecoversasparsevector.Theoreticalguaranteeofexactrecoveryhasbeendeveloped based on the restricted isometric property (RIP) of A(i)’s [14]. The performance of exploiting more structures of Y by simultaneously minimizing the 2,1-norm and the nuclear normisanalyzedin[15].

Consider that in practice the measurements are often corrupted by random noise, resultinginM= [A(1)y1,· · ·,A(L)yL]+Nwhere N= [n1,· · ·,nL]Rm×L isanoise matrix.To addressthenoise-corruptedcase,thegrouplassomodelin[3] solves

minY,E Y2,1+γN2F,

s.t. M= [A(1)y1,· · ·,A(L)yL] +N, (5) whereγisanonnegativeparameterandNFistheFrobeniusnormofN.Analternative to (5) is

minY Y2,1,

s.t. M[A(1)y1,· · ·,A(L)yL]2F ≤ε2, (6) where ε controls the noise level. It has been shown in [14] that if the sensing ma- trices A(i)’s satisfy RIP, then the distance between the solution to (6) and the true signal matrix,whichis measuredbythe Frobeniusnorm, is withinaconstantmultiple of ε.

Theexactrecovery guaranteefor(6) is elegant,butworksonlyifthenoiselevelεis nottoo large.However,inmanypracticalapplications,someofthe measurementsmay be seriously contaminated or even missing due to uncertainties such as sensor failures and transmission errors. Meanwhile, this kind of measurement errors are often sparse (see [16] for detailed discussions). In this case, the exact recovery guarantee does not hold andthesolutionof(6) canbe farawayfrom thetruesignalmatrix.

Theneedofhandlinglargebutsparsemeasurement errorsinthegroupsparsesignal recoveryproblemmotivatestheRGLmodel(2)–(3),whichhasfoundsuccessfulapplica- tionsin,forexample,thecognitivenetworksensingproblem[16].Efficientdecentralized algorithms solving (2)–(3) are proposed in [17] and their performances are evaluated viaextensivesimulations. In(2)–(3),themeasurement matrixMis contaminatedbya sparseerrormatrixS= [s1,· · ·,sL]Rm×Lwhosenonzeroentriesmightbeunbounded.

Throughsimultaneouslyminimizingthe2,1-normofYandthe1normofS,weexpect to recoverthegroupsparsesignalmatrixYandthesparse errormatrixS.

The RGL model(2)–(3) isclosely related to robustlasso and robustprinciple com- ponent analysis (RPCA),both ofwhichhavebeen provedeffectivelyinrecovering true

(4)

signalfromsparse grosscorruptions.Therobustlassomodel,whichhasbeen discussed extensivelyin[18],[19],[20],minimizes1-normofasparsesignalvectorandthe1-norm ofasparseerrorvectorsimultaneouslyinordertogetridofsparsecorruptions.Whereas theRPCA model,whichisfirstproposedin[21] andthenextendedby[22] and[23],re- coversalowrankmatrixbyminimizingthenuclearnormofsignalmatrixplus1-norm ofsparseerrormatrix.

1.2. Contribution andpaperorganization

Thispaperprovesthatwithhighprobability,theproposedRGLmodel(2)–(3) exactly recoversthegroupsparsesignalmatrixandthesparseerrormatrixsimultaneouslyunder certainrestrictionsonmeasurements foraverygeneralclassofsamplematrices.

The rest of this paper is organized as follows. Section 2 provides the main result (see Theorem 1) on the recoverability of the RGL model (2)–(3) under assumptions onthe sensingmatrices and thetrue signaland errormatrices (see Assumptions1–4).

Section 2 also introduces several supporting lemmas and corollaries (see Lemmas 1–4 andCorollaries1–2).Section3givesthedualcertificate of(2)–(3),whichisasufficient condition guaranteeingthe exact recovery from the RGLmodel with high probability.

Theirproofs are basedon two supportinglemmas (see Lemmas 5–6). Section4proves thattheinexactdualcertificateof(2)–(3) canbesatisfiedthroughaconstructivemanner (seeTheorem 3and Lemma7).This way, weprovethe main resultgiveninSection2.

Section5concludesthepaper.

1.3. Notations

We introduce several notations that are used in the subsequent sections. Bold up- percase letters denote matrices, whereas bold lowercase letters with subscripts and superscripts stand for column vectors and row vectors, respectively. For a matrix U, wedenoteui asitsi-thcolumn,uiasitsj-throw,anduij asits(i,j)-thelement.Fora givenvectoru,wedenoteui asitsi-thelement.Thenotations{U(i)}and{u(i)}denote thefamilyofmatricesandvectorsindexedbyi,respectively.Thenotations{U(i,j)}and {u(i,j)}denote the familyof matricesand vectorsindexed by (i,j), respectively.vec(·) is thevectorizing operator thatstacks the columns of amatrix oneafter another. {·}

denotesthetransposeoperator.diag{·}representsadiagonalmatrixandBLKdiag{·}

representsablockdiagonal matrix.Thenotation·,·denotestheinner product,when applyingto two matrices Uand V. sgn(u) and sgn(U) aresign vectorand signmatrix foruand U,respectively.

Additionally,weuseseveralstandardmatrixandvectornorms. ForavectoruRn, define

2-norm: u2=n j=1u2j.

1-norm: u1=n j=1|uj|.

(5)

ForamatrixURm×n,define

2,1-norm: U2,1=m i=1

n j=1u2ij.

2,-norm: U2,= maxin j=1u2ij.

1-norm:U1=m i=1

n j=1|uij|.

• Frobeniusnorm:UF =m i=1

n j=1u2ij.

-norm:U= maxi,j|uij|.

Also,we usethenotationU(p,q)to denoteinducednorms,whichstandsfor

U(p,q)= max

x∈Rn

Uxp

xq

.

For the signalmatrix Y Rn×L and noisematrix SRm×L, we usethe following set notationsthroughoutthepaper.

T:Therowgroupsupport(namely,theset ofrowcoordinatescorresponding tothe nonzerorowsofthesignalmatrix)whosecardinalityisdenotedaskT =|T|.

Tc:ThecomplementofT (namely,{1,· · ·,n}\T).

• Ω:Thesupportoferrormatrix(namely,thesetofcoordinatescorrespondingtothe nonzeroelementsoftheerrormatrix)whosecardinalityisdenotedaskΩ=|Ω|.

• Ωc:ThecomplementofΩ (namely,{1,· · ·,n}× {1,· · ·,L}\Ω).

• Ωi:Thesupportofthei-thcolumntheerrormatrixwhosecardinalityisdenotedas kΩi =|Ωi|.

• Ωci:ThecomplementofΩi (namely,{1,· · ·,n}\Ωi).

• Ωi: An arbitrary fixed subset of Ωci with cardinality m −kmax, where kmax = maxikΩi. Intuitively, Ωi standsfor the maximal non-corrupted set across different i∈ {1,· · ·,L}.

ForanygivenmatricesURm×L,VRn×LandgivenvectorsuRm,vRn,define orthogonal projectionoperators asfollows.

PΩU:Theorthogonal projectionofmatrixUontoΩ (namely,set everyentry ofU whosecoordinatebelongstoΩc as 0whilekeepother entriesunchanged).

PΩiu,PΩciu,PΩiu:TheorthogonalprojectionsofuontoΩici,andΩi,respectively.

PTv:TheorthogonalprojectionofvontoT.

PΩiU, PΩciU, and PΩiU: The orthogonal projections of each column of U onto Ωi, Ωci, and Ωi, respectively (namely, PΩiU = [PΩiu1,· · ·,PΩiuL], PΩciU = [PΩciu1,· · ·,PΩciuL] andPΩiU= [PΩiu1,· · ·,PΩiuL]).

PTV:TheorthogonalprojectionofeachcolumnofVontoT.

(6)

Furthermore,we admitanotationalconventionthatforanyprojectionoperatorP and correspondingmatrixU(or vectoru),itholds

UP= (PU)(oruP= (Pu)).

Finally, by saying an event occurs with high probability, we mean that the occurring probabilityoftheeventisat least1−C(nL)1 where Cisaconstant.

2. Mainresultofexactrecovery

ThissectionprovidesthetheoreticalperformanceguaranteeoftheRGLmodel(2)–(3).

Section2.1makesseveralassumptionsunderwhich(2)–(3) recoversthetruegroupsparse signaland sparse error matrices with high probability.The main result is summarized inTheorem 1. Section2.2 discussesseveralrelatedresultsandapplications.Section2.3 givesseveralprobabilitytoolsthatareuseful intheproofofthemain result.

2.1. Assumptions andmainresult

Westartfrom severalassumptionsonthesensingmatrices,aswell asthetruegroup sparsesignalandsparseerrormatrices.

Consider L distributions {Fi}Li=1 in Rn and an independently sampled vector a(i) fromeachFi.Thecorrelationmatrixisdefinedas

Σ(i)=E

a(i)a(i) ,

andthecorrespondingconditionnumberisboundedas follows

λmax{Σ(i)}

λmin{Σ(i)} ≤κ, ∀i∈ {1,2,· · ·, L},

where κ is a positive constant and λmax{·}, λmin{·} denote the largest and smallest eigenvaluesofamatrix,respectively.Observethatthisconditionnumberisfiniteifand onlyifthecovariancematrixisinvertible,andislargerthanor equalto1inany case.

Assumption1.Fori= 1,· · ·,L,define thei-thsensingmatrixas

A(i) 1

√m

⎜⎜

a(i)1

... a(i)m

⎟⎟

Rm×n.

Therein, {a(i)1,· · ·,a(i)m}is assumed to be asequence of i.i.d. random vectorsdrawn fromthedistributionFi inRn.

(7)

By Assumption 1, we suppose that everysensing matrix A(i) is randomly sampled from acorresponding distributionFi. Weproceed to assumethe propertiesof thedis- tributions {Fi}Li=1.

Assumption 2.Fori= 1,· · ·,L,everydistribution Fi satisfythefollowing twoproper- ties.

• Completeness:ThecorrelationmatrixΣ(i)isinvertible.

• Incoherence:Each sensingvector a(i)sampled fromFi satisfies

j∈{max1,···,n}|a(i),ek| ≤√

μ, (7)

j∈{1,···max,n}|Σ−1(i)a(i),ek| ≤√

μ, (8)

forsomefixed constantμ≥1,where{ek}nk=1 isthestandardbasisinRn.

We call μ as the incoherence parameter. Note that this incoherence condition is stronger than the one originally presented in [26], which does not require (8). If one wants to get rid of (8), then some other restrictions must be imposed on the sensing matrices (see[27] forrelatedresults).

Observe thatthebounds (7) and (8) inAssumption 2are meaninglessunless wefix thescaleofa(i).Thus,we havethefollowingassumption.

Assumption 3.ThecorrelationmatrixΣ(i) satisfies

λmax{Σ(i)}=λmin{Σ(i)}1, (9) forany Fi, i= 1,· · ·,L.

GivenanycompleteFi,(9) canalwaysbeachievedbyscalinga(i)upordown.Thisis truebecauseifwescalea(i)up,thenλmax{Σ(i)}increasesandλmin{Σ(i)}−1 decreases.

Observethattheoptimizationproblem(2)–(3) isinvariantunderscaling.Thus,Assump- tion3doesnotposeany extraconstraint.

Denote YandSasthetruegroupsparsesignalandsparseerrormatricestorecover, respectively.TherowgroupsupportofYandthesupportofSarefixedanddenotedas T and Ω,respectively.

Theassumption onYandSisgiven asfollows,

Assumption 4.The true signal matrixY and error matrixS satisfy the following two properties.

• Random sign: Thesignsof theelements of Y andS arei.i.d. and equallylikelyto be+1 or1.

(8)

• Row incoherence: Letyi be the i-th rowof Y. Then, there exists afixed constant ν 1 sothat

l∈{1,2,max···,L}

yi yi2

,el

ν

L,

forany i∈T andl∈ {1,2,· · ·,L},where{el}Ll=1 isastandardbasisinRL. Under the assumptions stated above, we have the following main theorem on the recoverabilityoftheRGLmodel(2)–(3).

Theorem1.Letθ∈[1,L] beatradeoffparameter.Under Assumptions 1–4,thesolution pair(Y,ˆ S)ˆ to theoptimization problem (2)–(3)is exact and uniquewith probability at least1(16+ 2e14)(nL)1,providedthat λ=

θ Llog(nL),

kT ≤α m

max κ,νθ

μlog2(nL), kΩ≤βmL

μθ , kmax≤γm

κ, (10)

wherekmaxmaxikΩi,and α≤ 96001 ,β≤ 31361 ,γ≤ 14 areallpositiveconstants.1 Notethatfrom(10),theconstantθ isindeedatrade-offparameter.Choosingθlarge relaxestheconstraintonrowsupportsbutrestrictsthenumberofsparseerrors,andvice versa.

Theincoherenceconditionofsensingvectors(Assumption2)iscommoninlassoand robust lasso literatures, which implies that the columns of [A(1)y1,· · ·,A(L)yL] are not aligned with sparse vectors. On the other hand, the row incoherence (Assump- tion 4) is unique in the current group lasso context. It indicates that the rows of [A(1)y1,· · ·,A(L)yL] are not aligned with sparse vectors either. Note that this con- dition is trivial when ν = L, in which cases choosing θ = L in Theorem 1 gives the measurementboundmL≥ O(kTLlog2(nL)) andm≥ O(kΩ).IngeneralwhenLislarge andtherowsofthetruesignalmatrixY aredense,wecouldhaveν L,inwhichcase choosingθcloseto1resultsinanimprovedmeasurementboundmL≥ O(kTLlog2(nL)) andmL≥ O(kΩ).Formatrixrecoveryliteratures,similartwo-wayincoherenceassump- tionsareadopted.Forexample,inRPCA([4],[21]),thesignalmatricesareassumedto haveboth leftandright singularvectorsbeing incoherentwithsparsevectors.

Finally,notice thatY and Sare assumed to havefixed supportsbutrandom signs.

This assumption is commonly adopted inthe RIPless analysis of lasso ([19], [26]) and seemstobecrucialintheproof.Alternatively,onecouldalwaysavoidassumingrandom signs of Y by taking the sensing matrices as A(i)D(i), i = 1,2,· · ·,L, where A(i) is

1 Theboundsonα,β,γarechosensuchthatalltherequirementsontheseconstantsinthesubsequent lemmasandtheoremsaremet.

(9)

thesameasaboveandD(i)Rn×n isadiagonalmatrixwitheachdiagonalentryi.i.d.

and equallylikelyto be +1 or 1.It is still an open problem whetheror not onecan completely getridofthis randomsignassumption.

2.2. Relatedworks andapplications 2.2.1. Examples

In this section, we provide several examples demonstrating that our sensing model coversawiderangeofrandommeasurements.

• Random Fouriermeasurements: An importantapplicationfor oursensingmodelis therandomsubsampleddiscreteFouriertransform(DFT)matrix.Considerann×n matrixWwitheachentry

Wωt =e−i2πωt/n, w, t∈ {0,1,· · ·, n−1}.

Let A(i) Cm×n be a sensing matrix so that each row is sampled uniformly at random from the rows of W.2 It can be easily verified that E

a(i)a(i)

= I and theincoherenceparameterμ= 1.Suchsensing modelarises invarious applications including magnetic resonance imaging and collaborative spectrum sensing ([11]).

Moregenerally,anyrandomrowsub-matrixA(i)ofanarbitraryboundedorthogonal matrixW (withincoherent rows)fits into ourmodel.This includessampling from theclassofHadamardmatrices(orthogonalmatriceswithallentries+1 or 1).

• Random sampling from frames: A frame is a set of vectors {uk}Kk=1 Rn which satisfiesthefollowingrelation: ThereexistpositiveconstantsC1 andC2suchthat

C1x22 1 K

K k=1

uk,x2≤C2x22, xRd.

This can be viewed as ageneralization of orthogonal systemswithout linear inde- pendence.Furthermore,assumethateachuk satisfiestheincoherencecondition.Let A(i)Rm×n be asensingmatrixso thateachrowissampled uniformlyat random from theset {uk}Kk=1. This sampling modelalsofits intoour formulation.Random measurements of this kind arise in Fourier transform with acontinuous frequency spectrum([31])as wellas waveletframebasedreconstruction([32]).

2.2.2. Related results

WeseefromTheorem1thatwhenthesignalmatrixYissufficientlygroupsparseand theerrormatrixSsufficientlysparse, thenwithhigh probabilitywe areabletoexactly

2 OurmodelisdevelopedinRwhilethisexampleisoverC,allourdefinitionsandresultscanbegeneralized tocomplexnumberswithlittlechanges.

(10)

recoverbothofthembysolvingaconvexprogram.Similarresultsonstructuredrecovery havebeen establishedpreviouslybyposingmorerestrictionsonthemeasurementssuch as Gaussian ([29]) or orthogonality ([30]). Here, we prove the performance guarantee underamoregeneralsensingmodel.

NotethatthetotalnumberofmeasurementsismL.Thus,Theorem1givesthemea- surement bounds mL ≥ O(kTLlog2(nL)) and mL ≥ O(kΩ). Specifically, by setting L = 1, this result meets the measurement bounds in the robust lasso model (for ex- ample, [19]), where the number of measurements m ≥ O(klog2p) and m ≥ O(kΩ) guaranteesthehighprobabilityrecoveryofanp-dimensionalk-sparsesignalvectorand anm-dimensionalkΩ-sparseerrorvectorsimultaneously.

Theorem1isaresultofRIPlessanalysis,whichsharesthesamelimitationasallother RIPlessanalyses. Tobe specific,Theorem 1onlyholds forarbitrarybutfixed Y andS (exceptthattheelementsofYandShaveuniformrandomsignsbyAssumption4).Ifwe expecttohaveauniformrecovery guaranteehere(namely,consideringrandomsensing matricesaswellassignalanderrormatriceswithrandomsupports),thencertainstronger assumptionsmustbe madeonthesensingmatricessuchas theRIPcondition.

The proof of Theorem 1is based on the construction of an inexact dual certificate throughthegolfingscheme.Thegolfingschemewasfirstintroducedin[24] forlowrank matrix recovery. Subsequently, [26] and [27] refined and used the scheme to prove the lasso recovery guarantee. The work [19] generalized it to mix-norm recovery. In this paper, weconsider anew mix-normproblem,namely, summationofthe 2,1-norm and the1-norm.

2.3. Matrixconcentrationinequalities

Belowwegiveseveralprobabilitytoolsthatareusefulintheproofsofthepaper.We begin with two versions of Bernstein inequalities from [25]. The first one is a matrix Bernsteininequality.

Lemma1(MatrixBernstein inequality). Consider afinite sequenceof independentran- dommatrices{M(j)Rd×d}.Assumethat every random matrixsatisfies E

M(j)

= 0 andM(j)(2,2)≤B almostsurely. Define

σ2max

⎧⎪

⎪⎩

j

E

M(j)M(j)

(2,2)

,

j

E

M(j)M(j)

(2,2)

⎫⎪

⎪⎭.

Then, forallt≥0,wehave

P r

⎧⎪

⎪⎩

j

M(j) (2,2)

≥t

⎫⎪

⎪⎭2dexp

t2/2 σ2+Bt/3

.

(11)

Wealsoneedavectorformof theBernstein inequality.

Lemma 2(Vector Bernstein inequality).Consider a finite sequenceof independent ran- dom vectors {g(j) Rd}. Assume that every random vector satisfies E

g(j)

= 0 and g(j)2≤B almostsurely. Defineσ2

kE g(j)22

.Then,forall0≤t≤σ2/B,we have

P r

j

g(j) 2

≥t

≤exp

t22+1

4

.

Next, we use the matrix Bernstein inequality to prove its extension on a block anisotropicmatrix.

Lemma 3.Consider a matrix A(i) satisfying the model described in Section 2.1, and denoteA˜(i)=Σ(i)1A(i)PΩiA(i).Forany τ >0,itholds

P r PT

m m−kmax

A˜(i)Σ−1(i) Σ−1(i)

PT

(2,2)

≥τ

!

2kTexp

−m−kmax

κkTμ

τ2 4(κ+3 )

,

and

P r PT

m m−kmax

A˜(i)I

PT

(2,2)

≥τ

!

2kTexp

−m−kmax

κkTμ

τ2 4(1 +3)

.

We prove the first part in Appendix A, and the second part can be proved in a similar way. Two consequent corollaries of Lemma 3 show that the restriction of

m

mkmaxBLKdiagA˜(1),· · ·,A˜(L)

tothecorresponding supportT isnearisometric.

Corollary 1.Denote A˜(i)=Σ−1(i)A(i)PΩiA(i).Given kT ≤αμκlog(nL)m ,kmax≤γm, and

1γ

α 64,then withprobability atleast12(nL)−2,wehave BLKdiag

"

PT

m m−kmax

A˜(1)I

PT, · · ·, PT

m m−kmax

A˜(L)I

PT

#

(2,2)

<1 2. (11) Furthermore,givenkT ≤αμκlogm2(nL),kmax≤γm,and 1−γα 64,withatleastthesame probability,we have

(12)

BLKdiag

"

PT

m

m−kmaxA˜(1)I

PT, · · ·, PT

m

m−kmaxA˜(L)I

PT

#

(2,2)

< 1 2$

log(nL). (12) Proof. First, following directlyfrom thefirst partof Lemma3, for alli = 1,· · ·,L, it holds

P r PT

m

m−kmaxA˜(i)I

PT

(2,2)

≥τ

!

2kTexp

"

−m−kmax

kTμκ

τ2 4(1 +3)

# . (13) Takingaunionboundoveralli= 1,· · ·,Lyields

P r BLKdiag

"

PT

m m−kmax

A˜(1)I

PT, · · ·,

PT

m m−kmax

A˜(L)I

PT

#

(2,2)

≥τ

!

=P r max

i PT

m m−kmax

A˜(i)I

PT

(2,2)

!

≥τ

!

L i=1

P r PT

m m−kmax

A˜(i)I

PT

(2,2)

≥τ

!

2kTLexp

"

−m−kmax kTμκ

τ2 4(1 +3)

#

. (14)

Plugginginτ= 12 andusingthefactthatkT ≤αμκlog(nL)m andkmax≤γm,weget

The last line of (14) = 2kTLexp

"

3(1−γ)

64α log(nL)

#

= 2kTL(nL)3(1−γ)64α ,

2kTL(nL)−32(nL)−2

wherethefirstinequalityfollowsfrom 1−γα 64 andthesecondinequalityfollowsfrom kT ≤n.Similarly,plugging inτ = 1

2$

log(nL) and usingthefactthatkT ≤αμκlogm2(nL), weprove(12) aslongas 1αγ 64. 2

Corollary 2.Given that kT ≤αμκlog(nL)m , kmax≤γm,and 1−γα 64, then with proba- bilityatleast 12(nL)2,wehave

(13)

BLKdiag

"

PT

m

m−kmaxA˜(1)Σ−1(1)Σ−1(1)

PT,

· · ·, PT

m m−kmax

A˜(L)Σ(L)1 Σ(L)1

PT

#

(2,2)

< κ

2. (15)

Theproofisthesameasproving(11) usingLemma3.Weomitthedetailsforbrevity.

Finally, we havethe following lemma show thatif the supportof thecolumns inA(i) is restricted to Ωi,then no columnindexed insideT canbe well approximated by the columnindexedoutsideofT.Inotherwords,thosecolumnscorrespondtothetruesignal matrixshallbe welldistinguished.

Lemma 4 (Off-support incoherence). Denote A˜(i) = Σ−1(i)A(i)PΩiA(i). Given kT αμκlog(nL)m andα < 241,withprobability atleast1−e14(nL)2,wehave

i∈{1,···max,L},kTc

PTA˜(i)ek

21, (16)

where {ek}nk=1 isastandardbasisin Rn.

Theproof ofLemma4isgiveninAppendixB.

With particular note, inthe above lemmas and corollaries, all the requirements on theconstants α, β andγ satisfytheboundsinTheorem1.

3. Inexactdualcertificates

This sectiongivesthedualcertificatesoftheRGLmodel,namely,thesufficientcon- ditions under whichthe optimal solutionpair ofthe convex program (2)–(3) is unique andequaltothepairofthetruesignalanderrormatrices.Firstwehavetwopreliminary lemmas.

Lemma 5.Suppose that Y Rn×L and S Rm×L are the true group sparse signal and sparseerrormatrices,respectively.If (Y+H,SF)isanoptimalsolution pairto (2)–(3),where HRn×L andFRm×L,thenthefollowingresultshold:

i)

A(1)h1,· · ·,A(L)hL

=F;

ii) Y+H2,1+λSF1 Y2,1+λS1+PTcH2,1+λPΩcF1+V,H λsgn(S),F,

where V Rn×L satisfies (PTV)i = y¯y¯ii2 and (PTcV)i = 0, ∀i = 1,· · ·,n. Here (PTV)i denotes thei-throw ofPTVand ¯yi denotesthei-th rowof Y.

Theproof ofLemma5isgiveninAppendixC.

(14)

Lemma 6.For any two matrices H Rn×L and F Rm×L, with probability at least 12n−2,H=0andF=0ifthefollowingconditions aresatisfied:

i) kT ≤αμκlogm2(nL),kmax≤γm,and 1−γα 64;

ii)

A(1)h1,· · ·,A(L)hL

=F;

iii) PTcH=0and PΩcF=0.

TheproofofLemma6isgiveninAppendixD.

Belowwe showthattheoptimal solutionpair( ˆY, S)ˆ of theconvexprogram (2)–(3) isequaltothetruesignalandnoisepair(Y,S)whencertaincertificateconditionshold.

Theorem 2 (Inexact duality).Suppose that Y Rn×L and S Rm×L are the true groupsparse signalandsparse errormatrices satisfyingtheassumptions inTheorem1.

The pair (Y, S) is the unique solution to the RGL model (2)–(3) with probability at least 1(2+e14)(nL)1, if the parameter λ < 1 and there exists a dual certificate (W,V)Rm×L×Rn×L suchthat

PTVVF λ 4

κ, (17)

PTcV2, 1

4, (18)

PΩcW λ

4, (19)

and V=

A(1)PΩc1w1,· · ·,A(L)PΩcLwL +λ

A(1)sgn(¯s1),· · ·,A(L)sgn(¯sL)

, (20) whereVRn×L satisfies(PTV)i =¯yy¯ii2 and (PTcV)i=0.

Proof. Supposethat(Y+H,SF) isanoptimalsolutionpairto(2)–(3).Therefore,the two resultsinLemma5hold true. Proving thatthepair (Y,S)is the uniquesolution to (2)–(3) is equivalentto showing thatH=0andF=0. Hence,the proofresortsto verifyingthethree conditionsinLemma6.

SinceYandSsatisfytheassumptionsinTheorem 1,wehave kT ≤α m

μκlog2(nL), kmax≤γm

κ, α≤ 1

9600, γ≤1 4.

Consideringκ≥1,weknowthatconditioni)inLemma6holds.Byresulti)ofLemma5, conditionii)alsoholds.Therefore,itremainstoverifyconditioniii), namely,PTcH=0 andPΩcF=0.

(15)

ConsiderthetermV,H−λsgn(S),Finresultii)ofLemma5.Usingtheequation V=PTV+PTcV,we rewritethetermas

V,H −λsgn(S),F=V− PTV,H − PTcV,H+V,H −λsgn(S),F. (21) ConsiderthetermV,H−λsgn(S),Fontherighthandsideof(21).By(20),wehave

V,H −λsgn(S),F

=%

A(1)PΩc1w1,· · ·,A(L)PΩcLwL ,H&

+λ%

A(1)sgn(¯s1),· · ·,A(L)sgn(¯sL) ,H&

−λsgn(S),F. Byadjointrelationsofinnerproducts, wehave

%

A(1)PΩc1w1,· · ·,A(L)PΩcLwL

,H

&

='

PΩc1A(1)h1,· · ·,PΩcLA(L)hL

,W( , and

A(1)sgn(¯s1),· · ·,A(L)sgn(¯sL) ,H

=sgn(S),

A(1)h1,· · ·,A(L)hL

.

Thus, itholds

V,H −λsgn(S),F

='

PΩc1A(1)h1,· · ·,PΩcLA(L)hL ,W( +λ'

sgn(S),

A(1)h1,· · ·,A(L)hL

(−λsgn(S),F.

Accordingto conclusioni)inLemma5,whichis

A(1)h1,· · ·,A(L)hL

=F,itfollows V,H −λsgn(S,F)=PΩcF,W.

Combining (21) andaboveequalitygives

V,H −λsgn(S),F=V− PTV,H − PTcV,H+PΩcF,W. (22) Next, we manage to find outa lower bound for the right-hand side of the equality (22).First,by(17),

V− PTV,H ≥ −PTVVFPTHF ≥ − λ 4

κPTHF.

(16)

Then,by(18),

−PTcV,H ≥ −PTcV2,∞PTcH2,1≥ −1

4PTcH2,1. Finally,by(19),

PΩcF,W ≥ −PΩcF1PΩcW≥ −λ

4PΩcF1. Therefore,(22) gives

V,H −λsgn(S),F ≥ − λ 4

κPTHF1

4PTcH2,1−λ

4PΩcF1. Substitutetheaboveinequalityinto conclusionii) ofLemma5gives

Y+H2,1+λSF1

Y2,1+λS1+3

4PTcH2,1+3λ

4 PΩcF1 λ 4

κPTHF.

Since(Y+H,SF) isanoptimalsolutionpairto(2)–(3),itfollowsthatY+H2,1+ λSF1Y2,1+λS1.Hence,wehave

3

4PTcH2,1+3λ

4PΩcF1 λ 4

κPTHF 0. (23) Tocomplete theproof, we needto show thatinequality(23) impliesPTcH=0 and PΩcF=0.Todoso,wefirstderive anupperboundforPTHF,expressed asalinear combinationofPTcH2,1 andPΩcF1.

Using(11),itfollows BLKdiag

"

PT

m m−kmax

A˜(1)I

PT,· · ·,PT

m m−kmax

A˜(L)I

PT

#

vec(H) 2

1

2PTHF. Since BLKdiag{PT,· · ·,PT}vec(H)2 = PTHF, applying the triangle inequality yields

PTHF 2 m

m−kmax

BLKdiag

PTA˜(1)PT,· · ·,PTA˜(L)PT

vec(H) 2

.

Observingvec(PTH)= vec(H)vec(PTcH) andusingthetriangleinequalityagain,we have

Références

Documents relatifs

In this paper, we survey and compare different algorithms that, given an over- complete dictionary of elementary functions, solve the problem of simultaneous sparse

To estimate the support of the parameter vector β 0 in the sparse corruption problem, we study an extension of the Lasso-Zero methodology [Descloux and Sardy, 2018],

In the first scenario, we focus on the mean estimation, m, by comparing the ECME proposed algorithm with the minimum covariance determinant [16], the Marona’s or-

The paper is organized as follows: we show in Section 2 that if the size of the dictionary is not bounded, then dictionary learning may be naturally cast as a convex

In this section, we present experiments on natural images and genomic data to demonstrate the efficiency of our method for dictionary learning, non-negative matrix factorization,

Pr. BOUHAFS Mohamed El Amine Chirurgie - Pédiatrique Pr. BOULAHYA Abdellatif* Chirurgie Cardio – Vasculaire Pr. CHENGUETI ANSARI Anas Gynécologie Obstétrique. Pr. DOGHMI Nawal

We investigate bipartite matching algorithms for com- puting column permutations of a sparse matrix to achieve two objectives: (i) the main diagonal of the permuted matrix has

Thus, we obtain three variants of the problem of finding an optimal sequential partitioning of a matrix: Given A, L, U , and possibly K, find a (K-way if K is given) partitioning