Robust group lasso: Model and recoverability

(1)

Contents lists available atScienceDirect

Linear Algebra and its Applications

www.elsevier.com/locate/laa

Robust group lasso: Model and recoverability

^✩

Xiaohan Wei^a,∗, Qing Ling^b, Zhu Han^c

a DepartmentofElectricalEngineering,UniversityofSouthernCalifornia,United Statesof America

b Schoolof DataandComputerScience,SunYat-SenUniversity,China

cDepartmentofElectricalandComputerEngineering,UniversityofHouston, UnitedStatesofAmerica

a r t i c l e i n f o a bs t r a c t

Article history:

Received19November2016 Accepted30June2018 Availableonlinexxxx SubmittedbyH.Rauhut MSC:

60D05 94A12 Keywords:

Compressedsensing Groupsparsity Sparsenoise

Restrictedisometryproperty Golﬁngscheme

Thispaperconsiderstheproblemofrecoveringagroupsparse signal matrix Y = [y1,· · ·,y_L] from sparsely corrupted measurements M= [A(1)y1,· · ·,A(L)yL]+S,whereA(i)’s are known sensing matrices and S is an unknown sparse errormatrix.Arobustgrouplasso(RGL)modelisproposed to recover Y andSthrough simultaneouslyminimizing the 2,1-normofYandthe1-normofSunderthemeasurement constraints.WeprovethatYandScanbeexactlyrecovered fromtheRGLmodelwithhighprobabilityforaverygeneral classofA_(i)’s.

✩ Qing Ling’s research is partially supported by National Science Foundation of China under Grant 61573331andbytheNationalScienceFoundationAnhuiunderGrant1608085QF130.ZhuHan’sresearch ispartiallysupportedbyUSMURI,NSFCNS-1717454,CNS-1731424,CNS-1702850,CNS-1646607.

* Correspondingauthor.

E-mailaddresses:[email protected](X. Wei),[email protected](Q. Ling),[email protected] (Z. Han).

https://doi.org/10.1016/j.laa.2018.06.034

(2)

1. Introduction

Considertheproblem ofrecovering agroupsparsematrixY= [y₁,· · ·,y_L]∈R^n×L fromsparsely corruptedmeasurements

M= [A₍₁₎y₁,· · ·,A_(L)y_L] +S, (1) where M = [m₁,· · ·,m_L]∈ R^m×L is a measurement matrix, A_(i)∈ R^m×n is thei-th sensingmatrix, and S= [s₁,· · ·,s_L]∈ R^m×L is an unknown sparse errormatrix. The error matrix S is sparse as it has only a small number of nonzero entries. The signal matrixYisgroupsparse,meaningthatYissparseanditsnonzeroentriesappearina smallnumberofcommonrows.

Given M and A(i)’s, our goal is to recover Y and S from the linear measurement equation(1).Inthispaper,weproposetoaccomplishtherecoverytaskthroughsolving thefollowingrobustgrouplasso(RGL) model

minY,S Y2,1+λS1, (2)

s.t. M= [A₍₁₎y₁,· · ·,A_(L)y_L] +S. (3) Denoting yij and sij as the (i,j)-th entries of Y and S, respectively, Y2,1 n

i=1

L

j=1y²_ij is deﬁned as the 2,1-norm of Y and S1 m i=1

L

j=1|sij| is de- ﬁnedas the1-norm ofS.Minimizingthe2,1-normtermpromotesgroupsparsityofY whileminimizingthe1-normtermpromotessparsityofS;λisanonnegativeparameter to balance the two terms. We prove that solving the RGL model (2)–(3), which is a convex program, enables exact recovery of Y and S with high probability, given that A_(i)’ssatisfycertainconditions.

1.1. From grouplasso torobustgroup lasso

Sparsesignal recovery has attractedmuch research interest inthe signal processing andoptimizationcommunities duringthepastfewyears.Various sparsity modelshave beenproposed tobetterexploitthesparsestructures ofhigh-dimensionaldata, suchas sparsity ofa vector[1], [2], groupsparsity of vectors[3], and low-ranknessof amatrix [4]. For moretopics relatedto sparse signalrecovery,readers arereferredto therecent surveypaper[5].

Inthis paperwe areinterestedinthe recoveryof groupsparse(also knownasblock sparse [6] or jointly sparse [7]) signals which ﬁnds a variety of applications such as direction-of-arrivalestimation[8],[9],collaborativespectrumsensing[10–12] andmotion detection [13]. A signal matrix Y = [y₁,· · ·,y_L] ∈ Rⁿ^×^L is called k-group sparse if k rowsofYarenonzero.AmeasurementmatrixM= [m₁,· · ·,m_L]∈R^m^×^Listakenfrom linearprojectionsmi=A(i)yi,i= 1,· · ·,L,whereA(i)∈R^m^×ⁿ isasensingmatrix.In ordertorecoverYfromA(i)’sandM,thestandard2,1-normminimizationformulation proposesto solveaconvexprogram

(3)

minY Y2,1,

s.t. M= [A₍₁₎y1,· · ·,A_(L)yL]. (4) Thisisastraightforwardextensionfromthecanonical1-normminimizationformulation thatrecoversasparsevector.Theoreticalguaranteeofexactrecoveryhasbeendeveloped based on the restricted isometric property (RIP) of A_(i)’s [14]. The performance of exploiting more structures of Y by simultaneously minimizing the _2,1-norm and the nuclear normisanalyzedin[15].

Consider that in practice the measurements are often corrupted by random noise, resultinginM= [A₍₁₎y1,· · ·,A_(L)yL]+Nwhere N= [n1,· · ·,nL]∈R^m×L isanoise matrix.To addressthenoise-corruptedcase,thegrouplassomodelin[3] solves

minY,E Y2,1+γN²F,

s.t. M= [A₍₁₎y1,· · ·,A_(L)yL] +N, (5) whereγisanonnegativeparameterandNFistheFrobeniusnormofN.Analternative to (5) is

minY Y2,1,

s.t. M−[A₍₁₎y1,· · ·,A_(L)yL]²F ≤ε², (6) where ε controls the noise level. It has been shown in [14] that if the sensing matrices A_(i)’s satisfy RIP, then the distance between the solution to (6) and the true signal matrix,whichis measuredbythe Frobeniusnorm, is withinaconstantmultiple of ε.

Theexactrecovery guaranteefor(6) is elegant,butworksonlyifthenoiselevelεis nottoo large.However,inmanypracticalapplications,someofthe measurementsmay be seriously contaminated or even missing due to uncertainties such as sensor failures and transmission errors. Meanwhile, this kind of measurement errors are often sparse (see [16] for detailed discussions). In this case, the exact recovery guarantee does not hold andthesolutionof(6) canbe farawayfrom thetruesignalmatrix.

Theneedofhandlinglargebutsparsemeasurement errorsinthegroupsparsesignal recoveryproblemmotivatestheRGLmodel(2)–(3),whichhasfoundsuccessfulapplica- tionsin,forexample,thecognitivenetworksensingproblem[16].Eﬃcientdecentralized algorithms solving (2)–(3) are proposed in [17] and their performances are evaluated viaextensivesimulations. In(2)–(3),themeasurement matrixMis contaminatedbya sparseerrormatrixS= [s₁,· · ·,s_L]∈R^m×Lwhosenonzeroentriesmightbeunbounded.

Throughsimultaneouslyminimizingthe_2,1-normofYandthe₁normofS,weexpect to recoverthegroupsparsesignalmatrixYandthesparse errormatrixS.

The RGL model(2)–(3) isclosely related to robustlasso and robustprinciple com- ponent analysis (RPCA),both ofwhichhavebeen provedeﬀectivelyinrecovering true

(4)

signalfromsparse grosscorruptions.Therobustlassomodel,whichhasbeen discussed extensivelyin[18],[19],[20],minimizes1-normofasparsesignalvectorandthe1-norm ofasparseerrorvectorsimultaneouslyinordertogetridofsparsecorruptions.Whereas theRPCA model,whichisﬁrstproposedin[21] andthenextendedby[22] and[23],re- coversalowrankmatrixbyminimizingthenuclearnormofsignalmatrixplus1-norm ofsparseerrormatrix.

1.2. Contribution andpaperorganization

Thispaperprovesthatwithhighprobability,theproposedRGLmodel(2)–(3) exactly recoversthegroupsparsesignalmatrixandthesparseerrormatrixsimultaneouslyunder certainrestrictionsonmeasurements foraverygeneralclassofsamplematrices.

The rest of this paper is organized as follows. Section 2 provides the main result (see Theorem 1) on the recoverability of the RGL model (2)–(3) under assumptions onthe sensingmatrices and thetrue signaland errormatrices (see Assumptions1–4).

Section 2 also introduces several supporting lemmas and corollaries (see Lemmas 1–4 andCorollaries1–2).Section3givesthedualcertiﬁcate of(2)–(3),whichisasuﬃcient condition guaranteeingthe exact recovery from the RGLmodel with high probability.

Theirproofs are basedon two supportinglemmas (see Lemmas 5–6). Section4proves thattheinexactdualcertiﬁcateof(2)–(3) canbesatisﬁedthroughaconstructivemanner (seeTheorem 3and Lemma7).This way, weprovethe main resultgiveninSection2.

Section5concludesthepaper.

1.3. Notations

We introduce several notations that are used in the subsequent sections. Bold up- percase letters denote matrices, whereas bold lowercase letters with subscripts and superscripts stand for column vectors and row vectors, respectively. For a matrix U, wedenoteu_i asitsi-thcolumn,uⁱasitsj-throw,andu_ij asits(i,j)-thelement.Fora givenvectoru,wedenoteui asitsi-thelement.Thenotations{U(i)}and{u(i)}denote thefamilyofmatricesandvectorsindexedbyi,respectively.Thenotations{U_(i,j)}and {u_(i,j)}denote the familyof matricesand vectorsindexed by (i,j), respectively.vec(·) is thevectorizing operator thatstacks the columns of amatrix oneafter another. {·}

denotesthetransposeoperator.diag{·}representsadiagonalmatrixandBLKdiag{·}

representsablockdiagonal matrix.Thenotation·,·denotestheinner product,when applyingto two matrices Uand V. sgn(u) and sgn(U) aresign vectorand signmatrix foruand U,respectively.

Additionally,weuseseveralstandardmatrixandvectornorms. Foravectoru∈Rⁿ, deﬁne

• 2-norm: u2=n j=1u²_j.

• 1-norm: u1=n j=1|uj|.

(5)

ForamatrixU∈R^m^×ⁿ,deﬁne

• 2,1-norm: U2,1=m i=1

n j=1u²_ij.

• 2,∞-norm: U2,∞= maxin j=1u²_ij.

• ₁-norm:U1=m i=1

n j=1|u_ij|.

• Frobeniusnorm:UF =m i=1

n j=1u²_ij.

• _∞-norm:U∞= maxi,j|uij|.

Also,we usethenotationU(p,q)to denoteinducednorms,whichstandsfor

U(p,q)= max

x∈Rⁿ

Uxp

xq

.

For the signalmatrix Y∈ R^n×L and noisematrix S∈R^m×L, we usethe following set notationsthroughoutthepaper.

• T:Therowgroupsupport(namely,theset ofrowcoordinatescorresponding tothe nonzerorowsofthesignalmatrix)whosecardinalityisdenotedaskT =|T|.

• T^c:ThecomplementofT (namely,{1,· · ·,n}\T).

• Ω:Thesupportoferrormatrix(namely,thesetofcoordinatescorrespondingtothe nonzeroelementsoftheerrormatrix)whosecardinalityisdenotedaskΩ=|Ω|.

• Ω^c:ThecomplementofΩ (namely,{1,· · ·,n}× {1,· · ·,L}\Ω).

• Ωi:Thesupportofthei-thcolumntheerrormatrixwhosecardinalityisdenotedas k_Ω_i =|Ω_i|.

• Ω^c_i:ThecomplementofΩi (namely,{1,· · ·,n}\Ωi).

• Ω^∗_i: An arbitrary ﬁxed subset of Ω^c_i with cardinality m −k_max, where k_max = maxikΩi. Intuitively, Ω^∗_i standsfor the maximal non-corrupted set across diﬀerent i∈ {1,· · ·,L}.

ForanygivenmatricesU∈R^m×L,V∈R^n×Landgivenvectorsu∈R^m,v∈Rⁿ,deﬁne orthogonal projectionoperators asfollows.

• PΩU:Theorthogonal projectionofmatrixUontoΩ (namely,set everyentry ofU whosecoordinatebelongstoΩ^c as 0whilekeepother entriesunchanged).

• PΩiu,PΩ^c_iu,PΩ^∗_iu:TheorthogonalprojectionsofuontoΩ_i,Ω^c_i,andΩ^∗_i,respectively.

• PTv:TheorthogonalprojectionofvontoT.

• PΩiU, PΩ^c_iU, and PΩ^∗_iU: The orthogonal projections of each column of U onto Ωi, Ω^c_i, and Ω^∗_i, respectively (namely, PΩiU = [PΩiu1,· · ·,PΩiuL], PΩ^c_iU = [PΩ^c_iu₁,· · ·,PΩ^c_iu_L] andPΩ^∗_iU= [PΩ^∗_iu₁,· · ·,PΩ^∗_iu_L]).

• PTV:TheorthogonalprojectionofeachcolumnofVontoT.

(6)

Furthermore,we admitanotationalconventionthatforanyprojectionoperatorP and correspondingmatrixU(or vectoru),itholds

UP= (PU)(oruP= (Pu)).

Finally, by saying an event occurs with high probability, we mean that the occurring probabilityoftheeventisat least1−C(nL)⁻¹ where Cisaconstant.

2. Mainresultofexactrecovery

ThissectionprovidesthetheoreticalperformanceguaranteeoftheRGLmodel(2)–(3).

Section2.1makesseveralassumptionsunderwhich(2)–(3) recoversthetruegroupsparse signaland sparse error matrices with high probability.The main result is summarized inTheorem 1. Section2.2 discussesseveralrelatedresultsandapplications.Section2.3 givesseveralprobabilitytoolsthatareuseful intheproofofthemain result.

2.1. Assumptions andmainresult

Westartfrom severalassumptionsonthesensingmatrices,aswell asthetruegroup sparsesignalandsparseerrormatrices.

Consider L distributions {Fi}^Li=1 in Rⁿ and an independently sampled vector a_(i) fromeachFi.Thecorrelationmatrixisdeﬁnedas

Σ_(i)=E

a_(i)a_(i) ,

andthecorrespondingconditionnumberisboundedas follows

λmax{Σ(i)}

λmin{Σ_(i)} ≤κ, ∀i∈ {1,2,· · ·, L},

where κ is a positive constant and λ_max{·}, λ_min{·} denote the largest and smallest eigenvaluesofamatrix,respectively.Observethatthisconditionnumberisﬁniteifand onlyifthecovariancematrixisinvertible,andislargerthanor equalto1inany case.

Assumption1.Fori= 1,· · ·,L,deﬁne thei-thsensingmatrixas

A_(i) 1

√m

⎛

⎜⎜

⎝ a_(i)1

... a_(i)m

⎞

⎟⎟

⎠∈R^m^×ⁿ.

Therein, {a_(i)1,· · ·,a_(i)m}is assumed to be asequence of i.i.d. random vectorsdrawn fromthedistributionFi inRⁿ.

(7)

By Assumption 1, we suppose that everysensing matrix A(i) is randomly sampled from acorresponding distributionFi. Weproceed to assumethe propertiesof thedis- tributions {Fi}^Li=1.

Assumption 2.Fori= 1,· · ·,L,everydistribution Fi satisfythefollowing twoproper- ties.

• Completeness:ThecorrelationmatrixΣ_(i)isinvertible.

• Incoherence:Each sensingvector a_(i)sampled fromFi satisﬁes

j∈{max1,···,n}|a(i),ek| ≤√

μ, (7)

j∈{1,···max,n}|Σ⁻¹_(i)a_(i),e_k| ≤√

μ, (8)

forsomeﬁxed constantμ≥1,where{e_k}ⁿk=1 isthestandardbasisinRⁿ.

We call μ as the incoherence parameter. Note that this incoherence condition is stronger than the one originally presented in [26], which does not require (8). If one wants to get rid of (8), then some other restrictions must be imposed on the sensing matrices (see[27] forrelatedresults).

Observe thatthebounds (7) and (8) inAssumption 2are meaninglessunless weﬁx thescaleofa_(i).Thus,we havethefollowingassumption.

Assumption 3.ThecorrelationmatrixΣ(i) satisﬁes

λ_max{Σ_(i)}=λ_min{Σ_(i)}⁻¹, (9) forany Fi, i= 1,· · ·,L.

GivenanycompleteFi,(9) canalwaysbeachievedbyscalinga_(i)upordown.Thisis truebecauseifwescalea_(i)up,thenλmax{Σ_(i)}increasesandλmin{Σ_(i)}⁻¹ decreases.

Observethattheoptimizationproblem(2)–(3) isinvariantunderscaling.Thus,Assump- tion3doesnotposeany extraconstraint.

Denote YandSasthetruegroupsparsesignalandsparseerrormatricestorecover, respectively.TherowgroupsupportofYandthesupportofSareﬁxedanddenotedas T and Ω,respectively.

Theassumption onYandSisgiven asfollows,

Assumption 4.The true signal matrixY and error matrixS satisfy the following two properties.

• Random sign: Thesignsof theelements of Y andS arei.i.d. and equallylikelyto be+1 or−1.

(8)

• Row incoherence: Letyⁱ be the i-th rowof Y. Then, there exists aﬁxed constant ν ≥1 sothat

l∈{1,2,max···,L}

yⁱ yⁱ2

,el

≤ ν

L,

forany i∈T andl∈ {1,2,· · ·,L},where{el}^Ll=1 isastandardbasisinR^L. Under the assumptions stated above, we have the following main theorem on the recoverabilityoftheRGLmodel(2)–(3).

Theorem1.Letθ∈[1,L] beatradeoﬀparameter.Under Assumptions 1–4,thesolution pair(Y,ˆ S)ˆ to theoptimization problem (2)–(3)is exact and uniquewith probability at least1−(16+ 2e¹⁴)(nL)⁻¹,providedthat λ=

θ Llog(nL),

kT ≤α m

max κ,^ν_θ

μlog²(nL), kΩ≤βmL

μθ , kmax≤γm

κ, (10)

wherekmaxmaxikΩi,and α≤ ₉₆₀₀¹ ,β≤ ₃₁₃₆¹ ,γ≤ ¹₄ areallpositiveconstants.¹ Notethatfrom(10),theconstantθ isindeedatrade-oﬀparameter.Choosingθlarge relaxestheconstraintonrowsupportsbutrestrictsthenumberofsparseerrors,andvice versa.

Theincoherenceconditionofsensingvectors(Assumption2)iscommoninlassoand robust lasso literatures, which implies that the columns of [A₍₁₎y1,· · ·,A_(L)yL] are not aligned with sparse vectors. On the other hand, the row incoherence (Assump- tion 4) is unique in the current group lasso context. It indicates that the rows of [A₍₁₎y1,· · ·,A_(L)yL] are not aligned with sparse vectors either. Note that this condition is trivial when ν = L, in which cases choosing θ = L in Theorem 1 gives the measurementboundmL≥ O(k_TLlog²(nL)) andm≥ O(k_Ω).IngeneralwhenLislarge andtherowsofthetruesignalmatrixY aredense,wecouldhaveν L,inwhichcase choosingθcloseto1resultsinanimprovedmeasurementboundmL≥ O(kTLlog²(nL)) andmL≥ O(kΩ).Formatrixrecoveryliteratures,similartwo-wayincoherenceassump- tionsareadopted.Forexample,inRPCA([4],[21]),thesignalmatricesareassumedto haveboth leftandright singularvectorsbeing incoherentwithsparsevectors.

Finally,notice thatY and Sare assumed to haveﬁxed supportsbutrandom signs.

This assumption is commonly adopted inthe RIPless analysis of lasso ([19], [26]) and seemstobecrucialintheproof.Alternatively,onecouldalwaysavoidassumingrandom signs of Y by taking the sensing matrices as A_(i)D_(i), i = 1,2,· · ·,L, where A_(i) is

1 Theboundsonα,β,γarechosensuchthatalltherequirementsontheseconstantsinthesubsequent lemmasandtheoremsaremet.

(9)

thesameasaboveandD(i)∈Rⁿ^×ⁿ isadiagonalmatrixwitheachdiagonalentryi.i.d.

and equallylikelyto be +1 or −1.It is still an open problem whetheror not onecan completely getridofthis randomsignassumption.

2.2. Relatedworks andapplications 2.2.1. Examples

In this section, we provide several examples demonstrating that our sensing model coversawiderangeofrandommeasurements.

• Random Fouriermeasurements: An importantapplicationfor oursensingmodelis therandomsubsampleddiscreteFouriertransform(DFT)matrix.Considerann×n matrixWwitheachentry

W_ω^t =e^−i2πωt/n, w, t∈ {0,1,· · ·, n−1}.

Let A_(i) ∈ C^m^×ⁿ be a sensing matrix so that each row is sampled uniformly at random from the rows of W.² It can be easily veriﬁed that E

a(i)a_(i)

= I and theincoherenceparameterμ= 1.Suchsensing modelarises invarious applications including magnetic resonance imaging and collaborative spectrum sensing ([11]).

Moregenerally,anyrandomrowsub-matrixA_(i)ofanarbitraryboundedorthogonal matrixW (withincoherent rows)ﬁts into ourmodel.This includessampling from theclassofHadamardmatrices(orthogonalmatriceswithallentries+1 or −1).

• Random sampling from frames: A frame is a set of vectors {uk}^Kk=1 ⊆ Rⁿ which satisﬁesthefollowingrelation: ThereexistpositiveconstantsC1 andC2suchthat

C1x²2≤ 1 K

K k=1

uk,x²≤C2x²2, ∀x∈R^d.

This can be viewed as ageneralization of orthogonal systemswithout linear inde- pendence.Furthermore,assumethateachuk satisﬁestheincoherencecondition.Let A_(i)∈R^m×n be asensingmatrixso thateachrowissampled uniformlyat random from theset {uk}^Kk=1. This sampling modelalsoﬁts intoour formulation.Random measurements of this kind arise in Fourier transform with acontinuous frequency spectrum([31])as wellas waveletframebasedreconstruction([32]).

2.2.2. Related results

WeseefromTheorem1thatwhenthesignalmatrixYissuﬃcientlygroupsparseand theerrormatrixSsuﬃcientlysparse, thenwithhigh probabilitywe areabletoexactly

2 OurmodelisdevelopedinRwhilethisexampleisoverC,allourdeﬁnitionsandresultscanbegeneralized tocomplexnumberswithlittlechanges.

(10)

recoverbothofthembysolvingaconvexprogram.Similarresultsonstructuredrecovery havebeen establishedpreviouslybyposingmorerestrictionsonthemeasurementssuch as Gaussian ([29]) or orthogonality ([30]). Here, we prove the performance guarantee underamoregeneralsensingmodel.

NotethatthetotalnumberofmeasurementsismL.Thus,Theorem1givesthemea- surement bounds mL ≥ O(k_TLlog²(nL)) and mL ≥ O(k_Ω). Speciﬁcally, by setting L = 1, this result meets the measurement bounds in the robust lasso model (for ex- ample, [19]), where the number of measurements m ≥ O(klog²p) and m ≥ O(kΩ) guaranteesthehighprobabilityrecoveryofanp-dimensionalk-sparsesignalvectorand anm-dimensionalkΩ-sparseerrorvectorsimultaneously.

Theorem1isaresultofRIPlessanalysis,whichsharesthesamelimitationasallother RIPlessanalyses. Tobe speciﬁc,Theorem 1onlyholds forarbitrarybutﬁxed Y andS (exceptthattheelementsofYandShaveuniformrandomsignsbyAssumption4).Ifwe expecttohaveauniformrecovery guaranteehere(namely,consideringrandomsensing matricesaswellassignalanderrormatriceswithrandomsupports),thencertainstronger assumptionsmustbe madeonthesensingmatricessuchas theRIPcondition.

The proof of Theorem 1is based on the construction of an inexact dual certificate throughthegolfingscheme.Thegolfingschemewasfirstintroducedin[24] forlowrank matrix recovery. Subsequently, [26] and [27] refined and used the scheme to prove the lasso recovery guarantee. The work [19] generalized it to mix-norm recovery. In this paper, weconsider anew mix-normproblem,namely, summationofthe 2,1-norm and the1-norm.

2.3. Matrixconcentrationinequalities

Belowwegiveseveralprobabilitytoolsthatareusefulintheproofsofthepaper.We begin with two versions of Bernstein inequalities from [25]. The ﬁrst one is a matrix Bernsteininequality.

Lemma1(MatrixBernstein inequality). Consider aﬁnite sequenceof independentran- dommatrices{M(j)∈R^d^×^d}.Assumethat every random matrixsatisﬁes E

M(j)

= 0 andM_(j)(2,2)≤B almostsurely. Deﬁne

σ²max

⎧⎪

⎨

⎪⎩

j

E

M_(j)M_(j)

(2,2)

,

j

E

M_(j)M_(j)

(2,2)

⎫⎪

⎬

⎪⎭.

Then, forallt≥0,wehave

P r

⎧⎪

⎨

⎪⎩

j

M_(j) (2,2)

≥t

⎫⎪

⎬

⎪⎭≤2dexp

− t²/2 σ²+Bt/3

.

(11)

Wealsoneedavectorformof theBernstein inequality.

Lemma 2(Vector Bernstein inequality).Consider a ﬁnite sequenceof independent ran- dom vectors {g_(j) ∈ R^d}. Assume that every random vector satisﬁes E

g_(j)

= 0 and g_(j)2≤B almostsurely. Deﬁneσ²

kE g_(j)²2

.Then,forall0≤t≤σ²/B,we have

P r

⎛

⎝

j

g_(j) 2

≥t

⎞

⎠≤exp

− t² 8σ²+1

4

.

Next, we use the matrix Bernstein inequality to prove its extension on a block anisotropicmatrix.

Lemma 3.Consider a matrix A_(i) satisfying the model described in Section 2.1, and denoteA˜_(i)=Σ⁻_(i)¹A_(i)PΩ^∗_iA_(i).Forany τ >0,itholds

P r PT

m m−kmax

A˜_(i)Σ⁻¹_(i) −Σ⁻¹_(i)

PT

(2,2)

≥τ

!

≤2kTexp

−m−kmax

κk_Tμ

τ² 4(κ+^2τ₃ )

,

and

P r PT

m m−kmax

A˜_(i)−I

PT

(2,2)

≥τ

!

≤2kTexp

−m−kmax

κkTμ

τ² 4(1 +^2τ₃)

.

We prove the ﬁrst part in Appendix A, and the second part can be proved in a similar way. Two consequent corollaries of Lemma 3 show that the restriction of

m

m−kmaxBLKdiagA˜₍₁₎,· · ·,A˜_(L)

tothecorresponding supportT isnearisometric.

Corollary 1.Denote A˜_(i)=Σ⁻¹_(i)A_(i)PΩ^∗_iA_(i).Given k_T ≤α_μκ_log(nL)^m ,k_max≤γm, and

1−γ

α ≥64,then withprobability atleast1−2(nL)⁻²,wehave BLKdiag

"

PT

m m−kmax

A˜₍₁₎−I

PT, · · ·, PT

m m−kmax

A˜_(L)−I

PT

#

(2,2)

<1 2. (11) Furthermore,givenk_T ≤α_μκ_log^m2(nL),k_max≤γm,and ^1−γ_α ≥64,withatleastthesame probability,we have

(12)

BLKdiag

"

PT

m

m−k_maxA˜₍₁₎−I

PT, · · ·, PT

m

m−k_maxA˜_(L)−I

PT

#

(2,2)

< 1 2$

log(nL). (12) Proof. First, following directlyfrom theﬁrst partof Lemma3, for alli = 1,· · ·,L, it holds

P r PT

m

m−k_maxA˜_(i)−I

PT

(2,2)

≥τ

!

≤2kTexp

"

−m−kmax

k_Tμκ

τ² 4(1 +^2τ₃)

# . (13) Takingaunionboundoveralli= 1,· · ·,Lyields

P r BLKdiag

"

PT

m m−kmax

A˜₍₁₎−I

PT, · · ·,

PT

m m−kmax

A˜_(L)−I

PT

#

(2,2)

≥τ

!

=P r max

i PT

m m−kmax

A˜_(i)−I

PT

(2,2)

!

≥τ

!

≤ L i=1

P r PT

m m−kmax

A˜_(i)−I

PT

(2,2)

≥τ

!

≤2k_TLexp

"

−m−k_max kTμκ

τ² 4(1 +^2τ₃)

#

. (14)

Plugginginτ= ¹₂ andusingthefactthatk_T ≤α_μκ_log(nL)^m andk_max≤γm,weget

The last line of (14) = 2k_TLexp

"

−3(1−γ)

64α log(nL)

#

= 2kTL(nL)⁻^3(1−γ)^64α ,

≤2kTL(nL)⁻³≤2(nL)⁻²

wheretheﬁrstinequalityfollowsfrom ^1−γ_α ≥64 andthesecondinequalityfollowsfrom kT ≤n.Similarly,plugging inτ = ¹

2$

log(nL) and usingthefactthatkT ≤α_μκ_log^m2(nL), weprove(12) aslongas ¹⁻_α^γ ≥64. 2

Corollary 2.Given that k_T ≤α_μκ_log(nL)^m , k_max≤γm,and ^1−γ_α ≥64, then with proba- bilityatleast 1−2(nL)⁻²,wehave

(13)

BLKdiag

"

PT

m

m−k_maxA˜₍₁₎Σ⁻¹₍₁₎−Σ⁻¹₍₁₎

PT,

· · ·, PT

m m−kmax

A˜(L)Σ⁻_(L)¹ −Σ⁻_(L)¹

PT

#

(2,2)

< κ

2. (15)

Theproofisthesameasproving(11) usingLemma3.Weomitthedetailsforbrevity.

Finally, we havethe following lemma show thatif the supportof thecolumns inA_(i) is restricted to Ω^∗_i,then no columnindexed insideT canbe well approximated by the columnindexedoutsideofT.Inotherwords,thosecolumnscorrespondtothetruesignal matrixshallbe welldistinguished.

Lemma 4 (Oﬀ-support incoherence). Denote A˜_(i) = Σ⁻¹_(i)A_(i)PΩ^∗_iA_(i). Given kT ≤ α_μκ_log(nL)^m andα < ₂₄¹,withprobability atleast1−e¹⁴(nL)⁻²,wehave

i∈{1,···max,L},k∈T^c

PTA˜(i)ek

2≤1, (16)

where {ek}ⁿk=1 isastandardbasisin Rⁿ.

Theproof ofLemma4isgiveninAppendixB.

With particular note, inthe above lemmas and corollaries, all the requirements on theconstants α, β andγ satisfytheboundsinTheorem1.

3. Inexactdualcertiﬁcates

This sectiongivesthedualcertiﬁcatesoftheRGLmodel,namely,thesuﬃcientcon- ditions under whichthe optimal solutionpair ofthe convex program (2)–(3) is unique andequaltothepairofthetruesignalanderrormatrices.Firstwehavetwopreliminary lemmas.

Lemma 5.Suppose that Y ∈ R^n×L and S ∈ R^m×L are the true group sparse signal and sparseerrormatrices,respectively.If (Y+H,S−F)isanoptimalsolution pairto (2)–(3),where H∈Rⁿ^×^L andF∈R^m^×^L,thenthefollowingresultshold:

i)

A₍₁₎h1,· · ·,A_(L)hL

=F;

ii) Y+H2,1+λS−F1 ≥ Y2,1+λS1+PT^cH2,1+λPΩ^cF1+V,H− λsgn(S),F,

where V ∈ R^n×L satisﬁes (PTV)ⁱ = _y_¯^y^¯iⁱ2 and (PT^cV)ⁱ = 0, ∀i = 1,· · ·,n. Here (PTV)ⁱ denotes thei-throw ofPTVand ¯yⁱ denotesthei-th rowof Y.

Theproof ofLemma5isgiveninAppendixC.

(14)

Lemma 6.For any two matrices H ∈ Rⁿ^×^L and F ∈ R^m^×^L, with probability at least 1−2n⁻²,H=0andF=0ifthefollowingconditions aresatisﬁed:

i) k_T ≤α_μκ_log^m2(nL),k_max≤γm,and ^1−γ_α ≥64;

ii)

A₍₁₎h₁,· · ·,A_(L)h_L

=F;

iii) PT^cH=0and PΩ^cF=0.

TheproofofLemma6isgiveninAppendixD.

Belowwe showthattheoptimal solutionpair( ˆY, S)ˆ of theconvexprogram (2)–(3) isequaltothetruesignalandnoisepair(Y,S)whencertaincertiﬁcateconditionshold.

Theorem 2 (Inexact duality).Suppose that Y ∈ Rⁿ^×^L and S ∈ R^m^×^L are the true groupsparse signalandsparse errormatrices satisfyingtheassumptions inTheorem1.

The pair (Y, S) is the unique solution to the RGL model (2)–(3) with probability at least 1−(2+e¹⁴)(nL)⁻¹, if the parameter λ < 1 and there exists a dual certiﬁcate (W,V)∈R^m^×^L×Rⁿ^×^L suchthat

PTV−VF ≤ λ 4√

κ, (17)

PT^cV2,∞≤ 1

4, (18)

PΩ^cW∞≤ λ

4, (19)

and V=

A₍₁₎PΩ^c₁w₁,· · ·,A_(L)PΩ^c_Lw_L +λ

A₍₁₎sgn(¯s₁),· · ·,A_(L)sgn(¯s_L)

, (20) whereV∈Rⁿ^×^L satisﬁes(PTV)ⁱ =_¯_y^y^¯iⁱ2 and (PT^cV)ⁱ=0.

Proof. Supposethat(Y+H,S−F) isanoptimalsolutionpairto(2)–(3).Therefore,the two resultsinLemma5hold true. Proving thatthepair (Y,S)is the uniquesolution to (2)–(3) is equivalentto showing thatH=0andF=0. Hence,the proofresortsto verifyingthethree conditionsinLemma6.

SinceYandSsatisfytheassumptionsinTheorem 1,wehave k_T ≤α m

μκlog²(nL), k_max≤γm

κ, α≤ 1

9600, γ≤1 4.

Consideringκ≥1,weknowthatconditioni)inLemma6holds.Byresulti)ofLemma5, conditionii)alsoholds.Therefore,itremainstoverifyconditioniii), namely,PT^cH=0 andPΩ^cF=0.

(15)

ConsiderthetermV,H−λsgn(S),Finresultii)ofLemma5.Usingtheequation V=PTV+PT^cV,we rewritethetermas

V,H −λsgn(S),F=V− PTV,H − PT^cV,H+V,H −λsgn(S),F. (21) ConsiderthetermV,H−λsgn(S),Fontherighthandsideof(21).By(20),wehave

V,H −λsgn(S),F

=%

A₍₁₎PΩ^c1w₁,· · ·,A_(L)PΩ^c_Lw_L ,H&

+λ%

A₍₁₎sgn(¯s₁),· · ·,A_(L)sgn(¯s_L) ,H&

−λsgn(S),F. Byadjointrelationsofinnerproducts, wehave

%

A₍₁₎PΩ^c₁w1,· · ·,A_(L)PΩ^c_LwL

,H

&

='

PΩ^c₁A₍₁₎h1,· · ·,PΩ^c_LA_(L)hL

,W( , and

A₍₁₎sgn(¯s1),· · ·,A_(L)sgn(¯sL) ,H

=sgn(S),

A₍₁₎h1,· · ·,A_(L)hL

.

Thus, itholds

V,H −λsgn(S),F

='

PΩ^c₁A₍₁₎h₁,· · ·,PΩ^c_LA_(L)h_L ,W( +λ'

sgn(S),

A₍₁₎h1,· · ·,A_(L)hL

(−λsgn(S),F.

Accordingto conclusioni)inLemma5,whichis

A₍₁₎h₁,· · ·,A_(L)h_L

=F,itfollows V,H −λsgn(S,F)=PΩ^cF,W.

Combining (21) andaboveequalitygives

V,H −λsgn(S),F=V− PTV,H − PT^cV,H+PΩ^cF,W. (22) Next, we manage to ﬁnd outa lower bound for the right-hand side of the equality (22).First,by(17),

V− PTV,H ≥ −PTV−VFPTHF ≥ − λ 4√

κPTHF.

(16)

Then,by(18),

−PT^cV,H ≥ −PT^cV2,∞PT^cH2,1≥ −1

4PT^cH2,1. Finally,by(19),

PΩ^cF,W ≥ −PΩ^cF1PΩ^cW∞≥ −λ

4PΩ^cF1. Therefore,(22) gives

V,H −λsgn(S),F ≥ − λ 4√

κPTHF−1

4PT^cH2,1−λ

4PΩ^cF1. Substitutetheaboveinequalityinto conclusionii) ofLemma5gives

Y+H2,1+λS−F1

≥ Y2,1+λS1+3

4PT^cH2,1+3λ

4 PΩ^cF1− λ 4√

κPTHF.

Since(Y+H,S−F) isanoptimalsolutionpairto(2)–(3),itfollowsthatY+H2,1+ λS−F1≤ Y2,1+λS1.Hence,wehave

3

4PT^cH2,1+3λ

4PΩ^cF1− λ 4√

κPTHF ≤0. (23) Tocomplete theproof, we needto show thatinequality(23) impliesPT^cH=0 and PΩ^cF=0.Todoso,weﬁrstderive anupperboundforPTHF,expressed asalinear combinationofPT^cH2,1 andPΩ^cF1.

Using(11),itfollows BLKdiag

"

PT

m m−kmax

A˜₍₁₎−I

PT,· · ·,PT

m m−kmax

A˜_(L)−I

PT

#

vec(H) 2

≤ 1

2PTHF. Since BLKdiag{PT,· · ·,PT}vec(H)2 = PTHF, applying the triangle inequality yields

PTHF ≤2 m

m−kmax

BLKdiag

PTA˜(1)PT,· · ·,PTA˜(L)PT

vec(H) 2

.

Observingvec(PTH)= vec(H)−vec(PT^cH) andusingthetriangleinequalityagain,we have