Contents lists available atScienceDirect
Linear Algebra and its Applications
www.elsevier.com/locate/laa
Robust group lasso: Model and recoverability
✩Xiaohan Weia,∗, Qing Lingb, Zhu Hanc
a DepartmentofElectricalEngineering,UniversityofSouthernCalifornia,United Statesof America
b Schoolof DataandComputerScience,SunYat-SenUniversity,China
cDepartmentofElectricalandComputerEngineering,UniversityofHouston, UnitedStatesofAmerica
a r t i c l e i n f o a bs t r a c t
Article history:
Received19November2016 Accepted30June2018 Availableonlinexxxx SubmittedbyH.Rauhut MSC:
60D05 94A12 Keywords:
Compressedsensing Groupsparsity Sparsenoise
Restrictedisometryproperty Golfingscheme
Thispaperconsiderstheproblemofrecoveringagroupsparse signal matrix Y = [y1,· · ·,yL] from sparsely corrupted measurements M= [A(1)y1,· · ·,A(L)yL]+S,whereA(i)’s are known sensing matrices and S is an unknown sparse errormatrix.Arobustgrouplasso(RGL)modelisproposed to recover Y andSthrough simultaneouslyminimizing the 2,1-normofYandthe1-normofSunderthemeasurement constraints.WeprovethatYandScanbeexactlyrecovered fromtheRGLmodelwithhighprobabilityforaverygeneral classofA(i)’s.
©2018ElsevierInc.Allrightsreserved.
✩ Qing Ling’s research is partially supported by National Science Foundation of China under Grant 61573331andbytheNationalScienceFoundationAnhuiunderGrant1608085QF130.ZhuHan’sresearch ispartiallysupportedbyUSMURI,NSFCNS-1717454,CNS-1731424,CNS-1702850,CNS-1646607.
* Correspondingauthor.
E-mailaddresses:[email protected](X. Wei),[email protected](Q. Ling),[email protected] (Z. Han).
https://doi.org/10.1016/j.laa.2018.06.034
0024-3795/©2018ElsevierInc. Allrightsreserved.
1. Introduction
Considertheproblem ofrecovering agroupsparsematrixY= [y1,· · ·,yL]∈Rn×L fromsparsely corruptedmeasurements
M= [A(1)y1,· · ·,A(L)yL] +S, (1) where M = [m1,· · ·,mL]∈ Rm×L is a measurement matrix, A(i)∈ Rm×n is thei-th sensingmatrix, and S= [s1,· · ·,sL]∈ Rm×L is an unknown sparse errormatrix. The error matrix S is sparse as it has only a small number of nonzero entries. The signal matrixYisgroupsparse,meaningthatYissparseanditsnonzeroentriesappearina smallnumberofcommonrows.
Given M and A(i)’s, our goal is to recover Y and S from the linear measurement equation(1).Inthispaper,weproposetoaccomplishtherecoverytaskthroughsolving thefollowingrobustgrouplasso(RGL) model
minY,S Y2,1+λS1, (2)
s.t. M= [A(1)y1,· · ·,A(L)yL] +S. (3) Denoting yij and sij as the (i,j)-th entries of Y and S, respectively, Y2,1 n
i=1
L
j=1y2ij is defined as the 2,1-norm of Y and S1 m i=1
L
j=1|sij| is de- finedas the1-norm ofS.Minimizingthe2,1-normtermpromotesgroupsparsityofY whileminimizingthe1-normtermpromotessparsityofS;λisanonnegativeparameter to balance the two terms. We prove that solving the RGL model (2)–(3), which is a convex program, enables exact recovery of Y and S with high probability, given that A(i)’ssatisfycertainconditions.
1.1. From grouplasso torobustgroup lasso
Sparsesignal recovery has attractedmuch research interest inthe signal processing andoptimizationcommunities duringthepastfewyears.Various sparsity modelshave beenproposed tobetterexploitthesparsestructures ofhigh-dimensionaldata, suchas sparsity ofa vector[1], [2], groupsparsity of vectors[3], and low-ranknessof amatrix [4]. For moretopics relatedto sparse signalrecovery,readers arereferredto therecent surveypaper[5].
Inthis paperwe areinterestedinthe recoveryof groupsparse(also knownasblock sparse [6] or jointly sparse [7]) signals which finds a variety of applications such as direction-of-arrivalestimation[8],[9],collaborativespectrumsensing[10–12] andmotion detection [13]. A signal matrix Y = [y1,· · ·,yL] ∈ Rn×L is called k-group sparse if k rowsofYarenonzero.AmeasurementmatrixM= [m1,· · ·,mL]∈Rm×Listakenfrom linearprojectionsmi=A(i)yi,i= 1,· · ·,L,whereA(i)∈Rm×n isasensingmatrix.In ordertorecoverYfromA(i)’sandM,thestandard2,1-normminimizationformulation proposesto solveaconvexprogram
minY Y2,1,
s.t. M= [A(1)y1,· · ·,A(L)yL]. (4) Thisisastraightforwardextensionfromthecanonical1-normminimizationformulation thatrecoversasparsevector.Theoreticalguaranteeofexactrecoveryhasbeendeveloped based on the restricted isometric property (RIP) of A(i)’s [14]. The performance of exploiting more structures of Y by simultaneously minimizing the 2,1-norm and the nuclear normisanalyzedin[15].
Consider that in practice the measurements are often corrupted by random noise, resultinginM= [A(1)y1,· · ·,A(L)yL]+Nwhere N= [n1,· · ·,nL]∈Rm×L isanoise matrix.To addressthenoise-corruptedcase,thegrouplassomodelin[3] solves
minY,E Y2,1+γN2F,
s.t. M= [A(1)y1,· · ·,A(L)yL] +N, (5) whereγisanonnegativeparameterandNFistheFrobeniusnormofN.Analternative to (5) is
minY Y2,1,
s.t. M−[A(1)y1,· · ·,A(L)yL]2F ≤ε2, (6) where ε controls the noise level. It has been shown in [14] that if the sensing ma- trices A(i)’s satisfy RIP, then the distance between the solution to (6) and the true signal matrix,whichis measuredbythe Frobeniusnorm, is withinaconstantmultiple of ε.
Theexactrecovery guaranteefor(6) is elegant,butworksonlyifthenoiselevelεis nottoo large.However,inmanypracticalapplications,someofthe measurementsmay be seriously contaminated or even missing due to uncertainties such as sensor failures and transmission errors. Meanwhile, this kind of measurement errors are often sparse (see [16] for detailed discussions). In this case, the exact recovery guarantee does not hold andthesolutionof(6) canbe farawayfrom thetruesignalmatrix.
Theneedofhandlinglargebutsparsemeasurement errorsinthegroupsparsesignal recoveryproblemmotivatestheRGLmodel(2)–(3),whichhasfoundsuccessfulapplica- tionsin,forexample,thecognitivenetworksensingproblem[16].Efficientdecentralized algorithms solving (2)–(3) are proposed in [17] and their performances are evaluated viaextensivesimulations. In(2)–(3),themeasurement matrixMis contaminatedbya sparseerrormatrixS= [s1,· · ·,sL]∈Rm×Lwhosenonzeroentriesmightbeunbounded.
Throughsimultaneouslyminimizingthe2,1-normofYandthe1normofS,weexpect to recoverthegroupsparsesignalmatrixYandthesparse errormatrixS.
The RGL model(2)–(3) isclosely related to robustlasso and robustprinciple com- ponent analysis (RPCA),both ofwhichhavebeen provedeffectivelyinrecovering true
signalfromsparse grosscorruptions.Therobustlassomodel,whichhasbeen discussed extensivelyin[18],[19],[20],minimizes1-normofasparsesignalvectorandthe1-norm ofasparseerrorvectorsimultaneouslyinordertogetridofsparsecorruptions.Whereas theRPCA model,whichisfirstproposedin[21] andthenextendedby[22] and[23],re- coversalowrankmatrixbyminimizingthenuclearnormofsignalmatrixplus1-norm ofsparseerrormatrix.
1.2. Contribution andpaperorganization
Thispaperprovesthatwithhighprobability,theproposedRGLmodel(2)–(3) exactly recoversthegroupsparsesignalmatrixandthesparseerrormatrixsimultaneouslyunder certainrestrictionsonmeasurements foraverygeneralclassofsamplematrices.
The rest of this paper is organized as follows. Section 2 provides the main result (see Theorem 1) on the recoverability of the RGL model (2)–(3) under assumptions onthe sensingmatrices and thetrue signaland errormatrices (see Assumptions1–4).
Section 2 also introduces several supporting lemmas and corollaries (see Lemmas 1–4 andCorollaries1–2).Section3givesthedualcertificate of(2)–(3),whichisasufficient condition guaranteeingthe exact recovery from the RGLmodel with high probability.
Theirproofs are basedon two supportinglemmas (see Lemmas 5–6). Section4proves thattheinexactdualcertificateof(2)–(3) canbesatisfiedthroughaconstructivemanner (seeTheorem 3and Lemma7).This way, weprovethe main resultgiveninSection2.
Section5concludesthepaper.
1.3. Notations
We introduce several notations that are used in the subsequent sections. Bold up- percase letters denote matrices, whereas bold lowercase letters with subscripts and superscripts stand for column vectors and row vectors, respectively. For a matrix U, wedenoteui asitsi-thcolumn,uiasitsj-throw,anduij asits(i,j)-thelement.Fora givenvectoru,wedenoteui asitsi-thelement.Thenotations{U(i)}and{u(i)}denote thefamilyofmatricesandvectorsindexedbyi,respectively.Thenotations{U(i,j)}and {u(i,j)}denote the familyof matricesand vectorsindexed by (i,j), respectively.vec(·) is thevectorizing operator thatstacks the columns of amatrix oneafter another. {·}
denotesthetransposeoperator.diag{·}representsadiagonalmatrixandBLKdiag{·}
representsablockdiagonal matrix.Thenotation·,·denotestheinner product,when applyingto two matrices Uand V. sgn(u) and sgn(U) aresign vectorand signmatrix foruand U,respectively.
Additionally,weuseseveralstandardmatrixandvectornorms. Foravectoru∈Rn, define
• 2-norm: u2=n j=1u2j.
• 1-norm: u1=n j=1|uj|.
ForamatrixU∈Rm×n,define
• 2,1-norm: U2,1=m i=1
n j=1u2ij.
• 2,∞-norm: U2,∞= maxin j=1u2ij.
• 1-norm:U1=m i=1
n j=1|uij|.
• Frobeniusnorm:UF =m i=1
n j=1u2ij.
• ∞-norm:U∞= maxi,j|uij|.
Also,we usethenotationU(p,q)to denoteinducednorms,whichstandsfor
U(p,q)= max
x∈Rn
Uxp
xq
.
For the signalmatrix Y∈ Rn×L and noisematrix S∈Rm×L, we usethe following set notationsthroughoutthepaper.
• T:Therowgroupsupport(namely,theset ofrowcoordinatescorresponding tothe nonzerorowsofthesignalmatrix)whosecardinalityisdenotedaskT =|T|.
• Tc:ThecomplementofT (namely,{1,· · ·,n}\T).
• Ω:Thesupportoferrormatrix(namely,thesetofcoordinatescorrespondingtothe nonzeroelementsoftheerrormatrix)whosecardinalityisdenotedaskΩ=|Ω|.
• Ωc:ThecomplementofΩ (namely,{1,· · ·,n}× {1,· · ·,L}\Ω).
• Ωi:Thesupportofthei-thcolumntheerrormatrixwhosecardinalityisdenotedas kΩi =|Ωi|.
• Ωci:ThecomplementofΩi (namely,{1,· · ·,n}\Ωi).
• Ω∗i: An arbitrary fixed subset of Ωci with cardinality m −kmax, where kmax = maxikΩi. Intuitively, Ω∗i standsfor the maximal non-corrupted set across different i∈ {1,· · ·,L}.
ForanygivenmatricesU∈Rm×L,V∈Rn×Landgivenvectorsu∈Rm,v∈Rn,define orthogonal projectionoperators asfollows.
• PΩU:Theorthogonal projectionofmatrixUontoΩ (namely,set everyentry ofU whosecoordinatebelongstoΩc as 0whilekeepother entriesunchanged).
• PΩiu,PΩciu,PΩ∗iu:TheorthogonalprojectionsofuontoΩi,Ωci,andΩ∗i,respectively.
• PTv:TheorthogonalprojectionofvontoT.
• PΩiU, PΩciU, and PΩ∗iU: The orthogonal projections of each column of U onto Ωi, Ωci, and Ω∗i, respectively (namely, PΩiU = [PΩiu1,· · ·,PΩiuL], PΩciU = [PΩciu1,· · ·,PΩciuL] andPΩ∗iU= [PΩ∗iu1,· · ·,PΩ∗iuL]).
• PTV:TheorthogonalprojectionofeachcolumnofVontoT.
Furthermore,we admitanotationalconventionthatforanyprojectionoperatorP and correspondingmatrixU(or vectoru),itholds
UP= (PU)(oruP= (Pu)).
Finally, by saying an event occurs with high probability, we mean that the occurring probabilityoftheeventisat least1−C(nL)−1 where Cisaconstant.
2. Mainresultofexactrecovery
ThissectionprovidesthetheoreticalperformanceguaranteeoftheRGLmodel(2)–(3).
Section2.1makesseveralassumptionsunderwhich(2)–(3) recoversthetruegroupsparse signaland sparse error matrices with high probability.The main result is summarized inTheorem 1. Section2.2 discussesseveralrelatedresultsandapplications.Section2.3 givesseveralprobabilitytoolsthatareuseful intheproofofthemain result.
2.1. Assumptions andmainresult
Westartfrom severalassumptionsonthesensingmatrices,aswell asthetruegroup sparsesignalandsparseerrormatrices.
Consider L distributions {Fi}Li=1 in Rn and an independently sampled vector a(i) fromeachFi.Thecorrelationmatrixisdefinedas
Σ(i)=E
a(i)a(i) ,
andthecorrespondingconditionnumberisboundedas follows
λmax{Σ(i)}
λmin{Σ(i)} ≤κ, ∀i∈ {1,2,· · ·, L},
where κ is a positive constant and λmax{·}, λmin{·} denote the largest and smallest eigenvaluesofamatrix,respectively.Observethatthisconditionnumberisfiniteifand onlyifthecovariancematrixisinvertible,andislargerthanor equalto1inany case.
Assumption1.Fori= 1,· · ·,L,define thei-thsensingmatrixas
A(i) 1
√m
⎛
⎜⎜
⎝ a(i)1
... a(i)m
⎞
⎟⎟
⎠∈Rm×n.
Therein, {a(i)1,· · ·,a(i)m}is assumed to be asequence of i.i.d. random vectorsdrawn fromthedistributionFi inRn.
By Assumption 1, we suppose that everysensing matrix A(i) is randomly sampled from acorresponding distributionFi. Weproceed to assumethe propertiesof thedis- tributions {Fi}Li=1.
Assumption 2.Fori= 1,· · ·,L,everydistribution Fi satisfythefollowing twoproper- ties.
• Completeness:ThecorrelationmatrixΣ(i)isinvertible.
• Incoherence:Each sensingvector a(i)sampled fromFi satisfies
j∈{max1,···,n}|a(i),ek| ≤√
μ, (7)
j∈{1,···max,n}|Σ−1(i)a(i),ek| ≤√
μ, (8)
forsomefixed constantμ≥1,where{ek}nk=1 isthestandardbasisinRn.
We call μ as the incoherence parameter. Note that this incoherence condition is stronger than the one originally presented in [26], which does not require (8). If one wants to get rid of (8), then some other restrictions must be imposed on the sensing matrices (see[27] forrelatedresults).
Observe thatthebounds (7) and (8) inAssumption 2are meaninglessunless wefix thescaleofa(i).Thus,we havethefollowingassumption.
Assumption 3.ThecorrelationmatrixΣ(i) satisfies
λmax{Σ(i)}=λmin{Σ(i)}−1, (9) forany Fi, i= 1,· · ·,L.
GivenanycompleteFi,(9) canalwaysbeachievedbyscalinga(i)upordown.Thisis truebecauseifwescalea(i)up,thenλmax{Σ(i)}increasesandλmin{Σ(i)}−1 decreases.
Observethattheoptimizationproblem(2)–(3) isinvariantunderscaling.Thus,Assump- tion3doesnotposeany extraconstraint.
Denote YandSasthetruegroupsparsesignalandsparseerrormatricestorecover, respectively.TherowgroupsupportofYandthesupportofSarefixedanddenotedas T and Ω,respectively.
Theassumption onYandSisgiven asfollows,
Assumption 4.The true signal matrixY and error matrixS satisfy the following two properties.
• Random sign: Thesignsof theelements of Y andS arei.i.d. and equallylikelyto be+1 or−1.
• Row incoherence: Letyi be the i-th rowof Y. Then, there exists afixed constant ν ≥1 sothat
l∈{1,2,max···,L}
yi yi2
,el
≤ ν
L,
forany i∈T andl∈ {1,2,· · ·,L},where{el}Ll=1 isastandardbasisinRL. Under the assumptions stated above, we have the following main theorem on the recoverabilityoftheRGLmodel(2)–(3).
Theorem1.Letθ∈[1,L] beatradeoffparameter.Under Assumptions 1–4,thesolution pair(Y,ˆ S)ˆ to theoptimization problem (2)–(3)is exact and uniquewith probability at least1−(16+ 2e14)(nL)−1,providedthat λ=
θ Llog(nL),
kT ≤α m
max κ,νθ
μlog2(nL), kΩ≤βmL
μθ , kmax≤γm
κ, (10)
wherekmaxmaxikΩi,and α≤ 96001 ,β≤ 31361 ,γ≤ 14 areallpositiveconstants.1 Notethatfrom(10),theconstantθ isindeedatrade-offparameter.Choosingθlarge relaxestheconstraintonrowsupportsbutrestrictsthenumberofsparseerrors,andvice versa.
Theincoherenceconditionofsensingvectors(Assumption2)iscommoninlassoand robust lasso literatures, which implies that the columns of [A(1)y1,· · ·,A(L)yL] are not aligned with sparse vectors. On the other hand, the row incoherence (Assump- tion 4) is unique in the current group lasso context. It indicates that the rows of [A(1)y1,· · ·,A(L)yL] are not aligned with sparse vectors either. Note that this con- dition is trivial when ν = L, in which cases choosing θ = L in Theorem 1 gives the measurementboundmL≥ O(kTLlog2(nL)) andm≥ O(kΩ).IngeneralwhenLislarge andtherowsofthetruesignalmatrixY aredense,wecouldhaveν L,inwhichcase choosingθcloseto1resultsinanimprovedmeasurementboundmL≥ O(kTLlog2(nL)) andmL≥ O(kΩ).Formatrixrecoveryliteratures,similartwo-wayincoherenceassump- tionsareadopted.Forexample,inRPCA([4],[21]),thesignalmatricesareassumedto haveboth leftandright singularvectorsbeing incoherentwithsparsevectors.
Finally,notice thatY and Sare assumed to havefixed supportsbutrandom signs.
This assumption is commonly adopted inthe RIPless analysis of lasso ([19], [26]) and seemstobecrucialintheproof.Alternatively,onecouldalwaysavoidassumingrandom signs of Y by taking the sensing matrices as A(i)D(i), i = 1,2,· · ·,L, where A(i) is
1 Theboundsonα,β,γarechosensuchthatalltherequirementsontheseconstantsinthesubsequent lemmasandtheoremsaremet.
thesameasaboveandD(i)∈Rn×n isadiagonalmatrixwitheachdiagonalentryi.i.d.
and equallylikelyto be +1 or −1.It is still an open problem whetheror not onecan completely getridofthis randomsignassumption.
2.2. Relatedworks andapplications 2.2.1. Examples
In this section, we provide several examples demonstrating that our sensing model coversawiderangeofrandommeasurements.
• Random Fouriermeasurements: An importantapplicationfor oursensingmodelis therandomsubsampleddiscreteFouriertransform(DFT)matrix.Considerann×n matrixWwitheachentry
Wωt =e−i2πωt/n, w, t∈ {0,1,· · ·, n−1}.
Let A(i) ∈ Cm×n be a sensing matrix so that each row is sampled uniformly at random from the rows of W.2 It can be easily verified that E
a(i)a(i)
= I and theincoherenceparameterμ= 1.Suchsensing modelarises invarious applications including magnetic resonance imaging and collaborative spectrum sensing ([11]).
Moregenerally,anyrandomrowsub-matrixA(i)ofanarbitraryboundedorthogonal matrixW (withincoherent rows)fits into ourmodel.This includessampling from theclassofHadamardmatrices(orthogonalmatriceswithallentries+1 or −1).
• Random sampling from frames: A frame is a set of vectors {uk}Kk=1 ⊆ Rn which satisfiesthefollowingrelation: ThereexistpositiveconstantsC1 andC2suchthat
C1x22≤ 1 K
K k=1
uk,x2≤C2x22, ∀x∈Rd.
This can be viewed as ageneralization of orthogonal systemswithout linear inde- pendence.Furthermore,assumethateachuk satisfiestheincoherencecondition.Let A(i)∈Rm×n be asensingmatrixso thateachrowissampled uniformlyat random from theset {uk}Kk=1. This sampling modelalsofits intoour formulation.Random measurements of this kind arise in Fourier transform with acontinuous frequency spectrum([31])as wellas waveletframebasedreconstruction([32]).
2.2.2. Related results
WeseefromTheorem1thatwhenthesignalmatrixYissufficientlygroupsparseand theerrormatrixSsufficientlysparse, thenwithhigh probabilitywe areabletoexactly
2 OurmodelisdevelopedinRwhilethisexampleisoverC,allourdefinitionsandresultscanbegeneralized tocomplexnumberswithlittlechanges.
recoverbothofthembysolvingaconvexprogram.Similarresultsonstructuredrecovery havebeen establishedpreviouslybyposingmorerestrictionsonthemeasurementssuch as Gaussian ([29]) or orthogonality ([30]). Here, we prove the performance guarantee underamoregeneralsensingmodel.
NotethatthetotalnumberofmeasurementsismL.Thus,Theorem1givesthemea- surement bounds mL ≥ O(kTLlog2(nL)) and mL ≥ O(kΩ). Specifically, by setting L = 1, this result meets the measurement bounds in the robust lasso model (for ex- ample, [19]), where the number of measurements m ≥ O(klog2p) and m ≥ O(kΩ) guaranteesthehighprobabilityrecoveryofanp-dimensionalk-sparsesignalvectorand anm-dimensionalkΩ-sparseerrorvectorsimultaneously.
Theorem1isaresultofRIPlessanalysis,whichsharesthesamelimitationasallother RIPlessanalyses. Tobe specific,Theorem 1onlyholds forarbitrarybutfixed Y andS (exceptthattheelementsofYandShaveuniformrandomsignsbyAssumption4).Ifwe expecttohaveauniformrecovery guaranteehere(namely,consideringrandomsensing matricesaswellassignalanderrormatriceswithrandomsupports),thencertainstronger assumptionsmustbe madeonthesensingmatricessuchas theRIPcondition.
The proof of Theorem 1is based on the construction of an inexact dual certificate throughthegolfingscheme.Thegolfingschemewasfirstintroducedin[24] forlowrank matrix recovery. Subsequently, [26] and [27] refined and used the scheme to prove the lasso recovery guarantee. The work [19] generalized it to mix-norm recovery. In this paper, weconsider anew mix-normproblem,namely, summationofthe 2,1-norm and the1-norm.
2.3. Matrixconcentrationinequalities
Belowwegiveseveralprobabilitytoolsthatareusefulintheproofsofthepaper.We begin with two versions of Bernstein inequalities from [25]. The first one is a matrix Bernsteininequality.
Lemma1(MatrixBernstein inequality). Consider afinite sequenceof independentran- dommatrices{M(j)∈Rd×d}.Assumethat every random matrixsatisfies E
M(j)
= 0 andM(j)(2,2)≤B almostsurely. Define
σ2max
⎧⎪
⎨
⎪⎩
j
E
M(j)M(j)
(2,2)
,
j
E
M(j)M(j)
(2,2)
⎫⎪
⎬
⎪⎭.
Then, forallt≥0,wehave
P r
⎧⎪
⎨
⎪⎩
j
M(j) (2,2)
≥t
⎫⎪
⎬
⎪⎭≤2dexp
− t2/2 σ2+Bt/3
.
Wealsoneedavectorformof theBernstein inequality.
Lemma 2(Vector Bernstein inequality).Consider a finite sequenceof independent ran- dom vectors {g(j) ∈ Rd}. Assume that every random vector satisfies E
g(j)
= 0 and g(j)2≤B almostsurely. Defineσ2
kE g(j)22
.Then,forall0≤t≤σ2/B,we have
P r
⎛
⎝
j
g(j) 2
≥t
⎞
⎠≤exp
− t2 8σ2+1
4
.
Next, we use the matrix Bernstein inequality to prove its extension on a block anisotropicmatrix.
Lemma 3.Consider a matrix A(i) satisfying the model described in Section 2.1, and denoteA˜(i)=Σ−(i)1A(i)PΩ∗iA(i).Forany τ >0,itholds
P r PT
m m−kmax
A˜(i)Σ−1(i) −Σ−1(i)
PT
(2,2)
≥τ
!
≤2kTexp
−m−kmax
κkTμ
τ2 4(κ+2τ3 )
,
and
P r PT
m m−kmax
A˜(i)−I
PT
(2,2)
≥τ
!
≤2kTexp
−m−kmax
κkTμ
τ2 4(1 +2τ3)
.
We prove the first part in Appendix A, and the second part can be proved in a similar way. Two consequent corollaries of Lemma 3 show that the restriction of
m
m−kmaxBLKdiagA˜(1),· · ·,A˜(L)
tothecorresponding supportT isnearisometric.
Corollary 1.Denote A˜(i)=Σ−1(i)A(i)PΩ∗iA(i).Given kT ≤αμκlog(nL)m ,kmax≤γm, and
1−γ
α ≥64,then withprobability atleast1−2(nL)−2,wehave BLKdiag
"
PT
m m−kmax
A˜(1)−I
PT, · · ·, PT
m m−kmax
A˜(L)−I
PT
#
(2,2)
<1 2. (11) Furthermore,givenkT ≤αμκlogm2(nL),kmax≤γm,and 1−γα ≥64,withatleastthesame probability,we have
BLKdiag
"
PT
m
m−kmaxA˜(1)−I
PT, · · ·, PT
m
m−kmaxA˜(L)−I
PT
#
(2,2)
< 1 2$
log(nL). (12) Proof. First, following directlyfrom thefirst partof Lemma3, for alli = 1,· · ·,L, it holds
P r PT
m
m−kmaxA˜(i)−I
PT
(2,2)
≥τ
!
≤2kTexp
"
−m−kmax
kTμκ
τ2 4(1 +2τ3)
# . (13) Takingaunionboundoveralli= 1,· · ·,Lyields
P r BLKdiag
"
PT
m m−kmax
A˜(1)−I
PT, · · ·,
PT
m m−kmax
A˜(L)−I
PT
#
(2,2)
≥τ
!
=P r max
i PT
m m−kmax
A˜(i)−I
PT
(2,2)
!
≥τ
!
≤ L i=1
P r PT
m m−kmax
A˜(i)−I
PT
(2,2)
≥τ
!
≤2kTLexp
"
−m−kmax kTμκ
τ2 4(1 +2τ3)
#
. (14)
Plugginginτ= 12 andusingthefactthatkT ≤αμκlog(nL)m andkmax≤γm,weget
The last line of (14) = 2kTLexp
"
−3(1−γ)
64α log(nL)
#
= 2kTL(nL)−3(1−γ)64α ,
≤2kTL(nL)−3≤2(nL)−2
wherethefirstinequalityfollowsfrom 1−γα ≥64 andthesecondinequalityfollowsfrom kT ≤n.Similarly,plugging inτ = 1
2$
log(nL) and usingthefactthatkT ≤αμκlogm2(nL), weprove(12) aslongas 1−αγ ≥64. 2
Corollary 2.Given that kT ≤αμκlog(nL)m , kmax≤γm,and 1−γα ≥64, then with proba- bilityatleast 1−2(nL)−2,wehave
BLKdiag
"
PT
m
m−kmaxA˜(1)Σ−1(1)−Σ−1(1)
PT,
· · ·, PT
m m−kmax
A˜(L)Σ−(L)1 −Σ−(L)1
PT
#
(2,2)
< κ
2. (15)
Theproofisthesameasproving(11) usingLemma3.Weomitthedetailsforbrevity.
Finally, we havethe following lemma show thatif the supportof thecolumns inA(i) is restricted to Ω∗i,then no columnindexed insideT canbe well approximated by the columnindexedoutsideofT.Inotherwords,thosecolumnscorrespondtothetruesignal matrixshallbe welldistinguished.
Lemma 4 (Off-support incoherence). Denote A˜(i) = Σ−1(i)A(i)PΩ∗iA(i). Given kT ≤ αμκlog(nL)m andα < 241,withprobability atleast1−e14(nL)−2,wehave
i∈{1,···max,L},k∈Tc
PTA˜(i)ek
2≤1, (16)
where {ek}nk=1 isastandardbasisin Rn.
Theproof ofLemma4isgiveninAppendixB.
With particular note, inthe above lemmas and corollaries, all the requirements on theconstants α, β andγ satisfytheboundsinTheorem1.
3. Inexactdualcertificates
This sectiongivesthedualcertificatesoftheRGLmodel,namely,thesufficientcon- ditions under whichthe optimal solutionpair ofthe convex program (2)–(3) is unique andequaltothepairofthetruesignalanderrormatrices.Firstwehavetwopreliminary lemmas.
Lemma 5.Suppose that Y ∈ Rn×L and S ∈ Rm×L are the true group sparse signal and sparseerrormatrices,respectively.If (Y+H,S−F)isanoptimalsolution pairto (2)–(3),where H∈Rn×L andF∈Rm×L,thenthefollowingresultshold:
i)
A(1)h1,· · ·,A(L)hL
=F;
ii) Y+H2,1+λS−F1 ≥ Y2,1+λS1+PTcH2,1+λPΩcF1+V,H− λsgn(S),F,
where V ∈ Rn×L satisfies (PTV)i = y¯y¯ii2 and (PTcV)i = 0, ∀i = 1,· · ·,n. Here (PTV)i denotes thei-throw ofPTVand ¯yi denotesthei-th rowof Y.
Theproof ofLemma5isgiveninAppendixC.
Lemma 6.For any two matrices H ∈ Rn×L and F ∈ Rm×L, with probability at least 1−2n−2,H=0andF=0ifthefollowingconditions aresatisfied:
i) kT ≤αμκlogm2(nL),kmax≤γm,and 1−γα ≥64;
ii)
A(1)h1,· · ·,A(L)hL
=F;
iii) PTcH=0and PΩcF=0.
TheproofofLemma6isgiveninAppendixD.
Belowwe showthattheoptimal solutionpair( ˆY, S)ˆ of theconvexprogram (2)–(3) isequaltothetruesignalandnoisepair(Y,S)whencertaincertificateconditionshold.
Theorem 2 (Inexact duality).Suppose that Y ∈ Rn×L and S ∈ Rm×L are the true groupsparse signalandsparse errormatrices satisfyingtheassumptions inTheorem1.
The pair (Y, S) is the unique solution to the RGL model (2)–(3) with probability at least 1−(2+e14)(nL)−1, if the parameter λ < 1 and there exists a dual certificate (W,V)∈Rm×L×Rn×L suchthat
PTV−VF ≤ λ 4√
κ, (17)
PTcV2,∞≤ 1
4, (18)
PΩcW∞≤ λ
4, (19)
and V=
A(1)PΩc1w1,· · ·,A(L)PΩcLwL +λ
A(1)sgn(¯s1),· · ·,A(L)sgn(¯sL)
, (20) whereV∈Rn×L satisfies(PTV)i =¯yy¯ii2 and (PTcV)i=0.
Proof. Supposethat(Y+H,S−F) isanoptimalsolutionpairto(2)–(3).Therefore,the two resultsinLemma5hold true. Proving thatthepair (Y,S)is the uniquesolution to (2)–(3) is equivalentto showing thatH=0andF=0. Hence,the proofresortsto verifyingthethree conditionsinLemma6.
SinceYandSsatisfytheassumptionsinTheorem 1,wehave kT ≤α m
μκlog2(nL), kmax≤γm
κ, α≤ 1
9600, γ≤1 4.
Consideringκ≥1,weknowthatconditioni)inLemma6holds.Byresulti)ofLemma5, conditionii)alsoholds.Therefore,itremainstoverifyconditioniii), namely,PTcH=0 andPΩcF=0.
ConsiderthetermV,H−λsgn(S),Finresultii)ofLemma5.Usingtheequation V=PTV+PTcV,we rewritethetermas
V,H −λsgn(S),F=V− PTV,H − PTcV,H+V,H −λsgn(S),F. (21) ConsiderthetermV,H−λsgn(S),Fontherighthandsideof(21).By(20),wehave
V,H −λsgn(S),F
=%
A(1)PΩc1w1,· · ·,A(L)PΩcLwL ,H&
+λ%
A(1)sgn(¯s1),· · ·,A(L)sgn(¯sL) ,H&
−λsgn(S),F. Byadjointrelationsofinnerproducts, wehave
%
A(1)PΩc1w1,· · ·,A(L)PΩcLwL
,H
&
='
PΩc1A(1)h1,· · ·,PΩcLA(L)hL
,W( , and
A(1)sgn(¯s1),· · ·,A(L)sgn(¯sL) ,H
=sgn(S),
A(1)h1,· · ·,A(L)hL
.
Thus, itholds
V,H −λsgn(S),F
='
PΩc1A(1)h1,· · ·,PΩcLA(L)hL ,W( +λ'
sgn(S),
A(1)h1,· · ·,A(L)hL
(−λsgn(S),F.
Accordingto conclusioni)inLemma5,whichis
A(1)h1,· · ·,A(L)hL
=F,itfollows V,H −λsgn(S,F)=PΩcF,W.
Combining (21) andaboveequalitygives
V,H −λsgn(S),F=V− PTV,H − PTcV,H+PΩcF,W. (22) Next, we manage to find outa lower bound for the right-hand side of the equality (22).First,by(17),
V− PTV,H ≥ −PTV−VFPTHF ≥ − λ 4√
κPTHF.
Then,by(18),
−PTcV,H ≥ −PTcV2,∞PTcH2,1≥ −1
4PTcH2,1. Finally,by(19),
PΩcF,W ≥ −PΩcF1PΩcW∞≥ −λ
4PΩcF1. Therefore,(22) gives
V,H −λsgn(S),F ≥ − λ 4√
κPTHF−1
4PTcH2,1−λ
4PΩcF1. Substitutetheaboveinequalityinto conclusionii) ofLemma5gives
Y+H2,1+λS−F1
≥ Y2,1+λS1+3
4PTcH2,1+3λ
4 PΩcF1− λ 4√
κPTHF.
Since(Y+H,S−F) isanoptimalsolutionpairto(2)–(3),itfollowsthatY+H2,1+ λS−F1≤ Y2,1+λS1.Hence,wehave
3
4PTcH2,1+3λ
4PΩcF1− λ 4√
κPTHF ≤0. (23) Tocomplete theproof, we needto show thatinequality(23) impliesPTcH=0 and PΩcF=0.Todoso,wefirstderive anupperboundforPTHF,expressed asalinear combinationofPTcH2,1 andPΩcF1.
Using(11),itfollows BLKdiag
"
PT
m m−kmax
A˜(1)−I
PT,· · ·,PT
m m−kmax
A˜(L)−I
PT
#
vec(H) 2
≤ 1
2PTHF. Since BLKdiag{PT,· · ·,PT}vec(H)2 = PTHF, applying the triangle inequality yields
PTHF ≤2 m
m−kmax
BLKdiag
PTA˜(1)PT,· · ·,PTA˜(L)PT
vec(H) 2
.
Observingvec(PTH)= vec(H)−vec(PTcH) andusingthetriangleinequalityagain,we have