HAL Id: hal-00816377
https://hal.archives-ouvertes.fr/hal-00816377
Submitted on 22 Apr 2013
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Robust Polyhedral Regularization
Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili
To cite this version:
Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili. Robust Polyhedral Regularization. International Con-
ference on Sampling Theory and Applications (SampTA), 2013, Bremen, Germany. �hal-00816377�
Robust Polyhedral Regularization
Samuel Vaiter, Gabriel Peyr´e CEREMADE, CNRS-Universit´e Paris-Dauphine,
Place du Mar´echal De Lattre De Tassigny, 75775 Paris Cedex 16, France.
Email: {vaiter,peyre}@ceremade.dauphine.fr
Jalal Fadili
GREYC, CNRS-ENSICAEN-Universit´e de Caen, 6, Bd du Mar´echal Juin,
14050 Caen Cedex, France.
Email: jalal.fadili@greyc.ensicaen.fr
Abstract —In this paper, we establish robustness to noise perturbations of polyhedral regularization of linear inverse problems. We provide a sufficient condition that ensures that the polyhedral face associated to the true vector is equal to that of the recovered one. This criterion also implies that the ℓ
2recovery error is proportional to the noise level for a range of parameter. Our criterion is expressed in terms of the hyperplanes supporting the faces of the unit polyhedral ball of the regularization.
This generalizes to an arbitrary polyhedral regularization results that are known to hold for sparse synthesis and analysis ℓ
1regularization which are encompassed in this framework. As a byproduct, we obtain recovery guarantees for ℓ
∞and ℓ
1− ℓ
∞regularization.
I. I NTRODUCTION
A. Polyhedral Regularization
We consider the following linear inverse problem
y = Φx
0+ w, (1)
where y ∈ R
Qare the observations, x
0∈ R
Nis the unknown true vector to recover, w the bounded noise, and Φ a linear operator which maps the signal domain R
Ninto the observation domain R
Q. The goal is to recover x
0either exactly or to a good approximation.
We call a polyhedron a subset P of R
Nsuch that P = x ∈ R
N| Ax 6 b for some A ∈ R
NH×Nand b ∈ R
NH, where the inequality 6 should be understood component-wise. This is a classical description of convex polyhedral sets in terms of the hyperplanes supporting their (N − 1)-dimensional faces.
In the following, we consider polyhedral convex functions of the form
J
H(x) = max
16i6NH
hx, h
ii,
where H = (h
i)
Ni=1H∈ R
N×NH. Thus, P
H= x ∈ R
N| J
H(x) 6 1 is a polyhedron. We assume that P
His a bounded polyhedron which contains 0 in its interior. This amounts to saying that J
His a gauge, or equivalently that it is continuous, non-negative, sublinear (i.e. convex and positively homogeneous), coercive, and J
H(x) > 0 for x 6= 0. Note that it is in general not a norm because it needs not be symmetric.
In order to solve the linear inverse problem (1), we devise the following regularized problem
x
⋆∈ argmin
x∈RN
1
2 ||y − Φx||
2+ λJ
H(x), (P
λ(y)) where λ > 0 is the regularization parameter. Coercivity and convexity of J
Himplies the set of minimizers is non-empty, convex and compact.
In the noiseless case, w = 0, one usually considers the equality- constrained optimization problem
x
⋆∈ argmin
Φx=y
J
H(x). (P
0(y))
B. Relation to Sparsity and Anti-sparsity
Examples of polyhedral regularization include the ℓ
1-norm, anal- ysis ℓ
1-norm and ℓ
∞-norm. The ℓ
1norm reads
J
H1(x) = ||x||
1=
N
X
i=1
|x
i|.
It corresponds to choosing H
1∈ R
N×2Nwhere the columns of H
1enumerate all possible sign patterns of length N, i.e. {−1, 1}
N. The corresponding regularized problem (P
λ(y)) is the popular Lasso [1]
or Basis Pursuit DeNoising [2]. It is used for recovering sparse vec- tors. Analysis-type sparsity-inducing penalties are obtained through the (semi-)norm J
H(x) = ||Lx||
1, where L ∈ R
P×Nis an analysis operator. This corresponds to using H = L
∗H
1where
∗stands for the adjoint. A popular example is the anisotropic total variation where L is a first-order finite difference operator.
The ℓ
∞norm
J
H∞(x) = ||x||
∞= max
16i6N
|x
i|
corresponds to choosing H
∞= [Id
N, −Id
N] ∈ R
N×2N. This regularization, coined anti-sparse regularization, is used for instance for approximate nearest neighbor search [3].
Another possible instance of polyhedral regularization is the group ℓ
1− ℓ
∞regularization. Let B be a partition of {1, . . . , N}. The ℓ
1− ℓ
∞norm associated to this group structure is
J
HB∞(x) = X
b∈B
||x
b||
∞.
This amounts to choosing the block-diagonal matrix H
B∞∈ R
N×Qb∈B2|b|such that each column is chosen by taking for each block a position with sign ±1, others are 0. If for all b ∈ B, |b| = 1, then we recover the ℓ
1-norm, whereas if the block structure is composed by one element, we get the ℓ
∞-norm.
C. Prior Work
In the special case of ℓ
1and analysis ℓ
1penalties, our criterion is equivalent to those defined in [4] and [5]. To our knowledge, there is no generic guarantee for robustness to noise with ℓ
∞regularization, but [6] studies robustness of a sub-class of polyhedral norms obtained by convex relaxation of combinatorial penalties. Its notion of support is however completely different from ours. The work [7] studies nu- merically some polyhedral regularizations.In [8], the authors provide an homotopy-like algorithm for polyhedral regularization through a continuous problem coined adaptive inverse scale space method.
The work [9] analyzes some particular polyhedral regularizations
in a noiseless compressed sensing setting when the matrix Φ is
drawn from an appropriate random ensemble. Again in a compressed
sensing scenario, the work of [10] studies a subset of polyhedral
regularizations to get sharp estimates of the number of measurements for exact and ℓ
2-stable recovery.
II. C ONTRIBUTIONS
Definition 1. We define the H -support supp
H(x) of a vector x ∈ R
Nto be the set
supp
H(x) = {i ∈ {1, . . . , N
H} | hx, h
ii = J
H(x)} . This definition suggests that to recover signals with H -support supp
H(x), it would be reasonable to impose that Φ is invertible on the corresponding subspace Ker H
supp∗ H(x). This is formalised in the following condition.
Definition 2. A H -support I satisfies the restricted injectivity con- dition if
Ker Φ ∩ Ker H
I∗= {0}, (C
I) where H
Iis the matrix whose columns are those of H indexed by I.
When it holds, we define the orthogonal projection Γ
Ion Φ Ker H
I∗:
M
I= (U
∗Φ
∗ΦU )
−1and
( Γ
I= ΦU M
IU
∗Φ
∗Γ
⊥I= Id − Γ
I.
where U is (any) basis of Ker H
I∗. The symmetric bilinear form on R
Ninduced by Γ
⊥Ireads
hu, vi
Γ⊥I
= hu, Γ
⊥Ivi, and we denote its associated quadratic form || · ||
2Γ⊥I
.
Definition 3. Let I be a H -support such that (C
I) holds. The Identifiability Criterion of I is
IC
H(I ) = max
zI∈KerHI
min
i∈I( ˜ Φ
∗IΓ
⊥IΦ ˜
II
I+ z
I)
iwhere I
I∈ R
|I|is the vector with coefficients 1, and Φ ˜
I= ΦH
I+,∗∈ R
Q×|I|where
+stands for the Moore–Penrose pseudo-inverse.
IC
H(I) can be computed by solving the linear program IC
H(I) = max
(r,zI)∈R×R|I|
r subj. to
( ∀i ∈ I, r 6 ( ˜ Φ
∗IΓ
⊥IΦ ˜
II
I+ z
I)
iH
Iz
I= 0.
A. Noise Robustness
Our main contribution is the following result.
Theorem 1. Let x
0∈ R
N\ {0} and I its H-support such that (C
I) holds. Let y = Φx
0+ w. Suppose that Φ ˜
II
I6= 0 and IC
H(I ) > 0.
Then there exists two constants c
I, ˜ c
Isatisfying,
||w||
2T < ˜ c
Ic
Iwhere T = min
j∈Ic
J
H(x
0) − hx
0, h
ji > 0, such that if λ is chosen according to
c
I||w||
2< λ < T c ˜
I,
the vector x
⋆∈ R
Ndefined by
x
⋆= µH
I+,∗I
I+ U M
IU
∗Φ
∗(y − µ Φ ˜
II
I) where U is any basis of Ker H
I∗and
0 < µ = J
H(x
0) + h Φ ˜
II
I, wi
Γ⊥ I− λ
|| Φ ˜
II
I||
2Γ⊥ I(2) is the unique solution of (P
λ(y)), and x
⋆lives on the same face as x
0, i.e. supp
H(x
⋆) = supp
H(x
0).
Observe that if λ is chosen proportional to the noise level, then
||x
⋆− x
0||
2= O(||w||
2). The following proposition proves that the condition IC
H(I) > 0 is almost a necessary condition to ensure the stability of the H -support. Its proof is omitted for obvious space limitation reasons.
Proposition 1. Let x
0∈ R
N\{0} and I its H-support such that (C
I) holds. Let y = Φx
0+w. Suppose that Φ ˜
II
I6= 0 and IC
H(I) < 0. If
||w||
λ
<
c1Ithen for any solution of (P
λ(y)), we have supp
H(x
0) 6=
supp
H(x
⋆).
B. Noiseless Identifiability
When there is no noise, the following result, which is a straightfor- ward consequence of Theorem 1, shows that the condition IC
H(I) >
0 implies signal identifiability.
Theorem 2. Let x
0∈ R
N\ {0} and I its H -support. Suppose that Φ ˜
II
I6= 0 and IC
H(I) > 0. Then the vector x
0is the unique solution of (P
0(y)).
III. P ROOFS
A. Preparatory Lemmata
We recall the definition of the subdifferential of a convex function f at the point x is the set ∂f(x) is
∂f(x) = n
g ∈ R
N| f(y) > f(x) + hg, y − xi o . The following lemma, which is a direct consequence of the properties of the max function, gives the subdifferential of the regularization function J
H.
Lemma 1. The subdifferential ∂J
Hat x ∈ R
Nreads
∂J
H(x) = H
IΣ
Iwhere I = supp
H(x) and Σ
Iis the canonical simplex on R
|I|: Σ
I= n
v
I∈ R
|I|| v
I> 0, hv
I, I
Ii = 1 o .
A point x
⋆is a minimizer of min
xf(x) if, and only if, 0 ∈
∂f (x
⋆). Thanks to Lemma 1, this gives the first-order condition for the problem (P
λ(y)).
Lemma 2. A vector x
⋆is a solution of (P
λ(y)) if, and only if, there exists v
I∈ Σ
Isuch that
Φ
∗(Φx − y) + λH
Iv
I= 0,
where I = supp
H(x).
We now introduce the following so-called source condition.
( SC
x): For I = supp
H(x), there exists η and v
I∈ Σ
Isuch that:
Φ
∗η = H
Iv
I∈ ∂J
H(x).
Under the source condition, a sufficient uniqueness condition can be derived when v
Ilives in the relative interior of Σ
Iwhich is
ri Σ
I= n
v
I∈ R
|I|| v
I> 0, hv
I, I
Ii = 1 o .
Lemma 3. Let x
⋆be a minimizer of (P
λ(y)) (resp. (P
0(y))) and I = supp
H(x
⋆). Assume that ( SC
x⋆) is verified with v
I∈ ri Σ
I, and that (C
I) holds. Then x
⋆is the unique solution of (P
λ(y)) (resp. (P
0(y))).
The proof of this lemma is omitted due to lack of space. Observe
that in the noiseless case, if the assumptions of Lemma 3 hold at x
0,
then the latter is exactly recovered by solving (P
0(y)).
Lemma 4. Let x
⋆∈ R
Nand I = supp
H(x
⋆). Assume (C
I) holds.
Let U be any basis of Ker H
I∗. There exists z
I∈ Ker H
Isuch that U
∗Φ
∗(Φx
⋆− y) = 0
v
I= z
I+ 1
λ H
I+Φ
∗(y − Φx
⋆) ∈ Σ
I,
if, and only if, x
⋆is a solution of (P
λ(y)). Moreover, if v
I∈ ri Σ
I, then x
⋆is the unique solution of (P
λ(y)).
Proof: We compute Φ
∗(Φx
⋆− y) + λH
Iv
I=Φ
∗(Φx
⋆− y) + λH
Iz
I+ 1
λ H
I+Φ
∗(y − Φx
⋆)
=(Id − H
IH
I+)Φ
∗(Φx
⋆− y) = proj
H∗I
(Φ
∗(Φx
⋆− y)) = 0, where proj
H∗I
is the projection on Ker H
I∗. Hence, x
⋆is a solution of (P
λ(y)). If v
I∈ ri Σ
I, then according to Lemma 3, x
⋆is the unique solution.
The following lemma is a simplified rewriting of the condition introduced in Lemma 4.
Lemma 5. Let x
⋆∈ R
N, I = supp
H(x
⋆) and µ = J
H(x
⋆).
Assume (C
I) holds. Let U be any basis of Ker H
I∗. There exists z ∈ Ker H
Isuch that
v
I= z
I+ 1
λ Φ ˜
∗IΓ
⊥I(y − µ Φ ˜
II
I) ∈ Σ
I,
if, and only if, x
⋆is a solution of (P
λ(y)). Moreover, if v
I∈ ri Σ
I, then x
⋆is the unique solution of (P
λ(y)).
Proof: Note that any vector x ∈ R
Nsuch that the condition (C
I) holds, where I is the H -support of x, is such that
x = µH
I+,∗I
I+ U α where µ = J
H(x), for some coefficients α and U any basis of Ker H
I∗. We obtain
U Φ
∗(Φx
⋆− y) = µUΦ
∗ΦH
I+,∗I
I− U Φ
∗y + U Φ
∗ΦU α = 0 Since (C
I) holds, we have
α = (UΦ
∗ΦU α)
−1U Φ
∗y − µ Φ ˜
II
I. Hence,
ΦU α = Γ
Iy − µ Φ ˜
II
I. Now since, x
⋆= µH
I+,∗I
I+ U α, one has
Φx
⋆= µ Φ ˜
II
I+ Γ
Iy − µ Φ ˜
II
I= µΓ
⊥IΦ ˜
II
I+ Γ
Iy.
Subtracting y and multiplying by Φ ˜
∗Iboth sides, and replacing in the expression of v
Iin Lemma 4, we get the desired result.
B. Proof of Theorem 1
Let I be the H -support of x
0. We consider the restriction of (P
λ(y)) to the H -support I.
x
⋆= argmax
x∈RN suppH(x)⊆I
1
2 ||y − Φx||
22+ J
H(x). (P
λ(y)
I) Thanks to (C
I), the objective function is strongly convex on the set of signals of H -support I. Hence x
⋆is uniquely defined. The proof is divided in five parts: We give (1.) an implicit form of x
⋆. We check (2.) that the H -support of x
⋆is the same as the H-support of x
0. We provide (3.) the value of J
H(x
⋆). Using Lemma 5, we prove (4.) that x
⋆is the unique minimizer of (P
λ(y)).
1. Expression of x
⋆. One has x
⋆= µH
I+,∗I
I+ U α where µ = J
H(x
⋆). Hence,
U
∗Φ
∗(Φx − y) = µU
∗Φ
∗ΦH
I+,∗I
I+ (U
∗Φ
∗ΦU )α − U
∗Φ
∗y = 0.
Thus,
U α = U M
IU
∗Φ
∗(y − µΦH
I+,∗I
I).
Now, since y = Φx
0+ w, with supp
H(x
0) = I, then x
⋆= µH
I+,∗I
I+ U M
IU
∗Φ
∗(y − µΦH
I+,∗I
I)
= µH
I+,∗I
I+ U M
IU
∗Φ
∗((µ
0− µ)ΦH
I+,∗I
I+ w) + U α
0= x
0− (µ
0− µ)H
I+,∗I
I+ U M
IU
∗Φ
∗((µ
0− µ)ΦH
I+,∗I
I+ w), where µ
0= J
H(x
0). Hence, x
⋆is satisfying
x
⋆= x
0+(µ
0−µ)[U M
IU
∗Φ
∗Φ−Id]H
I+,∗I
I+U M
IU
∗Φ
∗w. (3) 2. Checking that the H-support of x
⋆is I . To ensure that the H -support of x
⋆is I we have to impose that
∀i ∈ I, hh
i, x
⋆i = J
H(x
⋆) = µ
∀j ∈ I
c, hh
j, x
⋆i < J
H(x
⋆) = µ.
The components on I of x
⋆are satisfying H
I∗x
⋆= µI
I. Since J
His subadditive, we bound the components on I
cby the triangular inequality on (3) to get
max
j∈Ichh
j, x
⋆i 6 max
j∈Ic
hh
j, x
0i
+ (µ
0− µ)||H
I∗c[U M
IU
∗Φ
∗Φ − Id]H
I+,∗I
I||
∞+ ||H
I∗cU M
IU
∗Φ
∗w||
∞. Denoting
C
1= ||H
I∗c[U M
IU
∗Φ
∗Φ − Id]H
I+,∗I
I||
∞, C
2= ||H
I∗cU M
IU
∗Φ
∗||
2,∞,
T = µ
0− max
j∈Ic
hh
j, x
0i,
we bound the correlations outside the H -support by max
j∈Ichh
j, x
⋆i 6 µ
0− T + (µ
0− µ)C
1+ C
2||w||.
There exists some constants c
1, c
2satisfying c
1||w|| < c
2T + λ such that
0 6 µ
0− T + (µ
0− µ)C
1+ C
2||w|| < µ (4) Under this condition, one has
max
j∈Ichh
j, x
⋆i < µ, which proves that supp
H(x
⋆) = I .
3. Value of µ = J
H(x
⋆). Using Lemma 5 with H = U
∗H , since x
⋆is a solution of (P
λ(y)
I), there exists z
I∈ Ker H
Isuch that
v
I= z
I+ 1
λ Φ ˜
∗IΓ
⊥I(y − µ Φ ˜
II
I) ∈ Σ
I. (5) We decompose x
0as
x
0= µ
0H
I+,∗I
I+ U α
0. Since y = Φx
0+ w, we have
Γ
⊥Iy = Γ
⊥I(µ
0Φ ˜
II
I+ ΦU α
0+ w).
Now since
Γ
IΦU α
0= ΦU (U
∗Φ
∗ΦU )
−1U
∗Φ
∗ΦU α
0= ΦU α
0,
one obtains
Γ
⊥Iy = µ
0Γ
⊥IΦ ˜
II
I+ Γ
⊥Iw.
Thus, equation (5) equivalently reads v
I= z
I+ 1
λ Φ ˜
∗IΓ
⊥I(µ
0− µ) ˜ Φ
II
I+ w . In particular, hv
I, I
Ii = λ. Thus,
λ = hλv
I, I
Ii = hλ˜ z
I, I
Ii + h Φ ˜
∗IΓ
⊥I((µ
0− µ) ˜ Φ
II
I+ w, I
Ii.
Since z ˜
I∈ Ker H
I, one has hz
I, I
Ii = 0.
λ = h Φ ˜
∗IΓ
⊥I((µ
0− µ) ˜ Φ
II
I+ w, I
Ii
= (µ
0− µ)|| Φ ˜
II
I||
2Γ⊥I
+ h Φ ˜
II
I, wi
Γ⊥ I. Thus the value of µ is given by
µ = µ
0+ h Φ ˜
II
I, wi
Γ⊥ I− λ
|| Φ ˜
II
I||
2Γ⊥ I> 0. (6)
4. Checking conditions of Lemma 5. Consider now the vector
˜
v
Idefined by
˜
v
I= ˜ z
I+ 1 λ
Φ ˜
∗IΓ
⊥I(µ
0− µ) ˜ Φ
II
I+ w , where
˜ z
I= 1
µ − µ
0argmax
zI∈KerHI
min
i∈I( ˜ Φ
∗IΓ
⊥IΦ ˜
II
I+ z
I)
iUnder condition (4), the H -support of x
⋆is I, hence we only have to check that ˜ v
Iis an element of ri Σ
I. Since h˜ z
I, I
Ii = 0, one has
h˜ v
I, I
Ii
=hz
I+ 1 λ Φ ˜
∗IΓ
⊥I(µ
0− µ) ˜ Φ
II
I+ w
, I
Ii + h˜ z
I− z
I, I
Ii
=hv
I, I
Ii + 0 = λ.
Plugging back the expression (6) of (µ
0− µ) in the definition of v ˜
I, one has
v ˜
I= ˜ z
I+ 1 λ
Φ ˜
∗IΓ
⊥Iw + h Φ ˜
II
I, wi
Γ⊥ I− λ
|| Φ ˜
II
I||
2Γ⊥I
Φ ˜
∗IΓ
⊥IΦ ˜
II
I
.
For some constant c
3such that c
3||w|| − IC
H(I) · λ > 0, one has
∀i ∈ I, v
i> 0.
Combining this with the fact that h˜ v
I, I
Ii = λ proves that v ˜
I∈ ri Σ
I. According to Lemma 5, x
⋆is the unique minimizer of (P
λ(y)).
C. Proof of Theorem 2
Taking w = 0 in Theorem 1, we obtain immediately
Lemma 6. Let x
0∈ R
N\ {0} and I its H -support such that (C
I) holds. Let y = Φx
0. Suppose that Φ ˜
II
I6= 0 and IC
H(I) > 0. Let T = min
j∈Ic