HAL Id: hal-01782346
https://hal.archives-ouvertes.fr/hal-01782346v4
Submitted on 3 Feb 2020
HAL
is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire
HAL, estdestinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford-Shah Model
Marion Foare, Nelly Pustelnik, Laurent Condat
To cite this version:
Marion Foare, Nelly Pustelnik, Laurent Condat. Semi-Linearized Proximal Alternating Minimization
for a Discrete Mumford-Shah Model. IEEE Transactions on Image Processing, Institute of Electrical
and Electronics Engineers, 2019, pp.1-13. �10.1109/TIP.2019.2944561�. �hal-01782346v4�
Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford–Shah Model
Marion Foare,Nelly Pustelnik,and Laurent Condat
∗†‡September 25, 2019
Abstract
The Mumford–Shah model is a standard model in image segmentation, and due to its dif- ficulty, many approximations have been proposed. The major interest of this functional is to enable joint image restoration and contour detection. In this work, we propose a general for- mulation of the discrete counterpart of the Mumford–Shah functional, adapted to nonsmooth penalizations, fitting the assumptions required by the Proximal Alternating Linearized Mini- mization (PALM), with convergence guarantees. A second contribution aims to relax some as- sumptions on the involved functionals and derive a novel Semi-Linearized Proximal Alternated Minimization (SL-PAM) algorithm, with proved convergence. We compare the performances of the algorithm with several nonsmooth penalizations, for Gaussian and Poisson denoising, im- age restoration and RGB-color denoising. We compare the results with state-of-the-art convex relaxations of the Mumford–Shah functional, and a discrete version of the Ambrosio–Tortorelli functional. We show that the SL-PAM algorithm is faster than the original PALM algorithm, and leads to competitive denoising, restoration and segmentation results.
Keywords – Segmentation, restoration, inverse problems, nonsmooth optimization, noncon- vex optimization, proximal algorithms, PALM, Mumford–Shah.
1 Introduction
The topic of inverse problems is of major interest for a large panel of applications, going from microscopy (see e.g. [2, 3]) or tomography (see [4, 5, 6, 7] and the reference therein) to atmospheric science and oceanography [8]. The pioneering regularization approaches to solve inverse problems
∗This work is supported by Defi Imag’In SIROCCO.
†Part of this work has been presented in [1]. In this extended paper, the added contributions are 1) a general discrete Mumford-Shah model, 2) the convergence proof of the algorithm proposed in [1] and more experimental results.
‡M. Foare (corresponding author) and N. Pustelnik are with Univ. Lyon, ENS de Lyon, Univ. Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France. L. Condat is with Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France.
can be traced back to the works by Tikhonov [9] and by Geman and Geman [10]. The major challenge of this topic consists in designing jointly a cost function and an algorithm (to estimate its minimum) in order to obtain a solution that is the closest to the original unknown one. The recent development of proximal algorithms [11, 12] led to significant advances, thanks to the possibility to efficiently deal with large-size data and nonsmooth objective functions (e.g., nonlocal total- variation constraints, analysis-synthesis formulation, Kullback–Leibler divergence) [13].
In this work, we focus on image restoration and we denote the multicomponent image to recover by u = (u
m)
1≤m≤M∈
RNM, where each column of u is the vectorized representation of the m-th component. The degradation model we consider takes the form:
( ∀ m ∈ { 1, . . . , M } ) z
m= D
α(A
mu
m), (1) where A
m∈
RL×Nmodels a linear degradation (e.g. a blur, a compressed sensing matrix, a wrapping matrix) and D
α:
RL→
RLdenotes a random degradation that can be white Gaussian noise, leading to an additive model, or Poisson noise. The objective of this work is to estimate jointly the restored image u
band its contours, denoted by
be in the following, from the degraded data z.
One of the standard (variational) approach to solving such an ill-posed inverse problem consists in dealing with a regularization, by minimizing a sum of functionals. The variational formulation of this problem, when white Gaussian noise is involved, reads:
b
u = argmin
u
1
2σ
2k Au − z k
22+ ρ(u), (2)
where ρ is a “well-chosen” regularizing functional, which allows us to denoise, while preserving the discontinuities. Hence, it generally involves the gradient of the estimate. A classical choice is ρ(u) = k Du k
0, where D models the finite difference operator and k · k
0is the `
0pseudo-norm, which is known as the L2-Potts model [14], or ρ(u) = TV(u), the Total Variation model [15], which is convex. However, these models are restricted to piecewise constant estimates, and do not integrate contour detection in the variational formulation, which is performed as post-processing step. The main limitation of such a two-step procedure for contour detection is the difficulty of appropriately selecting the thresholding rule used for edge detection.
Mumford and Shah proposed to consider a more general regularizing term, depending on both the gradient of the estimate and the set of discontinuities [16]. The latter becomes an unknown variable in the problem. Since the Mumford–Shah (MS) formalism is generally formulated in a continuous setting, we denote by Ω ⊂
R2the image domain. The MS model aims at estimating both
bu ∈ W
1,2(Ω)
1, a piecewise smooth approximation of an image z ∈ L
∞(Ω), and the set of discontinuities K ⊂ Ω, such that the pair ( u, K) is an optimal solution of:
bminimize
u,K
1 2
Z
Ω
(u − z)
2dxdy + β
ZΩ\K
|∇ u |
2dxdy + λ | K | , (3)
where the first term acts as a data fidelity term and forces the approximation u to be close to z, the second term penalizes strong variations except at the locations K of the strong edges, and | K |
1W1,2(Ω) =
u∈L2(Ω) : ∂u∈L2(Ω) where∂denotes the weak derivative operator.
denotes the total length of the arcs forming K, thus the minimization of this functional implies that
| K | is small at a solution. Finally, β > 0 and λ > 0 denote regularization parameters controlling the smoothness and the length of K respectively.
Following discretization ideas proposed in the original paper of Mumford and Shah [16], we assume that u and z are functions on a lattice instead of functions on a two-dimensional region, and we denote them by u and z, respectively (referring to (1)). K models the path made up of lines between all pairs of adjacent lattice points where u has sharp transitions, as illustrated in Figure 1. In a discrete setting, K is thus replaced by the variable e ∈
R|E|, which denotes the edges between nodes (e.g. if the set of edges are limited to the horizontal and vertical edges between two pixels, then |E| = 2N − N
1− N
2, where N = N
1× N
2is the size of the grid), and whose value is 1 when a contour change is detected, and 0 otherwise. A discrete counterpart of (3) can be written:
minimize
u∈RNM,e∈R|E|
1
2 k u − z k
22+ β k (1 − e) Du k
2+ λ R (e), (4) where D ∈
R|E|×Nmodels a finite difference operator and R denotes a penalization term, that favors sparse solutions, which is a discrete translation of “short | K | ”. Note that there is no need to add additional constraints on e, since both (1 − e) and R (e) should force it to stay between 0 and 1.
Related works. One of the most popular convex relaxation of the MS functional is the total
variation functional (ROF for Rudin–Osher–Fatemi) [15, 17], which favors piecewise constant re-
sults, while preserving the discontinuities. Its `
0counterpart has been studied in [18, 19], leading
to the L
2-Potts formulation (`
0-penalization on Du). The authors in [20] proposed two convex
relaxations of the MS functional designed for discrete domain with continuous labels. As empha-
sized by the authors, proper convergence may be difficult to achieve for some parameterization,
and these two methods are not able to detect the contours. For these reasons, we do not consider
it in further comparisons. The Chan–Vese model can also be considered as a relaxation of the MS
model, whose main limitation is due to a prior label number, and a piecewise constant estimate
[21, 22]. In addition, Cai et al. [23, 24] have discussed the links between ROF minimizers [15] with
a post-processing step of thresholding and the piecewise constant MS solutions. The proposed
algorithm relies on updating iteratively the threshold from the ROF solution. Some approaches
have been derived from the Blake-Zisserman model [25, 26]. The convergence to local minimizers
is proved only in 1-D [25]. In addition to the weak convergence guarantees of the 2-D formula-
tion [26], there is no flexibility in the choice of the data-term, whose proximity operator should
have a closed form expression. In the same spirit, the approach of Strekalovskiy et al. [27] relies
on a truncated quadratic penalization of the gradient of the estimate. They derive a heuristic
algorithm, based on a convex relaxation of the functional they propose, and extract the contours
by thresholding. Recently, Li et al. [28] proposed a nonlocal ROF (NL-ROF) model, similar to
the AT functional, where the gradient is computed in a weighted neighborhood. Convergence is
proved, but the contours are obtained by post-processing the estimated image. The first author
and her collaborators [29, 30] proposed a new formulation of the AT functional in the framework
of Discrete Calculus. They obtain true 1-dimensional contours. But since they still have to deal
with the ε parameter, their algorithm is particularly slow.
K z=η1
z=η2
u1
u2
u3
u4
u5
u6
u7
u8
u9
e1,2
e1,4
{be= 1}
Figure 1: Continuous versus discrete formulations of the MS model (4). In the discrete setting, when D = [D
>h, D
v>]
>models the concatenation of the horizontal and vertical difference operators, the values of D
hu (resp. D
vu) live on the horizontal (resp. vertical) midgrid, and so does e. K and {b e = 1 } are delineated in red.
Contributions and outline. In order to jointly identify the edges and to restore the image, our contributions are 1) to define a theoretical framework making the bridge between a discrete version of the Mumford–Shah model, called D-MS, and the objective function handled by the Proximal Al- ternating Linearized Minimization (PALM) algorithm [31], 2) to provide a new algorithmic scheme, called Semi-Linearized Proximal Alternating Minimization (SL-PAM), aiming to combine one step of PAM [32] with one step of PALM [31], allowing to relax condition of a stepsize parameter, and having convergence guarantees. The convergence proof is derived. The efficiency of the proposed algorithmic scheme is illustrated on several restoration examples: Gaussian and Poisson denoising, color denoising and image restoration. Comparisons to state-of-the-art approaches are performed on the color denoising example.
Our general D-MS model is defined in Section 2. PALM formulated to solve D-MS is defined in Section 3.1 as well as additional assumptions under the D-MS objective function allowing to ensure convergence. The proposed SL-PAM is derived in Section 3.2. Experiments and comparisons are provided in Section 4.
2 Generalized Discrete Mumford–Shah Model
The Discrete-MS (D-MS) model proposed in this work is expressed as follows.
Problem 1. Let z ∈
RLMand A ∈
RLM×NM. Let L (A · , z) :
RNM→ ( −∞ , + ∞ ], be a fidelity term to the data z, and R :
R|E|→ ( −∞ , + ∞ ], be a regularizer term which enforces sparsity and acts as a length term, both being proper and lower semicontinuous functions. Let S :
R|E|×
RNM→
R, be the coupling term, which penalizes strong variations, except at edges, be a C
1function and such that ∇S is Lipschitz continuous on bounded subsets of
R|E|×R
NM. The general D-MS-like problem we aim to solve reads:
minimize
u∈RNM,e∈R|E|
Ψ(u, e) := L (Au, z) + β S e, u
+ λ R (e). (5)
The possibilities of this problem compared to the state-of-the-art formulation are :
• the generalization of the data-term, allowing to deal with linear degradation, and not restricted to the Euclidean norm like in the original MS model. For instance, L (Au, z) =
Pm
k A
mu
m− z
mk
2suited to data corrupted by both a linear degradation and white Gaussian noise [33, 34, 35]. A choice L (u, z) =
Pm
k u
m− z
mk
1fits data degraded with impulse noise [35], while the choice of the Kullback–Leibler divergence L (u, z) =
Pm
DKL(u
m, z
m) is employed for data corrupted by Poisson noise [36, 37].
• the possibility to deal with a large panel of regularization terms R . One of the most popular choice of R encountered in the literature is R
AT(e) = ε k De k
22+
4ε1k e k
22, with ε > 0 and D being a difference operator, proposed by Ambrosio and Tortorelli [38, 39]. Such a contour penalization makes (4) Γ-converge to the MS functional as ε tends to 0. As a matter of fact, large values of ε lead to thick contours but help to detect the set of discontinuities. Then, as ε tends to 0, the penalization of k e k
22increases and enforces e to become sparser and sparser, and thus contours becoming thinner and thinner. Numerically, however, it is not possible for ε to be arbitrarily small since it controls the thickness of the contours.
• the flexibility in the coupling term S . Since the MS functional is originally designed with a L
2-penalization of the gradient of u, a common choice for the coupling term is S e, u
P=
m
k (1 − e) Du
mk
22[39, 40], where D is defined as in (4). However, in [41], Shah proposed to replace the L
2-norm with a coupling term involving the L
1-norm such as S e, u
=
Pm
k (1 − e) (1 − e) Du
mk
1, combined with the AT regularizer. Alicandro et al. [42] proved the Γ-convergence of this particular functional to a variant of the MS functional, involving the Cantor part of Du. Experiments show that this ROF-like coupling term is more robust to image gradients, but eliminates high-frequency content. More recently, Li et al. [28] suggested to set e
p= { e
(q)p}
q∈B, where B is a box centered at the pixel p, as weights of the dissimilarity D
p(q)u
m= u
m,p− u
m,p+q. The regularization functional is thus a nonlocal ROF (NL-ROF) of the form S (e, u) =
Pm
P
p∈E
q
Σ
q∈Be
pq(D
(q)pu
m)
2. In this approach, the contours are not obtained from e but by thresholding u, leading to less accurate estimation.
We can remark that, when dealing with multivariate images, contours can be defined either as similar edges through all the components, or as distinct edges, leading to a path K that may be different for all the components. In order to facilitate the understanding and the reading, we formulate Problem 1 in the context of similar edges. But Problem 1, as well as the following results, can be similarly derived for distinct edges considering e ∈
R|E|×M. When M = 1, both formalisms are equivalent.
3 Algorithms
In order to solve Problem 1, we propose two algorithmic strategies. The first one relies on the
PALM algorithm [31], and requires additional assumptions on the function involved in order to
ensure convergence guarantees. The second is an alternative to PALM, that we called SL-PAM,
allowing to relax some of the assumptions made on the coupling term.
3.1 PALM for D-MS
The following Algorithm 1, which is an instance of the generic algorithm PALM [31], is tailored to solving Problem 1:
Algorithm 1 (PALM) for solving D-MS (5) Set u
[0]∈
RNMand e
[0]∈
R|E|.
For k ∈
N
Set γ > 1 and c
k= γν (e
[k]).
u
[k+1]∈ prox
1 ckL(A·,z)u
[k]−
c1k∇
uS e
[k], u
[k]Set δ > 1 and d
k= δε(u
[k+1]).
e
[k+1]∈ prox
1dkλR
e
[k]−
d1k∇
eS e
[k], u
[k+1]It consists in updating alternately the image u
[k]and the edges e
[k]by means of proximity operator steps, defined as,
( ∀ x ∈
RN) prox
f(x) = argmin
y∈RN
1
2 k y − x k
22+ f (y), (6) where f :
RN→ ( −∞ , + ∞ ] denotes a proper and lower semi-continuous function. Algorithm 1 converges under some assumptions listed in the following proposition:
Proposition 1. The sequence (u
[k], e
[k])
k∈Ngenerated by Algorithm 1 converges to a critical point of Problem 1 if
i) the updating steps of u
[k+1]and e
[k+1]have closed form expressions;
ii) the sequence (u
[k], e
[k])
k∈Ngenerated by Algorithm 1 is bounded;
iii) L (A · , z), R and Ψ( · , · ) are bounded below;
iv) Ψ is a Kurdyka- Lojasiewicz function [31, Definition 2.3];
v) ∇
uS and ∇
eS are globally Lipschitz continuous with moduli ν e
and ε u
respectively, and for all k ∈
N, ν e
[k]and ε u
[k]are bounded by positive constants.
Proof. The form of Problem 1 and the assumptions in Proposition 1 fit the requirements for convergence of the PALM algorithm described in [31, Assumptions A-B, Theorem 3.1].
From the practical point of view, the major challenge regarding the assumptions in Proposi-
tion 1 is to ensure that L (resp. R ) has a closed form expression for the associated proximity
operator. A large number of functions having a closed form expression of their proximal maps is
listed in [12, 43, 44] going from `
p-norm to gamma divergences. The main difficulty is due to the
linear operator A. Indeed, the proximity operator of a function composed with a linear operator
has a closed form expression if
• L ( · , z) = k · − z k
22and A
∗A is invertible [43], leading to ( ∀ γ > 0)( ∀ u ∈
RNM),
γ L (A · , z)u = (
I+ γA
∗A)
−1(u + γA
∗z); (7)
• A models a frame (or a semi-orthogonal) linear operator [13], i.e. A
∗A = µI with µ > 0, taking the form ( ∀ γ > 0)( ∀ u ∈
RNM),
prox
γL(A·,z)(u) = u + µ
−1A
∗(prox
γµL(Au) − Au). (8) Moreover, assumption ii) in Proposition 1 holds in several scenarios, such as when the functions L (A., z) and R have bounded level sets. The reader could refer to [32, Remark 5] and [31, Remark 3.4] for more details about this boundedness assumption.
3.2 Proposed SL-PAM
We propose an alternative to PALM, where the update u
[k+1]exploits the linearization and where the update e
[k+1]relies on the proximity operator of the function β S ( · , u
[k+1]) + λ R . The resulting Semi-Linearized PAM (SL-PAM) is described in Algorithm 2, that does not require ε(u
[k]) to be bounded and allows us to choose larger d
k.
Algorithm 2 (SL-PAM) algorithm for solving D-MS (5) Set u
[0]∈
RNMand e
[0]∈
R|E|.
For k ∈
N
Set γ > 1 and c
k= γν (e
[k]).
u
[k+1]∈ prox
1ckL(A·,z)
u
[k]−
c1k∇
uS e
[k], u
[k]Set d
k> 0.
e
[k+1]∈ prox
1dk λR+βS(·,u[k+1])
e
[k]The convergence of Algorithm 2 is ensured under Assumption 1.
Assumption 1. i) The updating steps of u
[k+1]and e
[k+1]have closed form expressions;
ii) Ψ is a Kurdyka- Lojasiewicz function;
iii) L (A · , z), R and Ψ are bounded below;
iv) ∇
uS is globally Lipschitz continuous with moduli ν (e
[k]) k ∈
Nand there exists ν
−, ν
+> 0 such that ν
−≤ ν(e
[k]) ≤ ν
+;
v) (d
k)
k∈Nis a positive sequence such that the stepsizes d
kbelong to (d
−, d
+), for some positive d
−≤ d
+.
Proposition 2. Under Assumption 1, and let assume that the sequence { x
[k]}
k∈N= { (u
[k], e
[k]) }
k∈Ngenerated by Algorithm 2 is bounded. Then
i) Σ
∞k=1k x
[k+1]− x
[k]k < ∞ ;
ii) { x
[k]}
k∈Nconverges to a critical point (u
∗, e
∗) of Ψ.
The proof relies on the general proof recipe given in [31], divided into three main steps: (i) sufficient decrease property, (ii) subgradient lower bound for the iterate gap, and (iii) Kurdyka- Lojasiewicz property. These three steps are detailed thereafter, where we set x
[k]= (u
[k], e
[k]).
Moreover, the Assumptions 1.i), 1.ii), and 1.iii) are discussed at the beginning of each experimental section (IV.B, IV.D). In addition, we provide comments in Section III.B.3) for the KL assumption, and in Section III.C for the closed form expression of the proximity operators. While the validity of Assumption 1.iv) is ensured by the definition of S provided in Section IV.A.
3.2.1 Sufficient decrease property
The objective is to find ρ
1> 0 such that ( ∀ k ∈
N)ρ
12 k x
[k+1]− x
[k]k
2≤ Ψ(x
[k]) − Ψ(x
[k+1]). (9)
This results relies on the following Lemma.
Lemma 1. Let { x
[k]}
k∈Nbe a sequence generated by Algorithm 2. Then
i) the sequence { Ψ(x
[k]) }
k∈Nis nonincreasing, in particular:
( ∀ k ∈
N) ρ
12 k x
[k+1]− x
[k]k
2≤ Ψ(x
[k]) − Ψ(x
[k+1]), where ρ
1= min { (γ − 1)ν
−, d
−} ;
ii) Σ
∞k=0k x
[k+1]− x
[k]k
2< ∞ and lim
k→∞
k x
[k+1]− x
[k]k = 0.
The proof is given in Appendix 6.1.
3.2.2 A subgradient lower bound for the iterates gap
This step relies on Lemma 2.
Lemma 2. Assume that the sequence { x
[k]}
k∈Ngenerated by Algorithm 2 is bounded. Define A
ku:= c
k−1(u
[k−1]− u
[k]) + ∇
uS e
[k], u
[k]− ∇
uS e
[k−1], u
[k−1], (10)
A
ke:= d
k−1(e
[k−1]− e
[k]). (11)
Then (A
ku, A
ke) ∈ ∂Ψ(u
[k], e
[k]) and there exists M > 0 such that
k (A
ku, A
ke) k ≤ k A
kuk + k A
kek ≤ 2(M + ρ
2) k x
[k−1]− x
[k]k (12) where ρ
2= γ
kν
++ d
+.
The proof is given in Appendix 6.2.
3.2.3 Kurdyka- Lojasiewicz property
This step relies on the assumption that Ψ is a Kurdyka- Lojasiewicz (KL) function, and proves that the minimizing sequence { x
[k]}
k∈Nis a Cauchy sequence. According to [31, Theorem 5.1], if Ψ :
RNM→
Ris a proper, lower semi-continuous (l.s.c.), and semi-algebraic function, then it satisfies the KL property at any point of domΨ. The proof of this step is the same as for [31, Lemma 3.6].
3.3 Additional Comments on Assumption 1-i)
The conditions to obtain a closed form expression for the update of u
[k+1]are similar to the ones detailed in Section 3.1. The tedious part concerns the update of e
[k+1]for which a closed form expression is provided in Proposition 3 for specific choices of S and R .
Proposition 3. Let D :
R|E|×N. For every (u, e) ∈
RNM×
R|E|, we assume that
S(e, u) = k (1 − e) Du k
22, (13)
and that R is a separable function such that
( ∀ e= (e
i)
1≤i≤|E|) R (e) =
|E|
X
i=1
σ
i(e
i), (14)
where σ
i:
R|E|→ ( −∞ ; + ∞ ], and whose proximity operator has a closed form expression. At the iteration k ∈
N, with d
k> 0, β > 0 and λ > 0, the updating step on e
[k+1]in Algorithm 2 is equivalent to, for all i ∈ { 1, ..., |E|} ,
e[k+1]i ∈prox λσi 2β(Du[k+1] )2i+dk
β Du[k+1]2
i +dke2[k]i β Du[k+1]2
i +d2k
. (15)
The proof is given in Appendix 6.4.
4 Experiments
Based on the results derived in the previous section, it is now possible to provide efficient algorith-
mic schemes in order to deal with D-MS, with the possibility of having R modeling a nonsmooth
penalization. To the best of our knowledge, this has never been proposed before.
4.1 Specific Choice of D-MS In our experiments we suggest to choose
S (e, u) = k (1 − e) Du k
22, (16) which is C
1and has Lipschitz continuous gradients. The regularization term is chosen as
R (e) =
|E|
X
i=1
max
n| e
i|
p, | e
i|
q4
o
, (17)
where > 0, p > 0 and q > 0, whose particular cases are:
• the `
0-pseudo norm when p = 0 and → ∞ ;
• the `
1-norm when p = 1 and → ∞ ;
• the quadratic `
1penalization, p = 1, q = 2 and 0 < < 1, derived in [1], which aims to model the quadratic behavior of
41k . k
22for small and enforce sparsity.
This function is bounded below, proper, l.s.c., separable, and semi-algebraic (see [31, Example 5.3]). The associated proximity operator of the quadratic `
1-penalization is:
Proposition 4. For every η ∈
R,prox
τmaxn|.|,|.|2
4
o
(η) =
sign(η) max
n0, min
h| η | − τ, max
4, | η |
τ
2
+ 1
io
.
The choice of L ( · , z) will be dependent on the restoration problem considered and it will be given in each subsection.
4.2 Gray-Scale White Gaussian Noise Denoising
Experimental setting – For this first set of experiments, we assume that D
αmodels white Gaussian noise with standard deviation denoted α > 0. In the context of gray-scale denoising, M = 1, L = N and A
1≡
IN. As commonly used in image restoration when Gaussian noise is involved, the data-term is a squared Euclidean norm, i.e. L (u, z) =
12k u − z k
22, which is bounded below, proper, l.s.c and semi-algebraic [31]. By definition of Ψ and since the finite sum of semi- algebraic functions is semi-algebraic, we deduce that Ψ satisfies Assumptions 1-i), ii), iii). In addition, this particular choice of A implies that L (A., z) satisfies the boundedness assumption in Proposition 2.
Let us consider the ground truth image in Figure 2 (left), where the contours are obtained
by binarization and computation of the gradients. In this section, we evaluate the denoising and
contour detection performances obtained with the proposed D-MS performed with Algorithm 2,
when the input corresponds to Figure 2 with an additive white Gaussian noise of standard deviation α ∈ { 0.04, 0.16 } .
Regarding the algorithms step-size, we set c
kand d
kconstant. We first compute ν e
[k], assuming that e is not equal to 1 everywhere. This assumption is not restrictive in general, since its means that we do not have contours everywhere. We have ν e
[k]= β(1 − e
[k])
2k D k
2≤ β k D k
2, where the upper bound is attained when e
[k]≡ 0. Hence, we choose, for both PALM and SL-PAM, c
k≡ 1.01 ∗ β k D k
2. In the other hand, ε(u
[k+1]) = β(1 − e) k Du
[k+1]k
2≤ β(1 − e) k D k
2k u
[k+1]k
2. If we normalize z, then ∀ k ∈
N,k u
[k]k
2≤ 1. Thus we set, for PALM, d
k≡ 1.01 ∗ β k D k
2. Finally, for SL-PAM, we set d
k≡ 1.01 ∗ β k D k
2∗ 10
−3. This choice will be discussed below (see Figure 5).
We compare the results obtained with various regularization terms: the `
0pseudo-norm, the
`
1norm, and the quadratic-`
1penalization.
Figure 2: Ground truth image of size 256 × 256 with real contours.
Performances evaluation – The restoration performances are evaluated in terms of signal- to-noise-ratio (SNR) and Structural Similarity Index (SSIM) [45], while the contour detection performances are evaluated using the Jaccard index [46] (also known as intersection over union).
A grid search is performed for each score, for β varying in [1, 50] and λ varying in [0.0001, 0.9].
The resulting scores are summarized in a map as the one displayed in Figure 3. We perform the experiments on a 3.2GHz Intel Core i5 CPU, and stop when | (Ψ(u
[k+1], e
[k+1]) − Ψ(u
[k], e
[k]) | <10
−4. The best performance with the quadratic-`
1penalization, according to each measure (SNR, SSIM and Jaccard index), is summarized in Table 1. It is displayed in Figure 6 for α = 0.04 and in Figure 7 for α = 0.16. From Figures 6 and 7, we first observe that the SNR and the SSIM lead to similar denoising and segmentation results. For small α, they do not allow us to extract a sparse 1-dimensional contour, while the result obtained from Jaccard index provides the best denoising and segmentation result. However, for strong noise, the SNR and the SSIM both outperform the Jaccard index for denoising purpose, with satisfying denoising and contour detection results.
Choice of R – From Tables 1 and 2, we notice that the best performances are obtained using either the `
1norm or the quadratic-`
1penalization. Since the latest provides the best segmentation results, we propose in the sequel to use the quadratic-`
1penalization together with the SSIM or Jaccard index, depending on the noise level.
Sensitivity to the initialization – We propose to evaluate the robustness of the proposed
algorithmic scheme with respect to the initialization. We compare different choices for u
[0]: u
[0]=
z, u
[0]∼ N (0,
IN) and u
[0]≡ ζ
u∈ [min(z), max(z)]. Similarly, we propose to deal with either
e
[0]≡ { 0, 1 }
|E|, e
[0]∼ B (0.5) or e
[0]≡ ζ
e∈ (0, 1). We show the mean convergence results for 10
realizations in Figure 4, and we observe that the best initializing pair for Gaussian denoising is
(u
[0], e
[0]) = (z, 1
|E|). Notice that, whatever the initialization, all the run converge to the same
value, which leads to a robust estimation, despite the resolution of a nonconvex problem.
λ (contour penalization)
β (smo othing)
a b
c d
a b
c d
Figure 3: Example of score map with corresponding results on the right. Contours are delineated in red. The red circle on the map represents the best score. We can observe that larger β leads to a smoother estimate and that a larger value of λ implies less contours.
Table 1: SL-PAM performances according to the SNR, the SSIM, and the Jaccard index.
α `
0`
1quadratic-`
10 . 04
SNR 39.67 41.79 41.57
SSIM 0.991 0.995 0.994
Jaccard index 0.994 0.995 0.995
0 . 16
SNR 30.19 30.6 30.58
SSIM 0.929 0.933 0.935
Jaccard index 0.989 0.993 0.993 Table 2: SL-PAM computational times.
α `
0`
1quadratic-`
10 . 04
SNR 0.46s 1.98s 3.24s SSIM 0.93s 1.74s 3.24s Jaccard index 0.06s 0.09s 0.49s
0 . 16
SNR 0.91s 1.67s 4.50s SSIM 0.83s 1.54s 2.86s Jaccard index 0.15s 0.25s 0.59s
SL-PAM versus PALM – We now compare in Figure 5(a) the performances of the PALM algo-
rithm 1, to those of our SL-PAM algorithm 2, with decreasing d
k∈ 1.01 ∗ β k D k
2∗{ 1, 10
−1, 10
−2, 10
−3} ,
and a quadratic-`
1regularization. We first notice that PALM and SL-PAM converge to the same
minimum. In particular, they converge the same way when the descent parameters c
kand d
kare
identically chosen for both of them. Nonetheless, SL-PAM outperforms the PALM algorithm for
d
kset such that δ < 1. Figure 5(b) shows a visual comparison of the evolution of the contours of
PALM versus SL-PAM with the lowest d
kwith respect to the number n of iterations. We observe
that SL-PAM converged for n = 100, while PALM requires a hundred times more iterations to
reach the same result.
e[0]≡0|E| e[0]≡1|E|
e[0]≡ζe∈(0,1) e[0]∼ B(0.5)
Figure 4: Performances of SL-PAM with different initial values of u
[0]and e
[0], when the input is the image in Fig. 2, degraded by additive white Gaussian noise with standard deviation α = 0.04.
SL-PAM is not sensitive to the initialization.
4.3 Color Denoising and Comparisons with State-of-the-Art Methods
In this section, we propose to perform RGB color image denoising involving white Gaussian noise.
In this case, we consider u = (u
R, u
G, u
B), M = 3 and e ∈
R|E|common to the three components of u. We compare the proposed method with state-of-the-art approaches, including “ROF” min- imization [23, 24], the “MS relaxation” proposed in [27], the “Discrete AT” formulation [29] and the “NL-ROF” [28], although it is not designed for RGB-color images. Since the ROF minimiza- tion and the “NL-ROF” do not allow us to directly extract the contours, we compute them by thresholding the gradient of the estimate following [23, 24]. We performed the comparisons using 70 images randomly chosen from the BSDS500 database, which provides both ground truth images and contours, for up to five hundred natural images [47]. We present in Figure 8 the results for a selection of 8 images from this database. As discussed in section 4.2, we display the best denois- ing results according to the SNR, and the best contour reconstruction according to the Jaccard index. In addition, we summarize in Figure 9 the resulting scores for the whole experiment, for the T-ROF, the MS relaxation and the proposed D-MS. Since the T-NL-ROF and the Discrete AT algorithms are very slow, we only show their results for the images proposed in Figure 8. For the sake of clarity, the x-axis for each score is sorted in ascending order according to the proposed D-MS method (solid green line).
From Figures 8 and 9, the conclusions are:
• Figure 9(a) shows that the proposed method leads to very good performance in terms of SNR.
It outperforms the state-of-the-art in about 91% for T-ROF, 98% for MS relaxation, 71% for R-NL-ROF, and 72% for Discrete-AT;
• in terms of SSIM, we can observe in Figure 9(b) that the proposed approach mainly improves
the results for difficult configurations (SSIM values lower than 0.86);
(a) Convergence rates comparison.
N= 10 N= 100 N= 1000 N= 10000
PALM (bluecurve)SL-PAM (blackdottedcurve)
(b) Contour comparison.
Figure 5: Comparison of PALM and SL-PAM on the input data is the image in Fig. 2, degraded by additive white Gaussian noise. (a) Convergence rates, with fixed c
kidentically chooses for both of them, and decreasing d
kfor SL-PAM, with white Gaussian noise standard deviation α = 0.04 (left), and α = 0.16 (right). (b) Evolution of the contours with respect to the number of iterations for the experiment α = 0.04 ((a), left).
• regarding the visual denoising performances, we can observe different types of reconstructions and artifacts from a method to another one. T-ROF leads to the well-known stair-casing effects, T-NL-ROF enables to slightly (over-)smooth these effects, while the three other methods (MS relaxation, Discrete AT and the proposed method) provide very close visual results, with smooth areas and sharp transitions;
• Figure 9(c) shows that the D-MS leads to a slightly better contour extraction than T-ROF (resp. MS relaxation) in about two thirds (resp. 75%) of all cases, while it clearly improves the results of Discrete AT and T-NL-ROF. Note that D-MS is designed to perform both contour extraction and restoration, while T-ROF requires a post-processing thresholding step [23, 24], and the proposed approach has convergence guarantees, while the MS relaxation does not;
• although T-ROF leads to a better Jaccard index in one third of cases, a visual comparison of the contour detection results in Figure 8 reveals the superiority of the proposed method, in particular for RGB color images. The D-MS allows us to extract true 1-dimensional contours, similar to those obtained with the Discrete AT, but at the price of a huge computational cost, and with the MS relaxation, which relies on a very efficient algorithm, but without convergence guarantees.
Sometimes, one method outperforms the other ones. In particular, the MS relaxation produces a better result for the “sea” image. For the “china” image, the Jaccard scores are very similar:
the MS relaxation is visually closer to the ground truth (but simplified) contour, while the D-MS and the Discrete AT recover a lot of additional thin details, consistent with the actual gradients in this image.
We summarize in Table 3 the computational times corresponding to the results displayed in
Data SNR SSIM Jaccard
SNR = 30.61 dB SNR = 41.57 dB SNR = 41.57 dB SNR = 37.48 dB
SSIM = 0.681 SSIM = 0.994 SSIM = 0.994 SSIM = 0.944
Jaccard = 0.741 Jaccard = 0.741 Jaccard = 0.995
Time = 3.24s Time = 3.24s Time = 0.49s
Figure 6: Denoising with SL-PAM and the quadratic-`
1penalization on the image in Fig. 2, degraded by additive white Gaussian noise with standard deviation α = 0.04. The best results for each score are presented.
Figure 8. We observe that the proposed SL-PAM algorithm is almost ten times faster than T-ROF, and that it far outperforms the T-NL-ROF and the Discrete AT approaches. The MS relaxation is the fastest method but this is because of the GPU-based implementation.
4.4 Poisson Denoising and Image Restoration
Since Problem 1 allows us to deal with more complex data fidelity terms, we propose here to illustrate the results obtained when (i) data are corrupted by Poisson noise and (ii) data are degraded by both a blur and Gaussian noise. Since the experiments in Section 4 showed that the proposed approach outperforms the T-ROF minimization, we do not present T-ROF results in the following.
Poisson denoising – The choice of the Kullback–Leibler divergence L (u, z) =
Pm
DKL(u
m, z
m) fits data corrupted by Poisson noise [48, 36, 37]. This data-term is bounded below and l.s.c. Thus Ψ satisfies Assumption 1-i), ii), iii).
We first consider the image in Figure 2 corrupted by a Poisson noise with parameter σ = 100.
The best results according to (SNR,SSIM,Jaccard index) using the quadratic-`
1regularization are
Data SNR SSIM Jaccard
SNR = 18.68 dB SNR = 30.58 dB SNR = 30.43 dB SNR = 27.45 dB
SSIM = 0.173 SSIM = 0.928 SSIM = 0.935 SSIM = 0.568
Jaccard = 0.992 Jaccard = 0.993 Jaccard = 0.993
Time = 4.5s Time = 2.86s Time = 0.59s
Figure 7: Denoising with SL-PAM and the quadratic-`
1penalization on the image in Fig. 2, degraded by additive white Gaussian noise with standard deviation α = 0.16. The best results for each score are presented.
shown in Figure 10. In Figure 11, we present the Poisson denoising results of a real image with the quadratic-`
1regularization. The performances are comparable with Gaussian denoising, with higher computational time, due to the use of the divergence.
Image restoration – We propose to discuss the potential of the SL-PAM algorithm for image
restoration tasks. In presence of blur, the data fidelity term depends on the blur matrix A, and
reads: L (Au, z) =
12k Au − z k
22. In our experiments, we consider a Gaussian blur of size Q × Q and
standard deviation σ, and additive white Gaussian noise, with standard deviation α. This type of
degradation allows us to ensure the boundedness assumption in Proposition 2. Figure 12 displays
the restoration results on the image in Figure 2, when α = 0.2 and Q = 7. Restoration results
on a real image are presented in Figure 13, with α = 0.2 and Q = 7. Except for the best results
according to the SNR, we observe that the method is able to detect sharp contours and to recover
thin structures.
polar bear birds sea starfish china prayer bells runners soldier
datagivencontourT-ROF[24],[25]T-NL-ROF[26]
NC NC NC NC
DiscreteAT[27]MSrelaxation[20]proposedD-MS
Figure 8: Comparison according to the SNR and the Jaccard of denoising and contour detection
performances, involving white Gaussian noise with standard deviation α = 0.05, with state-of-the-
art methods, from top to bottom: T-ROF [23, 24], T-NL-ROF [28], the Discrete AT [29], the MS
relaxation [27], and the proposed method (D-MS).
Table 3: Computational times (in seconds) of the experiments illustrated in Figure 8.
polar bear birds sea starfish china prayer bells runners soldier
(321×481) (321×481) (321×481) (321×481) (321×481) (481×321) (321×481) (481×321) (481×321)
T-ROF [23, 24] 4.07 3.33 3.86 3.48 4.14 16.55 17.30 14.95 17.85
T-NL-ROF [28] 4601 4547 4500 4489 4507 - - - -
Discrete AT [29] 539 548 544 536 535 592 617 597 595
MS relaxation [27] 0.35 0.11 0.13 0.11 0.10 0.36 0.24 0.25 0.31
D-MS (SL-PAM) 1.49 1.04 0.32 1.93 0.23 0.21 0.25 0.43 0.39
0 20 40 60
20 25 30
T-ROF T-NL-ROF Discrete AT MS relax.
D-MS
0 20 40 60
0.7 0.8 0.9 1
0 20 40 60
0 0.1 0.2 0.3 0.4
(a) SNR (b) SSIM (c) Jaccard
Figure 9: Comparison of the SNR, SSIM and Jaccard index on 70 images from the BSSDS500 database [47] for Gaussian denoising and contour detection tasks, with state-of-the-art methods:
T-ROF [23, 24], T-NL-ROF [28], the Discrete AT [29], the MS relaxation [27], and the proposed method (D-MS). The x-axis for each score is sorted in ascending order according to the proposed D-MS method (solid green line).
5 Conclusion
In this work, we propose 1) a new discrete formulation of the MS functional, and 2) a new proximal algorithm, with proved convergence, to solve it. The major interest of the MS formalism is to be able to (i) restore a degraded image and (ii) extract its contours. In terms of restoration, on a large database, we showed that the proposed method is better than the state-of-the-art ones, including T-ROF [23, 24], the MS relaxation [27], the Discrete AT [29] and the T-NL-ROF [28]. Regarding contour detection, the results are very close to those obtained with the MS relaxation (which is a fast and accurate method, but does not have convergence guarantees), and with the Discrete AT, which has a huge computational cost. The influence of the choice of the regularization parameters with respect to different performance measures is also provided.
6 Appendix
6.1 Proof of Lemma 1
(i) Let k ≥ 0. Applying [31, Lemma 3.2] with h = S e
[k], ·
, σ = L and t = c
kwe obtain:
S e
[k], u
[k+1]+ L (u
[k+1])
≤S e
[k], u
[k]+ L (u
[k]) − 1
2 (c
k− ν(e
[k])) k u
[k+1]− u
[k]k
2(18)
≤S e
[k], u
[k]+ L (u
[k]) − 1
2 (γ
k− 1)ν(e
[k]) k u
[k+1]− u
[k]k
2(19)
Data SNR SSIM Jaccard
SNR = 10.20 dB SNR = 34.32 dB SNR = 33.84 dB SNR = 31.46 dB SSIM = 0.222 SSIM = 0.942 SSIM = 0.972 SSIM = 0.741
Jaccard = 0.951 Jaccard = 0.960 Jaccard = 0.995 Time = 22.47s Time = 41.89s Time = 5.04s
Figure 10: Denoising with quadratic-`
1penalization with SL-PAM on the image in Fig. 2 degraded by Poisson noise with parameter α = 100. The best results for each score are presented.
with c
k= γ
kν(e
[k]). On the other hand, the update of e
[k+1]can be written e
[k+1]∈ argmin
e
d
k2 k e − e
[k]k
22+ λ R (e) + β S (e, u
[k+1]), (20) leading to
λ R (e
[k+1]) + β S e
[k+1], u
[k+1]+ d
k2 k e
[k+1]− e
[k]k
2≤ λ R (e
[k]) + β S e
[k], u
[k](21) Hence, combining (19) and (21), we get
Ψ(x
[k]) − Ψ(x
[k+1]) (22)
= L (u
[k]) + β S e
[k], u
[k]+ λ R (e
[k])
− L (u
[k+1]) − β S e
[k+1], u
[k+1]− λ R (e
[k+1]) (23)
≥ 1
2 (γ
k− 1)ν(e
[k]) k u
[k+1]− u
[k]k
2+ d
k2 k e
[k+1]− e
[k]k
2+ L (u
[k+1]) + β S e
[k], u
[k+1]+ λ R (e
[k])
− L (u
[k+1]) − β S e
[k], u
[k+1]− λ R (e
[k]) (24)
≥ ρ1
2 k x
[k+1]− x
[k]k
2. (25)
Combined with Assumptions 1-iv), v), it proves the result.
(ii) Since Ψ is bounded from below, Ψ converges to some Ψ ∈
R. Let nowN ∈
N∗. It follows
from (i) that
Data SNR SSIM Jaccard
SNR = 11.13 dB SNR = 22.89 dB SNR = 22.89 dB SNR = 22.72 dB
SSIM = 0.538 SSIM = 0.748 SSIM = 0.748 SSIM = 0.720
Jaccard = 0.940 Jaccard = 0.940 Jaccard = 0.945
Time = 6.16s Time = 6.16s Time = 10.99s
Figure 11: Denoising with quadratic-`
1penalization with SL-PAM on a muscle image degraded by Poisson noise with parameter α = 100. The best results for each score are presented.
NX−1 k=0
k x
[k+1]− x
[k]k
2≤ 2
ρ
1(Ψ(x
[0]) − Ψ(x
[N])) (26)
≤ 2 ρ
1(Ψ(x
[0]) − Ψ) < ∞ . (27) We conclude taking the limit as N → ∞ .
6.2 Proof of Lemma 2
Writing down the optimality conditions for the iterative steps of Algorithm 2, we get:
β ∇
uS e
[k−1], u
[k−1]+ c
k−1(u
[k]− u
[k−1]) + υ
[k]= 0, (28)
where υ
[k]∈ ∂ L (u
[k]), and
β ∇
eS e
[k], u
[k]+ d
k−1(e
[k]− e
[k−1]) + ξ
[k]= 0, (29)
where ξ
[k]∈ ∂(λ R (e
[k])).
Subdifferential property [31, Proposition 2.1] allows us to state that β ∇
uS e
[k], u
[k]+ υ
[k]∈
∂
uΨ(u
[k], e
[k]) and β ∇
eS e
[k], u
[k]+ ξ
[k]∈ ∂
eΨ(u
[k], e
[k]), and hence (A
ku, A
ke) ∈ ∂Ψ(u
[k], e
[k]).
Combining Assumption 1-iii) with the assumption of Lipschitz continuity of ∇S , and following arguments in [31, Lemma 3.4], we can prove that there exists M > 0 such that
k A
kuk ≤ (2M + γ
kν
+) k x
[k]− x
[k−1]k . (30)
Data SNR SSIM Jaccard
SNR = 23.90 dB SNR = 24.72 dB SNR = 24.04 dB SNR = 24.12 dB SSIM = 0.059 SSIM = 0.765 SSIM = 0.941 SSIM = 0.860
Jaccard = 0.985 Jaccard = 0.984 Jaccard = 0.987 Time = 3.5s Time = 17.54s Time = 8.67s
Figure 12: Image restoration with SL-PAM and the quadratic-`
1penalization on the image in Fig. 2 degraded by additive white Gaussian noise with standard deviation α = 0.2, and a Gaussian blurring filter of size 7 × 7 and standard deviation σ = 2. The best results for each score are presented.
On the other hand,
k A
kek = d
k−1k e
[k−1]− e
[k]k ≤ d
+k x
[k]− x
[k−1]k . (31) Summing up (30) and (31) we obtain the desired result with ρ
2= γ
kν
++ d
+.
6.3 Proof of Proposition 4
Let η ∈
R. One has:prox
τmaxn|.|,.42o
(η) = argmin
x
1
2 k x − η k
22+ τ max
| x | , x
24
(32) One must split cases:
• If | x | ≤ 4, then max
n| x | , x
24
o
= | x | and we have:
prox
τmaxn|.|,.2
4
o
(η) = argmin
x
1
2 k x − η k
22+ τ | x | (33)
= prox
τ|.|(η) (34)
= sign(η) max(0, | η | − τ ) (35)
when | η | ≤ 4 + τ .
Data SNR SSIM Jaccard
SNR = 18.10 dB SNR = 19.47 dB SNR = 19.34 dB SNR = 19.34 dB
SSIM = 0.394 SSIM = 0.578 SSIM = 0.589 SSIM = 0.573
Jaccard = 0.818 Jaccard = 0.847 Jaccard = 0.858
Time = 150.2s Time = 42.1s Time = 121.9s
Figure 13: Image restoration with with SL-PAM and the quadratic-`
1penalization on the image in Fig. 2 degraded by additive white Gaussian noise with standard deviation α = 0.2 and a Gaussian blurring filter of size 7 × 7 and standard deviation σ = 2. The best results for each score are presented.
• If | x | > 4, then max
n| x | , x
24
o
= x
24ε : prox
τmaxn|.|,.42o
(η) = argmin
x
1
2 k x − η k
22+ τ x
24 (36)
= prox
τ2x2
(η) (37)
= sign(η) | η |
τ
2
+ 1 (38)
when | η | > 4 + 2τ . Finally, we obtain
prox
τmaxn|.|,4.2o
(η) =
sign(η) max(0, | η | − τ ) if | η | < 4 + τ,
4 if 4 + τ ≤ | η | ≤ 4 + 2τ,
sign(η) | η |
τ
2