Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford-Shah Model

(1)

HAL Id: hal-01782346

https://hal.archives-ouvertes.fr/hal-01782346v4

Submitted on 3 Feb 2020

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford-Shah Model

Marion Foare, Nelly Pustelnik, Laurent Condat

To cite this version:

Marion Foare, Nelly Pustelnik, Laurent Condat. Semi-Linearized Proximal Alternating Minimization

for a Discrete Mumford-Shah Model. IEEE Transactions on Image Processing, Institute of Electrical

and Electronics Engineers, 2019, pp.1-13. �10.1109/TIP.2019.2944561�. �hal-01782346v4�

(2)

Semi-Linearized Proximal Alternating Minimization for a Discrete Mumford–Shah Model

Marion Foare,Nelly Pustelnik,and Laurent Condat

^∗^†^‡

September 25, 2019

Abstract

The Mumford–Shah model is a standard model in image segmentation, and due to its difficulty, many approximations have been proposed. The major interest of this functional is to enable joint image restoration and contour detection. In this work, we propose a general formulation of the discrete counterpart of the Mumford–Shah functional, adapted to nonsmooth penalizations, fitting the assumptions required by the Proximal Alternating Linearized Mini- mization (PALM), with convergence guarantees. A second contribution aims to relax some assumptions on the involved functionals and derive a novel Semi-Linearized Proximal Alternated Minimization (SL-PAM) algorithm, with proved convergence. We compare the performances of the algorithm with several nonsmooth penalizations, for Gaussian and Poisson denoising, image restoration and RGB-color denoising. We compare the results with state-of-the-art convex relaxations of the Mumford–Shah functional, and a discrete version of the Ambrosio–Tortorelli functional. We show that the SL-PAM algorithm is faster than the original PALM algorithm, and leads to competitive denoising, restoration and segmentation results.

Keywords – Segmentation, restoration, inverse problems, nonsmooth optimization, nonconvex optimization, proximal algorithms, PALM, Mumford–Shah.

1 Introduction

The topic of inverse problems is of major interest for a large panel of applications, going from microscopy (see e.g. [2, 3]) or tomography (see [4, 5, 6, 7] and the reference therein) to atmospheric science and oceanography [8]. The pioneering regularization approaches to solve inverse problems

∗This work is supported by Defi Imag’In SIROCCO.

†Part of this work has been presented in [1]. In this extended paper, the added contributions are 1) a general discrete Mumford-Shah model, 2) the convergence proof of the algorithm proposed in [1] and more experimental results.

‡M. Foare (corresponding author) and N. Pustelnik are with Univ. Lyon, ENS de Lyon, Univ. Lyon 1, CNRS, Laboratoire de Physique, F-69342 Lyon, France. L. Condat is with Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000 Grenoble, France.

(3)

can be traced back to the works by Tikhonov [9] and by Geman and Geman [10]. The major challenge of this topic consists in designing jointly a cost function and an algorithm (to estimate its minimum) in order to obtain a solution that is the closest to the original unknown one. The recent development of proximal algorithms [11, 12] led to significant advances, thanks to the possibility to efficiently deal with large-size data and nonsmooth objective functions (e.g., nonlocal total- variation constraints, analysis-synthesis formulation, Kullback–Leibler divergence) [13].

In this work, we focus on image restoration and we denote the multicomponent image to recover by u = (u

m

)

_1≤m≤M

∈

R^NM

, where each column of u is the vectorized representation of the m-th component. The degradation model we consider takes the form:

( ∀ m ∈ { 1, . . . , M } ) z

m

= D

α

(A

m

u

m

), (1) where A

m

∈

R^L×N

models a linear degradation (e.g. a blur, a compressed sensing matrix, a wrapping matrix) and D

^α

:

R^L

→

R^L

denotes a random degradation that can be white Gaussian noise, leading to an additive model, or Poisson noise. The objective of this work is to estimate jointly the restored image u

b

and its contours, denoted by

b

e in the following, from the degraded data z.

One of the standard (variational) approach to solving such an ill-posed inverse problem consists in dealing with a regularization, by minimizing a sum of functionals. The variational formulation of this problem, when white Gaussian noise is involved, reads:

b

u = argmin

u

1 2σ

²

k Au − z k

²2

+ ρ(u), (2)

where ρ is a “well-chosen” regularizing functional, which allows us to denoise, while preserving the discontinuities. Hence, it generally involves the gradient of the estimate. A classical choice is ρ(u) = k Du k

0

, where D models the finite difference operator and k · k

0

is the `

₀

pseudo-norm, which is known as the L2-Potts model [14], or ρ(u) = TV(u), the Total Variation model [15], which is convex. However, these models are restricted to piecewise constant estimates, and do not integrate contour detection in the variational formulation, which is performed as post-processing step. The main limitation of such a two-step procedure for contour detection is the difficulty of appropriately selecting the thresholding rule used for edge detection.

Mumford and Shah proposed to consider a more general regularizing term, depending on both the gradient of the estimate and the set of discontinuities [16]. The latter becomes an unknown variable in the problem. Since the Mumford–Shah (MS) formalism is generally formulated in a continuous setting, we denote by Ω ⊂

R²

the image domain. The MS model aims at estimating both

b

u ∈ W

^1,2

(Ω)

¹

, a piecewise smooth approximation of an image z ∈ L

^∞

(Ω), and the set of discontinuities K ⊂ Ω, such that the pair ( u, K) is an optimal solution of:

b

minimize

u,K

1 2

Z

Ω

(u − z)

²

dxdy + β

Z

Ω\K

|∇ u |

²

dxdy + λ | K | , (3)

where the first term acts as a data fidelity term and forces the approximation u to be close to z, the second term penalizes strong variations except at the locations K of the strong edges, and | K |

1W^1,2(Ω) =

u∈L²(Ω) : ∂u∈L²(Ω) where∂denotes the weak derivative operator.

(4)

denotes the total length of the arcs forming K, thus the minimization of this functional implies that

| K | is small at a solution. Finally, β > 0 and λ > 0 denote regularization parameters controlling the smoothness and the length of K respectively.

Following discretization ideas proposed in the original paper of Mumford and Shah [16], we assume that u and z are functions on a lattice instead of functions on a two-dimensional region, and we denote them by u and z, respectively (referring to (1)). K models the path made up of lines between all pairs of adjacent lattice points where u has sharp transitions, as illustrated in Figure 1. In a discrete setting, K is thus replaced by the variable e ∈

R^|^E^|

, which denotes the edges between nodes (e.g. if the set of edges are limited to the horizontal and vertical edges between two pixels, then |E| = 2N − N

₁

− N

₂

, where N = N

₁

× N

₂

is the size of the grid), and whose value is 1 when a contour change is detected, and 0 otherwise. A discrete counterpart of (3) can be written:

minimize

u∈R^NM,e∈R^|^E^|

1 2 k u − z k

²2

+ β k (1 − e) Du k

²

+ λ R (e), (4) where D ∈

R^|^E^|×N

models a finite difference operator and R denotes a penalization term, that favors sparse solutions, which is a discrete translation of “short | K | ”. Note that there is no need to add additional constraints on e, since both (1 − e) and R (e) should force it to stay between 0 and 1.

Related works. One of the most popular convex relaxation of the MS functional is the total

variation functional (ROF for Rudin–Osher–Fatemi) [15, 17], which favors piecewise constant re-

sults, while preserving the discontinuities. Its `

₀

counterpart has been studied in [18, 19], leading

to the L

₂

-Potts formulation (`

₀

-penalization on Du). The authors in [20] proposed two convex

relaxations of the MS functional designed for discrete domain with continuous labels. As empha-

sized by the authors, proper convergence may be difficult to achieve for some parameterization,

and these two methods are not able to detect the contours. For these reasons, we do not consider

it in further comparisons. The Chan–Vese model can also be considered as a relaxation of the MS

model, whose main limitation is due to a prior label number, and a piecewise constant estimate

[21, 22]. In addition, Cai et al. [23, 24] have discussed the links between ROF minimizers [15] with

a post-processing step of thresholding and the piecewise constant MS solutions. The proposed

algorithm relies on updating iteratively the threshold from the ROF solution. Some approaches

have been derived from the Blake-Zisserman model [25, 26]. The convergence to local minimizers

is proved only in 1-D [25]. In addition to the weak convergence guarantees of the 2-D formula-

tion [26], there is no flexibility in the choice of the data-term, whose proximity operator should

have a closed form expression. In the same spirit, the approach of Strekalovskiy et al. [27] relies

on a truncated quadratic penalization of the gradient of the estimate. They derive a heuristic

algorithm, based on a convex relaxation of the functional they propose, and extract the contours

by thresholding. Recently, Li et al. [28] proposed a nonlocal ROF (NL-ROF) model, similar to

the AT functional, where the gradient is computed in a weighted neighborhood. Convergence is

proved, but the contours are obtained by post-processing the estimated image. The first author

and her collaborators [29, 30] proposed a new formulation of the AT functional in the framework

of Discrete Calculus. They obtain true 1-dimensional contours. But since they still have to deal

with the ε parameter, their algorithm is particularly slow.

(5)

K z=η1

z=η2

u1

u2

u3

u4

u5

u6

u7

u8

u9

e1,2

e1,4

{be= 1}

Figure 1: Continuous versus discrete formulations of the MS model (4). In the discrete setting, when D = [D

^>_h

, D

_v^>

]

^>

models the concatenation of the horizontal and vertical difference operators, the values of D

_h

u (resp. D

v

u) live on the horizontal (resp. vertical) midgrid, and so does e. K and {b e = 1 } are delineated in red.

Contributions and outline. In order to jointly identify the edges and to restore the image, our contributions are 1) to define a theoretical framework making the bridge between a discrete version of the Mumford–Shah model, called D-MS, and the objective function handled by the Proximal Al- ternating Linearized Minimization (PALM) algorithm [31], 2) to provide a new algorithmic scheme, called Semi-Linearized Proximal Alternating Minimization (SL-PAM), aiming to combine one step of PAM [32] with one step of PALM [31], allowing to relax condition of a stepsize parameter, and having convergence guarantees. The convergence proof is derived. The efficiency of the proposed algorithmic scheme is illustrated on several restoration examples: Gaussian and Poisson denoising, color denoising and image restoration. Comparisons to state-of-the-art approaches are performed on the color denoising example.

Our general D-MS model is defined in Section 2. PALM formulated to solve D-MS is defined in Section 3.1 as well as additional assumptions under the D-MS objective function allowing to ensure convergence. The proposed SL-PAM is derived in Section 3.2. Experiments and comparisons are provided in Section 4.

2 Generalized Discrete Mumford–Shah Model

The Discrete-MS (D-MS) model proposed in this work is expressed as follows.

Problem 1. Let z ∈

R^LM

and A ∈

R^LM×NM

. Let L (A · , z) :

R^NM

→ ( −∞ , + ∞ ], be a fidelity term to the data z, and R :

R^|^E^|

→ ( −∞ , + ∞ ], be a regularizer term which enforces sparsity and acts as a length term, both being proper and lower semicontinuous functions. Let S :

R^|^E^|

×

R^NM

→

R

, be the coupling term, which penalizes strong variations, except at edges, be a C

¹

function and such that ∇S is Lipschitz continuous on bounded subsets of

R^|^E^|

×R

^NM

. The general D-MS-like problem we aim to solve reads:

minimize

u∈R^NM,e∈R^|^E^|

Ψ(u, e) := L (Au, z) + β S e, u

+ λ R (e). (5)

The possibilities of this problem compared to the state-of-the-art formulation are :

• the generalization of the data-term, allowing to deal with linear degradation, and not restricted to the Euclidean norm like in the original MS model. For instance, L (Au, z) =

P

m

k A

m

u

m

− z

m

k

²

(6)

suited to data corrupted by both a linear degradation and white Gaussian noise [33, 34, 35]. A choice L (u, z) =

P

m

k u

m

− z

m

k

1

fits data degraded with impulse noise [35], while the choice of the Kullback–Leibler divergence L (u, z) =

P

m

DKL(u

_m

, z

_m

) is employed for data corrupted by Poisson noise [36, 37].

• the possibility to deal with a large panel of regularization terms R . One of the most popular choice of R encountered in the literature is R

AT

(e) = ε k De k

²₂

+

_4ε¹

k e k

²₂

, with ε > 0 and D being a difference operator, proposed by Ambrosio and Tortorelli [38, 39]. Such a contour penalization makes (4) Γ-converge to the MS functional as ε tends to 0. As a matter of fact, large values of ε lead to thick contours but help to detect the set of discontinuities. Then, as ε tends to 0, the penalization of k e k

²₂

increases and enforces e to become sparser and sparser, and thus contours becoming thinner and thinner. Numerically, however, it is not possible for ε to be arbitrarily small since it controls the thickness of the contours.

• the flexibility in the coupling term S . Since the MS functional is originally designed with a L

₂

-penalization of the gradient of u, a common choice for the coupling term is S e, u

P

=

m

k (1 − e) Du

m

k

²2

[39, 40], where D is defined as in (4). However, in [41], Shah proposed to replace the L

₂

-norm with a coupling term involving the L

₁

-norm such as S e, u

=

P

m

k (1 − e) (1 − e) Du

_m

k

1

, combined with the AT regularizer. Alicandro et al. [42] proved the Γ-convergence of this particular functional to a variant of the MS functional, involving the Cantor part of Du. Experiments show that this ROF-like coupling term is more robust to image gradients, but eliminates high-frequency content. More recently, Li et al. [28] suggested to set e

p

= { e

^(q)p

}

^q∈B

, where B is a box centered at the pixel p, as weights of the dissimilarity D

_p^(q)

u

_m

= u

_m,p

− u

_m,p+q

. The regularization functional is thus a nonlocal ROF (NL-ROF) of the form S (e, u) =

P

m

P

p∈E

q

Σ

q∈B

e

^pq

(D

^(q)p

u

_m

)

²

. In this approach, the contours are not obtained from e but by thresholding u, leading to less accurate estimation.

We can remark that, when dealing with multivariate images, contours can be defined either as similar edges through all the components, or as distinct edges, leading to a path K that may be different for all the components. In order to facilitate the understanding and the reading, we formulate Problem 1 in the context of similar edges. But Problem 1, as well as the following results, can be similarly derived for distinct edges considering e ∈

R^|E|×M

. When M = 1, both formalisms are equivalent.

3 Algorithms

In order to solve Problem 1, we propose two algorithmic strategies. The first one relies on the

PALM algorithm [31], and requires additional assumptions on the function involved in order to

ensure convergence guarantees. The second is an alternative to PALM, that we called SL-PAM,

allowing to relax some of the assumptions made on the coupling term.

(7)

3.1 PALM for D-MS

The following Algorithm 1, which is an instance of the generic algorithm PALM [31], is tailored to solving Problem 1:

Algorithm 1 (PALM) for solving D-MS (5) Set u

^[0]

∈

R^NM

and e

^[0]

∈

R^|^E^|

.

For k ∈

N





Set γ > 1 and c

k

= γν (e

^[k]

).

u

^[k+1]

∈ prox

1 ckL(A·,z)

u

^[k]

−

_c¹_k

∇

u

S e

^[k]

, u

^[k]

Set δ > 1 and d

_k

= δε(u

^[k+1]

).

e

^[k+1]

∈ prox

¹

dkλR

e

^[k]

−

_d¹_k

∇

e

S e

^[k]

, u

^[k+1]

It consists in updating alternately the image u

^[k]

and the edges e

^[k]

by means of proximity operator steps, defined as,

( ∀ x ∈

R^N

) prox

_f

(x) = argmin

y∈R^N

1 2 k y − x k

²2

+ f (y), (6) where f :

R^N

→ ( −∞ , + ∞ ] denotes a proper and lower semi-continuous function. Algorithm 1 converges under some assumptions listed in the following proposition:

Proposition 1. The sequence (u

^[k]

, e

^[k]

)

k∈N

generated by Algorithm 1 converges to a critical point of Problem 1 if

i) the updating steps of u

^[k+1]

and e

^[k+1]

have closed form expressions;

ii) the sequence (u

^[k]

, e

^[k]

)

k∈N

generated by Algorithm 1 is bounded;

iii) L (A · , z), R and Ψ( · , · ) are bounded below;

iv) Ψ is a Kurdyka- Lojasiewicz function [31, Definition 2.3];

v) ∇

u

S and ∇

e

S are globally Lipschitz continuous with moduli ν e

and ε u

respectively, and for all k ∈

N

, ν e

^[k]

and ε u

^[k]

are bounded by positive constants.

Proof. The form of Problem 1 and the assumptions in Proposition 1 fit the requirements for convergence of the PALM algorithm described in [31, Assumptions A-B, Theorem 3.1].

From the practical point of view, the major challenge regarding the assumptions in Proposi-

tion 1 is to ensure that L (resp. R ) has a closed form expression for the associated proximity

operator. A large number of functions having a closed form expression of their proximal maps is

listed in [12, 43, 44] going from `

_p

-norm to gamma divergences. The main difficulty is due to the

linear operator A. Indeed, the proximity operator of a function composed with a linear operator

has a closed form expression if

(8)

• L ( · , z) = k · − z k

²2

and A

^∗

A is invertible [43], leading to ( ∀ γ > 0)( ∀ u ∈

R^NM

),

γ L (A · , z)u = (

I

+ γA

^∗

A)

⁻¹

(u + γA

^∗

z); (7)

• A models a frame (or a semi-orthogonal) linear operator [13], i.e. A

^∗

A = µI with µ > 0, taking the form ( ∀ γ > 0)( ∀ u ∈

R^NM

),

prox

_γL(A·,z)

(u) = u + µ

⁻¹

A

^∗

(prox

_γµL

(Au) − Au). (8) Moreover, assumption ii) in Proposition 1 holds in several scenarios, such as when the functions L (A., z) and R have bounded level sets. The reader could refer to [32, Remark 5] and [31, Remark 3.4] for more details about this boundedness assumption.

3.2 Proposed SL-PAM

We propose an alternative to PALM, where the update u

^[k+1]

exploits the linearization and where the update e

^[k+1]

relies on the proximity operator of the function β S ( · , u

^[k+1]

) + λ R . The resulting Semi-Linearized PAM (SL-PAM) is described in Algorithm 2, that does not require ε(u

^[k]

) to be bounded and allows us to choose larger d

k

.

Algorithm 2 (SL-PAM) algorithm for solving D-MS (5) Set u

^[0]

∈

R^NM

and e

^[0]

∈

R^|^E^|

.

For k ∈

N





Set γ > 1 and c

_k

= γν (e

^[k]

).

u

^[k+1]

∈ prox

¹

ckL(A·,z)

u

^[k]

−

_c¹_k

∇

u

S e

^[k]

, u

^[k]

Set d

_k

> 0.

e

^[k+1]

∈ prox

₁

dk λR+βS(·,u^[k+1])

e

^[k]

The convergence of Algorithm 2 is ensured under Assumption 1.

Assumption 1. i) The updating steps of u

^[k+1]

and e

^[k+1]

have closed form expressions;

ii) Ψ is a Kurdyka- Lojasiewicz function;

iii) L (A · , z), R and Ψ are bounded below;

iv) ∇

u

S is globally Lipschitz continuous with moduli ν (e

^[k]

) k ∈

N

and there exists ν

⁻

, ν

⁺

> 0 such that ν

⁻

≤ ν(e

^[k]

) ≤ ν

⁺

;

v) (d

k

)

k∈N

is a positive sequence such that the stepsizes d

k

belong to (d

⁻

, d

⁺

), for some positive d

⁻

≤ d

⁺

.

Proposition 2. Under Assumption 1, and let assume that the sequence { x

^[k]

}

k∈N

= { (u

^[k]

, e

^[k]

) }

k∈N

generated by Algorithm 2 is bounded. Then

(9)

i) Σ

^∞_k=1

k x

^[k+1]

− x

^[k]

k < ∞ ;

ii) { x

^[k]

}

^k∈N

converges to a critical point (u

^∗

, e

^∗

) of Ψ.

The proof relies on the general proof recipe given in [31], divided into three main steps: (i) sufficient decrease property, (ii) subgradient lower bound for the iterate gap, and (iii) Kurdyka- Lojasiewicz property. These three steps are detailed thereafter, where we set x

^[k]

= (u

^[k]

, e

^[k]

).

Moreover, the Assumptions 1.i), 1.ii), and 1.iii) are discussed at the beginning of each experimental section (IV.B, IV.D). In addition, we provide comments in Section III.B.3) for the KL assumption, and in Section III.C for the closed form expression of the proximity operators. While the validity of Assumption 1.iv) is ensured by the definition of S provided in Section IV.A.

3.2.1 Sufficient decrease property

The objective is to find ρ

₁

> 0 such that ( ∀ k ∈

N)

ρ

1

2 k x

^[k+1]

− x

^[k]

k

²

≤ Ψ(x

^[k]

) − Ψ(x

^[k+1]

). (9)

This results relies on the following Lemma.

Lemma 1. Let { x

^[k]

}

^k∈N

be a sequence generated by Algorithm 2. Then

i) the sequence { Ψ(x

^[k]

) }

k∈N

is nonincreasing, in particular:

( ∀ k ∈

N

) ρ

₁

2 k x

^[k+1]

− x

^[k]

k

²

≤ Ψ(x

^[k]

) − Ψ(x

^[k+1]

), where ρ

₁

= min { (γ − 1)ν

⁻

, d

⁻

} ;

ii) Σ

^∞_k=0

k x

^[k+1]

− x

^[k]

k

²

< ∞ and lim

k→∞

k x

^[k+1]

− x

^[k]

k = 0.

The proof is given in Appendix 6.1.

3.2.2 A subgradient lower bound for the iterates gap

This step relies on Lemma 2.

Lemma 2. Assume that the sequence { x

^[k]

}

k∈N

generated by Algorithm 2 is bounded. Define A

^k_u

:= c

_k−1

(u

^[k−1]

− u

^[k]

) + ∇

u

S e

^[k]

, u

^[k]

− ∇

u

S e

^[k−1]

, u

^[k−1]

, (10)

A

^k_e

:= d

_k−1

(e

^[k−1]

− e

^[k]

). (11)

(10)

Then (A

^k_u

, A

^k_e

) ∈ ∂Ψ(u

^[k]

, e

^[k]

) and there exists M > 0 such that

k (A

^k_u

, A

^k_e

) k ≤ k A

^k_u

k + k A

^k_e

k ≤ 2(M + ρ

₂

) k x

^[k−1]

− x

^[k]

k (12) where ρ

₂

= γ

_k

ν

⁺

+ d

⁺

.

The proof is given in Appendix 6.2.

3.2.3 Kurdyka- Lojasiewicz property

This step relies on the assumption that Ψ is a Kurdyka- Lojasiewicz (KL) function, and proves that the minimizing sequence { x

^[k]

}

k∈N

is a Cauchy sequence. According to [31, Theorem 5.1], if Ψ :

R^NM

→

R

is a proper, lower semi-continuous (l.s.c.), and semi-algebraic function, then it satisfies the KL property at any point of domΨ. The proof of this step is the same as for [31, Lemma 3.6].

3.3 Additional Comments on Assumption 1-i)

The conditions to obtain a closed form expression for the update of u

^[k+1]

are similar to the ones detailed in Section 3.1. The tedious part concerns the update of e

^[k+1]

for which a closed form expression is provided in Proposition 3 for specific choices of S and R .

Proposition 3. Let D :

R^|^E^|×N

. For every (u, e) ∈

R^NM

×

R^|^E^|

, we assume that

S(e, u) = k (1 − e) Du k

²2

, (13)

and that R is a separable function such that

( ∀ e= (e

i

)

_1≤i≤|_E|

) R (e) =

|E|

X

i=1

σ

i

(e

i

), (14)

where σ

i

:

R^|^E^|

→ ( −∞ ; + ∞ ], and whose proximity operator has a closed form expression. At the iteration k ∈

N

, with d

_k

> 0, β > 0 and λ > 0, the updating step on e

^[k+1]

in Algorithm 2 is equivalent to, for all i ∈ { 1, ..., |E|} ,

e^[k+1]_i ∈prox λσi 2β(Du[k+1] )2i+dk



β Du^[k+1]2

i +^d^k^e₂^[k]ⁱ β Du^[k+1]2

i +^d₂^k



. (15)

The proof is given in Appendix 6.4.

4 Experiments

Based on the results derived in the previous section, it is now possible to provide efficient algorith-

mic schemes in order to deal with D-MS, with the possibility of having R modeling a nonsmooth

penalization. To the best of our knowledge, this has never been proposed before.

(11)

4.1 Specific Choice of D-MS In our experiments we suggest to choose

S (e, u) = k (1 − e) Du k

²2

, (16) which is C

¹

and has Lipschitz continuous gradients. The regularization term is chosen as

R (e) =

|E|

X

i=1

max

n

| e

i

|

^p

, | e

_i

|

^q

4

o

, (17)

where > 0, p > 0 and q > 0, whose particular cases are:

• the `

₀

-pseudo norm when p = 0 and → ∞ ;

• the `

1

-norm when p = 1 and → ∞ ;

• the quadratic `

₁

penalization, p = 1, q = 2 and 0 < < 1, derived in [1], which aims to model the quadratic behavior of

₄¹

k . k

²₂

for small and enforce sparsity.

This function is bounded below, proper, l.s.c., separable, and semi-algebraic (see [31, Example 5.3]). The associated proximity operator of the quadratic `

₁

-penalization is:

Proposition 4. For every η ∈

R,

prox

τmaxn

|.|,^|.|²

4

o

(η) =

sign(η) max

n

0, min

h

| η | − τ, max

4, | η |

τ

2

+ 1

io

.

The choice of L ( · , z) will be dependent on the restoration problem considered and it will be given in each subsection.

4.2 Gray-Scale White Gaussian Noise Denoising

Experimental setting – For this first set of experiments, we assume that D

α

models white Gaussian noise with standard deviation denoted α > 0. In the context of gray-scale denoising, M = 1, L = N and A

₁

≡

IN

. As commonly used in image restoration when Gaussian noise is involved, the data-term is a squared Euclidean norm, i.e. L (u, z) =

¹₂

k u − z k

²2

, which is bounded below, proper, l.s.c and semi-algebraic [31]. By definition of Ψ and since the finite sum of semi- algebraic functions is semi-algebraic, we deduce that Ψ satisfies Assumptions 1-i), ii), iii). In addition, this particular choice of A implies that L (A., z) satisfies the boundedness assumption in Proposition 2.

Let us consider the ground truth image in Figure 2 (left), where the contours are obtained

by binarization and computation of the gradients. In this section, we evaluate the denoising and

contour detection performances obtained with the proposed D-MS performed with Algorithm 2,

(12)

when the input corresponds to Figure 2 with an additive white Gaussian noise of standard deviation α ∈ { 0.04, 0.16 } .

Regarding the algorithms step-size, we set c

_k

and d

_k

constant. We first compute ν e

^[k]

, assuming that e is not equal to 1 everywhere. This assumption is not restrictive in general, since its means that we do not have contours everywhere. We have ν e

^[k]

= β(1 − e

^[k]

)

²

k D k

²

≤ β k D k

²

, where the upper bound is attained when e

^[k]

≡ 0. Hence, we choose, for both PALM and SL-PAM, c

_k

≡ 1.01 ∗ β k D k

²

. In the other hand, ε(u

^[k+1]

) = β(1 − e) k Du

^[k+1]

k

²

≤ β(1 − e) k D k

²

k u

^[k+1]

k

²

. If we normalize z, then ∀ k ∈

N,

k u

^[k]

k

²

≤ 1. Thus we set, for PALM, d

k

≡ 1.01 ∗ β k D k

²

. Finally, for SL-PAM, we set d

_k

≡ 1.01 ∗ β k D k

²

∗ 10

⁻³

. This choice will be discussed below (see Figure 5).

We compare the results obtained with various regularization terms: the `

₀

pseudo-norm, the

`

₁

norm, and the quadratic-`

₁

penalization.

Figure 2: Ground truth image of size 256 × 256 with real contours.

Performances evaluation – The restoration performances are evaluated in terms of signal- to-noise-ratio (SNR) and Structural Similarity Index (SSIM) [45], while the contour detection performances are evaluated using the Jaccard index [46] (also known as intersection over union).

A grid search is performed for each score, for β varying in [1, 50] and λ varying in [0.0001, 0.9].

The resulting scores are summarized in a map as the one displayed in Figure 3. We perform the experiments on a 3.2GHz Intel Core i5 CPU, and stop when | (Ψ(u

^[k+1]

, e

^[k+1]

) − Ψ(u

^[k]

, e

^[k]

) | <10

⁻⁴

. The best performance with the quadratic-`

₁

penalization, according to each measure (SNR, SSIM and Jaccard index), is summarized in Table 1. It is displayed in Figure 6 for α = 0.04 and in Figure 7 for α = 0.16. From Figures 6 and 7, we first observe that the SNR and the SSIM lead to similar denoising and segmentation results. For small α, they do not allow us to extract a sparse 1-dimensional contour, while the result obtained from Jaccard index provides the best denoising and segmentation result. However, for strong noise, the SNR and the SSIM both outperform the Jaccard index for denoising purpose, with satisfying denoising and contour detection results.

Choice of R – From Tables 1 and 2, we notice that the best performances are obtained using either the `

₁

norm or the quadratic-`

₁

penalization. Since the latest provides the best segmentation results, we propose in the sequel to use the quadratic-`

1

penalization together with the SSIM or Jaccard index, depending on the noise level.

Sensitivity to the initialization – We propose to evaluate the robustness of the proposed

algorithmic scheme with respect to the initialization. We compare different choices for u

^[0]

: u

^[0]

=

z, u

^[0]

∼ N (0,

IN

) and u

^[0]

≡ ζ

_u

∈ [min(z), max(z)]. Similarly, we propose to deal with either

e

^[0]

≡ { 0, 1 }

^|^E^|

, e

^[0]

∼ B (0.5) or e

^[0]

≡ ζ

e

∈ (0, 1). We show the mean convergence results for 10

realizations in Figure 4, and we observe that the best initializing pair for Gaussian denoising is

(u

^[0]

, e

^[0]

) = (z, 1

^|^E^|

). Notice that, whatever the initialization, all the run converge to the same

value, which leads to a robust estimation, despite the resolution of a nonconvex problem.

(13)

λ (contour penalization)

β (smo othing)

a b

c d

a b

c d

Figure 3: Example of score map with corresponding results on the right. Contours are delineated in red. The red circle on the map represents the best score. We can observe that larger β leads to a smoother estimate and that a larger value of λ implies less contours.

Table 1: SL-PAM performances according to the SNR, the SSIM, and the Jaccard index.

α `

₀

`

₁

quadratic-`

₁

0 . 04

SNR 39.67 41.79 41.57

SSIM 0.991 0.995 0.994

Jaccard index 0.994 0.995 0.995

0 . 16

SNR 30.19 30.6 30.58

SSIM 0.929 0.933 0.935

Jaccard index 0.989 0.993 0.993 Table 2: SL-PAM computational times.

α `

₀

`

₁

quadratic-`

₁

0 . 04

SNR 0.46s 1.98s 3.24s SSIM 0.93s 1.74s 3.24s Jaccard index 0.06s 0.09s 0.49s

0 . 16

SNR 0.91s 1.67s 4.50s SSIM 0.83s 1.54s 2.86s Jaccard index 0.15s 0.25s 0.59s

SL-PAM versus PALM – We now compare in Figure 5(a) the performances of the PALM algo-

rithm 1, to those of our SL-PAM algorithm 2, with decreasing d

_k

∈ 1.01 ∗ β k D k

²

∗{ 1, 10

⁻¹

, 10

⁻²

, 10

⁻³

} ,

and a quadratic-`

1

regularization. We first notice that PALM and SL-PAM converge to the same

minimum. In particular, they converge the same way when the descent parameters c

_k

and d

_k

are

identically chosen for both of them. Nonetheless, SL-PAM outperforms the PALM algorithm for

d

k

set such that δ < 1. Figure 5(b) shows a visual comparison of the evolution of the contours of

PALM versus SL-PAM with the lowest d

_k

with respect to the number n of iterations. We observe

that SL-PAM converged for n = 100, while PALM requires a hundred times more iterations to

reach the same result.

(14)

e^[0]≡0^|^E^| e^[0]≡1^|^E^|

e^[0]≡ζe∈(0,1) e^[0]∼ B(0.5)

Figure 4: Performances of SL-PAM with different initial values of u

^[0]

and e

^[0]

, when the input is the image in Fig. 2, degraded by additive white Gaussian noise with standard deviation α = 0.04.

SL-PAM is not sensitive to the initialization.

4.3 Color Denoising and Comparisons with State-of-the-Art Methods

In this section, we propose to perform RGB color image denoising involving white Gaussian noise.

In this case, we consider u = (u

_R

, u

_G

, u

_B

), M = 3 and e ∈

R^|^E^|

common to the three components of u. We compare the proposed method with state-of-the-art approaches, including “ROF” minimization [23, 24], the “MS relaxation” proposed in [27], the “Discrete AT” formulation [29] and the “NL-ROF” [28], although it is not designed for RGB-color images. Since the ROF minimization and the “NL-ROF” do not allow us to directly extract the contours, we compute them by thresholding the gradient of the estimate following [23, 24]. We performed the comparisons using 70 images randomly chosen from the BSDS500 database, which provides both ground truth images and contours, for up to five hundred natural images [47]. We present in Figure 8 the results for a selection of 8 images from this database. As discussed in section 4.2, we display the best denoising results according to the SNR, and the best contour reconstruction according to the Jaccard index. In addition, we summarize in Figure 9 the resulting scores for the whole experiment, for the T-ROF, the MS relaxation and the proposed D-MS. Since the T-NL-ROF and the Discrete AT algorithms are very slow, we only show their results for the images proposed in Figure 8. For the sake of clarity, the x-axis for each score is sorted in ascending order according to the proposed D-MS method (solid green line).

From Figures 8 and 9, the conclusions are:

• Figure 9(a) shows that the proposed method leads to very good performance in terms of SNR.

It outperforms the state-of-the-art in about 91% for T-ROF, 98% for MS relaxation, 71% for R-NL-ROF, and 72% for Discrete-AT;

• in terms of SSIM, we can observe in Figure 9(b) that the proposed approach mainly improves

the results for difficult configurations (SSIM values lower than 0.86);

(15)

(a) Convergence rates comparison.

N= 10 N= 100 N= 1000 N= 10000

PALM (bluecurve)SL-PAM (blackdottedcurve)

(b) Contour comparison.

Figure 5: Comparison of PALM and SL-PAM on the input data is the image in Fig. 2, degraded by additive white Gaussian noise. (a) Convergence rates, with fixed c

_k

identically chooses for both of them, and decreasing d

k

for SL-PAM, with white Gaussian noise standard deviation α = 0.04 (left), and α = 0.16 (right). (b) Evolution of the contours with respect to the number of iterations for the experiment α = 0.04 ((a), left).

• regarding the visual denoising performances, we can observe different types of reconstructions and artifacts from a method to another one. T-ROF leads to the well-known stair-casing effects, T-NL-ROF enables to slightly (over-)smooth these effects, while the three other methods (MS relaxation, Discrete AT and the proposed method) provide very close visual results, with smooth areas and sharp transitions;

• Figure 9(c) shows that the D-MS leads to a slightly better contour extraction than T-ROF (resp. MS relaxation) in about two thirds (resp. 75%) of all cases, while it clearly improves the results of Discrete AT and T-NL-ROF. Note that D-MS is designed to perform both contour extraction and restoration, while T-ROF requires a post-processing thresholding step [23, 24], and the proposed approach has convergence guarantees, while the MS relaxation does not;

• although T-ROF leads to a better Jaccard index in one third of cases, a visual comparison of the contour detection results in Figure 8 reveals the superiority of the proposed method, in particular for RGB color images. The D-MS allows us to extract true 1-dimensional contours, similar to those obtained with the Discrete AT, but at the price of a huge computational cost, and with the MS relaxation, which relies on a very efficient algorithm, but without convergence guarantees.

Sometimes, one method outperforms the other ones. In particular, the MS relaxation produces a better result for the “sea” image. For the “china” image, the Jaccard scores are very similar:

the MS relaxation is visually closer to the ground truth (but simplified) contour, while the D-MS and the Discrete AT recover a lot of additional thin details, consistent with the actual gradients in this image.

We summarize in Table 3 the computational times corresponding to the results displayed in

(16)

Data SNR SSIM Jaccard

SNR = 30.61 dB SNR = 41.57 dB SNR = 41.57 dB SNR = 37.48 dB

SSIM = 0.681 SSIM = 0.994 SSIM = 0.994 SSIM = 0.944

Jaccard = 0.741 Jaccard = 0.741 Jaccard = 0.995

Time = 3.24s Time = 3.24s Time = 0.49s

Figure 6: Denoising with SL-PAM and the quadratic-`

₁

penalization on the image in Fig. 2, degraded by additive white Gaussian noise with standard deviation α = 0.04. The best results for each score are presented.

Figure 8. We observe that the proposed SL-PAM algorithm is almost ten times faster than T-ROF, and that it far outperforms the T-NL-ROF and the Discrete AT approaches. The MS relaxation is the fastest method but this is because of the GPU-based implementation.

4.4 Poisson Denoising and Image Restoration

Since Problem 1 allows us to deal with more complex data fidelity terms, we propose here to illustrate the results obtained when (i) data are corrupted by Poisson noise and (ii) data are degraded by both a blur and Gaussian noise. Since the experiments in Section 4 showed that the proposed approach outperforms the T-ROF minimization, we do not present T-ROF results in the following.

Poisson denoising – The choice of the Kullback–Leibler divergence L (u, z) =

P

m

DKL(u

_m

, z

_m

) fits data corrupted by Poisson noise [48, 36, 37]. This data-term is bounded below and l.s.c. Thus Ψ satisfies Assumption 1-i), ii), iii).

We first consider the image in Figure 2 corrupted by a Poisson noise with parameter σ = 100.

The best results according to (SNR,SSIM,Jaccard index) using the quadratic-`

1

regularization are

(17)

Figure 7: Denoising with SL-PAM and the quadratic-`

₁

penalization on the image in Fig. 2, degraded by additive white Gaussian noise with standard deviation α = 0.16. The best results for each score are presented.

shown in Figure 10. In Figure 11, we present the Poisson denoising results of a real image with the quadratic-`

₁

regularization. The performances are comparable with Gaussian denoising, with higher computational time, due to the use of the divergence.

Image restoration – We propose to discuss the potential of the SL-PAM algorithm for image

restoration tasks. In presence of blur, the data fidelity term depends on the blur matrix A, and

reads: L (Au, z) =

¹₂

k Au − z k

²2

. In our experiments, we consider a Gaussian blur of size Q × Q and

standard deviation σ, and additive white Gaussian noise, with standard deviation α. This type of

degradation allows us to ensure the boundedness assumption in Proposition 2. Figure 12 displays

the restoration results on the image in Figure 2, when α = 0.2 and Q = 7. Restoration results

on a real image are presented in Figure 13, with α = 0.2 and Q = 7. Except for the best results

according to the SNR, we observe that the method is able to detect sharp contours and to recover

thin structures.

(18)

polar bear birds sea starfish china prayer bells runners soldier

datagivencontourT-ROF[24],[25]T-NL-ROF[26]

NC NC NC NC

DiscreteAT[27]MSrelaxation[20]proposedD-MS

Figure 8: Comparison according to the SNR and the Jaccard of denoising and contour detection

performances, involving white Gaussian noise with standard deviation α = 0.05, with state-of-the-

art methods, from top to bottom: T-ROF [23, 24], T-NL-ROF [28], the Discrete AT [29], the MS

relaxation [27], and the proposed method (D-MS).

(19)

Table 3: Computational times (in seconds) of the experiments illustrated in Figure 8.

polar bear birds sea starfish china prayer bells runners soldier

(321×481) (321×481) (321×481) (321×481) (321×481) (481×321) (321×481) (481×321) (481×321)

T-ROF [23, 24] 4.07 3.33 3.86 3.48 4.14 16.55 17.30 14.95 17.85

T-NL-ROF [28] 4601 4547 4500 4489 4507 - - - -

Discrete AT [29] 539 548 544 536 535 592 617 597 595

MS relaxation [27] 0.35 0.11 0.13 0.11 0.10 0.36 0.24 0.25 0.31

D-MS (SL-PAM) 1.49 1.04 0.32 1.93 0.23 0.21 0.25 0.43 0.39

0 20 40 60

20 25 30

T-ROF T-NL-ROF Discrete AT MS relax.

D-MS

0 20 40 60

0.7 0.8 0.9 1

0 20 40 60

0 0.1 0.2 0.3 0.4

(a) SNR (b) SSIM (c) Jaccard

Figure 9: Comparison of the SNR, SSIM and Jaccard index on 70 images from the BSSDS500 database [47] for Gaussian denoising and contour detection tasks, with state-of-the-art methods:

T-ROF [23, 24], T-NL-ROF [28], the Discrete AT [29], the MS relaxation [27], and the proposed method (D-MS). The x-axis for each score is sorted in ascending order according to the proposed D-MS method (solid green line).

5 Conclusion

In this work, we propose 1) a new discrete formulation of the MS functional, and 2) a new proximal algorithm, with proved convergence, to solve it. The major interest of the MS formalism is to be able to (i) restore a degraded image and (ii) extract its contours. In terms of restoration, on a large database, we showed that the proposed method is better than the state-of-the-art ones, including T-ROF [23, 24], the MS relaxation [27], the Discrete AT [29] and the T-NL-ROF [28]. Regarding contour detection, the results are very close to those obtained with the MS relaxation (which is a fast and accurate method, but does not have convergence guarantees), and with the Discrete AT, which has a huge computational cost. The influence of the choice of the regularization parameters with respect to different performance measures is also provided.

6 Appendix

6.1 Proof of Lemma 1

(i) Let k ≥ 0. Applying [31, Lemma 3.2] with h = S e

^[k]

, ·

, σ = L and t = c

k

we obtain:

S e

^[k]

, u

^[k+1]

+ L (u

^[k+1]

)

≤S e

^[k]

, u

^[k]

+ L (u

^[k]

) − 1

2 (c

_k

− ν(e

^[k]

)) k u

^[k+1]

− u

^[k]

k

²

(18)

≤S e

^[k]

, u

^[k]

+ L (u

^[k]

) − 1

2 (γ

_k

− 1)ν(e

^[k]

) k u

^[k+1]

− u

^[k]

k

²

(19)

(20)

SNR = 10.20 dB SNR = 34.32 dB SNR = 33.84 dB SNR = 31.46 dB SSIM = 0.222 SSIM = 0.942 SSIM = 0.972 SSIM = 0.741

Jaccard = 0.951 Jaccard = 0.960 Jaccard = 0.995 Time = 22.47s Time = 41.89s Time = 5.04s

Figure 10: Denoising with quadratic-`

₁

penalization with SL-PAM on the image in Fig. 2 degraded by Poisson noise with parameter α = 100. The best results for each score are presented.

with c

_k

= γ

_k

ν(e

^[k]

). On the other hand, the update of e

^[k+1]

can be written e

^[k+1]

∈ argmin

e

d

_k

2 k e − e

^[k]

k

²2

+ λ R (e) + β S (e, u

^[k+1]

), (20) leading to

λ R (e

^[k+1]

) + β S e

^[k+1]

, u

^[k+1]

+ d

_k

2 k e

^[k+1]

− e

^[k]

k

²

≤ λ R (e

^[k]

) + β S e

^[k]

, u

^[k]

(21) Hence, combining (19) and (21), we get

Ψ(x

^[k]

) − Ψ(x

^[k+1]

) (22)

= L (u

^[k]

) + β S e

^[k]

, u

^[k]

+ λ R (e

^[k]

)

− L (u

^[k+1]

) − β S e

^[k+1]

, u

^[k+1]

− λ R (e

^[k+1]

) (23)

≥ 1

2 (γ

_k

− 1)ν(e

^[k]

) k u

^[k+1]

− u

^[k]

k

²

+ d

_k

2 k e

^[k+1]

− e

^[k]

k

²

+ L (u

^[k+1]

) + β S e

^[k]

, u

^[k+1]

+ λ R (e

^[k]

)

− L (u

^[k+1]

) − β S e

^[k]

, u

^[k+1]

− λ R (e

^[k]

) (24)

≥ ρ1

2 k x

^[k+1]

− x

^[k]

k

²

. (25)

Combined with Assumptions 1-iv), v), it proves the result.

(ii) Since Ψ is bounded from below, Ψ converges to some Ψ ∈

R. Let now

N ∈

N^∗

. It follows

from (i) that

(21)

Figure 11: Denoising with quadratic-`

₁

penalization with SL-PAM on a muscle image degraded by Poisson noise with parameter α = 100. The best results for each score are presented.

NX−1 k=0

k x

^[k+1]

− x

^[k]

k

²

≤ 2

ρ

1

(Ψ(x

^[0]

) − Ψ(x

^[N^]

)) (26)

≤ 2 ρ

1

(Ψ(x

^[0]

) − Ψ) < ∞ . (27) We conclude taking the limit as N → ∞ .

6.2 Proof of Lemma 2

Writing down the optimality conditions for the iterative steps of Algorithm 2, we get:

β ∇

u

S e

^[k−1]

, u

^[k−1]

+ c

_k−1

(u

^[k]

− u

^[k−1]

) + υ

^[k]

= 0, (28)

where υ

^[k]

∈ ∂ L (u

^[k]

), and

β ∇

e

S e

^[k]

, u

^[k]

+ d

_k−1

(e

^[k]

− e

^[k−1]

) + ξ

^[k]

= 0, (29)

where ξ

^[k]

∈ ∂(λ R (e

^[k]

)).

Subdifferential property [31, Proposition 2.1] allows us to state that β ∇

u

S e

^[k]

, u

^[k]

+ υ

^[k]

∈

∂

u

Ψ(u

^[k]

, e

^[k]

) and β ∇

e

S e

^[k]

, u

^[k]

+ ξ

^[k]

∈ ∂

e

Ψ(u

^[k]

, e

^[k]

), and hence (A

^k_u

, A

^k_e

) ∈ ∂Ψ(u

^[k]

, e

^[k]

).

Combining Assumption 1-iii) with the assumption of Lipschitz continuity of ∇S , and following arguments in [31, Lemma 3.4], we can prove that there exists M > 0 such that

k A

^k_u

k ≤ (2M + γ

k

ν

⁺

) k x

^[k]

− x

^[k−1]

k . (30)

(22)

SNR = 23.90 dB SNR = 24.72 dB SNR = 24.04 dB SNR = 24.12 dB SSIM = 0.059 SSIM = 0.765 SSIM = 0.941 SSIM = 0.860

Jaccard = 0.985 Jaccard = 0.984 Jaccard = 0.987 Time = 3.5s Time = 17.54s Time = 8.67s

Figure 12: Image restoration with SL-PAM and the quadratic-`

₁

penalization on the image in Fig. 2 degraded by additive white Gaussian noise with standard deviation α = 0.2, and a Gaussian blurring filter of size 7 × 7 and standard deviation σ = 2. The best results for each score are presented.

On the other hand,

k A

^k_e

k = d

_k−1

k e

^[k−1]

− e

^[k]

k ≤ d

⁺

k x

^[k]

− x

^[k−1]

k . (31) Summing up (30) and (31) we obtain the desired result with ρ

₂

= γ

_k

ν

⁺

+ d

⁺

.

6.3 Proof of Proposition 4

Let η ∈

R. One has:

prox

τmaxn

|.|,^.₄²o

(η) = argmin

x

1 2 k x − η k

²2

+ τ max

| x | , x

²

4 (32) One must split cases:

• If | x | ≤ 4, then max

n

| x | , x

²

4

o

= | x | and we have:

prox

τmaxn

|.|,^.²

4

o

(η) = argmin

x

1 2 k x − η k

²2

+ τ | x | (33)

= prox

_τ|.|

(η) (34)

= sign(η) max(0, | η | − τ ) (35)

when | η | ≤ 4 + τ .

(23)

Figure 13: Image restoration with with SL-PAM and the quadratic-`

₁

penalization on the image in Fig. 2 degraded by additive white Gaussian noise with standard deviation α = 0.2 and a Gaussian blurring filter of size 7 × 7 and standard deviation σ = 2. The best results for each score are presented.

• If | x | > 4, then max

n

| x | , x

²

4

o

= x

²

4ε : prox

τmaxn

|.|,^.₄²o

(η) = argmin

x

1 2 k x − η k

²2

+ τ x

²

4 (36)

= prox

^τ

2x²

(η) (37)

= sign(η) | η |

τ

2

+ 1 (38)

when | η | > 4 + 2τ . Finally, we obtain

prox

τmaxn

|.|,₄^.²o

(η) =











sign(η) max(0, | η | − τ ) if | η | < 4 + τ,

4 if 4 + τ ≤ | η | ≤ 4 + 2τ,

sign(η) | η |

τ

2

+ 1 if | η | > 4 + 2τ.

(39)