Deep morphological networks

(1)

HAL Id: hal-02922299

https://hal.archives-ouvertes.fr/hal-02922299

Submitted on 26 Aug 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Deep morphological networks

Gianni Franchi, Amin Fehri, Angela Yao

To cite this version:

Gianni Franchi, Amin Fehri, Angela Yao. Deep morphological networks. Pattern Recognition, Else-

vier, 2020, 102, pp.107246. �10.1016/j.patcog.2020.107246�. �hal-02922299�

(2)

Deep morphological networks

Gianni Franchi

^a,∗

, Amin Fehri

^b

, Angela Yao

^c

a

gianni.franchi@u-psud.fr,

SATIE, Universit Paris-Sud, Universit Paris-Saclay

b

amin.fehri@gmail.com

c

ayao@comp.nus.edu.sg,

School of Computing National University of Singapore

Abstract

Mathematical morphology provides powerful nonlinear operators for a variety of image processing tasks such as filtering, segmentation, and edge detection.

In this paper, we propose a way to use these nonlinear operators in an end-to- end deep learning framework and illustrate them on different applications. We demonstrate on various examples that new layers making use of the morpholog- ical non-linearities are complementary to convolution layers. These new layers can be used to integrate the non-linear operations and pooling into a joint oper- ation. We finally enhance results obtained in boundary detection using this new family of layers with just 0.01% of the parameters of competing state-of-the-art methods.

Keywords: Mathematical Morphology; deep learning; edges detection;

denoising.

1. Introduction

Modern deep learning approaches learn parameters of a series of linear and non-linear operators for a given task. The concatenation of these operators over multiple layers increases the depth thereby generalization power of neural net- works. Similar to previous works [1, 2], our paper tries to incorporate new types

5

∗

Corresponding author

(3)

of layers for deep learning: in our case, these layers are based on Mathematical Morphology.

Mathematical Morphology (MM) [3, 4] applies topological operators to im- ages to recover or filter out specific structures. It has led to important suc- cesses in many computer vision tasks, such as filtering [5, 6, 7][8], segmentation

10

[9, 10, 11, 12][13, 14], semantic segmentation [15, 16, 17, 18, 19, 20] [21, 22], im- age fusion [23, 24][25], feature extraction [26, 15, 16, 17, 27] and edge detection[9]

. With this paper, we are the first, to the best of our knowledge, to integrate exact morphological operators within a deep learning end-to-end framework.

Commonly known MM operators have flat structuring elements [3, 4] and

15

share the same weights: their effect then depends only on the shape of the structuring elements. To go further, one can use nonflat structuring elements, which has led to performance enhancement [28]. In all cases, selecting the structuring elements’ shapes and weights often involve tedious trial-and-error, particularly for non-experts.

20

Given the similarities between morphological and convolutional operators, the idea of directly learning these shapes and weights within a neural network naturally arises. However, directly extending such operators into neural net- works is non-trivial, since the max and min operations are non-differentiable.

One possibility is to replace them with differentiable approximations, such as

25

the counter-harmonic mean [29, 30] so that they can be easily integrated into a neural networks such as Deep Conovlutional Neural Networks (DCNN). Note that one of the major drawbacks of counter-harmonic mean operators is their exploding gradients, due to the power function on which they are based. This problem is not encountered when directly using max or min operators, as we

30

propose in this paper.

In this work, we propose a way to integrate the original morphological opera- tors such as erosion and dilation within a deep learning framework. To motivate the appeal of such operators, one may note that the max pooling is simply a morphological dilation with a square structuring element followed by a pooling.

35

Although pooling is essential to build spatial invariance in a DCNN, it may lead

(4)

to an important loss of information, as illustrated on Figure 1. On the first row of this figure, one can see a time signal, and on the second row its Fourier trans- form. As one can see, the max pooling loses too much information, hence the interest of learning the pooling instead, which can be done using our proposal.

40

On the contrary to [29, 31, 30] our work does not just approximate the max pooling. Moreover, we propose a new layer that can be used as a pooling layer as well as to replace the convolutional layer (as proven in section 1).

More broadly, inspired by recent works where the authors try to train ap- proximations of MM operators in a deep learning framework [32, 33, 30, 34], we

45

propose and study several pipelines incorporating exact MM operators in such a framework. Our main contributions are:

• A novel morphological deep learning framework with learned MM opera- tors: to the best of our knowledge, we are the first to attempt to learn the weights of exact MM operators end-to-end in a deep learning framework;

50

• a replacement of the standard max pooling in convolutional neural net- works with a learned morphological pooling that proves to be experimen- tally beneficial;

• a mixed morphological and convolutional neural network that performs edge detection with state-of-the-art results. In addition, this network is

55

trained from scratch, on the contrary to state-of-the-art methods that make use of pretrained weights;

• a fully morphological neural network for image denoising that presents better performance than similar fully convolutional neural network for this task.

60

Morphological operators that are key ingredients of image analysis can now

be used as components of deep learning frameworks. Notably, we can use them

on the tasks where they proved to be efficient, but this time learn their optimal

parameters for given types of images and problems.

(5)

signal x(t) flat maxpool x(t) non-flat maxpool x(t)

Figure 1: We have illustrated on the first row one signal and its result with the flat max pooling layer of size 3 × 1, which is the classical max pool, and with the non flat max pooling layer of size 3 × 1, where the weights have been changed so that we find the original signal.

The second row contains the Fourier signal link to the first row. We can see that putting non uniform weight allow one to don’t loose to much frequency while contracting the signal.

2. Related work

65

The use of morphological operators in neural networks is not common but has been explored both in the context of single unit perceptrons as well as multi-unit deep networks.

2.1. Morphological Operators in Perceptrons

Works [30, 35, 34] on the first category investigates the use of the perceptron

70

in the (max, +) and (min,+) algebra. The morphological perceptron layer proposed in [35] is a fully connected layer, followed by a max-plus layer [36]:

the max-plus blocks proposed in this work consists in adding to the output of a fully connected layer several bias vectors, and then taking the maximum of these vectors to provide new neurons, and so on.

75

Contrary to this morphological perceptron, which layers are fully connected

and thus capture global information (since all neurons are connected), our pro-

posal allows to extract local information using exact morphological features.

(6)

Being local allows our layers to be lighter than morphological perceptron. This is why it can be used for a variety of tasks and not only classification, as we

80

shall see further in the paper.

2.2. Morphological Operators in Deep Networks

Recently, morphological operations have been explored in the context of deep learning frameworks for image denoising and classification [29, 31, 30]. All three works are based on the approximation of morphological operations, since the

85

minimum and maximum were approximated with the counter-harmonic mean (CHM) [37, 38, 39]. The CHM is a power mean studied in [40, 39], where the signal is approximated as a weighted sum of the signal raised to the power p, with p a fixed integer. Then this sum is normalized by the weighted sum of the signal raised to the power p − 1. Masci et al. [29] were the first to propose integrating

90

CHM filters directly into CNNs and showed that the parameters of the CHM filter could be learned via standard back-propagation. We note also [31], which uses these CHM operators to classify digits.Finally, [30] proposes the CHM as an alternative to the standard max-pooling and places it into a CNN. In using a CHM filter, the authors claim that image details are better preserved

95

across pooling operations, and as such, retain more relevant information for classifying images. They achieve state-of-the-art results for image classification on ImageNet [41] with a Resnet101 [42] architecture.

While the CHM faithfully approximates the min and max operation, it does not fully encapsulate MM operations. As such, the approach of [30] applies only

100

as a replacement for the pooling layer. Our approach differs in that we do not need to approximate the min or max operation: we directly integrate the entire MM operation directly into the neural network, encapsulating both the standard convolution and max pooling into a single morphological operation. Our work is thus more general in this respect. Furthermore, our proposal does not suffer

105

from exploding gradients that can stem out the use of a power function as they

propose. The training step in our proposal is thus made simpler.

(7)

3. Morphological features

3.1. Notation

We begin by introducing the notation used in this paper. Let E be a subset

110

of the discrete space Z

²

, which represents the support space of a 2D image, and F ⊆ R

^d

be a set of pixels values in dimension d. Higher order tensors are denoted as (I , S , . . .), with the order of tensor I ∈ R

ⁿ¹^×n²^×...×n^J

being J . Moreover, if I ∈ R

ⁿ¹^×n²^×n³

, for i ∈ [1, n

3

], I

:,:,i

represents a matrix of size n

1

× n

2

where the third component is equal to i.

115

3.2. Operator Definitions

Mathematical morphology operators are non-linear image operators based on the spatial structure of the image. Let f be a grey scale image which can be represented as a function, with the intensity at position x be denoted as f (x).

The two basic operations in morphology are the grey-level erosion and dilation

120

[43, 4], defined on their discreet version respectively as :

ε

b

(f )(x) = min

_i∈[−n,n]

(f (x + i) − b

i

), (1) δ

_b

(f )(x) = max

_i∈[−n,n]

(f(x − i) + b

_i

), (2) where b = [b

₋

n, . . . , b

n

] is a structuring element. The structuring element impacts a morphological operation by both the geometry of its support and its weights. In its simplest form, these weights have the same value over the support shape and are equal to zero elsewhere: this corresponds to the commonly known

125

flat structuring elements [43].

By concatenating these two basic morphological operators, it is possible to obtain more evolved operators such as opening and closing [43], defined respectively as:

γ

_b

(f ) = δ

_b

(ε

_b

(f )) , (3)

ϕ

b

(f ) = ε

b

(δ

b

(f )) . (4)

(8)

The opening (resp. closing) γ of an image f by a structuring element b is

130

a filter which removes the bright (resp. dark) structures of the images which do not fit the support shape of the structuring element b. As such, all features smaller than this support shape are removed.

One can use as algebraic opening (resp. closing) any operator that verifies the following properties:

135

• Openings are anti-extensive: γ

b

(f ) ≤ f

• Openings are idempotent: γ

b

(γ

b

(f )) = γ

b

(f )

• Openings are increasing operators: if f ≤ g then γ

b

(f ) ≤ γ

b

(g)

Since closing operators are the dual of openings, it directly follows that the only property differing between the two is the first, i.e.while openings are anti-

140

extensive, closings are extensive, such that ϕ

_b

(f ) ≥ f . 3.3. Alternate Sequential Filters (ASF)

Image filters can in turn be created from opening and closing operations.

Notably, morphological Alternate Sequential Filters (ASF), composed from a series of increasing (resp. decreasing, cf. above for definition of an increasing

145

operator) filtering operators such as dilations (resp. erosions) with structuring functions having a larger and larger support shape [44], have proven to be useful for multiscale analysis and filtering of images [3, chapter 10]. They progressively simplify images, removing noise and small details and textures, while preserving textures. As such, the ASF are useful for denoising and can also be seen as

150

multi-scale representations of images.

More formally, an ASF is obtained from two families of operators: one family of openings {γ

b_l

} 1 ≤ l ≤ n

1

, and one family of closings {ϕ

b_l

}, 1 ≤ l ≤ n

2

. The index l is associated with the scale of the structuring function, i.e.b

l

⊂ b

l+1

. The ASF can be defined as:

ASF

_λ

= γ

_b_λ

ϕ

_b_λ

. . . γ

_b₁

ϕ

_b₁

, (5)

(9)

when starting with a closing, and:

ASF

λ

= ϕ

b_λ

γ

b_λ

. . . ϕ

b₁

γ

b₁

, (6) when starting with an opening, with λ corresponding to the number of opening and closing.

4. Morphological neural network 4.1. Morphological back-propagation

155

Let us consider the following objective function for an image f which has undergone erosion:

L(b) = X

x∈E

kε

b

(f )(x) − gk, (7) with g a signal we want to reach, that can be an image. There are different techniques to find the optimal structuring function b = {b

_−n

, b

0

, . . . , b

n

} to minimize this objective function. Here we focus on gradient-based optimization techniques such as stochastic gradient descent and its variants. Hence we need to differentiate L(b) with respect to b.

160

However, the min operator is not directly differentiable. We tackle this problem in the same way that max pooling is handled in standard convolutional neural networks. Consider the case for erosion. One notes that for each position x in the image, we can find an index i

^x∗

belonging to {−n, −n + 1, . . . , n − 1, n}

with a minimum value, such that this index is the best one for the minimization

165

problem at hand. This results in:

˜

ε

_b

(f)(x) = f (x + i

^x∗

) − b

_i^x∗

. (8) In other words, the resulting eroded version of the image or feature map is the value at a position x + i

^x∗

minus the structuring function at position i

^x∗

.For any position x there exists a i

^x∗

verifying this property.

Thus, based on this, we can derive the loss function according to b

_i^x∗

:

170

∂ ε ˜

b

(f )(x)

∂b

i^x∗

= −n

i^x∗

(9)

(10)

with n

i^x∗

equal one or zeros, it is equal to one if b

i^x∗

is the best index for the erosion task at position x

To be clearer let us consider as an example a 1D signal f composed of 10 val- ues {f (x

_i

)}

⁹_i=0

. Let us write h the results of the erosion with the following struc- turing element b = (b

₋₁

, b

₀

, b

₁

). This structuring element is a simple line of size

175

3. Let us consider that the erosion is equal to h = (f(x

₀

) − b

₀

, f(x

₀

) − b

₋₁

, f(x

₃

) − b

₁

, f (x

₃

) − b

₀

, f(x

₄

) − b

₀

, f(x

₄

) − b

₋₁

,

f (x

5

) − b

₋₁

, f (x

6

) − b

₋₁

, f (x

9

) − b

1

, f (x

9

) − b

0

). Based on this formulation of h we have that

∂h

∂b

₋₁

= (0, −1, 0, 0, 0, −1, −1, −1, 0, 0) (10)

∂h

∂b

₀

= (−1, 0, 0, −1, −1, 0, 0, 0, 0, −1) (11)

∂h

∂b

₁

= (0, 0, −1, 0, 0, 0, 0, 0, −1, 0) (12) Let us consider that the loss we want to optimize is equal to L = kh − gk =

180

P

9

i=0

(h(x

i

) − g(x

i

))

²

. Then the derivative of the loss according to the different elements of the structuring elements is equal to :

∂L

∂b

j

=

9

X

i=0

(h(x

i

) − g(x

i

)) × ∂h

∂b

j

(x

i

) (13)

with j ∈ {−1, 0, 1}. In the case of the erosion the

_∂b^∂h

j

(x

i

) is equal to 0 or −1, in the case we deal with the dilation it is equal to 0 or 1, finally in the case of the convolution it is equal to f (x

_i

− j). From this equation we can see that the

185

erosion and dilation does not bring instability since their gradient are bounded, contrary to the one of the convolution. Hence these morphological layers are not hard to train.

One can see that the derivative of the eroded image according to a structuring element has a tendency to be equal to 0 or −1, and for a dilation to 0 or 1. We

190

can see the dual effect of the dilation and erosion also on the derivative. Hence

it might be interesting to combine them on a neural network to take advantage

of this dual effect.

(11)

(a) (b) (c) (d) (e) (f)

Figure 2: Different structuring elements of size 3 used for initialization of morphological layers in this work. For the boundary detection application we initiate the structuring element with the ones represented in (a),(b),(c),(e), and (f). For denoising, we use the structuring elements (e) and (f), except that for the structuring element (f) we increase it size to 5×5 instead of 3×3.

Based on the demonstration above, we are thus provided with a way to differentiate the morphological operators during the backward pass of a deep

195

learning framework. This allows for an end-to-end learning of these operators, which can be useful for various stages of a neural network, as we shall see in the next sections.

4.2. Morphological layers

In this section, we present the different types of morphological layers illus-

200

trated in this paper. Morphological layers used in a Deep learning framework are composed of at least one morphological operator. We start by explaining how to initialize and train such layers, and then describe the ones we experiment with in detail.

4.2.1. Initialization

205

Out of simplicity and because in most works, flat structuring elements are used, we also initialize our morphological layers with flat structures. Figure 2 shows the different initialization we chose, with values of either 0s or

_m¹

, where m is the number of non-zero elements in the structuring element. This provides a compromise between traditional MM and deep learning, since the network starts

210

with a standard shape that is then adjusted during the optimization process.

The structuring element weights of the morphological layers can also be ran-

domly initiated. Experimentally, we observe that using a uniform distribution

(12)

provide better results than using a normal distribution.

These two types of structuring element initialization are useful in different

215

setups. Two main scenarios emerge in our experiments: (i) in the first one we have images as inputs and outputs of the network, for example in denoising or boundary detection; (ii) in the second one the network takes images as inputs and outputs vectors, as in classification problems. In the first case, constraining the structuring elements to have values initialized following a shape leads to

220

better refined output images. In the second case, since the goal is not to be visually meaningful but to extract descriptors as discriminative as possible for further use, the structuring element can be initialized randomly instead.

4.2.2. Training

Morphological networks have more extreme behaviors than the convolutional

225

ones. Hence we propose to clip the gradient of the morphological layers to smooth the update of the structuring element. This technique leads to impor- tant result improvements. Note however that gradient clipping is not needed for the denoising task, and is only used for the CIFAR classification and the boundary detection ones. To train the morphological layers, we incorporate the

230

morphological operators provided in Tensorflow in our networks, and then we treat them just like regular layers.

4.2.3. Morphological Pooling layers

In some sense, one can say that DCNNs already use morphological operators, since a max-pooling is nothing but a dilation with a flat square structuring

235

element followed by a pooling. One of the first experiments that come to mind is thus to replace the flat dilation with a learned morphological operator. The motivation behind this idea is that while pooling is a sub-sampling procedure that usually aggregates the most important information, it might sometimes lead to a loss of useful information. This is why we propose to learn the dilation and

240

erosion used in the pooling, to optimize its effect in the network. As illustrated

in Figure 3, we thus apply in parallel both the erosion and dilation to the signal

(13)

before pooling. These results are then concatenated and projected back to the original size using a 1 × 1 convolution. Both the structuring element weights as well as the projection weights are learned. To improve the results, we add a

245

batch normalization layer in the morphological pooling: it is applied after the erosion and the dilation and before their concatenation.

To sum up, morphological pooling layers are used in the DCNN, with the underlying idea that they might increase the generalization power of the DCNN and thus increase its performance. So the neural networks in this subsection

250

can be seen as a mix of DMNN and DCNN. This is why we refer to them as Deep Morphological and Convolutional Neural Network (DMCNN).

Figure 3: The morphological pooling operator. The erosion and dilation share the same struc- turing element and are applied in parallel to the input data. The results of these morphological operators are concatenated and projected back to the original size with a 1 × 1 convolution.

4.2.4. Morphological learned descriptor layer

In this subsection, we deal with a more complex idea, that consists in training a DCNN where all the spatial term comes from MM operators. Simply stated,

255

we can replace the convolutions from a CNN with morphological operators, which parameters are then fine-tuned for the task at hand. In this case, there might be 1 × 1 convolutions to increase the number of neurons. We refer to the resulting network as to a Deep Morphological Neural Network (DMNN).

A Deep Morphological Neural Network is simply composed of projection

260

layers and morphological layers that were traditionally used only to perform the

(14)

pooling. Hence there is no spatial information extracted through convolutional layers. In this paper, we prove that morphological layers are useful not only to perform pooling but also to extract information similarly to convolutional ones.

5. Experiments

265

5.1. Image classification with morphological pooling

In image classification, the goal is to learn a mapping from images to class labels. It is a task where we have at least two sets: the training set {(f

i

, y

i

)}

ⁿ_i=1¹

and the testing set {(f

i

, y

i

))}

ⁿ_i=1²

, where f

i

are images and y

i

is a one-hot vector encoding of the class label of f

i

. State-of-the-art methods for image classification

270

use DCNN [45, 46, 47]. In this section, we show how a conventional DCNN can be adapted to incorporate morphological operators.

We demonstrate the interest of DMCNN (a DCNN where we replace the max pooling with a morphological pooling) on two standard classification bench- marks: MNIST [48] and CIFAR10 [49]. MNIST is a large dataset of handwritten

275

digits that is commonly used for training and testing in the machine learning field. It is composed of 60,000 images for the training set and 10,000 images for the testing set. Each image of MNIST is a grey scale image of size 28 × 28.

The number of class of MNIST data is equal to 10. CIFAR10 is another image dataset commonly used in machine learning research to evaluate algorithms. It

280

is composed of 50,000 training RGB images and 10,000 RGB test images. The images are of size 32 × 32. The dataset has 10 classes (e.g. airplane or bird).

As an evaluation criterion, we use the accuracy, that measures the proportion of correct predictions. It is equal to the number of test images which classes have been correctly predicted divided by the total number of test images.

285

We compare the DCNN and DMCNN on the classification task for similar architectures, i.e. with the same number of layers and the same depth. For the MNIST classification task, the two compared networks are represented in Figure 4. These networks both have three convolutional layers, followed by a ReLU activation function. For the DCNN, the first two convolution layers

290

(15)

are followed by a max pooling, while on the DMCNN they are followed by morphological pooling without batch normalization. Hence, these two networks are comparable.

For CIFAR10, we use the Resnet32 [42] as the DCNN. We transform it into a DMCNN by reducing the convolutions with stride 2 to stride 1 and adding a

295

morphological pooling with batch normalization.

(A) DMCNN

(B) DCNN

Figure 4: (A) A representation of the Morphological DCNN used to classify ,(B) A represen- tation of the reference CNN used to classify MNIST.

The results obtained for these two classification tasks with both a DCNN and a DCMNN are presented in Table 1. The results are slightly improved by using a morphological pooling, which proves that learning the pooling using our proposal can be interesting for classification tasks.

300

To better understand the effect of this morphological pooling layer, we visu- alize in Figure 5 the structuring function learned during the MNIST classifica- tion. We can note that the learnt structuring element weights are quite different from classical DCNN weights. In fact, the weights of convolutional layers are typically similar to SIFT or gradients [50], while these morphological weights

305

seem to learn shape information. This shows that the morphological layer tries to learn a representation that is a bit flat.

In the next section, we will show how replacing the convolutional layers

with morphological layers can lead to improving results for certain types of

(16)

image denoising.

310

(A) First layer

(B) Second layer

Figure 5: Visualizations of (A) 25 of the 32 structuring functions learned on the first layer and (B) 50 of the 64 structuring functions learned on the second layer of the DMCNN used to classify MNIST.

DCNN DMCNN

Accuracy MNIST

99.4 99.7

Accuracy CIFAR

96.5 96.6

Table 1: Table compiling the results of the classification task on the dataset MNIST and CIFAR10.

5.2. Image Denoising

Morphological operators have been proven in the past to be useful for denois- ing images [6, 7]. Noise present in an image may differ depending on its gener- ating source or transmission channel. Noise that alter images can be divided in three main categories: additive (e.g. Gaussian noise), multiplicative (e.g. speckle

315

(17)

noise) and impulse (e.g. salt-and-pepper noise). Morphological operators are es- pecially efficient when it comes to impulse noise such as salt-and-pepper noise.

Several state-of-the-art methods for salt-and-pepper denoising [51, 52] consist of first detecting the noisy pixels in the image, and then applying a denoising filter at the noise locations. In the filtering stage, fuzzy morphological operators

320

such as alternate sequential filters have proven to be very effective [52]. In this regard, the ability to learn from data the parameters of these filters leading to the best denoising would be very useful.

In this section, we compare the standard DCNN with a DMNN with a similar number of parameters and show the efficacy of the end-to-end morphological

325

filter learning for denoising.

In the context of image denoising, let us consider two images of the same dimension M × N : f the original noise-free image, and g the restored image for which some filter has been applied. Three commonly-used quantitative evalua- tion measures for denoising are the mean squared error (MSE), peak signal-to-

330

noise ratio (PSNR) and the Structural SIMiliarity (SSIM) index.

MSE is computed using the following expression:

M SE = P

m,n

(f (m, n) − g(m, n))

²

M × N (14)

PSNR is defined as:

P SN R = 10 log

₁₀

R

²

M SE

, (15)

where R is the maximum fluctuation in the input image.

Finally, the SSIM [53] index was introduced to evaluate the structural sim- ilarity between two images, with the motivation that structural changes are what most affects human visual perception. SSIM compares the original and the distorted image considering three main features: luminance, contrast and the structural comparison and is defined as:

SSIM (f, g) = (2µ

f

µ

g

+ C

f

)

(µ

²_f

+ µ

²_g

+ C

_f

) · (2σ

f g

+ C

g

)

(σ

²_f

+ σ

_g²

+ C

_g

) , (16)

where µ

f

and σ

f

(resp. µ

g

and σ

g

) are the mean and variance of the greyscale

values of the image f (resp. g), σ

f g

is the covariance of f and g, and C

f

=

(18)

(k

1

· L)

²

and C

g

= (k

2

· L)

²

two variables to stabilize the division by a weak

335

denominator, with L the dynamic range of the pixel-values taken by f and g.

By default we have k

1

= 0.01, k

2

= 0.03, and in our case L = 255.

We first implement an experiment upon the MNIST dataset [48]. We com- pare the denoising efficiency of two similar DCNN and DMNN, with voluntarily simple architectures with few parameters, presented in Figure 6, and which

340

have similar number of parameters for a fair comparison. Note that the goal of this comparison is to provide a proof-of-concept of the denoising capability of morphological layers. Further work might include such layers in more complex architectures to aim for state-of-the-art results by replacing or being combined with convolutional ones [54].

345

(a)

(b)

Figure 6: Architectures considered for the denoising experiment. (a) DCNN: 4 3 × 3 convolu- tions - 77 parameters (b) DMNN - alternate sequential filter structure with 3 × 3 structuring functions - 82 parameters.

For these different networks, we thus compare the quality of the denoising

(19)

obtained when training them for different levels of salt-and-pepper noise and MSE as a loss. Since we provided a way to actualize weights of morphological operators within a gradient-based optimization technique, we can indeed train these two networks on the same task and compare the results. The training is

350

made using Adam, a batch size of 100 and a learning rate of 0.02.

Results are presented in Table 2 for the two networks, different denoising metrics and different levels of added salt-and-pepper noise. One can note that results obtained by the DMNN are better than those obtained by the DCNN.

This confirms the potential of such operators for filtering problems, and the

355

interest of being able to train them end-to-end for a given filtering task.

MSE (&) PSNR (%) SSIM (%)

Noise level DCNN DMNN DCNN DMNN DCNN DMNN

10% 0.015 0.009 18.71 20.89 0.9994 0.9995

20% 0.022 0.016 16.64 18.34 0.9988 0.9990

30% 0.030 0.022 15.38 16.89 0.9980 0.9985

40% 0.038 0.029 14.35 15.66 0.9972 0.9978

50% 0.047 0.040 13.44 14.26 0.9961 0.9966

60% 0.058 0.051 12.59 13.19 0.9945 0.9953

70% 0.069 0.063 11.86 12.29 0.9930 0.9934

80% 0.080 0.079 11.25 11.36 0.9907 0.9908

90% 0.089 0.088 10.83 10.85 0.9888 0.9894

Table 2: Table compiling salt-and-pepper denoising results on MNIST dataset [48] for both the DCNN and DMNN architectures, for the MSE, PSNR and SSIM metrics, and different levels of salt-and-pepper noise.

To confirm the interest of DMNN for denoising, we conduct the same de- noising experiment on natural images from the Berkeley Segmentation Dataset BSDS500 [55], for an experiment similar to the one conducted in [54]. The noise is applied randomly on each channel of the considered RGB images. We keep

360

the same architecture for both the DCNN and the DMNN. Results are presented

(20)

(A) First layer

(B) Second layer

Figure 7: We have represented in (A) the four structuring functions learned on the first layer and on (B) the four structuring functions learned on the second layer used to denoise MNIST.

We can see the DCNN learn shapes, that have some symmetry. These shapes are based on the structuring elements.

in Table 3, and show again slightly better results with the morphological ap- proach, thus confirming the interest of our approach. Visual results for a high level of noise are also provided in Figure 8.

5.3. Edge detection

365

Both the classification and denoising tasks established the interest of using morphological layers in deep learning architectures. To further demonstrate the strength of morphological operators in such networks, we propose a morpholog- ical encoder-decoder framework to extract image edges that takes advantage of the DMCNN architecture previously described.

370

Traditional algorithms [56, 57, 58, 59, 60, 61] that perform well on this task are obtained by fine-tuning deep neural networks previously trained on Imagenet [41].

Interestingly, our DMCNN model is trained from scratch, and yet achieves results competitive with the state-of-the-art. We train and evaluate our method

375

on the BSDS500 dataset [55], using the provided training set, test set and their

evaluation metrics.

(21)

image image with added salt-and-pepper noise (50%) denoised image

Figure 8: Illustration of the denoising performed by the DMNN for several images of the

BSDS500 dataset, for an added salt-and-pepper noise of 50%.

(22)

MSE (&) PSNR (%) SSIM (%)

Noise level DCNN DMNN DCNN DMNN DCNN DMNN

10% 0.007 0.006 21.58 22.09 0.9993 0.9995

20% 0.010 0.008 20.27 21.15 0.9991 0.9993

30% 0.011 0.010 19.73 20.21 0.9992 0.9991

40% 0.017 0.013 17.89 18.97 0.9985 0.9990

50% 0.021 0.017 17.19 18.13 0.9980 0.9982

60% 0.027 0.022 15.78 17.02 0.9972 0.9978

70% 0.039 0.035 14.24 15.05 0.9956 0.9961

80% 0.057 0.049 12.60 13.62 0.9930 0.9944

90% 0.080 0.077 11.62 11.67 0.9903 0.9905

Table 3: Table compiling salt-and-pepper denoising results on the BSDS500 dataset [55]

for both the DCNN and DMNN architectures, for the MSE, PSNR and SSIM metrics, and different levels of salt-and-pepper noise.

Figure 9: DMCNN used for the edge detection experiment. The blue box represents the fixes (i.e. not learned) operators.

We hereby provide some details on the network architecture that is used, which is illustrated in Figure 9. First, we project the image from the RGB to the Lab colour space. We then generate a mosaic image: first by using the

380

watershed transform [62] resulting in a simplified version of the image with clus- tered pixels, then by retaining as unique value on each cluster the average pixel value on this cluster. To segment the image and obtain clusters, we need a gradient for the watershed transform. It is normally impossible to compute a gradient for a color image, so we convert the image into the LAB color space.

385

Then on each channel we apply 45 erosions and dilations, which have structuring

(23)

elements at different scales and orientations. More precisely, we use a structur- ing element line of size 3, 7, 9, 11, 13. On each of these lines we apply one of the following rotation : O

^◦

, 22.5

^◦

, 45

^◦

, 67.5

^◦

, 90

^◦

, 112.5

^◦

, 135

^◦

, 157.5

^◦

. Hence we have 45 structuring elements. After that, we calculate all the gradients with all

390

the structuring elements, and for a given scale we have 8 orientation gradients.

In order to produce a gradient by scale, we take for each pixel the maximum value of the orientations gradient. Hence we have 5 gradients (one for each scale) that we average to obtain the final gradient. This multi-scale morpho- logical segmentation is quite common and used in various publications [63, 64],

395

the major difference comes in the way to merge the scales. We conclude this preprocessing by normalizing the values between 0 and 1.

After the preprocessing of the image, we apply mainly three kinds of lay- ers. The first ones are the spatial layers, which are composed of morphological layers and one convolution layer. We apply a 5 morphological pooling where

400

the weights are fixed to +1 for the dilation and −1 for the erosion. It means that we calculate 5 morphological gradients dilation − erosion. The structuring elements used for these 5 morphological operators are drawn in Figure 2(a), 2(b),2(c),2(e), and 2(f). This morphological pooling extract most of the spatial detail information. In addition, the stride of this morphological pooling is equal

405

to 2. We realize that increasing the stride to 4 enhances the results. Then, simi- larly to semantic segmentation algorithms, we used atroue dilation and erosion.

Atroue layers allow the network to increase the receptive field without adding pooling. Hence morphological pooling layers have a pooling stride equal to 2 and an atroue dilation rate (or stride) equal to 2 and are equivalent to a pooling

410

stride equal to 4. After this morphological layer, we apply a convolution layer with a weight of size 3 × 3. This last layer allows us to be sure to bring out most of the spatial information.

After having learned spatial descriptors, we apply two projection convolu- tional layers that do not bring spatial information. These layers increase the

415

depth of the spatial descriptors from 32 to 64. However, the spatial descriptor

has still a size 4 times smaller than the original one. Hence, we need to perform

(24)

a deconvolution and a projection. As a deconvolution layer, we apply a bilinear interpolation to return a feature space at the original size. Finally, we apply the last convolution layer with a weight of size 3 × 3. This last convolutional

420

layer is not followed by any Relu activation function, contrary to the other con- volutional layers. It brings spatial information to correct the interpolation and perform the deconvolution.

We evaluate the edge detection with the F1-score. The F1-score is the har- monic average of the precision and recall and ranges between 0 and 1, with

425

higher values being better. We evaluate the F1 score for each image of the test set of BSDS500 [55] at different thresholds for the edge prediction. We apply different thresholds since our results are edge probabilities with values between zero and one, as one can see in Figure 10. The closest to 1 the edge value is, the more probable this edge is correct. Edge detection accuracy is evaluated using

430

the Optimal Dataset Scale (ODS), where one chooses the optimal threshold for the entire dataset, the Optimal Image Scale (OIS) being the case where one chooses the optimal threshold per-image. The OIS is always a bit better than the ODS, since it considers the best scale for each image. The OIS and ODS would be equal in the optimistic situation where the optimal threshold would

435

be the same for the whole dataset. For more information about these classical measures for edge detection we refer to [55].

On Table 4 we compare our results for edge detection compared to other algorithms [56, 57, 58, 59, 60, 61]. We can see that thanks to our network we reach human performances on OIS. Regarding the second best network HED

440

[61], they use multiple scales from a VGG network [65], hence using up to 138 millions weights, while we have just learned 11175 weights. So, the most interesting aspect is to note the extreme simplicity of our network in comparison to most state-of-the-art networks. In addition, the VGG network that they used is potentially difficult to train, so they choose to finetune it from weights

445

already trained to classify images on ImageNet [41]. Their procedure of training

is thus longer than ours since, and involve external data. In addition, our

training procedure is much simpler since it does not require any pre-trained

(25)

network. Our network relies mostly on the morphological operators, that learn the perfect gradient, then based on this low resolution gradient we correct it

450

with convolution 1,2,3 and 4.

image GT output

Figure 10: We have illustrated the results of our edge detection on two images. Our results is a grey scale value between 0 and 1. We threshold at different value to get the F measure.

We can see that morphological layers seem to be perfectly adapted to learn edges from images. Not only they allow us to reduce the number of weights, and so make us save time, but they also allow us to have really good results.

We show such results on natural images, it could be interesting to try it on

455

microscopic images where the data have a Poisson noise. The morphological network that we propose did not have better results than GAN network [66].

But one can see the discriminative part of the GAN as a perceptual loss trying to make that the output of the generator looks like gradient images. We did not try GAN since they need more images to train, secondly the GAN is based on a

460

generator and a discriminator network. We compare just the generator networks

(26)

ODS OIS

Human 0.80 0.80

Canny 0.60 0.64

Felz-Hutt[56] 0.61 0.64 gPb-owt-ucm[55] 0.73 0.76

SCG[58] 0.74 0.76

DeepNets[59] 0.74 0.76 DeepEdge[60] 0.75 0.77 DeepContour[57] 0.76 0.77

HED[61] 0.78 0.80

DMCNN (ours) 0.79 0.80

Table 4: Table compiling on the results on the test set of BSDS500 [55].

and considered that if wanted one could add to our generator a discriminator.

6. Conclusion

In this work, we have introduced new innovative layers based on mathemat- ical morphology that can be used within a deep neural network. Based on these

465

layers, we proposed first to learn the pooling on DCNN for image classification.

Then we saw how such layers might be used to denoise images on a DCNN containing only morphological operators. Finally, we exposed their usefulness to learn object boundaries. The first experience teaches us that we can use this layer to replace a pooling layer. The second experiment shows us two major

470

points: first that these morphological layers extract information and that they can thus replace in some cases convolutional layers, secondly that these layers might be more suitable for salt-and-pepper denoising than the convolutional layers. Finally, the last experiment proves on the edge detection task that com- bining these morphological layers with convolutional ones can be the key to

475

future research to have lighter and better networks. Hence, each of these three

(27)

Figure 11: We have represented in (A) the four structuring functions learned on the first layer and on (B) the four structuring functions learned on the second layer used to denoise MNIST.

We can see the DCNN learn shapes, that have some symmetry. These shapes are the basis of the structuring elements.

experiments presented good results that proved the interest of these new layers.

Based on these good results, we are going to investigate in future works how to use such layers within a generative adversarial network. They might indeed bring out interesting information to generate new images. We could also

480

use the boundary detection morphological network to calculate superpixels and compare them with the state of the art. We can also try to see if this network can be used to enhance results in image processing tasks already well performed by not learned MM operators.

References

485

[1] W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, et al., Deepid-net: Deformable deep convolutional neu- ral networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2403–2412.

[2] Y. Duan, J. Lu, Z. Wang, J. Feng, J. Zhou, Learning deep binary descrip-

490

tor with multi-quantization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1183–1192.

[3] J. Serra, Image analysis and mathematical morphology, v. 2.

(28)

[4] P. Soille, Morphological image analysis: principles and applications, Springer Science & Business Media, 2013.

495

[5] S. Velasco-Forero, J. Angulo, P. Soille, Conditional toggle mappings: prin- ciples and applications, Journal of mathematical imaging and vision 48 (3) (2014) 544–565.

[6] Z. Yu-Qian, G. Wei-Hua, C. Zhen-Cheng, T. Jing-Tian, L. Ling-Yun, Med- ical images edge detection based on mathematical morphology, in: 2005

500

IEEE engineering in medicine and biology 27th annual conference, IEEE, 2006, pp. 6492–6495.

[7] E. Decenciere, Restauration automatique de films anciens, Ph.D. thesis, Ecole Nationale Sup´ ´ erieure des Mines de Paris (1997).

[8] M. Van Droogenbroeck, H. Talbot, Fast computation of morphological op-

505

erations with arbitrary structuring elements, Pattern recognition letters 17 (14) (1996) 1451–1460.

[9] F. Meyer, S. Beucher, Morphological segmentation, Journal of visual com- munication and image representation 1 (1) (1990) 21–46.

[10] P. Salembier, F. Marqu´ es, M. Pardas, J. R. Morros, I. Corset, S. Jeannin,

510

L. Bouchard, F. Meyer, B. Marcotegui, Segmentation-based video coding system allowing the manipulation of objects, IEEE Transactions on Circuits and Systems for Video Technology 7 (1) (1997) 60–74.

[11] A. Dufour, C. Ronse, J. Baruthio, O. Tankyevych, H. Talbot, N. Passat, Morphology-based cerebrovascular atlas, in: 2013 IEEE 10th International

515

Symposium on Biomedical Imaging, IEEE, 2013, pp. 1210–1214.

[12] X. Zhang, G. Thibault, E. Decenci` ere, G. Quellec, G. Cazuguel, A. Erginay, P. Massin, A. Chabouis, Spatial normalization of eye fundus images, in:

ISBI 2012: 9th IEEE International Symposium on Biomedical Imaging, IEEE, 2012.

520

(29)

[13] P. Dokl´ adal, I. Bloch, M. Couprie, D. Ruijters, R. Urtasun, L. Garnero, Topologically controlled segmentation of 3d magnetic resonance images of the head by using morphological operators, Pattern recognition 36 (10) (2003) 2463–2478.

[14] M. Baccar, L. Gee, R. C. Gonzalez, M. A. Abidi, Segmentation of range

525

images via data fusion and morphological watersheds, Pattern Recognition 29 (10) (1996) 1673–1687.

[15] G. Franchi, J. Angulo, Comparative study on morphological principal com- ponent analysis of hyperspectral images, in: 2014 6th Workshop on Hy- perspectral Image and Signal Processing: Evolution in Remote Sensing

530

(WHISPERS), IEEE, 2014, pp. 1–4.

[16] G. Cavallaro, Spectral-spatial classification of remote sensing optical data with morphological attribute profiles using parallel and scalable methods, Ph.D. thesis (2016).

[17] G. Cavallaro, N. Falco, M. Dalla Mura, J. A. Benediktsson, Automatic

535

attribute profiles, IEEE Transactions on Image Processing 26 (4) (2017) 1859–1872.

[18] M. Fauvel, J. Chanussot, J. A. Benediktsson, J. R. Sveinsson, Spectral and spatial classification of hyperspectral data using svms and morpholog- ical profiles, in: 2007 IEEE International Geoscience and Remote Sensing

540

Symposium, IEEE, 2007, pp. 4834–4837.

[19] J. A. Benediktsson, M. Pesaresi, K. Amason, Classification and feature ex- traction for remote sensing images from urban areas based on morphologi- cal transformations, IEEE Transactions on Geoscience and Remote Sensing 41 (9) (2003) 1940–1949.

545

[20] S. Velasco-Forero, J. Angulo, Classification of hyperspectral images by ten-

sor modeling and additive morphological decomposition, Pattern Recogni-

tion 46 (2) (2013) 566–577.

(30)

[21] Y. Tarabalka, J. Chanussot, J. A. Benediktsson, Segmentation and classi- fication of hyperspectral images using watershed transformation, Pattern

550

Recognition 43 (7) (2010) 2367–2379.

[22] A. Plaza, P. Martinez, R. Perez, J. Plaza, A new approach to mixed pixel classification of hyperspectral imagery based on extended morphological profiles, Pattern Recognition 37 (6) (2004) 1097–1116.

[23] S. Drouyer, S. Beucher, M. Bilodeau, M. Moreaud, L. Sorbier, Sparse stereo

555

disparity map densification using hierarchical image segmentation, in: In- ternational symposium on mathematical morphology and its applications to signal and image processing, Springer, 2017, pp. 172–184.

[24] G. Franchi, J. Angulo, M. Moreaud, L. Sorbier, Enhanced edx images by fusion of multimodal sem images using pansharpening techniques, Journal

560

of microscopy 269 (1) (2018) 94–112.

[25] I. Bloch, H. Maˆıtre, Fuzzy mathematical morphologies: a comparative study, Pattern recognition 28 (9) (1995) 1341–1387.

[26] G. Franchi, J. Angulo, Morphological principal component analysis for hyperspectral image analysis, ISPRS International Journal of Geo-

565

Information 5 (6) (2016) 83.

[27] A. Fehri, S. Velasco-Forero, F. Meyer, Characterizing images by the gromov-hausdorff distances between derived hierarchies, in: 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp.

1213–1217. doi:10.1109/ICIP.2018.8451522.

570

[28] M. Wirth, M. Fraschini, J. Lyon, Contrast enhancement of microcal- cifications in mammograms using morphological enhancement and non- flat structuring elements, in: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems, IEEE, 2004, pp. 134–139.

[29] J. Masci, J. Angulo, J. Schmidhuber, A learning framework for morpholog-

575

ical operators using counter–harmonic mean, in: International Symposium

(31)

on Mathematical Morphology and Its Applications to Signal and Image Processing, Springer, 2013, pp. 329–340.

[30] F. Saeedan, N. Weber, M. Goesele, S. Roth, Detail-preserving pooling in deep networks, in: Proceedings of the IEEE Conference on Computer Vi-

580

sion and Pattern Recognition, 2018, pp. 9108–9116.

[31] D. Mellouli, T. M. Hamdani, M. B. Ayed, A. M. Alimi, Morph-cnn: A morphological convolutional neural network for image classification, in: In- ternational Conference on Neural Information Processing, Springer, 2017, pp. 110–117.

585

[32] S. Wolf, L. Schott, U. Kothe, F. Hamprecht, Learned watershed: End-to- end learning of seeded segmentation, in: Proceedings of the IEEE Interna- tional Conference on Computer Vision, 2017, pp. 2011–2019.

[33] S. Wolf, C. Pape, A. Bailoni, N. Rahaman, A. Kreshuk, U. Kothe, F. Ham- precht, The mutex watershed: Efficient, parameter-free image partitioning,

590

in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 546–562.

[34] R. Mondal, S. Santra, B. Chanda, Dense morphological network: An uni- versal function approximator, arXiv preprint arXiv:1901.00109.

[35] Y. Zhang, S. Blusseau, S. Velasco-Forero, I. Bloch, J. Angulo, Max-plus

595

operators applied to filter selection and model pruning in neural networks, arXiv preprint arXiv:1903.08072.

[36] V. Charisopoulos, P. Maragos, Morphological perceptrons: geometry and training algorithms, in: International Symposium on Mathematical Mor- phology and Its Applications to Signal and Image Processing, Springer,

600

2017, pp. 3–15.

[37] L. J. Van Vliet, Robust local max-min filters by normalized power-weighted

filtering, in: Proceedings of the 17th International Conference on Pattern

Recognition, 2004. ICPR 2004., Vol. 1, IEEE, 2004, pp. 696–699.

(32)

[38] J. Angulo, Pseudo-morphological image diffusion using the counter-

605

harmonic paradigm, in: International Conference on Advanced Concepts for Intelligent Vision Systems, Springer, 2010, pp. 426–437.

[39] J. Angulo, Morphological bilateral filtering and spatially-variant adaptive structuring functions, in: International Symposium on Mathematical Mor- phology and Its Applications to Signal and Image Processing, Springer,

610

2011, pp. 212–223.

[40] P. S. Bullen, Handbook of means and their inequalities, Vol. 560, Springer Science & Business Media, 2013.

[41] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large- scale hierarchical image database, in: 2009 IEEE conference on computer

615

vision and pattern recognition, Ieee, 2009, pp. 248–255.

[42] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog- nition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[43] J. Serra, Image Analysis and Mathematical Morphology, Academic Press,

620

Inc., Orlando, FL, USA, 1983.

[44] S. R. Sternberg, Grayscale morphology, Computer vision, graphics, and image processing 35 (3) (1986) 333–355.

[45] P. Sermanet, Y. LeCun, Traffic sign recognition with multi-scale convolu- tional networks., in: IJCNN, 2011, pp. 2809–2813.

625

[46] M. D. McDonnell, T. Vladusich, Enhanced image classification with a fast- learning shallow convolutional neural network, in: 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, 2015, pp. 1–7.

[47] B. Graham, Fractional max-pooling. in arxiv: cs, arXiv preprint arXiv:1412.6071.

630

(33)

[48] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.

[49] A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Tech. rep., Citeseer (2009).

635

[50] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.

[51] Y. Wang, J. Wang, X. Song, L. Han, An efficient adaptive fuzzy switch- ing weighted mean filter for salt-and-pepper noise removal, IEEE Signal

640

Processing Letters 23 (11) (2016) 1582–1586.

[52] M. Gonz´ alez-Hidalgo, S. Massanet, A. Mir, D. Ruiz-Aguilera, Improving salt and pepper noise removal using a fuzzy mathematical morphology- based filter, Applied Soft Computing 63 (2018) 167–180.

[53] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al., Image quality

645

assessment: from error visibility to structural similarity, IEEE transactions on image processing 13 (4) (2004) 600–612.

[54] B. Fu, X. Zhao, Y. Li, X. Wang, Y. Ren, A convolutional neural networks denoising approach for salt and pepper noise, Multimedia Tools and Ap- plications 1–15.

650

[55] P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hier- archical image segmentation, IEEE transactions on pattern analysis and machine intelligence 33 (5) (2011) 898–916.

[56] P. F. Felzenszwalb, D. P. Huttenlocher, Efficient graph-based image seg- mentation, International journal of computer vision 59 (2) (2004) 167–181.

655

[57] W. Shen, X. Wang, Y. Wang, X. Bai, Z. Zhang, Deepcontour: A deep

convolutional feature learned by positive-sharing loss for contour detection,

(34)

in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3982–3991.

[58] R. Xiaofeng, L. Bo, Discriminatively trained sparse code gradients for con-

660

tour detection, in: Advances in neural information processing systems, 2012, pp. 584–592.

[59] J. Kivinen, C. Williams, N. Heess, Visual boundary prediction: A deep neural prediction network and quality dissection, in: Artificial Intelligence and Statistics, 2014, pp. 512–521.

665

[60] G. Bertasius, J. Shi, L. Torresani, Deepedge: A multi-scale bifurcated deep network for top-down contour detection, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, 2015, pp. 4380–4389.

[61] S. Xie, Z. Tu, Holistically-nested edge detection, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1395–1403.

670

[62] S. Beucher, F. Meyer, The morphological approach to segmentation: the watershed transformation, Optical Engineering-New York-Marcel Dekker Incorporated- 34 (1992) 433–433.

[63] S. Mukhopadhyay, B. Chanda, Multiscale morphological segmentation of gray-scale images, IEEE transactions on Image Processing 12 (5) (2003)

675

533–549.

[64] G. Franchi, J. Angulo, Bagging stochastic watershed on natural color image segmentation, in: International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, Springer, 2015, pp.

422–433.

680

[65] K. Simonyan, A. Zisserman, Very deep convolutional networks for large- scale image recognition, arXiv preprint arXiv:1409.1556.

[66] H. Yang, Y. Li, X. Yan, F. Cao, Contourgan: Image contour detection with generative adversarial network, Knowledge-Based Systems 164 (2019) 21–28.

685