HAL Id: hal-02922299
https://hal.archives-ouvertes.fr/hal-02922299
Submitted on 26 Aug 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires
Deep morphological networks
Gianni Franchi, Amin Fehri, Angela Yao
To cite this version:
Gianni Franchi, Amin Fehri, Angela Yao. Deep morphological networks. Pattern Recognition, Else-
vier, 2020, 102, pp.107246. �10.1016/j.patcog.2020.107246�. �hal-02922299�
Deep morphological networks
Gianni Franchi
a,∗, Amin Fehri
b, Angela Yao
ca
gianni.franchi@u-psud.fr,
SATIE, Universit Paris-Sud, Universit Paris-Saclay
b
amin.fehri@gmail.com
c
ayao@comp.nus.edu.sg,
School of Computing National University of Singapore
Abstract
Mathematical morphology provides powerful nonlinear operators for a variety of image processing tasks such as filtering, segmentation, and edge detection.
In this paper, we propose a way to use these nonlinear operators in an end-to- end deep learning framework and illustrate them on different applications. We demonstrate on various examples that new layers making use of the morpholog- ical non-linearities are complementary to convolution layers. These new layers can be used to integrate the non-linear operations and pooling into a joint oper- ation. We finally enhance results obtained in boundary detection using this new family of layers with just 0.01% of the parameters of competing state-of-the-art methods.
Keywords: Mathematical Morphology; deep learning; edges detection;
denoising.
1. Introduction
Modern deep learning approaches learn parameters of a series of linear and non-linear operators for a given task. The concatenation of these operators over multiple layers increases the depth thereby generalization power of neural net- works. Similar to previous works [1, 2], our paper tries to incorporate new types
5
∗
Corresponding author
of layers for deep learning: in our case, these layers are based on Mathematical Morphology.
Mathematical Morphology (MM) [3, 4] applies topological operators to im- ages to recover or filter out specific structures. It has led to important suc- cesses in many computer vision tasks, such as filtering [5, 6, 7][8], segmentation
10
[9, 10, 11, 12][13, 14], semantic segmentation [15, 16, 17, 18, 19, 20] [21, 22], im- age fusion [23, 24][25], feature extraction [26, 15, 16, 17, 27] and edge detection[9]
. With this paper, we are the first, to the best of our knowledge, to integrate exact morphological operators within a deep learning end-to-end framework.
Commonly known MM operators have flat structuring elements [3, 4] and
15
share the same weights: their effect then depends only on the shape of the structuring elements. To go further, one can use nonflat structuring elements, which has led to performance enhancement [28]. In all cases, selecting the structuring elements’ shapes and weights often involve tedious trial-and-error, particularly for non-experts.
20
Given the similarities between morphological and convolutional operators, the idea of directly learning these shapes and weights within a neural network naturally arises. However, directly extending such operators into neural net- works is non-trivial, since the max and min operations are non-differentiable.
One possibility is to replace them with differentiable approximations, such as
25
the counter-harmonic mean [29, 30] so that they can be easily integrated into a neural networks such as Deep Conovlutional Neural Networks (DCNN). Note that one of the major drawbacks of counter-harmonic mean operators is their exploding gradients, due to the power function on which they are based. This problem is not encountered when directly using max or min operators, as we
30
propose in this paper.
In this work, we propose a way to integrate the original morphological opera- tors such as erosion and dilation within a deep learning framework. To motivate the appeal of such operators, one may note that the max pooling is simply a morphological dilation with a square structuring element followed by a pooling.
35
Although pooling is essential to build spatial invariance in a DCNN, it may lead
to an important loss of information, as illustrated on Figure 1. On the first row of this figure, one can see a time signal, and on the second row its Fourier trans- form. As one can see, the max pooling loses too much information, hence the interest of learning the pooling instead, which can be done using our proposal.
40
On the contrary to [29, 31, 30] our work does not just approximate the max pooling. Moreover, we propose a new layer that can be used as a pooling layer as well as to replace the convolutional layer (as proven in section 1).
More broadly, inspired by recent works where the authors try to train ap- proximations of MM operators in a deep learning framework [32, 33, 30, 34], we
45
propose and study several pipelines incorporating exact MM operators in such a framework. Our main contributions are:
• A novel morphological deep learning framework with learned MM opera- tors: to the best of our knowledge, we are the first to attempt to learn the weights of exact MM operators end-to-end in a deep learning framework;
50
• a replacement of the standard max pooling in convolutional neural net- works with a learned morphological pooling that proves to be experimen- tally beneficial;
• a mixed morphological and convolutional neural network that performs edge detection with state-of-the-art results. In addition, this network is
55
trained from scratch, on the contrary to state-of-the-art methods that make use of pretrained weights;
• a fully morphological neural network for image denoising that presents better performance than similar fully convolutional neural network for this task.
60
Morphological operators that are key ingredients of image analysis can now
be used as components of deep learning frameworks. Notably, we can use them
on the tasks where they proved to be efficient, but this time learn their optimal
parameters for given types of images and problems.
signal x(t) flat maxpool x(t) non-flat maxpool x(t)
Figure 1: We have illustrated on the first row one signal and its result with the flat max pooling layer of size 3 × 1, which is the classical max pool, and with the non flat max pooling layer of size 3 × 1, where the weights have been changed so that we find the original signal.
The second row contains the Fourier signal link to the first row. We can see that putting non uniform weight allow one to don’t loose to much frequency while contracting the signal.
2. Related work
65
The use of morphological operators in neural networks is not common but has been explored both in the context of single unit perceptrons as well as multi-unit deep networks.
2.1. Morphological Operators in Perceptrons
Works [30, 35, 34] on the first category investigates the use of the perceptron
70
in the (max, +) and (min,+) algebra. The morphological perceptron layer proposed in [35] is a fully connected layer, followed by a max-plus layer [36]:
the max-plus blocks proposed in this work consists in adding to the output of a fully connected layer several bias vectors, and then taking the maximum of these vectors to provide new neurons, and so on.
75
Contrary to this morphological perceptron, which layers are fully connected
and thus capture global information (since all neurons are connected), our pro-
posal allows to extract local information using exact morphological features.
Being local allows our layers to be lighter than morphological perceptron. This is why it can be used for a variety of tasks and not only classification, as we
80
shall see further in the paper.
2.2. Morphological Operators in Deep Networks
Recently, morphological operations have been explored in the context of deep learning frameworks for image denoising and classification [29, 31, 30]. All three works are based on the approximation of morphological operations, since the
85
minimum and maximum were approximated with the counter-harmonic mean (CHM) [37, 38, 39]. The CHM is a power mean studied in [40, 39], where the signal is approximated as a weighted sum of the signal raised to the power p, with p a fixed integer. Then this sum is normalized by the weighted sum of the signal raised to the power p − 1. Masci et al. [29] were the first to propose integrating
90
CHM filters directly into CNNs and showed that the parameters of the CHM filter could be learned via standard back-propagation. We note also [31], which uses these CHM operators to classify digits.Finally, [30] proposes the CHM as an alternative to the standard max-pooling and places it into a CNN. In using a CHM filter, the authors claim that image details are better preserved
95
across pooling operations, and as such, retain more relevant information for classifying images. They achieve state-of-the-art results for image classification on ImageNet [41] with a Resnet101 [42] architecture.
While the CHM faithfully approximates the min and max operation, it does not fully encapsulate MM operations. As such, the approach of [30] applies only
100
as a replacement for the pooling layer. Our approach differs in that we do not need to approximate the min or max operation: we directly integrate the entire MM operation directly into the neural network, encapsulating both the standard convolution and max pooling into a single morphological operation. Our work is thus more general in this respect. Furthermore, our proposal does not suffer
105
from exploding gradients that can stem out the use of a power function as they
propose. The training step in our proposal is thus made simpler.
3. Morphological features
3.1. Notation
We begin by introducing the notation used in this paper. Let E be a subset
110
of the discrete space Z
2, which represents the support space of a 2D image, and F ⊆ R
dbe a set of pixels values in dimension d. Higher order tensors are denoted as (I , S , . . .), with the order of tensor I ∈ R
n1×n2×...×nJbeing J . Moreover, if I ∈ R
n1×n2×n3, for i ∈ [1, n
3], I
:,:,irepresents a matrix of size n
1× n
2where the third component is equal to i.
115
3.2. Operator Definitions
Mathematical morphology operators are non-linear image operators based on the spatial structure of the image. Let f be a grey scale image which can be represented as a function, with the intensity at position x be denoted as f (x).
The two basic operations in morphology are the grey-level erosion and dilation
120
[43, 4], defined on their discreet version respectively as :
ε
b(f )(x) = min
i∈[−n,n](f (x + i) − b
i), (1) δ
b(f )(x) = max
i∈[−n,n](f(x − i) + b
i), (2) where b = [b
−n, . . . , b
n] is a structuring element. The structuring element impacts a morphological operation by both the geometry of its support and its weights. In its simplest form, these weights have the same value over the support shape and are equal to zero elsewhere: this corresponds to the commonly known
125
flat structuring elements [43].
By concatenating these two basic morphological operators, it is possible to obtain more evolved operators such as opening and closing [43], defined respectively as:
γ
b(f ) = δ
b(ε
b(f )) , (3)
ϕ
b(f ) = ε
b(δ
b(f )) . (4)
The opening (resp. closing) γ of an image f by a structuring element b is
130
a filter which removes the bright (resp. dark) structures of the images which do not fit the support shape of the structuring element b. As such, all features smaller than this support shape are removed.
One can use as algebraic opening (resp. closing) any operator that verifies the following properties:
135
• Openings are anti-extensive: γ
b(f ) ≤ f
• Openings are idempotent: γ
b(γ
b(f )) = γ
b(f )
• Openings are increasing operators: if f ≤ g then γ
b(f ) ≤ γ
b(g)
Since closing operators are the dual of openings, it directly follows that the only property differing between the two is the first, i.e.while openings are anti-
140
extensive, closings are extensive, such that ϕ
b(f ) ≥ f . 3.3. Alternate Sequential Filters (ASF)
Image filters can in turn be created from opening and closing operations.
Notably, morphological Alternate Sequential Filters (ASF), composed from a series of increasing (resp. decreasing, cf. above for definition of an increasing
145
operator) filtering operators such as dilations (resp. erosions) with structuring functions having a larger and larger support shape [44], have proven to be useful for multiscale analysis and filtering of images [3, chapter 10]. They progressively simplify images, removing noise and small details and textures, while preserving textures. As such, the ASF are useful for denoising and can also be seen as
150
multi-scale representations of images.
More formally, an ASF is obtained from two families of operators: one family of openings {γ
bl} 1 ≤ l ≤ n
1, and one family of closings {ϕ
bl}, 1 ≤ l ≤ n
2. The index l is associated with the scale of the structuring function, i.e.b
l⊂ b
l+1. The ASF can be defined as:
ASF
λ= γ
bλϕ
bλ. . . γ
b1ϕ
b1, (5)
when starting with a closing, and:
ASF
λ= ϕ
bλγ
bλ. . . ϕ
b1γ
b1, (6) when starting with an opening, with λ corresponding to the number of opening and closing.
4. Morphological neural network 4.1. Morphological back-propagation
155
Let us consider the following objective function for an image f which has undergone erosion:
L(b) = X
x∈E
kε
b(f )(x) − gk, (7) with g a signal we want to reach, that can be an image. There are different techniques to find the optimal structuring function b = {b
−n, b
0, . . . , b
n} to minimize this objective function. Here we focus on gradient-based optimization techniques such as stochastic gradient descent and its variants. Hence we need to differentiate L(b) with respect to b.
160
However, the min operator is not directly differentiable. We tackle this problem in the same way that max pooling is handled in standard convolutional neural networks. Consider the case for erosion. One notes that for each position x in the image, we can find an index i
x∗belonging to {−n, −n + 1, . . . , n − 1, n}
with a minimum value, such that this index is the best one for the minimization
165
problem at hand. This results in:
˜
ε
b(f)(x) = f (x + i
x∗) − b
ix∗. (8) In other words, the resulting eroded version of the image or feature map is the value at a position x + i
x∗minus the structuring function at position i
x∗.For any position x there exists a i
x∗verifying this property.
Thus, based on this, we can derive the loss function according to b
ix∗:
170
∂ ε ˜
b(f )(x)
∂b
ix∗= −n
ix∗(9)
with n
ix∗equal one or zeros, it is equal to one if b
ix∗is the best index for the erosion task at position x
To be clearer let us consider as an example a 1D signal f composed of 10 val- ues {f (x
i)}
9i=0. Let us write h the results of the erosion with the following struc- turing element b = (b
−1, b
0, b
1). This structuring element is a simple line of size
175
3. Let us consider that the erosion is equal to h = (f(x
0) − b
0, f(x
0) − b
−1, f(x
3) − b
1, f (x
3) − b
0, f(x
4) − b
0, f(x
4) − b
−1,
f (x
5) − b
−1, f (x
6) − b
−1, f (x
9) − b
1, f (x
9) − b
0). Based on this formulation of h we have that
∂h
∂b
−1= (0, −1, 0, 0, 0, −1, −1, −1, 0, 0) (10)
∂h
∂b
0= (−1, 0, 0, −1, −1, 0, 0, 0, 0, −1) (11)
∂h
∂b
1= (0, 0, −1, 0, 0, 0, 0, 0, −1, 0) (12) Let us consider that the loss we want to optimize is equal to L = kh − gk =
180
P
9i=0
(h(x
i) − g(x
i))
2. Then the derivative of the loss according to the different elements of the structuring elements is equal to :
∂L
∂b
j=
9
X
i=0
(h(x
i) − g(x
i)) × ∂h
∂b
j(x
i) (13)
with j ∈ {−1, 0, 1}. In the case of the erosion the
∂b∂hj
(x
i) is equal to 0 or −1, in the case we deal with the dilation it is equal to 0 or 1, finally in the case of the convolution it is equal to f (x
i− j). From this equation we can see that the
185
erosion and dilation does not bring instability since their gradient are bounded, contrary to the one of the convolution. Hence these morphological layers are not hard to train.
One can see that the derivative of the eroded image according to a structuring element has a tendency to be equal to 0 or −1, and for a dilation to 0 or 1. We
190
can see the dual effect of the dilation and erosion also on the derivative. Hence
it might be interesting to combine them on a neural network to take advantage
of this dual effect.
(a) (b) (c) (d) (e) (f)
Figure 2: Different structuring elements of size 3 used for initialization of morphological layers in this work. For the boundary detection application we initiate the structuring element with the ones represented in (a),(b),(c),(e), and (f). For denoising, we use the structuring elements (e) and (f), except that for the structuring element (f) we increase it size to 5×5 instead of 3×3.
Based on the demonstration above, we are thus provided with a way to differentiate the morphological operators during the backward pass of a deep
195
learning framework. This allows for an end-to-end learning of these operators, which can be useful for various stages of a neural network, as we shall see in the next sections.
4.2. Morphological layers
In this section, we present the different types of morphological layers illus-
200
trated in this paper. Morphological layers used in a Deep learning framework are composed of at least one morphological operator. We start by explaining how to initialize and train such layers, and then describe the ones we experiment with in detail.
4.2.1. Initialization
205
Out of simplicity and because in most works, flat structuring elements are used, we also initialize our morphological layers with flat structures. Figure 2 shows the different initialization we chose, with values of either 0s or
m1, where m is the number of non-zero elements in the structuring element. This provides a compromise between traditional MM and deep learning, since the network starts
210
with a standard shape that is then adjusted during the optimization process.
The structuring element weights of the morphological layers can also be ran-
domly initiated. Experimentally, we observe that using a uniform distribution
provide better results than using a normal distribution.
These two types of structuring element initialization are useful in different
215
setups. Two main scenarios emerge in our experiments: (i) in the first one we have images as inputs and outputs of the network, for example in denoising or boundary detection; (ii) in the second one the network takes images as inputs and outputs vectors, as in classification problems. In the first case, constraining the structuring elements to have values initialized following a shape leads to
220
better refined output images. In the second case, since the goal is not to be visually meaningful but to extract descriptors as discriminative as possible for further use, the structuring element can be initialized randomly instead.
4.2.2. Training
Morphological networks have more extreme behaviors than the convolutional
225
ones. Hence we propose to clip the gradient of the morphological layers to smooth the update of the structuring element. This technique leads to impor- tant result improvements. Note however that gradient clipping is not needed for the denoising task, and is only used for the CIFAR classification and the boundary detection ones. To train the morphological layers, we incorporate the
230
morphological operators provided in Tensorflow in our networks, and then we treat them just like regular layers.
4.2.3. Morphological Pooling layers
In some sense, one can say that DCNNs already use morphological operators, since a max-pooling is nothing but a dilation with a flat square structuring
235
element followed by a pooling. One of the first experiments that come to mind is thus to replace the flat dilation with a learned morphological operator. The motivation behind this idea is that while pooling is a sub-sampling procedure that usually aggregates the most important information, it might sometimes lead to a loss of useful information. This is why we propose to learn the dilation and
240
erosion used in the pooling, to optimize its effect in the network. As illustrated
in Figure 3, we thus apply in parallel both the erosion and dilation to the signal
before pooling. These results are then concatenated and projected back to the original size using a 1 × 1 convolution. Both the structuring element weights as well as the projection weights are learned. To improve the results, we add a
245
batch normalization layer in the morphological pooling: it is applied after the erosion and the dilation and before their concatenation.
To sum up, morphological pooling layers are used in the DCNN, with the underlying idea that they might increase the generalization power of the DCNN and thus increase its performance. So the neural networks in this subsection
250
can be seen as a mix of DMNN and DCNN. This is why we refer to them as Deep Morphological and Convolutional Neural Network (DMCNN).
Figure 3: The morphological pooling operator. The erosion and dilation share the same struc- turing element and are applied in parallel to the input data. The results of these morphological operators are concatenated and projected back to the original size with a 1 × 1 convolution.
4.2.4. Morphological learned descriptor layer
In this subsection, we deal with a more complex idea, that consists in training a DCNN where all the spatial term comes from MM operators. Simply stated,
255
we can replace the convolutions from a CNN with morphological operators, which parameters are then fine-tuned for the task at hand. In this case, there might be 1 × 1 convolutions to increase the number of neurons. We refer to the resulting network as to a Deep Morphological Neural Network (DMNN).
A Deep Morphological Neural Network is simply composed of projection
260
layers and morphological layers that were traditionally used only to perform the
pooling. Hence there is no spatial information extracted through convolutional layers. In this paper, we prove that morphological layers are useful not only to perform pooling but also to extract information similarly to convolutional ones.
5. Experiments
265
5.1. Image classification with morphological pooling
In image classification, the goal is to learn a mapping from images to class labels. It is a task where we have at least two sets: the training set {(f
i, y
i)}
ni=11and the testing set {(f
i, y
i))}
ni=12, where f
iare images and y
iis a one-hot vector encoding of the class label of f
i. State-of-the-art methods for image classification
270
use DCNN [45, 46, 47]. In this section, we show how a conventional DCNN can be adapted to incorporate morphological operators.
We demonstrate the interest of DMCNN (a DCNN where we replace the max pooling with a morphological pooling) on two standard classification bench- marks: MNIST [48] and CIFAR10 [49]. MNIST is a large dataset of handwritten
275
digits that is commonly used for training and testing in the machine learning field. It is composed of 60,000 images for the training set and 10,000 images for the testing set. Each image of MNIST is a grey scale image of size 28 × 28.
The number of class of MNIST data is equal to 10. CIFAR10 is another image dataset commonly used in machine learning research to evaluate algorithms. It
280
is composed of 50,000 training RGB images and 10,000 RGB test images. The images are of size 32 × 32. The dataset has 10 classes (e.g. airplane or bird).
As an evaluation criterion, we use the accuracy, that measures the proportion of correct predictions. It is equal to the number of test images which classes have been correctly predicted divided by the total number of test images.
285
We compare the DCNN and DMCNN on the classification task for similar architectures, i.e. with the same number of layers and the same depth. For the MNIST classification task, the two compared networks are represented in Figure 4. These networks both have three convolutional layers, followed by a ReLU activation function. For the DCNN, the first two convolution layers
290
are followed by a max pooling, while on the DMCNN they are followed by morphological pooling without batch normalization. Hence, these two networks are comparable.
For CIFAR10, we use the Resnet32 [42] as the DCNN. We transform it into a DMCNN by reducing the convolutions with stride 2 to stride 1 and adding a
295
morphological pooling with batch normalization.
(A) DMCNN
(B) DCNN
Figure 4: (A) A representation of the Morphological DCNN used to classify ,(B) A represen- tation of the reference CNN used to classify MNIST.
The results obtained for these two classification tasks with both a DCNN and a DCMNN are presented in Table 1. The results are slightly improved by using a morphological pooling, which proves that learning the pooling using our proposal can be interesting for classification tasks.
300
To better understand the effect of this morphological pooling layer, we visu- alize in Figure 5 the structuring function learned during the MNIST classifica- tion. We can note that the learnt structuring element weights are quite different from classical DCNN weights. In fact, the weights of convolutional layers are typically similar to SIFT or gradients [50], while these morphological weights
305
seem to learn shape information. This shows that the morphological layer tries to learn a representation that is a bit flat.
In the next section, we will show how replacing the convolutional layers
with morphological layers can lead to improving results for certain types of
image denoising.
310
(A) First layer
(B) Second layer
Figure 5: Visualizations of (A) 25 of the 32 structuring functions learned on the first layer and (B) 50 of the 64 structuring functions learned on the second layer of the DMCNN used to classify MNIST.
DCNN DMCNN
Accuracy MNIST
99.4 99.7
Accuracy CIFAR
96.5 96.6
Table 1: Table compiling the results of the classification task on the dataset MNIST and CIFAR10.
5.2. Image Denoising
Morphological operators have been proven in the past to be useful for denois- ing images [6, 7]. Noise present in an image may differ depending on its gener- ating source or transmission channel. Noise that alter images can be divided in three main categories: additive (e.g. Gaussian noise), multiplicative (e.g. speckle
315
noise) and impulse (e.g. salt-and-pepper noise). Morphological operators are es- pecially efficient when it comes to impulse noise such as salt-and-pepper noise.
Several state-of-the-art methods for salt-and-pepper denoising [51, 52] consist of first detecting the noisy pixels in the image, and then applying a denoising filter at the noise locations. In the filtering stage, fuzzy morphological operators
320
such as alternate sequential filters have proven to be very effective [52]. In this regard, the ability to learn from data the parameters of these filters leading to the best denoising would be very useful.
In this section, we compare the standard DCNN with a DMNN with a similar number of parameters and show the efficacy of the end-to-end morphological
325
filter learning for denoising.
In the context of image denoising, let us consider two images of the same dimension M × N : f the original noise-free image, and g the restored image for which some filter has been applied. Three commonly-used quantitative evalua- tion measures for denoising are the mean squared error (MSE), peak signal-to-
330
noise ratio (PSNR) and the Structural SIMiliarity (SSIM) index.
MSE is computed using the following expression:
M SE = P
m,n
(f (m, n) − g(m, n))
2M × N (14)
PSNR is defined as:
P SN R = 10 log
10R
2M SE
, (15)
where R is the maximum fluctuation in the input image.
Finally, the SSIM [53] index was introduced to evaluate the structural sim- ilarity between two images, with the motivation that structural changes are what most affects human visual perception. SSIM compares the original and the distorted image considering three main features: luminance, contrast and the structural comparison and is defined as:
SSIM (f, g) = (2µ
fµ
g+ C
f)
(µ
2f+ µ
2g+ C
f) · (2σ
f g+ C
g)
(σ
2f+ σ
g2+ C
g) , (16)
where µ
fand σ
f(resp. µ
gand σ
g) are the mean and variance of the greyscale
values of the image f (resp. g), σ
f gis the covariance of f and g, and C
f=
(k
1· L)
2and C
g= (k
2· L)
2two variables to stabilize the division by a weak
335
denominator, with L the dynamic range of the pixel-values taken by f and g.
By default we have k
1= 0.01, k
2= 0.03, and in our case L = 255.
We first implement an experiment upon the MNIST dataset [48]. We com- pare the denoising efficiency of two similar DCNN and DMNN, with voluntarily simple architectures with few parameters, presented in Figure 6, and which
340
have similar number of parameters for a fair comparison. Note that the goal of this comparison is to provide a proof-of-concept of the denoising capability of morphological layers. Further work might include such layers in more complex architectures to aim for state-of-the-art results by replacing or being combined with convolutional ones [54].
345
(a)
(b)
Figure 6: Architectures considered for the denoising experiment. (a) DCNN: 4 3 × 3 convolu- tions - 77 parameters (b) DMNN - alternate sequential filter structure with 3 × 3 structuring functions - 82 parameters.
For these different networks, we thus compare the quality of the denoising
obtained when training them for different levels of salt-and-pepper noise and MSE as a loss. Since we provided a way to actualize weights of morphological operators within a gradient-based optimization technique, we can indeed train these two networks on the same task and compare the results. The training is
350
made using Adam, a batch size of 100 and a learning rate of 0.02.
Results are presented in Table 2 for the two networks, different denoising metrics and different levels of added salt-and-pepper noise. One can note that results obtained by the DMNN are better than those obtained by the DCNN.
This confirms the potential of such operators for filtering problems, and the
355
interest of being able to train them end-to-end for a given filtering task.
MSE (&) PSNR (%) SSIM (%)
Noise level DCNN DMNN DCNN DMNN DCNN DMNN
10% 0.015 0.009 18.71 20.89 0.9994 0.9995
20% 0.022 0.016 16.64 18.34 0.9988 0.9990
30% 0.030 0.022 15.38 16.89 0.9980 0.9985
40% 0.038 0.029 14.35 15.66 0.9972 0.9978
50% 0.047 0.040 13.44 14.26 0.9961 0.9966
60% 0.058 0.051 12.59 13.19 0.9945 0.9953
70% 0.069 0.063 11.86 12.29 0.9930 0.9934
80% 0.080 0.079 11.25 11.36 0.9907 0.9908
90% 0.089 0.088 10.83 10.85 0.9888 0.9894
Table 2: Table compiling salt-and-pepper denoising results on MNIST dataset [48] for both the DCNN and DMNN architectures, for the MSE, PSNR and SSIM metrics, and different levels of salt-and-pepper noise.
To confirm the interest of DMNN for denoising, we conduct the same de- noising experiment on natural images from the Berkeley Segmentation Dataset BSDS500 [55], for an experiment similar to the one conducted in [54]. The noise is applied randomly on each channel of the considered RGB images. We keep
360
the same architecture for both the DCNN and the DMNN. Results are presented
(A) First layer
(B) Second layer
Figure 7: We have represented in (A) the four structuring functions learned on the first layer and on (B) the four structuring functions learned on the second layer used to denoise MNIST.
We can see the DCNN learn shapes, that have some symmetry. These shapes are based on the structuring elements.
in Table 3, and show again slightly better results with the morphological ap- proach, thus confirming the interest of our approach. Visual results for a high level of noise are also provided in Figure 8.
5.3. Edge detection
365
Both the classification and denoising tasks established the interest of using morphological layers in deep learning architectures. To further demonstrate the strength of morphological operators in such networks, we propose a morpholog- ical encoder-decoder framework to extract image edges that takes advantage of the DMCNN architecture previously described.
370
Traditional algorithms [56, 57, 58, 59, 60, 61] that perform well on this task are obtained by fine-tuning deep neural networks previously trained on Imagenet [41].
Interestingly, our DMCNN model is trained from scratch, and yet achieves results competitive with the state-of-the-art. We train and evaluate our method
375
on the BSDS500 dataset [55], using the provided training set, test set and their
evaluation metrics.
image image with added salt-and-pepper noise (50%) denoised image
Figure 8: Illustration of the denoising performed by the DMNN for several images of the
BSDS500 dataset, for an added salt-and-pepper noise of 50%.
MSE (&) PSNR (%) SSIM (%)
Noise level DCNN DMNN DCNN DMNN DCNN DMNN
10% 0.007 0.006 21.58 22.09 0.9993 0.9995
20% 0.010 0.008 20.27 21.15 0.9991 0.9993
30% 0.011 0.010 19.73 20.21 0.9992 0.9991
40% 0.017 0.013 17.89 18.97 0.9985 0.9990
50% 0.021 0.017 17.19 18.13 0.9980 0.9982
60% 0.027 0.022 15.78 17.02 0.9972 0.9978
70% 0.039 0.035 14.24 15.05 0.9956 0.9961
80% 0.057 0.049 12.60 13.62 0.9930 0.9944
90% 0.080 0.077 11.62 11.67 0.9903 0.9905
Table 3: Table compiling salt-and-pepper denoising results on the BSDS500 dataset [55]
for both the DCNN and DMNN architectures, for the MSE, PSNR and SSIM metrics, and different levels of salt-and-pepper noise.
Figure 9: DMCNN used for the edge detection experiment. The blue box represents the fixes (i.e. not learned) operators.
We hereby provide some details on the network architecture that is used, which is illustrated in Figure 9. First, we project the image from the RGB to the Lab colour space. We then generate a mosaic image: first by using the
380
watershed transform [62] resulting in a simplified version of the image with clus- tered pixels, then by retaining as unique value on each cluster the average pixel value on this cluster. To segment the image and obtain clusters, we need a gradient for the watershed transform. It is normally impossible to compute a gradient for a color image, so we convert the image into the LAB color space.
385
Then on each channel we apply 45 erosions and dilations, which have structuring
elements at different scales and orientations. More precisely, we use a structur- ing element line of size 3, 7, 9, 11, 13. On each of these lines we apply one of the following rotation : O
◦, 22.5
◦, 45
◦, 67.5
◦, 90
◦, 112.5
◦, 135
◦, 157.5
◦. Hence we have 45 structuring elements. After that, we calculate all the gradients with all
390
the structuring elements, and for a given scale we have 8 orientation gradients.
In order to produce a gradient by scale, we take for each pixel the maximum value of the orientations gradient. Hence we have 5 gradients (one for each scale) that we average to obtain the final gradient. This multi-scale morpho- logical segmentation is quite common and used in various publications [63, 64],
395
the major difference comes in the way to merge the scales. We conclude this preprocessing by normalizing the values between 0 and 1.
After the preprocessing of the image, we apply mainly three kinds of lay- ers. The first ones are the spatial layers, which are composed of morphological layers and one convolution layer. We apply a 5 morphological pooling where
400
the weights are fixed to +1 for the dilation and −1 for the erosion. It means that we calculate 5 morphological gradients dilation − erosion. The structuring elements used for these 5 morphological operators are drawn in Figure 2(a), 2(b),2(c),2(e), and 2(f). This morphological pooling extract most of the spatial detail information. In addition, the stride of this morphological pooling is equal
405
to 2. We realize that increasing the stride to 4 enhances the results. Then, simi- larly to semantic segmentation algorithms, we used atroue dilation and erosion.
Atroue layers allow the network to increase the receptive field without adding pooling. Hence morphological pooling layers have a pooling stride equal to 2 and an atroue dilation rate (or stride) equal to 2 and are equivalent to a pooling
410
stride equal to 4. After this morphological layer, we apply a convolution layer with a weight of size 3 × 3. This last layer allows us to be sure to bring out most of the spatial information.
After having learned spatial descriptors, we apply two projection convolu- tional layers that do not bring spatial information. These layers increase the
415
depth of the spatial descriptors from 32 to 64. However, the spatial descriptor
has still a size 4 times smaller than the original one. Hence, we need to perform
a deconvolution and a projection. As a deconvolution layer, we apply a bilinear interpolation to return a feature space at the original size. Finally, we apply the last convolution layer with a weight of size 3 × 3. This last convolutional
420
layer is not followed by any Relu activation function, contrary to the other con- volutional layers. It brings spatial information to correct the interpolation and perform the deconvolution.
We evaluate the edge detection with the F1-score. The F1-score is the har- monic average of the precision and recall and ranges between 0 and 1, with
425
higher values being better. We evaluate the F1 score for each image of the test set of BSDS500 [55] at different thresholds for the edge prediction. We apply different thresholds since our results are edge probabilities with values between zero and one, as one can see in Figure 10. The closest to 1 the edge value is, the more probable this edge is correct. Edge detection accuracy is evaluated using
430
the Optimal Dataset Scale (ODS), where one chooses the optimal threshold for the entire dataset, the Optimal Image Scale (OIS) being the case where one chooses the optimal threshold per-image. The OIS is always a bit better than the ODS, since it considers the best scale for each image. The OIS and ODS would be equal in the optimistic situation where the optimal threshold would
435
be the same for the whole dataset. For more information about these classical measures for edge detection we refer to [55].
On Table 4 we compare our results for edge detection compared to other algorithms [56, 57, 58, 59, 60, 61]. We can see that thanks to our network we reach human performances on OIS. Regarding the second best network HED
440
[61], they use multiple scales from a VGG network [65], hence using up to 138 millions weights, while we have just learned 11175 weights. So, the most interesting aspect is to note the extreme simplicity of our network in comparison to most state-of-the-art networks. In addition, the VGG network that they used is potentially difficult to train, so they choose to finetune it from weights
445
already trained to classify images on ImageNet [41]. Their procedure of training
is thus longer than ours since, and involve external data. In addition, our
training procedure is much simpler since it does not require any pre-trained
network. Our network relies mostly on the morphological operators, that learn the perfect gradient, then based on this low resolution gradient we correct it
450
with convolution 1,2,3 and 4.
image GT output
Figure 10: We have illustrated the results of our edge detection on two images. Our results is a grey scale value between 0 and 1. We threshold at different value to get the F measure.
We can see that morphological layers seem to be perfectly adapted to learn edges from images. Not only they allow us to reduce the number of weights, and so make us save time, but they also allow us to have really good results.
We show such results on natural images, it could be interesting to try it on
455
microscopic images where the data have a Poisson noise. The morphological network that we propose did not have better results than GAN network [66].
But one can see the discriminative part of the GAN as a perceptual loss trying to make that the output of the generator looks like gradient images. We did not try GAN since they need more images to train, secondly the GAN is based on a
460
generator and a discriminator network. We compare just the generator networks
ODS OIS
Human 0.80 0.80
Canny 0.60 0.64
Felz-Hutt[56] 0.61 0.64 gPb-owt-ucm[55] 0.73 0.76
SCG[58] 0.74 0.76
DeepNets[59] 0.74 0.76 DeepEdge[60] 0.75 0.77 DeepContour[57] 0.76 0.77
HED[61] 0.78 0.80
DMCNN (ours) 0.79 0.80
Table 4: Table compiling on the results on the test set of BSDS500 [55].
and considered that if wanted one could add to our generator a discriminator.
6. Conclusion
In this work, we have introduced new innovative layers based on mathemat- ical morphology that can be used within a deep neural network. Based on these
465
layers, we proposed first to learn the pooling on DCNN for image classification.
Then we saw how such layers might be used to denoise images on a DCNN containing only morphological operators. Finally, we exposed their usefulness to learn object boundaries. The first experience teaches us that we can use this layer to replace a pooling layer. The second experiment shows us two major
470
points: first that these morphological layers extract information and that they can thus replace in some cases convolutional layers, secondly that these layers might be more suitable for salt-and-pepper denoising than the convolutional layers. Finally, the last experiment proves on the edge detection task that com- bining these morphological layers with convolutional ones can be the key to
475
future research to have lighter and better networks. Hence, each of these three
Figure 11: We have represented in (A) the four structuring functions learned on the first layer and on (B) the four structuring functions learned on the second layer used to denoise MNIST.
We can see the DCNN learn shapes, that have some symmetry. These shapes are the basis of the structuring elements.
experiments presented good results that proved the interest of these new layers.
Based on these good results, we are going to investigate in future works how to use such layers within a generative adversarial network. They might indeed bring out interesting information to generate new images. We could also
480
use the boundary detection morphological network to calculate superpixels and compare them with the state of the art. We can also try to see if this network can be used to enhance results in image processing tasks already well performed by not learned MM operators.
References
485
[1] W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, et al., Deepid-net: Deformable deep convolutional neu- ral networks for object detection, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 2403–2412.
[2] Y. Duan, J. Lu, Z. Wang, J. Feng, J. Zhou, Learning deep binary descrip-
490
tor with multi-quantization, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1183–1192.
[3] J. Serra, Image analysis and mathematical morphology, v. 2.
[4] P. Soille, Morphological image analysis: principles and applications, Springer Science & Business Media, 2013.
495
[5] S. Velasco-Forero, J. Angulo, P. Soille, Conditional toggle mappings: prin- ciples and applications, Journal of mathematical imaging and vision 48 (3) (2014) 544–565.
[6] Z. Yu-Qian, G. Wei-Hua, C. Zhen-Cheng, T. Jing-Tian, L. Ling-Yun, Med- ical images edge detection based on mathematical morphology, in: 2005
500
IEEE engineering in medicine and biology 27th annual conference, IEEE, 2006, pp. 6492–6495.
[7] E. Decenciere, Restauration automatique de films anciens, Ph.D. thesis, Ecole Nationale Sup´ ´ erieure des Mines de Paris (1997).
[8] M. Van Droogenbroeck, H. Talbot, Fast computation of morphological op-
505
erations with arbitrary structuring elements, Pattern recognition letters 17 (14) (1996) 1451–1460.
[9] F. Meyer, S. Beucher, Morphological segmentation, Journal of visual com- munication and image representation 1 (1) (1990) 21–46.
[10] P. Salembier, F. Marqu´ es, M. Pardas, J. R. Morros, I. Corset, S. Jeannin,
510
L. Bouchard, F. Meyer, B. Marcotegui, Segmentation-based video coding system allowing the manipulation of objects, IEEE Transactions on Circuits and Systems for Video Technology 7 (1) (1997) 60–74.
[11] A. Dufour, C. Ronse, J. Baruthio, O. Tankyevych, H. Talbot, N. Passat, Morphology-based cerebrovascular atlas, in: 2013 IEEE 10th International
515
Symposium on Biomedical Imaging, IEEE, 2013, pp. 1210–1214.
[12] X. Zhang, G. Thibault, E. Decenci` ere, G. Quellec, G. Cazuguel, A. Erginay, P. Massin, A. Chabouis, Spatial normalization of eye fundus images, in:
ISBI 2012: 9th IEEE International Symposium on Biomedical Imaging, IEEE, 2012.
520
[13] P. Dokl´ adal, I. Bloch, M. Couprie, D. Ruijters, R. Urtasun, L. Garnero, Topologically controlled segmentation of 3d magnetic resonance images of the head by using morphological operators, Pattern recognition 36 (10) (2003) 2463–2478.
[14] M. Baccar, L. Gee, R. C. Gonzalez, M. A. Abidi, Segmentation of range
525
images via data fusion and morphological watersheds, Pattern Recognition 29 (10) (1996) 1673–1687.
[15] G. Franchi, J. Angulo, Comparative study on morphological principal com- ponent analysis of hyperspectral images, in: 2014 6th Workshop on Hy- perspectral Image and Signal Processing: Evolution in Remote Sensing
530
(WHISPERS), IEEE, 2014, pp. 1–4.
[16] G. Cavallaro, Spectral-spatial classification of remote sensing optical data with morphological attribute profiles using parallel and scalable methods, Ph.D. thesis (2016).
[17] G. Cavallaro, N. Falco, M. Dalla Mura, J. A. Benediktsson, Automatic
535
attribute profiles, IEEE Transactions on Image Processing 26 (4) (2017) 1859–1872.
[18] M. Fauvel, J. Chanussot, J. A. Benediktsson, J. R. Sveinsson, Spectral and spatial classification of hyperspectral data using svms and morpholog- ical profiles, in: 2007 IEEE International Geoscience and Remote Sensing
540
Symposium, IEEE, 2007, pp. 4834–4837.
[19] J. A. Benediktsson, M. Pesaresi, K. Amason, Classification and feature ex- traction for remote sensing images from urban areas based on morphologi- cal transformations, IEEE Transactions on Geoscience and Remote Sensing 41 (9) (2003) 1940–1949.
545
[20] S. Velasco-Forero, J. Angulo, Classification of hyperspectral images by ten-
sor modeling and additive morphological decomposition, Pattern Recogni-
tion 46 (2) (2013) 566–577.
[21] Y. Tarabalka, J. Chanussot, J. A. Benediktsson, Segmentation and classi- fication of hyperspectral images using watershed transformation, Pattern
550
Recognition 43 (7) (2010) 2367–2379.
[22] A. Plaza, P. Martinez, R. Perez, J. Plaza, A new approach to mixed pixel classification of hyperspectral imagery based on extended morphological profiles, Pattern Recognition 37 (6) (2004) 1097–1116.
[23] S. Drouyer, S. Beucher, M. Bilodeau, M. Moreaud, L. Sorbier, Sparse stereo
555
disparity map densification using hierarchical image segmentation, in: In- ternational symposium on mathematical morphology and its applications to signal and image processing, Springer, 2017, pp. 172–184.
[24] G. Franchi, J. Angulo, M. Moreaud, L. Sorbier, Enhanced edx images by fusion of multimodal sem images using pansharpening techniques, Journal
560
of microscopy 269 (1) (2018) 94–112.
[25] I. Bloch, H. Maˆıtre, Fuzzy mathematical morphologies: a comparative study, Pattern recognition 28 (9) (1995) 1341–1387.
[26] G. Franchi, J. Angulo, Morphological principal component analysis for hyperspectral image analysis, ISPRS International Journal of Geo-
565
Information 5 (6) (2016) 83.
[27] A. Fehri, S. Velasco-Forero, F. Meyer, Characterizing images by the gromov-hausdorff distances between derived hierarchies, in: 2018 25th IEEE International Conference on Image Processing (ICIP), 2018, pp.
1213–1217. doi:10.1109/ICIP.2018.8451522.
570
[28] M. Wirth, M. Fraschini, J. Lyon, Contrast enhancement of microcal- cifications in mammograms using morphological enhancement and non- flat structuring elements, in: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems, IEEE, 2004, pp. 134–139.
[29] J. Masci, J. Angulo, J. Schmidhuber, A learning framework for morpholog-
575
ical operators using counter–harmonic mean, in: International Symposium
on Mathematical Morphology and Its Applications to Signal and Image Processing, Springer, 2013, pp. 329–340.
[30] F. Saeedan, N. Weber, M. Goesele, S. Roth, Detail-preserving pooling in deep networks, in: Proceedings of the IEEE Conference on Computer Vi-
580
sion and Pattern Recognition, 2018, pp. 9108–9116.
[31] D. Mellouli, T. M. Hamdani, M. B. Ayed, A. M. Alimi, Morph-cnn: A morphological convolutional neural network for image classification, in: In- ternational Conference on Neural Information Processing, Springer, 2017, pp. 110–117.
585
[32] S. Wolf, L. Schott, U. Kothe, F. Hamprecht, Learned watershed: End-to- end learning of seeded segmentation, in: Proceedings of the IEEE Interna- tional Conference on Computer Vision, 2017, pp. 2011–2019.
[33] S. Wolf, C. Pape, A. Bailoni, N. Rahaman, A. Kreshuk, U. Kothe, F. Ham- precht, The mutex watershed: Efficient, parameter-free image partitioning,
590
in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 546–562.
[34] R. Mondal, S. Santra, B. Chanda, Dense morphological network: An uni- versal function approximator, arXiv preprint arXiv:1901.00109.
[35] Y. Zhang, S. Blusseau, S. Velasco-Forero, I. Bloch, J. Angulo, Max-plus
595
operators applied to filter selection and model pruning in neural networks, arXiv preprint arXiv:1903.08072.
[36] V. Charisopoulos, P. Maragos, Morphological perceptrons: geometry and training algorithms, in: International Symposium on Mathematical Mor- phology and Its Applications to Signal and Image Processing, Springer,
600
2017, pp. 3–15.
[37] L. J. Van Vliet, Robust local max-min filters by normalized power-weighted
filtering, in: Proceedings of the 17th International Conference on Pattern
Recognition, 2004. ICPR 2004., Vol. 1, IEEE, 2004, pp. 696–699.
[38] J. Angulo, Pseudo-morphological image diffusion using the counter-
605
harmonic paradigm, in: International Conference on Advanced Concepts for Intelligent Vision Systems, Springer, 2010, pp. 426–437.
[39] J. Angulo, Morphological bilateral filtering and spatially-variant adaptive structuring functions, in: International Symposium on Mathematical Mor- phology and Its Applications to Signal and Image Processing, Springer,
610
2011, pp. 212–223.
[40] P. S. Bullen, Handbook of means and their inequalities, Vol. 560, Springer Science & Business Media, 2013.
[41] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, Imagenet: A large- scale hierarchical image database, in: 2009 IEEE conference on computer
615
vision and pattern recognition, Ieee, 2009, pp. 248–255.
[42] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recog- nition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[43] J. Serra, Image Analysis and Mathematical Morphology, Academic Press,
620
Inc., Orlando, FL, USA, 1983.
[44] S. R. Sternberg, Grayscale morphology, Computer vision, graphics, and image processing 35 (3) (1986) 333–355.
[45] P. Sermanet, Y. LeCun, Traffic sign recognition with multi-scale convolu- tional networks., in: IJCNN, 2011, pp. 2809–2813.
625
[46] M. D. McDonnell, T. Vladusich, Enhanced image classification with a fast- learning shallow convolutional neural network, in: 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, 2015, pp. 1–7.
[47] B. Graham, Fractional max-pooling. in arxiv: cs, arXiv preprint arXiv:1412.6071.
630
[48] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.
[49] A. Krizhevsky, G. Hinton, Learning multiple layers of features from tiny images, Tech. rep., Citeseer (2009).
635
[50] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in neural information processing systems, 2012, pp. 1097–1105.
[51] Y. Wang, J. Wang, X. Song, L. Han, An efficient adaptive fuzzy switch- ing weighted mean filter for salt-and-pepper noise removal, IEEE Signal
640
Processing Letters 23 (11) (2016) 1582–1586.
[52] M. Gonz´ alez-Hidalgo, S. Massanet, A. Mir, D. Ruiz-Aguilera, Improving salt and pepper noise removal using a fuzzy mathematical morphology- based filter, Applied Soft Computing 63 (2018) 167–180.
[53] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al., Image quality
645
assessment: from error visibility to structural similarity, IEEE transactions on image processing 13 (4) (2004) 600–612.
[54] B. Fu, X. Zhao, Y. Li, X. Wang, Y. Ren, A convolutional neural networks denoising approach for salt and pepper noise, Multimedia Tools and Ap- plications 1–15.
650
[55] P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and hier- archical image segmentation, IEEE transactions on pattern analysis and machine intelligence 33 (5) (2011) 898–916.
[56] P. F. Felzenszwalb, D. P. Huttenlocher, Efficient graph-based image seg- mentation, International journal of computer vision 59 (2) (2004) 167–181.
655
[57] W. Shen, X. Wang, Y. Wang, X. Bai, Z. Zhang, Deepcontour: A deep
convolutional feature learned by positive-sharing loss for contour detection,
in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3982–3991.
[58] R. Xiaofeng, L. Bo, Discriminatively trained sparse code gradients for con-
660
tour detection, in: Advances in neural information processing systems, 2012, pp. 584–592.
[59] J. Kivinen, C. Williams, N. Heess, Visual boundary prediction: A deep neural prediction network and quality dissection, in: Artificial Intelligence and Statistics, 2014, pp. 512–521.
665
[60] G. Bertasius, J. Shi, L. Torresani, Deepedge: A multi-scale bifurcated deep network for top-down contour detection, in: Proceedings of the IEEE Con- ference on Computer Vision and Pattern Recognition, 2015, pp. 4380–4389.
[61] S. Xie, Z. Tu, Holistically-nested edge detection, in: Proceedings of the IEEE international conference on computer vision, 2015, pp. 1395–1403.
670
[62] S. Beucher, F. Meyer, The morphological approach to segmentation: the watershed transformation, Optical Engineering-New York-Marcel Dekker Incorporated- 34 (1992) 433–433.
[63] S. Mukhopadhyay, B. Chanda, Multiscale morphological segmentation of gray-scale images, IEEE transactions on Image Processing 12 (5) (2003)
675
533–549.
[64] G. Franchi, J. Angulo, Bagging stochastic watershed on natural color image segmentation, in: International Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, Springer, 2015, pp.
422–433.
680
[65] K. Simonyan, A. Zisserman, Very deep convolutional networks for large- scale image recognition, arXiv preprint arXiv:1409.1556.
[66] H. Yang, Y. Li, X. Yan, F. Cao, Contourgan: Image contour detection with generative adversarial network, Knowledge-Based Systems 164 (2019) 21–28.
685