Stacked Sparse Non-Linear Blind Source Separation

(1)

HAL Id: hal-02088465

https://hal.archives-ouvertes.fr/hal-02088465

Preprint submitted on 2 Apr 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Stacked Sparse Non-Linear Blind Source Separation

Christophe Kervazo, Jérôme Bobin

To cite this version:

Christophe Kervazo, Jérôme Bobin. Stacked Sparse Non-Linear Blind Source Separation. 2019.

�hal-02088465�

(2)

Stacked Sparse Non-Linear Blind Source Separation

Christophe Kervazo

CEA Saclay

Gif-sur-Yvette, 91191 cedex, France Email: christophe.kervazo@cea.fr

J´erˆome Bobin

CEA Saclay

Gif-sur-Yvette, 91191 cedex, France Email: jerome.bobin@cea.fr

Abstract—Linear Blind Source Separation (BSS) has known a tremen-dous success in fields ranging from biomedical imaging to astrophysics. In this work, we however depart from the usual linear setting and tackle the case in which the sources are mixed by an unknown non-linear function. We propose to use a sequential decomposition of the data enabling its approximation by a linear-by-part function. Beyond separating the sources, the introduced StackedAMCA can further empirically learn in some settings an approximation of the inverse of the unknown non-linear mixing, enabling to reconstruct the sources despite a severely ill-posed problem. The quality of the method is demonstrated experimentally, and a comparison is performed with state-of-the art non-linear BSS algorithms.

I. INTRODUCTION

Blind Source Separation is a major tool to learn meaningful decompositions of multivalued data [1], [2]. Most of the work has been dedicated to linear BSS: m observations are linear combinations of n sources of t samples. In matrix form, X = AS + N with X (size m × t) the observation matrix corrupted with some unknown noise N, S (n × t) the sources and A (m × n – here, n ≤ m) the mixing matrix. The goal of linear BSS is to recover A and S from X up to a permutation and scaling indeterminacy. While this is ill-posed, the sparsity prior [3] – assuming the sources to have many zero coefficients – has been shown to lead to high separation quality [4], [5]. Less work has however been done on non-linear BSS, where: X = f (S) + N (1) With f an unknown non-linear function from Rn×tto Rm×t. Here, we will consider general functions f , by mostly assuming that f is in-vertible and symmetrical around the origin, as well as regular enough (i.e. L-Lipschitz with L small and not deviating from a linear mixing too fast as a function of the source amplitude). Despite increased indeterminacies than in the linear case, [6] claimed the possibility to recover sparse sources up to a non-linear function h under some conditions. Our approach is fully different from manifold clustering ones [7], [6], and also differs from neural network ones [8] as it brings a geometrical interpretation (and uses the mixing regularity) and an automatic hyperparameter choice (potentially enabling an increased robustness and building on the linear BSS litterature [4]).

II. PROPOSEDMETHOD: STACKEDAMCA

StackedAMCA is geometrically described in the case n = 2 (two sources S1 and S2); the generalization is straightforward. Due to

the morphological diversity assumption [9], when plotting S1 as a

function of S2 most of the source coefficients lie on the axes. Once

mixed by f , these are transformed into n non-linear one dimensional (1D) manifolds [6], [7], each corresponding to one source. To separate the sources, the goal is then to back-project each manifold on one of the axes. We propose to do this through a linear-by-part approximation of the 1D-manifolds, which is inverted.

More specifically, the lowest amplitude data coefficients can be well approximated by a classical linear model because of the regularity

assumption on f (cf. Fig. 1(b)). Finding such a low-amplitude approximation can be done using a sparse linear BSS algorithm, provided that this one is robust to the higher amplitude non-linearities. Improving this linear approximation is then done by discriminating using a non-linear selection step the (highly non-linear) samples that are not well linearly separated, creating a new dataset R comprehending only them. A new linear model corresponding to a new section of the whole linear-by-part approximation is then fitted to the lowest amplitude samples of R, and the whole process iterated. More details concerning the two main steps are given below, and the algorithm is summarized.

Linear BSS: We use AMCA algorithm [4], which discards the highly non-linear samples, considering them as partial correlations (samples with sources simultaneously active). It thus finds at iteration l a good linear model ˆA(l)of the lowest amplitude samples of R(l). Non-linear selection step: The goal is to discriminate within X the samples which are badly explained by the curent linear-by-part model, creating a new dataset R(l+1)_{. To limit the error propagation}

in the algorithm, at each iteration we start back from X and unrolls the manifolds using the knowledge of all the linear models that have been computed in iterations 1...l (while a naive approach working directly on the previous residual R(l+1)_{would only use each}

individual linear model one by one, not fully taking into account the gain yielded by all the iterations until the current one). Once the manifolds are unrolled, the badly separated samples are roughly the ones that are far from the axes, which makes selecting them easy. StackedAMCA(X)

R(1)_{= X}

for l in 1...L:

• Linear step: estimates Â(l)with AMCA (cf. [4] for weights W and other definitions, and details concerning the estimation): min ˆ A(l)_,S 1 2Tr[(R (l) − Â(l)S)W(R(l)− Â(l)S)T]+ n X j=1 kλj Sjk1+ιkAˆj_k 2=1 ( Â)

• Non-linear selection step: compute R(l+1)

– Unroll manifolds in X using ˆA(1)... ˆA(l): result denoted Xu

– From Xu, select highly non-linear samples and compute

R(l+1) through soft-thresholding R(l+1)= S(Xu)

return ˆA(1)_{... ˆ}_A(L)

III. EXPERIMENTAL RESULTS AND CONCLUSION

Table I and Fig. 3 show that StackedAMCA separation is better than with other state-of-art algorithms on Fig. 2 data. Interestingly, the algorithm structure enables to reconstruct the sources, bypassing the indeterminacy by h (cf. Table II). We are able to give sufficient conditions for the source reconstruction, which will be discussed at the conference along with more details, the required conditions on the mixing, an interpretation in terms of stacked network architecture and the results of more experiments (with n > 2).

(3)

X1 X 2 (a) Xu1 X u2 (b)

Fig. 1. Left: A non-linear mixing of sparse sources. The dashed arrows correspond to the mixing directions found by a linear model. The colors, corresponding to each source, are displayed for explaning the distortion introduced by the mixing f of the source samples but are unknown in a blind setting. Right: in blue, the output of the manifold unfolding at the first iteration is displayed (at the first iteration, it coincides with inverting the linear model of Fig. 1(a) – the process for the following iterations is more complicated but the principle is the same). In addition, the red square delimits the low amplitude sample areas where the linear model is a good approximation, which means the areas where the samples of the unfolded manifolds almost lie on the axes. In brief, computing the residual is done by removing in Xuthe contribution

within the red square by soft-thresholding.

X1

X

2

Fig. 2. Dataset X corresponding to the non-linearity f found in [6].

ˆ S1

S1

ˆ S1

S1

Fig. 3. Comparison of StackedAMCA results with other state-of-art methods on the mixing of Fig. 2. The scatter plot of one estimated source as a function of the corresponding true source is displayed for 4 methods: StackedAMCA, MISEP [10], NFA [11] and ANICA [8]. In brief, the scatter plot of well separated sources should ressemble a 1D-manifold (corresponding to h), which is the case for StackedAMCA and MISEP. For well reconstructed sources, it should ressemble a scaled version of the identity, which is only the case using StackedAMCA. Upper left: StackedAMCA, Upper right: MISEP, Down left:NFA, Down right: ANICA.

METHOD Cmed Csq Cang STACKEDAMCA 49.6 49.3 38.9

MISEP 26.7 44.8 18.3

NFA 16.7 30.9 4.09

ANICA 19.9 34.4 1.56

TABLE I

SEPARATION QUALITY OFSTACKEDAMCA, MISEP, NFAANDANICA.

ROUGHLY SPEAKING, CmedCORRESPONDS TO THE MEDIAN DISTANCE OF

THE ESTIMATED SOURCES TO THEhINDETERMINACY FUNCTION,AND

CsqTO ANEUCLIDIAN DISTANCE. ALTERNATIVELY, CangIS MEASURING

THE AVERAGE ANGLE OF THE SAMPLES WITH THE AXES. FOR THE SAKE

OF CLARITY,THE DISPLAYED RESULTS ARE-10LOG(.)OF THE

CORRESPONDING METRIC AND THUS THE HIGHER VALUE,THE BETTER

THE SEPARATION. METHOD MSE ME STACKEDAMCA 46.0 32.6 MISEP 34.3 20.7 NFA 30.1 14.1 ANICA 19.4 −0.206 TABLE II

RECONSTRUCTION QUALITY OF THE4METHODS. A

MEAN-SQUARED-ERROR AND AMEDIAN-ERROR IS USED(AND-10

LOG(.)OF THESE VALUES IS DISPLAYED). THE OTHER ALGORITHMS DO

NOT RECONSTRUCT AS WELL THE SOURCES ASSTACKEDAMCADOES.

ON THE CONTRARY,THE GOOD RESULTS OFSTACKEDAMCAINDICATE

THAT THE ALGORITHM STRUCTURE WAS SUFFICIENT TO REGULARIZE

WELL THE RECONSTRUCTION PROBLEM.

REFERENCES

[1] C. F´evotte, N. Bertin, and J.-L. Durrieu, “Nonnegative matrix factor-ization with the Itakura-Saito divergence: With application to music analysis,” Neural computation, vol. 21, no. 3, pp. 793–830, 2009. [2] J. Bobin, F. Sureau, J.-L. Starck, A. Rassat, and P. Paykari, “Joint

PLANCK and WMAP CMB map reconstruction,” Astronomy & Astrophysics, vol. 563, p. A105, 2014.

[3] M. Zibulevsky and B. A. Pearlmutter, “Blind source separation by sparse decomposition in a signal dictionary,” Neural Computation, vol. 13, no. 4, pp. 863–882, 2001.

[4] J. Bobin, J. Rapin, A. Larue, and J.-L. Starck, “Sparsity and adaptivity for the blind separation of partially correlated sources.” IEEE Transanc-tions on Signal Processing, vol. 63, no. 5, pp. 1199–1213, 2015. [5] C. Kervazo, J. Bobin, and C. Chenot, “Blind separation of a large number

of sparse sources,” Signal Processing, vol. 150, pp. 157–165, 2018. [6] B. Ehsandoust, B. Rivet, C. Jutten, and M. Babaie-Zadeh, “Nonlinear

blind source separation for sparse sources,” in Signal Processing Con-ference (EUSIPCO), 2016 24th European. IEEE, 2016, pp. 1583–1587. [7] M. Puigt, A. Griffin, and A. Mouchtaris, “Nonlinear blind mixture identification using local source sparsity and functional data clustering,” in Sensor Array and Multichannel Signal Processing Workshop (SAM), 2012 IEEE 7th. IEEE, 2012, pp. 481–484.

[8] P. Brakel and Y. Bengio, “Learning independent features with adversarial nets for non-linear ICA,” arXiv preprint arXiv:1710.05050, 2017. [9] J. Bobin, J.-L. Starck, J. M. Fadili, and Y. Moudden, “Sparsity and

morphological diversity in blind source separation,” IEEE Transactions on Image Processing, vol. 16, no. 11, pp. 2662–2674, 2007.

[10] L. B. Almeida, “Misep–linear and nonlinear ica based on mutual information,” Journal of Machine Learning Research, vol. 4, no. Dec, pp. 1297–1318, 2003.

[11] A. Honkela, H. Valpola, A. Ilin, and J. Karhunen, “Blind separation of nonlinear mixtures by variational bayesian learning,” Digital Signal Processing, vol. 17, no. 5, pp. 914–934, 2007.