I. I NTRODUCTION
BlindSourceSeparation is a major tool to learn meaningful decompositions of multivalued data , . Most of the work has been dedicated to linear BSS: m observations are linear combinations of n sources of t samples. In matrix form, X = AS + N with X (size m × t) the observation matrix corrupted with some unknown noise N, S (n × t) the sources and A (m × n – here, n ≤ m) the mixing matrix. The goal of linear BSS is to recover A and S from X up to a permutation and scaling indeterminacy. While this is ill-posed, the sparsity prior  – assuming the sources to have many zero coefficients – has been shown to lead to high separation quality , . Less work has however been done on non-linear BSS, where: X = f (S) + N (1) With f an unknown non-linear function from R n×t to R m×t . Here, we will consider general functions f , by mostly assuming that f is in- vertible and symmetrical around the origin, as well as regular enough (i.e. L-Lipschitz with L small and not deviating from a linear mixing too fast as a function of the source amplitude). Despite increased indeterminacies than in the linear case,  claimed the possibility to recover sparse sources up to a non-linear function h under some conditions. Our approach is fully different from manifold clustering ones , , and also differs from neural network ones  as it brings a geometrical interpretation (and uses the mixing regularity) and an automatic hyperparameter choice (potentially enabling an increased robustness and building on the linear BSS litterature ).
SparseBlindSourceSeparation (sBSS) plays a key role in scientific do- mains as different as biomedical imaging, remote sensing or astrophysics, which require the development of increasingly faster and scalable BSS meth- ods without sacrificing the separation performances. To that end, a new distributed sparse BSS algorithm is introduced based on a mini-batch ex- tension of the Generalized Morphological Component Analysis algorithm (GMCA). Precisely, it combines a robust projected alternate least-squares method with mini-batches optimization. The originality further lies in the use of a manifold-based aggregation of asynchronously estimated mixing ma- trices. Numerical experiments are carried out on realistic spectroscopic spec- tra, and highlight the ability of the proposed distributed GMCA (dGMCA) to provide very good separation results even when very small mini-batches are used. Quite unexpectedly, it can further outperform the (non-distributed) state-of-the-art methods for highly sparse sources.
I. I NTRODUCTION
A. BlindSourceSeparation problem
In the BSS  framework, the data are composed of m observations, each of which has t samples. These observations are supposed to be some linear combinations of n sources. The objective of BSS is to retrieve the sources as well as the mixing coefficients. In matrix form, the goal is therefore to find two matrices S (of size n × t) and A (of size m×n), called respectively the source and the mixing matrices, such that: X = AS + N, where X (of size m × t) is the observation matrix that is corrupted with some unkwown noise N. Since it requires tackling an ill-posed unsupervised matrix factorization problem, further assumptions are needed, including the statistical independance of the sources (ICA - ), the non-negativity of A and S . In this work, we will focus on the sparsity of the sources –. In this framework, sparse BSS will aim at finding a (local) minimum of:
Abstract—We consider the problem of blindsource separa- tion for underdetermined convolutive mixtures. Based on the multiplicative narrowband approximation in the time-frequency domain with the help of Short-Time-Fourier-Transform (STFT) and the sparse representation of the source signals, we formulate the separation problem into an optimization framework. This framework is then generalized based on the recently investigated convolutive narrowband approximation and the statistics of the room impulse response. Algorithms with convergence proof are then employed to solve the proposed optimization problems. The evaluation of the proposed frameworks and algorithms for synthesized and live recorded mixtures are illustrated. The proposed approaches are also tested for mixtures with input noise. Numerical evaluations show the advantages of the proposed methods.
A natural question is then the identifiability issue : using the previous priors on the sources, can we really hope to recover them ? Starting with the ICA methods, such questions have been well studied in the past and it has been for instance shown that the independence of the sources – in the absence of noise and Gaussian sources – en- ables to recover them up to a mere scaling and permutation indeterminacy (Darmois theorem [Darmois 1953]). Concerning the sparsity prior, conditions for recovery up to the same indeterminacies are studied in [Gribonval & Schnass 2010, Gribonval et al. 2015]. In the last work, the authors furthermore studied the non-asymptotical noisy setting in the presence of outliers, with potentially over-complete dictionaries (i.e. mixing matrices in our setting). In brief, they show that A ∗ can be recovered using the cost function II.8 , which is shown to have a local minimum around A ∗ with high probability. Among others, some of the required hypotheses are that S ∗ must be sparse enough (and follow other assumptions), A ∗ must be sufficiently inco- herent (depending on S ∗ sparsity level), the noise level as well as the outlier energy are limited, and the number of non-outlier samples large enough (there are also assumptions on the regularization parameters). Note that these conditions might however be slightly restrictive in realistic experiments, and might not thus be al- ways respected during this work.
2007 ), it is very rare that sparse sources both have non-zeros values at the same time. Therefore, when plotting the scatter plot of S 1 as a function of S 2 (cf. Fig. 1(a) ), most of the
source coefficients lie on the axes (in this work we even assume that all coefficients lie on the axes – this hypothesis is discussed in Sec. 4 ). Once mixed with the non-linear f , the source coefficients lying on the axes are transformed into n non-linear one dimensional (1D) manifolds ( Ehsandoust et al. , 2016 ; Puigt et al. , 2012 ), each manifold corresponding to one source (see Fig 1(b) ). To separate the sources, the idea is then to back-project each manifold on one of the axes. We propose to perform this back-projection by approximating the 1D-manifolds by a linear-by-part function, that we will invert. As evoked above, we then get separated sources which are only distorted through non-linear functions that do not remix them, called h in the following.
V. C ONCLUSION
This paper introduced a multichannel audio source separa- tion method based on exact convolutive mixture modeling and Student’s t source modeling. Within this framework we com- pared a sparse and an NMF-based source model. The semi- blind experimental evaluation demonstrated the importance of modeling the spectro-temporal characteristics of the sources instead of only assuming sparsity. Future work will focus on developing a fully blindsourceseparation method that exploits priors on the impulse response of the mixing filters. We could for example consider similar priors as in  for promoting sparsity and exponentially decaying envelop.
Index Terms— BlindSourceSeparation; Sparsity; Inde- pendant Component Analysis; Optimization
The instantaneous linear mixture model of BSS assumes that: x = As + e , (1) where x ∈ R M ×T and s ∈ R N ×T are the matrices of mixture channels and source signals respectively. A ∈ R M ×N is the mixing matrix and e ∈ R M ×T models the background noise. The ICA  methods are often applied when M ≥ N (over-determined case). These methods try to achieve sep- aration by minimizing an independence criterion between the components of the estimated sources. In the under- determined case (M < N ), two-steps methods based on sparsity are largely used : The mixing system is first es- timated using clustering methods , then the sources are estimated thanks to optimization approaches .
 Belouchrani, A., Abed-Meraim, K., Cardoso, J.F., Molines, E., A BSS technique using 2nd-order statistics, IEEE Trans. on Signal Processing 45, 434–44, 1997.  James, G.H., Carne, T.G., Lauffer, J.P., The natural excitation technique for modal
parameter extraction from operating wind turbines. SAND92-1666, UC-261, 1993.  Van Overschee, P., De Moor, B., Subspace Identification for Linear Systems:
Fig. 1. Sourceseparation performance over stereo instantaneous musical mixtures. STFT window length is 2048 (93 ms) and K = 8.
In this paper, we proposed a new framework for the blind audio source sepa- ration problem in the multichannel instantaneous mixture case. In this frame- work Spectral-GMM models of sources were blindly learned, i.e. without using any other informations than the mixture and the mixing matrix, with an EM algorithm having a linear O(N K) complexity, in contrast to some related state- of-the-art methods having an exponential O(K N ) complexity. As opposed to the
2.1. Rough estimation of the mixing matrix and source parameters 2.1.1. Tensor construction and CP model
In the style of , we exploit the event-related nature of the signals of interest to con- struct a data tensor with dimensions space, event-synchronized window, and time from the 2-dimensional measurements for each of Q event-related sources. To this end, for the q- th source, we identify L q event-synchronized windows of length T q of the corresponding time signal. This can, for instance, be achieved based on a characteristic pattern within each event- synchronized window that can be recognized in the measurements. This pattern also serves as a reference point to synchronize the signals of different patterns of the event-related source, such as the maximum amplitude in the case of impulsive signals. For each of the L q event- synchronized windows, one can thus extract a M × T q data matrix from the measurements. These matrices are then stacked along the second dimension of the tensor Y (q) ∈ R M ×L q × T q (see Figure 1).
III. M ETHODS A. Independent component analysis
The concept of ICA was introduced by Herault and Jut- ten , especially in order to solve the BSS problem. In the mid 90s, Comon presented a mathematical formulation of ICA . During the past 25 years a wealth of algorithms have been proposed  and ICA-based methods have now been extensively and successfully applied to solve many prac- tical real-life problems (see  for details). More precisely, assuming the linear observation model of equation (1), ICA consists in looking for an overdetermined (N × P) mixing matrix A (i.e. P is smaller than or equal to N) and a P- dimensional sources s[k] which components are the most sta- tistically independent as possible. In other words, the goal of ICA is to find a (P × N), full rank, separator matrix, W , such that the output signal y[k] =W x[k] is an estimate of the source vector s[k]. Our previous works  showed that, compared
di . (4)
If the slope values were exact (and the mixture were perfectly modeled by (2), which is not the case), it can be shown that x = As, which is a linear mixture of the two sources which can then be solved by classical linear BSS algorithms. This results in the classical PNL approach in which the difficult part is to estimate the parameters that correctly cancel the nonlinearity - in this case the slopes d. However, in our previous work we had simply used the theoretical slope value, and together with the fact that (2) only approximately models the mixture, we can only claim that the modified set x is an approximately linear mixture of the sources. The idea, then, was to represent x as a second degree polynomial, in which the quadratic terms can correct small nonlinearities in the modified mixture. The mixture was represented then as
listeners would entertain the active listening experience if they could access to it from usual stereo format. This is the goal of the DReaM project. 1 This project aims
at developping sourceseparation methods that enable the efficient separation of many different sound sources from only 2-channel mixture signals. Hence we enter the so- called underdetermined difficult (because non-invertible) configuration of the sourceseparation problem. More- over, the mixture signals should be realistic music pieces, approaching professional music production, and the sep- aration should be processed in real-time with reasonable computation cost, so that real-time source manipulation and remixing can follow. Such ambitious goal is out of reach for current Blind (or semi-blind) Source Separa- tion (BSS) techniques and related Independent Compo- nents Analysis (ICA), i.e. separation techniques that only proceed from (determined and overdetermined) mixture signals with very few information about the sources and mixture process characteristics . It is even still a quite difficult task for informed sourceseparation (ISS) methods, where additionnal prior “external” informa- tion about the sources is provided, for example with midi/score files  or melody humming .
Keywords Sourceseparation · Bayesian approach · Nonlinear mixtures · Truncated priors · Scanned images
The problem of BlindSourceSeparation (BSS) concerns the recovery of a set of sig- nals (sources) based on mixed versions of these original signals. The term “blind” is employed because separation is conducted by using a minimum amount of information about the sources and about the underlying mixing process. This problem has been attracting a great deal attention of the signal processing community, among others, for about two decades; one can find applications in a great diversity of domains, from astronomical imaging to biosignal processing [12,13].
The algorithm factorizes a sparse nonnegative tensor comprising the audio spectrogram and local frequency- slope-to-frequency ratios, which are estimated at each time-frequency bin using the Distributed Derivative Method. The use of local frequency modulations as separation cues is motivated by the principle of com- mon fate partial grouping from Auditory Scene Analysis, which hypothesizes that each latent source in a mixture is characterized perceptually by coherent frequency and amplitude modulations shared by its component partials. We derive multiplicative factor updates by Minorization- Maximization, which guarantees convergence to a local optimum by iteration. We then compare our method to the baseline on two separation tasks: one considers synthetic vibrato notes, while the other considers vibrato string in- strument recordings.
Pascal Chevalier received the Master Science de- gree from Ecole Nationale Sup´erieure des Tech- niques Avanc´ees (ENSTA) and the Ph.D. degree from South-Paris University, France, in 1985 and 1991 respectively. Since 1991 he has shared indus- trial activities (studies, experimentations, expertises, management), teaching activities both in French engineer schools (ESE, ENST, ENSTA) and French Universities (Cergy-Pontoise) and research activities. Since 2000, he has also been acting as Technical Manager and Architect of the array processing sub- system as part of a national program of military satellite telecommunications. His present research interests are in array processing techniques, either blind or informed, second order or higher order, spatial-or spatio-temporal, Time- Invariant or Time-Varying especially for cyclostationary signals, linear or nonlinear and particularly widely linear for non circular signals, for applica- tions such as TDMA and CDMA radiocommunications networks, satellite telecommunications, spectrum monitoring and HF/VUHF passive listening. Dr Chevalier has been a member of the THOMSON-CSF Technical and Scientifical Council. He is author or co-author of about 100 papers (Journal, Conferences, Patents and Chapters of books). Dr. Chevalier is presently an EURASIP member and a senior member of the Societ´e des Electriciens et des Electroniciens (SEE).
In this chapter 5 , we have shown that ICA-based methods for adaptive subtraction
are some kind of matching ﬁlter, with diﬀerent windowing strategies. In particular, Infomax, negentropy maximization and hybrid ℓ1/ℓ2-norm based matching ﬁlters share strong similarities. All these techniques aim at minimizing the cross-correlation between the predicted multiples and the estimated primaries enhanced by a chosen operator. It is this operator that links all the analyzed ﬁltering techniques. As correlation is a particular case of linear statistical dependence, the primary and multiple of a crossing event are statistically dependent. Then, forcing their statistical independence does not lead to a better solution. However, the windowing strategy, increasing the statistical diversity around the crossing event by the use of adjacent traces, is decisive as it actually allows to model primaries and multiples as independent
Although it enables basic but efficient left/right spatializa- tion of the sources, the LIS mixture is generally an over- simplistic process when professional/commercial music pro- duction is at stake. Moreover, the corresponding sparseness- based separation process has its own limitations: for example, it cannot process two sources located at the same position, since the corresponding submatrix is not invertible. Future works will consider those limitations and deal with going towards more realistic/professional mixtures, involving con- volutive filtering (e.g. reverberation) and "true stereo" source signals (e.g. 2-channel synthesizers). A future extension of this work will be the combination of the present 2-channel sparse approach with the source coding ISS approach of . For exemple, within a mixture of, say, 6 sources, 2 of them could be extracted by the coding approach, and the 4 remaining sources could be estimated by the present sparse method after subtraction of the first 2 decoded sources to the mixture. A reduction of remaining artifacts is expected. The separation of convolutive and true stereo sources will be considered in such extended framework.
(d) O 1 Φ T O , second data set.
Figure 4: Scatter plots of two expansion coefficients of the sources (left), and illustrations of the expansion coefficients of the outliers (right). The top row corresponds to the first data-set, with exactly sparse coefficients, and the second row, to the second data-set, with compressible signals.