I. I NTRODUCTION
**Blind** **Source** **Separation** is a major tool to learn meaningful decompositions of multivalued data [1], [2]. Most of the work has been dedicated to **linear** BSS: m observations are **linear** combinations of n sources of t samples. In matrix form, X = AS + N with X (size m × t) the observation matrix corrupted with some unknown noise N, S (n × t) the sources and A (m × n – here, n ≤ m) the mixing matrix. The goal of **linear** BSS is to recover A and S from X up to a permutation and scaling indeterminacy. While this is ill-posed, the sparsity prior [3] – assuming the sources to have many zero coefficients – has been shown to lead to high **separation** quality [4], [5]. Less work has however been done on **non**-**linear** BSS, where: X = f (S) + N (1) With f an unknown **non**-**linear** function from R n×t to R m×t . Here, we will consider general functions f , by mostly assuming that f is in- vertible and symmetrical around the origin, as well as regular enough (i.e. L-Lipschitz with L small and not deviating from a **linear** mixing too fast as a function of the **source** amplitude). Despite increased indeterminacies than in the **linear** case, [6] claimed the possibility to recover **sparse** sources up to a **non**-**linear** function h under some conditions. Our approach is fully different from manifold clustering ones [7], [6], and also differs from neural network ones [8] as it brings a geometrical interpretation (and uses the mixing regularity) and an automatic hyperparameter choice (potentially enabling an increased robustness and building on the **linear** BSS litterature [4]).

En savoir plus
Abstract
**Sparse** **Blind** **Source** **Separation** (sBSS) plays a key role in scientific do- mains as different as biomedical imaging, remote sensing or astrophysics, which require the development of increasingly faster and scalable BSS meth- ods without sacrificing the **separation** performances. To that end, a new distributed **sparse** BSS algorithm is introduced based on a mini-batch ex- tension of the Generalized Morphological Component Analysis algorithm (GMCA). Precisely, it combines a robust projected alternate least-squares method with mini-batches optimization. The originality further lies in the use of a manifold-based aggregation of asynchronously estimated mixing ma- trices. Numerical experiments are carried out on realistic spectroscopic spec- tra, and highlight the ability of the proposed distributed GMCA (dGMCA) to provide very good **separation** results even when very small mini-batches are used. Quite unexpectedly, it can further outperform the (**non**-distributed) state-of-the-art methods for highly **sparse** sources.

En savoir plus
I. I NTRODUCTION
A. **Blind** **Source** **Separation** problem
In the BSS [1] framework, the data are composed of m observations, each of which has t samples. These observations are supposed to be some **linear** combinations of n sources. The objective of BSS is to retrieve the sources as well as the mixing coefficients. In matrix form, the goal is therefore to find two matrices S (of size n × t) and A (of size m×n), called respectively the **source** and the mixing matrices, such that: X = AS + N, where X (of size m × t) is the observation matrix that is corrupted with some unkwown noise N. Since it requires tackling an ill-posed unsupervised matrix factorization problem, further assumptions are needed, including the statistical independance of the sources (ICA - [1]), the **non**-negativity of A and S [2]. In this work, we will focus on the sparsity of the sources [3]–[6]. In this framework, **sparse** BSS will aim at finding a (local) minimum of:

En savoir plus
Abstract—We consider the problem of **blind** **source** separa- tion for underdetermined convolutive mixtures. Based on the multiplicative narrowband approximation in the time-frequency domain with the help of Short-Time-Fourier-Transform (STFT) and the **sparse** representation of the **source** signals, we formulate the **separation** problem into an optimization framework. This framework is then generalized based on the recently investigated convolutive narrowband approximation and the statistics of the room impulse response. Algorithms with convergence proof are then employed to solve the proposed optimization problems. The evaluation of the proposed frameworks and algorithms for synthesized and live recorded mixtures are illustrated. The proposed approaches are also tested for mixtures with input noise. Numerical evaluations show the advantages of the proposed methods.

En savoir plus
A natural question is then the identifiability issue : using the previous priors on the sources, can we really hope to recover them ? Starting with the ICA methods, such questions have been well studied in the past and it has been for instance shown that the independence of the sources – in the absence of noise and Gaussian sources – en- ables to recover them up to a mere scaling and permutation indeterminacy (Darmois theorem [Darmois 1953]). Concerning the sparsity prior, conditions for recovery up to the same indeterminacies are studied in [Gribonval & Schnass 2010, Gribonval et al. 2015]. In the last work, the authors furthermore studied the **non**-asymptotical noisy setting in the presence of outliers, with potentially over-complete dictionaries (i.e. mixing matrices in our setting). In brief, they show that A ∗ can be recovered using the cost function II.8 , which is shown to have a local minimum around A ∗ with high probability. Among others, some of the required hypotheses are that S ∗ must be **sparse** enough (and follow other assumptions), A ∗ must be sufficiently inco- herent (depending on S ∗ sparsity level), the noise level as well as the outlier energy are limited, and the number of **non**-outlier samples large enough (there are also assumptions on the regularization parameters). Note that these conditions might however be slightly restrictive in realistic experiments, and might not thus be al- ways respected during this work.

En savoir plus
214 En savoir plus

2007 ), it is very rare that **sparse** sources both have **non**-zeros values at the same time. Therefore, when plotting the scatter plot of S 1 as a function of S 2 (cf. Fig. 1(a) ), most of the
**source** coefficients lie on the axes (in this work we even assume that all coefficients lie on the axes – this hypothesis is discussed in Sec. 4 ). Once mixed with the **non**-**linear** f , the **source** coefficients lying on the axes are transformed into n **non**-**linear** one dimensional (1D) manifolds ( Ehsandoust et al. , 2016 ; Puigt et al. , 2012 ), each manifold corresponding to one **source** (see Fig 1(b) ). To separate the sources, the idea is then to back-project each manifold on one of the axes. We propose to perform this back-projection by approximating the 1D-manifolds by a **linear**-by-part function, that we will invert. As evoked above, we then get separated sources which are only distorted through **non**-**linear** functions that do not remix them, called h in the following.

En savoir plus
V. C ONCLUSION
This paper introduced a multichannel audio **source** separa- tion method based on exact convolutive mixture modeling and Student’s t **source** modeling. Within this framework we com- pared a **sparse** and an NMF-based **source** model. The semi- **blind** experimental evaluation demonstrated the importance of modeling the spectro-temporal characteristics of the sources instead of only assuming sparsity. Future work will focus on developing a fully **blind** **source** **separation** method that exploits priors on the impulse response of the mixing filters. We could for example consider similar priors as in [29] for promoting sparsity and exponentially decaying envelop.

En savoir plus
Index Terms— **Blind** **Source** **Separation**; Sparsity; Inde- pendant Component Analysis; Optimization
1. INTRODUCTION
The instantaneous **linear** mixture model of BSS assumes that: x = As + e , (1) where x ∈ R M ×T and s ∈ R N ×T are the matrices of mixture channels and **source** signals respectively. A ∈ R M ×N is the mixing matrix and e ∈ R M ×T models the background noise. The ICA [1] methods are often applied when M ≥ N (over-determined case). These methods try to achieve sep- aration by minimizing an independence criterion between the components of the estimated sources. In the under- determined case (M < N ), two-steps methods based on sparsity are largely used [2]: The mixing system is first es- timated using clustering methods [3], then the sources are estimated thanks to optimization approaches [4].

En savoir plus
[9] Belouchrani, A., Abed-Meraim, K., Cardoso, J.F., Molines, E., A BSS technique using 2nd-order statistics, IEEE Trans. on Signal Processing 45, 434–44, 1997. [10] James, G.H., Carne, T.G., Lauffer, J.P., The natural excitation technique for modal
parameter extraction from operating wind turbines. SAND92-1666, UC-261, 1993. [11] Van Overschee, P., De Moor, B., Subspace Identification for **Linear** Systems:

Fig. 1. **Source** **separation** performance over stereo instantaneous musical mixtures. STFT window length is 2048 (93 ms) and K = 8.
6 Conclusion
In this paper, we proposed a new framework for the **blind** audio **source** sepa- ration problem in the multichannel instantaneous mixture case. In this frame- work Spectral-GMM models of sources were blindly learned, i.e. without using any other informations than the mixture and the mixing matrix, with an EM algorithm having a **linear** O(N K) complexity, in contrast to some related state- of-the-art methods having an exponential O(K N ) complexity. As opposed to the

En savoir plus
2.1. Rough estimation of the mixing matrix and **source** parameters 2.1.1. Tensor construction and CP model
In the style of [15], we exploit the event-related nature of the signals of interest to con- struct a data tensor with dimensions space, event-synchronized window, and time from the 2-dimensional measurements for each of Q event-related sources. To this end, for the q- th **source**, we identify L q event-synchronized windows of length T q of the corresponding time signal. This can, for instance, be achieved based on a characteristic pattern within each event- synchronized window that can be recognized in the measurements. This pattern also serves as a reference point to synchronize the signals of different patterns of the event-related **source**, such as the maximum amplitude in the case of impulsive signals. For each of the L q event- synchronized windows, one can thus extract a M × T q data matrix from the measurements. These matrices are then **stacked** along the second dimension of the tensor Y (q) ∈ R M ×L q × T q (see Figure 1).

En savoir plus
III. M ETHODS A. Independent component analysis
The concept of ICA was introduced by Herault and Jut- ten [12], especially in order to solve the BSS problem. In the mid 90s, Comon presented a mathematical formulation of ICA [8]. During the past 25 years a wealth of algorithms have been proposed [3] and ICA-based methods have now been extensively and successfully applied to solve many prac- tical real-life problems (see [3] for details). More precisely, assuming the **linear** observation model of equation (1), ICA consists in looking for an overdetermined (N × P) mixing matrix A (i.e. P is smaller than or equal to N) and a P- dimensional sources s[k] which components are the most sta- tistically independent as possible. In other words, the goal of ICA is to find a (P × N), full rank, separator matrix, W , such that the output signal y[k] =W x[k] is an estimate of the **source** vector s[k]. Our previous works [7] showed that, compared

En savoir plus
yi−ei
di . (4)
If the slope values were exact (and the mixture were perfectly modeled by (2), which is not the case), it can be shown that x = As, which is a **linear** mixture of the two sources which can then be solved by classical **linear** BSS algorithms. This results in the classical PNL approach in which the difficult part is to estimate the parameters that correctly cancel the nonlinearity - in this case the slopes d. However, in our previous work we had simply used the theoretical slope value, and together with the fact that (2) only approximately models the mixture, we can only claim that the modified set x is an approximately **linear** mixture of the sources. The idea, then, was to represent x as a second degree polynomial, in which the quadratic terms can correct small nonlinearities in the modified mixture. The mixture was represented then as

En savoir plus
listeners would entertain the active listening experience if they could access to it from usual stereo format. This is the goal of the DReaM project. 1 This project aims
at developping **source** **separation** methods that enable the efficient **separation** of many different sound sources from only 2-channel mixture signals. Hence we enter the so- called underdetermined difficult (because **non**-invertible) configuration of the **source** **separation** problem. More- over, the mixture signals should be realistic music pieces, approaching professional music production, and the sep- aration should be processed in real-time with reasonable computation cost, so that real-time **source** manipulation and remixing can follow. Such ambitious goal is out of reach for current **Blind** (or semi-**blind**) **Source** Separa- tion (BSS) techniques and related Independent Compo- nents Analysis (ICA), i.e. **separation** techniques that only proceed from (determined and overdetermined) mixture signals with very few information about the sources and mixture process characteristics [1][2][3]. It is even still a quite difficult task for informed **source** **separation** (ISS) methods, where additionnal prior “external” informa- tion about the sources is provided, for example with midi/score files [4][5] or melody humming [6].

En savoir plus
Keywords **Source** **separation** · Bayesian approach · Nonlinear mixtures · Truncated priors · Scanned images
1 Introduction
The problem of **Blind** **Source** **Separation** (BSS) concerns the recovery of a set of sig- nals (sources) based on mixed versions of these original signals. The term “**blind**” is employed because **separation** is conducted by using a minimum amount of information about the sources and about the underlying mixing process. This problem has been attracting a great deal attention of the signal processing community, among others, for about two decades; one can find applications in a great diversity of domains, from astronomical imaging to biosignal processing [12,13].

En savoir plus
The algorithm factorizes a **sparse** nonnegative tensor comprising the audio spectrogram and local frequency- slope-to-frequency ratios, which are estimated at each time-frequency bin using the Distributed Derivative Method. The use of local frequency modulations as **separation** cues is motivated by the principle of com- mon fate partial grouping from Auditory Scene Analysis, which hypothesizes that each latent **source** in a mixture is characterized perceptually by coherent frequency and amplitude modulations shared by its component partials. We derive multiplicative factor updates by Minorization- Maximization, which guarantees convergence to a local optimum by iteration. We then compare our method to the baseline on two **separation** tasks: one considers synthetic vibrato notes, while the other considers vibrato string in- strument recordings.

En savoir plus
Pascal Chevalier received the Master Science de- gree from Ecole Nationale Sup´erieure des Tech- niques Avanc´ees (ENSTA) and the Ph.D. degree from South-Paris University, France, in 1985 and 1991 respectively. Since 1991 he has shared indus- trial activities (studies, experimentations, expertises, management), teaching activities both in French engineer schools (ESE, ENST, ENSTA) and French Universities (Cergy-Pontoise) and research activities. Since 2000, he has also been acting as Technical Manager and Architect of the array processing sub- system as part of a national program of military satellite telecommunications. His present research interests are in array processing techniques, either **blind** or informed, second order or higher order, spatial-or spatio-temporal, Time- Invariant or Time-Varying especially for cyclostationary signals, **linear** or **non** **linear** and particularly widely **linear** for **non** circular signals, for applica- tions such as TDMA and CDMA radiocommunications networks, satellite telecommunications, spectrum monitoring and HF/VUHF passive listening. Dr Chevalier has been a member of the THOMSON-CSF Technical and Scientifical Council. He is author or co-author of about 100 papers (Journal, Conferences, Patents and Chapters of books). Dr. Chevalier is presently an EURASIP member and a senior member of the Societ´e des Electriciens et des Electroniciens (SEE).

En savoir plus
5.8 Conclusion
In this chapter 5 , we have shown that ICA-based methods for adaptive subtraction
are some kind of matching ﬁlter, with diﬀerent windowing strategies. In particular, Infomax, negentropy maximization and hybrid ℓ1/ℓ2-norm based matching ﬁlters share strong similarities. All these techniques aim at minimizing the cross-correlation between the predicted multiples and the estimated primaries enhanced by a chosen operator. It is this operator that links all the analyzed ﬁltering techniques. As correlation is a particular case of **linear** statistical dependence, the primary and multiple of a crossing event are statistically dependent. Then, forcing their statistical independence does not lead to a better solution. However, the windowing strategy, increasing the statistical diversity around the crossing event by the use of adjacent traces, is decisive as it actually allows to model primaries and multiples as independent

En savoir plus
189 En savoir plus

Although it enables basic but efficient left/right spatializa- tion of the sources, the LIS mixture is generally an over- simplistic process when professional/commercial music pro- duction is at stake. Moreover, the corresponding sparseness- based **separation** process has its own limitations: for example, it cannot process two sources located at the same position, since the corresponding submatrix is not invertible. Future works will consider those limitations and deal with going towards more realistic/professional mixtures, involving con- volutive filtering (e.g. reverberation) and "true stereo" **source** signals (e.g. 2-channel synthesizers). A future extension of this work will be the combination of the present 2-channel **sparse** approach with the **source** coding ISS approach of [17]. For exemple, within a mixture of, say, 6 sources, 2 of them could be extracted by the coding approach, and the 4 remaining sources could be estimated by the present **sparse** method after subtraction of the first 2 decoded sources to the mixture. A reduction of remaining artifacts is expected. The **separation** of convolutive and true stereo sources will be considered in such extended framework.

En savoir plus
O 1
T O
(d) O 1 Φ T O , second data set.
Figure 4: Scatter plots of two expansion coefficients of the sources (left), and illustrations of the expansion coefficients of the outliers (right). The top row corresponds to the first data-set, with exactly **sparse** coefficients, and the second row, to the second data-set, with compressible signals.