SOURCE SEPARATION ALGORITHM 67 3 Joint Model

Frame Based Source Separation

4.8. SOURCE SEPARATION ALGORITHM 67 3 Joint Model

As mentioned previously we consider the substutition of the sources by their windowed version. Also to write the model consider now also the following notation:

W = ⊕^K_k=1W = I_K⊗W (4.58)

I = [I_N. . . I_N] = 1^T_k ⊗I_N (4.59) A = ⊕^K_k=1A_k = blockdiag{A1, . . . ,A_K} (4.60) B = ⊕^K_k=1B_k = blockdiag{B1, . . . ,B_K} (4.61)

Λ = ⊕^K_k=1λ_kI_N = Λ⊗I_N (4.62)

Λ = diag{λ₁, . . . , λ_k} (4.63)

x^′ = W x (4.64)

Λ^′ = W⁻¹ΛW⁻¹ = Λ⊗W⁻² (4.65)

e = [e^T₁ · · ·e^T_K]^T. (4.66)

where⊕ and⊗are the Kronecker sum and product respectively. With this notation, the signal model can be written as

Wy = I x^′ + W v (4.67)

A B x^′ = W e. (4.68)

with circulantA_k,B_k.

4.8.4 Estimating the Sources

Following the VB approach defined in section 4.2.2, the prior probability distributions for the various parameters are chosen as follows: Let ψ be any of the groups {xk, k = 1, . . . , K},a_k,ϕ_k\λ_k. Then for any such subset of parametersψ and for the λ_k,λ_v, the priors are chosen as

p(ψ) = N(m_ψ, C_ψ) (4.69)

p(λv) = Exponential(mλv) (4.70) p(λ_k) = Exponential(m_λ_k). (4.71) Where the λare the precision (inverse variance used in VB terminology).

With this choice of prior distributions, the posterior distributions obtained by VB will be of the same nature (Gaussian or Exponential). However, in this chapter we shall consider a further simplification. We consider that the parameters are deterministic. We get the Gaussian

−2 ln p(y,x|θ) = λv(Wy−I x^′)^TW⁻²(Wy−I x^′) +x^′^T

B^TA^TΛ^′A B x^′

= c+ (x^′−mx^′)^TC_x⁻¹′x^′(x^′−mx^′). (4.72) Averaging this over the parametersθnow simply means evaluating at the latest estimates of these parameters, since they are considered deterministic in the simplification. We get

68 CHAPTER 4. FRAME BASED SOURCE SEPARATION from (4.72), after introducing the auxiliary quantities

C₁ = B^TA^TΛ^′A B

To implement this in the frequency domain, consider the diagonal ˘A_k = FA_kF⁻¹ etc.

The only non-diagonal matrix is ˘W2 = FW²F⁻¹ which, due to careful window design, can be approximated by a banded matrix as discussed earlier. As a result, ˘C⁻¹₁ and C˘₂ are equally banded matrices. Now consider the (Lower triangular, Diagonal, Upper Triangular) LDU factorization (as shown in Figure 4.5):

FDF⁻¹=F where the unit diagonal lower triangular Lis banded.

The steps to compute m^x^′ in the frequency domain are now:

• y˘ = F W y

• solve ˘u from L˘u= ˘y by backsubstitution

• solve ˘z from L^H˘z=D⁻¹u˘ by backsubstitution

• m^x^′_k = _λ¹

k F⁻¹ B˘⁻¹_k A˘⁻¹_k W˘2A˘^−H_k B˘^−H_k ˘z, each time multiplying a vector with a matrix and ending with IDFT and scaling.

In practice all operations with the Discrete Fourier Transform (DFT) matrix F are done by using the Fast Fourier Transform algorithm (FFT). As ˘Bk = diag{b˘k} and ˘Ak =

4.9. SUMMARY AND DISCUSSION 69 4.8.4.1 Summary of the algorithm

The algorithm can be summarized as follows. First of all, the windowwis designed and its weights are used to constructW and the Discrete Fourier Transform (DFT) of its square W˘2. A segment of the observation y is extracted, windowed and its DFT is computed.

The diagonal elements of the DFT of the circulant matrices are computed and used in (4.76) for all the sources. A LDU decomposition is done on the sum defined in (4.75) then the procedure described before is applied. The LDU Decomposition is done by using the Cholesky Decomposition. The algorithm extracts windowed source so for the overlap and add method they don’t need to be windowed.

4.9 Summary and discussion

This Chapter deals with three algorithms. Two of them focus on parameter estimation while the third one is a purely source separation algorithm. The estimation of the pa-rameters use the Itakura Saito (IS) Distance, which is known to give good perceptual results. All the algorithms of this Chapter are frame based and most calculations are done in spectral domain. This implies to use a well designed analysis window and leads, also, to analyze overlapped segment. Section 4.8.2 deals with an approximation of the window in the frequency domain to simplify the source separation algorithm. The ap-proximation consists on keeping only the main lobe in the Spectrum of the window. Also the original model presented in Chapter 2 is slightly modified, we approximate the filter operation (convolution) with a circular convolution using circulant matrices. These two approximations lead to simplification in the use of the Wiener filter. The two parameter estimation algorithms are both based on the Itakura Saito Distance. The first one, pre-sented in section 4.5, is a naive interpretation of this distance and leads to an iterative algorithm able to estimate all the parameters. The second one, in section 4.6, is based on the minimization of the IS distance and gives better result for synthetic Spectra but the Long Term part is not yet finished. The overall algorithm will consists on using one of the two algorithms to estimate the parameters, on a frame of the observation, and to use the estimated parameters in the source separation algorithm. The algorithm which use the Naive interpretation of the IS distance for the parameter estimation is called Naive-IS while the one which minimizes the IS distance is calledTMin-IS.

The next chapter is dedicated to real speech separation, the adaptive algorithm of sec-tion 3 will be compared to the two possible algorithms presented in the present secsec-tion. An initialization procedure will also be presented as well as a background sound extraction.

The results, even if the simulations are not performed in the same context, are compared to some existing solutions.

70 CHAPTER 4. FRAME BASED SOURCE SEPARATION

Chapter 5

Simulations

The purpose of this chapter is to show some simulation results on real signals. The consid-ered algorithms used here were presented in previous chapters. Two versions of the adaptive algorithm, introduced in section 3, are compared to the frame-based algorithms presented in section 4. The EM-Kalman based algorithms are called Joint-EMK and Alt-EMK for joint and alternate parameter estimation respectively. The Frame based algorithms are called Naive-ISandTMin-IS. They are used for the Naive interpretation of the Itakura Saito (IS) distance and its Minimization.

This Chapter is organized as follows: First of all, some details about the real signals used in this chapter will be given in section 5.1. In section 5.2, the focus will be on short duration speech segments in which the periods are constant. Then in section 5.3, long time duration speech will be at the centre of the simulation. Throughout section 5.4, the perfor-mance of the presented algorithm compared to some existing algorithms will be discussed.

In section 5.5, the adaptive algorithm is applied on a particular application related to the 2010 Football World Cup. Finally, section 5.6 will be the conclusion of this chapter.

72 CHAPTER 5. SIMULATIONS

Dans le document The DART-Europe E-theses Portal (Page 126-131)