SECOND ORDER STATISTICS BLIND IDENTIFICATION BASED ON EVD AND GEVD

Blind Decorrelation and Second Order Statistics for

4.2 SECOND ORDER STATISTICS BLIND IDENTIFICATION BASED ON EVD AND GEVD

p 1

(λ1−σˆ²_ν), . . . , 1 p(λn−σˆ_ν²)

)

V^T_S (4.66)

and the prewhitened sensor vector: y=W x.

Remark 4.3 It should be noted that for noisy data (x(k) = H s(k) +ν(k)) the above described whitening transform (y(k) = W x(k) = W H s(k) +Wν(k)) can amplify the noise rather than suppressing it, especially when m = n and/or the mixing matrix H is ill-conditioned. For the ill-conditioned Hthe some eigenvaluesλ_n, λ_n−1, . . .are very small.

The enhancement of noise will be different in different channels depending on distribution of the eigenvalues. In such cases, to alleviate the problem, we can apply the regularization approach discussed in Chapter2, by using instead of (4.66) the following formula form≥n:

W= diag

(s λ1

λ²₁+ ˆσ²_ν, . . . , s

λn

λ²_n+ ˆσ_ν² )

V^T_S, (4.67)

where ˆσ_ν² is estimated variance of the noise.

For Gaussian noise instead of the standard covariance matrixRxx, we can employ fourth-order matrix cumulants which are insensitive to an arbitrary Gaussian noise [889].

4.2 SECOND ORDER STATISTICS BLIND IDENTIFICATION BASED ON EVD AND GEVD

4.2.1 Mixing Model

In this section we will discuss the basic methods that jointly exploit the second order statistics (correlation matrices for different time delays) and temporal structure of sources.

We show how the problem of blind identification of mixing matrix can be converted to standard eigenvalue decomposition (EVD), generalized eigenvalue decomposition (GEVD) and simultaneous diagonalization (SD) problems.

We consider the case where sources may have arbitrary distributions but non-vanishing temporal correlations. More precisely, let us consider the simple mixing model where the

3Such operation is called sometimes “quasi-whitening”, because it performs whitening not on the basis of noisy sensor signals but rather on the estimated noise free data.

m-dimensional observation (sensor) vectorx(k)∈IR^m is assumed to be generated by x(k) =H s(k) +ν(k), (4.68) whereH∈IR^m×n is an unknown full column rank mixing matrix,s(k) is then-dimensional source vector (which is also unknown and m ≥ n), and ν(k) is the additive noise vector that is assumed to be statistically independent ofs(k).

The task of blind identification or equivalently blind source separation (BSS) is to esti-mate the mixing matrixH or its pseudo inverse separating (unmixing) matrixW=H⁺ in order to estimate original source signalss(k), given only a finite number of observation data {x(k)}, k= 1, . . . , N. Recall that two indeterminacies cannot be resolved in BSS without somea prioriknowledge: Scaling and permutation ambiguities. Thus, if the estimate of the mixing matrix,Hb satisfiesG=W H=Hb⁺H=P D, whereGis the global transformation which combines the mixing and separating system, Pis some permutation matrix and D is some nonsingular scaling diagonal matrix, then (H,b bs) and (H, s) are said to be related by a waveform-preserving relation [1160]. A key factor in the BSS is the assumption about statistical properties of the sources like statistical independence among sources. That is the reason why the BSS is often confused with the independent component analysis (ICA). In this chapter, we exploit some weaker conditions for separation of sources assuming that they have temporal structures with different autocorrelation functions or equivalently different power spectra and/or they are nonstationary with time varying variances. Methods that exploit either the temporal structure of sources (mainly the second-order correlations) or the nonstationarity of sources, lead to the second-order statistics (SOS) based BSS meth-ods. In contrast to the higher-order statistics (HOS) based BSS methods, all the SOS based methods do not have to infer the probability distributions of sources or nonlinear activation functions [234, 236].

In this and the next section, we describe several batch methods that exploit the spatio-temporal decorrelation to estimate (or identify) the mixing matrix in the presence of spa-tially correlated but temporally white noise (which is not necessarily Gaussian). Moreover, we show that for a suitable set of time-delayed correlations of the observation data, we can find a robust (with respect of additive noise) estimate of the separating matrixH. Through-out this and next sections the following assumptions are made unless other is stated:

(AS1) The mixing matrixH is of full column rank.

(AS2) Sources are spatially uncorrelated with different autocorrelation functions but are temporally correlated (colored) stochastic signals with zero-mean.

(AS3) Sources are stationary signals and/or second-order nonstationary signals in the sense that their variances are time varying.

(AS4) Additive noises{νi(k)}are independent of source signals and they can be spatially correlated but temporally white, i.e.,

E{ν(k)ν^T(k−p)}=δp0Rν(p), (4.69)

whereδp0is the Kronecker symbol andRν is an arbitrarym×mmatrix.

SOS BLIND IDENTIFICATION BASED ON EVD 143

4.2.2 Basic Principles: Simultaneous Diagonalization and Eigenvalue Decomposition Taking into account the above assumptions, it is straightforward to check that the correla-tion matrices of the vector x(k) of sensor signals satisfy

Rx(0) = E{x(k)x^T(k)}=H Rs(0)H^T+Rν(0), (4.70) Rx(p) = E{x(k)x^T(k−p)}=H Rs(p)H^T, (4.71) for some non-zero time lag p. It follows from the assumption (AS2) that both Rs(0) = E{s(k)s^T(k)} andRs(p) =E{s(k)s^T(k−p)}are non-zero distinct diagonal matrices.

In the case of overdetermined mixtures (more sensors than sources) when the covariance matrix of the noise has the special formRνν =Rν(0) =E{ν(k)ν^T(k)}=σ_ν²Im, the noise varianceσ²_ν can be estimated for relatively high SNR (signal to noise ratio) from the least singular value ofRx(0) (or the average of minor (m−n) singular values ofRx(0)) and the unbiased covariance matrixRx(0) can be estimated as

R¯x(0) =Rx(0)−σ²_νIm=H Rs(0)H^T. (4.72) In order to estimate the mixing matrix H up to its re-scaled and permuted version, we can perform simultaneous diagonalization of two covariance matrices: Rbx(0) and Rbx(p), according to (4.71) and (4.72).

For the sake of simplicity, the simultaneous diagonalization will be explained first, in the case when the number of sensor is equal to the number of sources (m = n). It⁴ can be performed in two steps: orthogonalization followed by an unitary transformation as shown below

(1) First, the covariance matrix Rb¯x(0) = (1/N)P_N

k=1(x(k)x^T(k))−σˆ_ν²In is estimated and its EVD is performed asRb¯x(0) = VxΛxV_x^T. Then the standard whitening is realized by a linear transformation:

x(k) =Q x(k) =Λ⁻x¹²V^T_xx(k), (4.73) whereQ=Λ⁻x¹²V^T_x. Hence, we have

Rbx(0) = 1 N

k=1

x(k)x^T(k) =QRbx(0)Q^T =In, (4.74) Rbx(p) = 1

N XN

k=1

x(k)x^T(k−p) =QRbx(p)Q^T. (4.75)

(2) Second, an orthogonal transformation is applied to diagonalize the matrixRbx(p). The eigenvalue decomposition ofRbx(p) has the form

Rbx(p) =VxΛxV^T_x. (4.76)

4In the simultaneous diagonalization, the task is to diagonalize simultaneously only two matrices. In contrast in the joint diagonalization problem, we attempt diagonalize (approximately) arbitrary number of matrices.

Simultaneously, on the basis of (4.71) and (4.75), we obtain

Rb_x(p) =QRb_x(p)Q^T =Q HRb_s(p)H^TQ^T. (4.77) Hence, if the diagonal matrixΛx has distinct eigenvalues then the mixing matrix can be estimated uniquely (up to sign and permutation matrices) (see Theorem 4.1 as given below)

Hb =Q⁻¹Vx=VxΛ^1/2_x Vx. (4.78) The simultaneous diagonalization of two symmetric matrices can be carried out with-out going through the two-step procedure, by converting the problem to the generalized eigenvalue decomposition (GEVD) [1159,223]. In fact, the problem can be easily converted to the standard eigenvalue problem which can be formulated for the nonsingular mixing matrixHas (see Eqs. (4.71)-(4.72)):

Rb⁻¹_x (0)Rbx(p) = (H^T)⁻¹R⁻¹_s (0)Rs(p)H^T =VΛV⁻¹, (4.79) or equivalently the generalized eigenvalue problem:

Rbx(p)V=Rbx(0)VΛ (4.80)

on the condition thatΛ=R⁻¹_s (0)Rs(p) has distinct eigenvalues. Then, the mixing matrix H can be estimated on the basis of eigenvectors of the GEVD (4.80) as

Hb = (V^T)⁻¹=V^−T, (4.81)

up to arbitrary scaling and permutation of columns.

These basic results can be explained and summarized by the following Theorem [1159, 223].

Theorem 4.1 Let Λ1, Λ2, D1, D2 ∈IR^n×n be diagonal matrices with non-zero diagonal entries and additionally matrices Λ₁ andD₁ are positive definite. Suppose thatG∈IR^n×n satisfies the following decompositions:

D1 = G Λ1G^T, (4.82)

D2 = G Λ2G^T. (4.83)

Then the matrix G is the generalized permutation matrix⁵ if D⁻¹₁ D2 and Λ⁻¹₁ Λ2 have distinct diagonal entries.

Proof. From (4.82), there exists an orthogonal matrixUsuch that

³ G Λ₁¹²

³ D₁¹²

U. (4.84)

5The generalized permutation matrix is defined asG=PD, wherePis a standard permutation matrix andDis any nonsingular diagonal matrix.

SOS BLIND IDENTIFICATION BASED ON EVD 145

Hence,

G=D₁¹²UΛ⁻₁¹². (4.85)

Substitute (4.85) into (4.83) to obtain

D⁻¹₁ D2=UΛ⁻¹₁ Λ2U^T. (4.86)

Since the right-hand side of (4.86) is the eigen-decomposition of the matrix D⁻¹₁ D2, the diagonal elements of D⁻¹₁ D2 and Λ⁻¹₁ Λ2 are the same. From the assumption that the diagonal elements of D⁻¹₁ D2 are distinct, the orthogonal matrix U must have the form U =PSg, where P is a permutation matrix and Sg is a diagonal matrix whose diagonal elements are either +1 or−1. Hence, we have

G = D₁¹²PSgΛ⁻₁¹²

= PP^TD₁¹²PSgΛ⁻₁¹²

= PD0, (4.87)

where D0is a diagonal matrix expressed as

D0=P^TD₁¹²PSgΛ⁻₁¹². (4.88) Remark 4.4 For successful source separation, we may choose any time delay pfor which Rb⁻¹_x (0)Rbx(p)has non-zero distinct eigenvalues. We have found, by extensive experiments, that for typical real world signals a good choice is usuallyp= 1. It is also possible to choose a linear combination P

iαiRˆx(i)instead ofRˆx(p).

It is also important to note, that instead of using the generalized eigenvalue decomposi-tion, we can use the standard eigenvalue decomposition (EVD) or equivalently the singular value decomposition (SVD) in a two stage procedure described in detail below [1159].

Algorithm Outline: Two-stage EVD/SVD for more sensors than sources

1. Estimate the correlation matrix of sensor signals as Rbx(0) = 1

N XN

k=1

x(k)x^T(k). (4.89)

2. Compute the EVD (or equivalently SVD) ofRbx(0) as (see section4.1.8) Rbx(0) = UxΣxV^T_x =VxΛxV_x^T

= VSΛSV_S^T+VNΛNV_N^T, (4.90) whereVS = [v1,v2, . . . ,vn]∈IR^m×ncontains the eigenvectors associated withn prin-cipal eigenvalues ofΛS = diag{λ1≥λ2· · · ≥λn}in descending order. Similarly, ma-trixVN ∈IR^m×(m−n)contains the (m−n) noise eigenvectors that correspond to noise

eigenvaluesΛN = diag{λn+1 ≥ · · · ≥λm}, withλn > λn+1. It should be noted that the eigenvalues have usually typical relationshipλ1≥λ2≥ · · ·λn> λn+1≈ · · · ≈λm, m > n. This means that the last (m−n) non-significant (minor) eigenvalues corre-spond to noise subspace and the first significant (principal) eigenvalues correcorre-spond to signal plus noise subspace. Estimate the number of sourcesnfrom the number of most significant singular values.

3. Estimate also the variance σ_ν² of the white noise as the mean value of the (m−n) least significant eigen (or singular) values.

4. Perform a robust (with respect to the white noise) prewhitening transformation as x(k) =Λb^−1/2_S V^T_Sx(k) =Q x(k), (4.91) whereΛbS = diag{(λ1−σˆ²_ν), λ2−σˆ_ν²), . . . ,(λn−σˆ²_ν)}).

5. Estimate the covariance matrix of the vectorx(k) for specific time delayp6= 0 (typi-cally,p= 1 gives best results) and perform the SVD of the covariance matrix:

Rbx(p) = 1 N

k=1

x(k)x^T(k−p) =UxΣxV_x^T. (4.92)

6. Check whether for the specific time delaypall singular values of the diagonal matrix Σx are distinct. If not, repeat step 4 for a different time delayp.

If the singular values are distinct and sufficiently far away from each other then we can estimate successfully the mixing matrix as

Hb =Q⁺U_x=V_SΛb^1/2_S U_x (4.93) and if necessary noisy source signals⁶as

y(k) =bs(k) =U^T_xx(k) =U^T_xΛb^−1/2_S V^T_Sx(k). (4.94)

It should be noted that if both covariance matricesRbx(0) andRbx(p) are symmetric positive definite thenUx=VxandUx=Vx, respectively and the SVD and PCA/EVD techniques are equivalent.

The above procedure is a modified and optimized version of the algorithm called the AMUSE (Algorithm for Multiple Unknown Signals Extraction) [1159, 854]. Usually, for single sample time delay p= 1 the above algorithm successfully separates colored sources

6The estimated sources will be recovered without cross-talking due to unbiased estimation of the unmixing matrix. However, they will be corrupted by additive noise since the noise is projected from the sensor signals by the linear transformation (4.94). In order to remove noise, we need to apply methods described in Chapters 1 and 8.

SOS BLIND IDENTIFICATION BASED ON EVD 147 with different power spectra shapes. This means that in such a case the eigenvalues of the time-delayed covariance matrix are distinct. The main disadvantage of this algorithm is that its accuracy is strongly deteriorates in the presence of additive noise.

The AMUSE algorithm for BSS of colored sources can be naturally extended to the ICA of independent non Gaussian source signals if instead of the standard time-delayed covariance matrices Rx(p), we use the contracted quadricovariance matrices defined as

Cx(E) = Cx{x^T(k)E x(k)x(k)x^T(k)}

= E{x^T(k)E x(k)x(k)x^T(k)} −Rx(0)E Rx(0)

−tr(E Rx(0))Rx(0)−Rx(0)E^TRx(0), (4.95) where Rx(0) =E{x(k)x^T(k)} ⁷ andE∈IR^n×n is some freely chosen matrix called eigen-matrix (typically, E = In or E = eqe^T_q, where eq are vectors of some unitary matrix) [136,137].

It can be easily shown that such matrix has the following eigenvalue decomposition (EVD):

Cx(E) =U ΛEU^T, (4.96)

withΛE= diag{λ1u^T₁E u1, . . . , λnu^T_nE un},λi=κ4(si) =E{s⁴_i} −3E²{s²_i}is the kurtosis of zero-mean i-th source and ui is the i-th column of the orthogonal eigenvector matrix U. Hence, if the EVD ofCx(E) = U ΛEU^T =A Cb s(E)Ab^T is unique in the sense that all eigenvalues of ΛE are distinct, we can estimate the mixing matrix Ab =QH =U. In the special case for E=In these conditions are satisfied if the source signals have different values of kurtosis. The above procedure is called FOBI (Fourth-Order Blind Identification) [136,137,883,595].

Remark 4.5 The main advantage of the use of fourth order quadricovariance matrices is their theoretical insensitivity to an arbitrary Gaussian noise. Furthermore, the HOS-based techniques enable us to identify the mixing system when sources are i.i.d. and mutually independent. However, it should be emphasized that the standard time-delayed covariance matrices can be estimated accurately with far fewer data samples than their higher order counterparts. In such cases, when the number of available samples is relatively low, work-ing with SOS-based instead of HOS-based techniques is advantageous, especially in a time-varying environment.

The above algorithms based on the time-delayed covariance matrices and symmetric EVD/SVD and GEVD are probably the simplest batch algorithms for blind identification and blind separation of sources with temporal structure. However, their robustness with respect to noise and performance can be poor, especially when additive noise is large or we not able to estimate precisely the covariance matrix of the noise. In order to alleviate the problem, we can use the two covariance matrices: Rbx(p1) and Rbx(p2) for non-zero time delays (p1 6= p2 6= 0). Since the noise vector was assumed to be temporally white, the

7For the prewhitened data we haveRx(0) =In.

covariance matrices Rbx(p1) andRbx(p2) are not affected by the noise vector, i.e., Rbx(p1) = HRbs(p1)H^T,

Rb_x(p₂) = HRb_s(p₂)H^T

for any time delay different from zero. Thus, it is possible to obtain a robust estimate of the unmixing matrix, regardless of probability distributions and spatial structure of the noise vector [223,224]. However, to perform GEVD or EVD with robust prewhitening one of the matrices Rx(p1) or Rx(p2) must be positive definite which is not guaranteed for any time delay. So a new problem arises, how to select an optimal time delays, such that at least one of the covariance matrices is symmetric positive definite. Furthermore, the described algorithms exploit only two different correlation matrices of the observation vector, so their performance is degraded if some eigenvalues of Rs(p) are close to each other. In order to avoid these drawbacks, we should use rather a larger set of time-delayed correlation matrices for various time lags as explained in the next sections of this chapter.

4.3 IMPROVED SOS BLIND IDENTIFICATION ALGORITHMS BASED ON

Dans le document Adaptive Blind Signal and Image Processing (Page 173-180)