• Aucun résultat trouvé

Dual microphone based echo postfilter

4. PSD estimate according to PSD estimate according to

6.3.2 Linear echo processing

Experiments involve the same signals as in Chapter 5. The different impulse responses recorded with our mock-up phone are used to generate a test database of speech signals with SERs ranging from −5 to 10dB on the primary microphone. Linear and non-linear echo suppression are assessed in Section 6.3.2 and 6.3.3 respectively. The DM residual echo PSD estimate considered in this section is that of Section 6.1.

6.3.2.1 Influence of the near-end RTF estimate

The proposed DM echo estimate depends on the near-end and echo RTFs. In Chapter 5 (see Section 5.4.2) two near-end RTF estimates are presented: ˆΘM M SE and ˆΘLS. Fig-ure 6.3 shows the influence of the different near-end RTF estimators on the ERLE and

118 6. Dual microphone based echo postfilter SA. The ERLE and SA for the SM case is shown for reference. We can see from Fig-ure 6.3 (a) that between time t = 8.3s and t = 9.5s, the DM systems achieve less echo suppression than the SM system. This is explained by the fact that this portion corre-sponds to the beginning of the echo periods. This poor behavior is due to the echo RTF which has not yet reached its steady state. Indeed, the echo RTF is computed through the cross-PSDs Φxe1 and Φxy2 which are in turn estimated through autoregressive smoothing.

At the beginning of the echo period, these PSDs are still in their transient state. As explained in Section 6.1.1, the echo RTF accuracy has a strong impact on the amount of echo suppression.

Figure 6.3 also shows no matter what near-end RTF estimate is used, the ERLE and SA curves for the DM postfilter are very similar. In Section 6.1.1, we showed that SA during DT is partly due to errors in the near-end RTF. Although, the LS estimate ˆΘLS is shown in [Shalvi and Weinstein, 1996] to be unbiased and therefore a more mathematically accurate estimate compared to the MMSE estimate, it does not significantly improve the performance of our DM PSD estimate. Given the significant computational complexity and memory requirements needed by ˆΘLS, only ˆΘM M SE is used in further experiments.

6.3.2.2 Residual echo PSD accuracy

Figure 6.4 illustrates the error in the residual echo PSD estimator during echo-only and double-talk periods. For both echo-only periods and DT periods, the DM estimator out-performs the SM estimator. The curves also show that the error for the DM estimate increases with the SER. For the echo-only periods, this increase is very small (less than 1dB for the whole range of SERs considered). In our opinion, this increase is due to the threshold of the denominator of the DM estimate (Equation 6.33). For the DT periods, the increase is more emphasized and can be explained by the presence of near-end speech, which disturbs the PSD estimate. Moreover, a high SER implies high levels of near-end speech compared to echo (and therefore residual echo) and thus greater disturbance of the residual echo estimators. The loss of performance of the DM estimate can be explained by the fact, that during DT, the presence of near-end disturbs the estimate of the echo RTF as the cross-PSDs used for its computation contain a component dependent on the near-end speech signal. As the SER increases, the cross-PSD component due to the near-near-end speech signal increases and so-doing leads to a less accurate estimate of the echo RTF.

It is also of interest to note that there is an increase in the DM PSD error between echo-only and DT periods. This corroborates the analysis in Section 6.1.1 which shows that there are some near-end speech components in the residual echo estimate during DT.

6.3.2.3 Echo suppression

Figure 6.5 shows the ERLE and SA curves for echo-only and DT periods respectively.

The ERLE curves in Figure 6.5 (a) show that the DM echo postfilter achieves more echo suppression than the SM postfilter. This is a direct consequence of PSD estimate accuracy during echo-only periods. The SA curves show that for the DM case, attenuation of the near-end speech increases with the SER while for the SM case the attenuation decreases. Such increases in the SA are undesirable as it means half-duplex situations.

The DM postfilter nevertheless introduces less attenuation (up to 5dB) compared to the SM postfilter.

Informal listening tests show that the DM postfilter yields slightly better near-end speech intelligibility compared to the SM postfilter during double-talk periods. The high

−5 0 5 10 7

7.5 8 8.5 9 9.5 10 10.5 11

Signal to echo ratio (SER) in dB

Log Error in dB

SM Linear DM

(a) PSD accuracy in echo only periods

−5 0 5 10

8.5 9 9.5 10 10.5 11 11.5 12 12.5 13 13.5

Signal to echo ratio (SER) in dB

Log Error in dB

SM Linear DM

(b) PSD accuracy in double-talk periods

Figure 6.4: Echo PSD estimation error

SA introduced by the SM postfilter is perceptible and sometimes leads to complete sup-pression of the speech.

6.3.2.4 Assessment with data from real device

Figure 6.6 shows the ERLE and SA curves measured with data from a real device. The SA is measured in a similar way as in Section 5.5.5. Only the ranking of the methods compared and the SA difference between them has a meaning. We see that the SM system achieves the most echo suppression: this observation is confirmed by the spectrograms in Figure 6.7. Nevertheless, it is of interest to note that for both methods, the echo component is completely suppressed during echo-only periods. Bearing in mind, that in a real time system, the postfiltering gains are always floored such that its output level matches that of the ambient noise or comfort noise, we can state that the Linear DM method achieves enough echo suppression.

120 6. Dual microphone based echo postfilter

−5 0 5 10

43 44 45 46 47 48 49 50 51 52 53

Signal to echo ratio (SER) in dB

ERLE in dB

SM Linear DM

(a) Average ERLE

−58 0 5 10

10 12 14 16 18 20 22

Signal to echo ratio (SER) in dB

SA in dB

SM Linear DM

(b) Average SA during double-talk periods

Figure 6.5: Echo suppression performance

The spectrograms in Figure 6.7 and the SA curve in Figure 6.6 (b) show the clear superiority of the Linear DM method during double-talk periods. We see from the spec-trograms that a lot of the near-end speech components are lost with the SM system.