• Aucun résultat trouvé

Echo cancellation for dual channel terminals

5.5 Performances of the proposed double-talk detector and gain rulegain rule

5.5.4 Performance with data from mock-up phone

In this section, we present a more global analysis of our systems using a large database of speech signals. Impulse responses measured with the handsfree mock-up phone are used to generate a database of test signals with SER ranging from -5dB to 10dB at the primary microphone. Our DM approaches are assessed in this comparison which includes 3 differents systems:

• Since, it is difficult to find a consistent DM echo processing in the litterature for comparison with our system, we design a baseline DM approach based on the

com-98 5. Echo cancellation for dual channel terminals

(a) Echo suppression during echo-only period

28.5 29 29.5 30 30.5 31 31.5

(b) Speech attenuation during double-talk period

Figure 5.12: Influence of near-end RTF estimate. Impulse responses used are measured in office environment

parison of the output signal of SM echo processing scheme to each microphone path.

Therefore, this baseline approach outputs two signals ˆs1(n) and ˆs2(n) which we de-noteSMsˆ1(n) andSMsˆ2(n) respectively. In a real-time implementation, one would need to choose between both outputs: such a choice can be done in real-time using an additional decision block or offline (i.e. when designing our echo processing sys-tem). In our case, we use both outputs as reference system for comparison with our system. Advantages and inconveniences of each output will be discussed.

• We use our DTD as in Section 5.5.2 to control a SM echo postfilter. In this case, we apply the echo processing on the primary microphone and denote this output SMˆs1(n)−DT D: this output can be seen as an improved version of the baseline SMs1(m) system.

• Lastly, our new gain rule as tuned in Section 5.5.3 is considered in our experiments.

We denote this method DM.

5.5.4.1 Performance of proposed DTD and gain rule

Figures 5.13 (a) and 5.13 (b) show the ERLE and SA as functions of the SER. The baseline SMs2(n) achieves less echo suppression than the baseline SM s2(n) but achieves less SA during DT periods. The difference in performance between both systems is due

−5 0 5 10

Signal to echo ratio (SER) in dB

ERLE in dB

Signal to echo ratio (SER) in dB

SA in dB

Figure 5.13: Objective performance against SER

to transducers arrangement on the mock-up phone. The secondary microphone is closer to the loudspeaker than the primary. As a result, the secondary microphone will capture more echo than the primary microphone. This will lead to high echo suppression but also strong distortions of the near-end speech signal for the secondary microphone. At some point, the baseline SM s1(m) offers a better trade-off between the amount of echo suppression and distortions during DT periods than baseline SMs2(m)

In terms of ERLE, we observe that the use of the DTD permits to significantly improve the performance of the baseline SM s1(n) solution: an improvement of more than 15dB is observed. The ERLE curves in Figure 5.13 (a) also shows that the DM achieve slightly less echo suppression than the baseline SM s1(n) solution: there is a difference of about 4dB. The SA curves in Figure 5.13 (b) shows that the PLD gain rule achieves the best performance. Although the DM does not achieve as much echo suppression that the other methods, it remains very interesting for its double-talk behavior. The use of the

100 5. Echo cancellation for dual channel terminals DTD in the baseline SMs1(m) system permits to reduce its SA of about 1-3dB. This SA improvement can be explained by the two-step overestimation factor which is used and controlled by the DTD.

5.5.4.2 Impact of the position of the secondary microphone

We mentioned in Section 5.3 that placing the secondary microphone closer to the loud-speaker could improve the performance of our DTD. The recorded impulse response are used to generate a dataset of signals accounting for different distances between the loud-speaker and the secondary microphone. It is generally admitted that in non-reverberant environment as it is case in our recording, the amplitude of the min delay of an acoustic path is inversely proportional to the distance between the source and the microphone [Ha-bets, 2007]. The distance d0xy2 between the loudspeaker and the secondary microphone of our mock-up is about 5cm. We simulate two other positions of the secondary microphone dxy2 (2cm and 1cm) by simply amplifying the impulse responsehmod2 (n) proportionally to the inverse of the target distancedxy2:

hmod2 (n) = d0xy2 dxy2

·h2(n). (5.111)

The distance dsy2 between the mouth of the mannequin and secondary microphone is equal to 30 or 50cm. Moving the secondary microphone for 5cm as it is the case here will have a significant impact on g2(n). For this reason, we assume that moving the secondary microphone towards the loudspeaker will only modify the echo path: h2(n) then becomes hmod2 (n). The amplified impulse response hmod2 (n) is used in conjunction with the measured impulse responsesh1(n),g1(n) andg2(n) to generate a dataset of test signals. The resulting dataset contains microphone signals with SER which we set ranging from -5dB to 10dB for the primary microphone.

Figures 5.14 (a) and 5.14 (b) show the ERLE and SA performance as functions of the distance between the loudspeaker and the secondary microphone. Once more the baseline SM s1(n) system gives a better tradeoff between echo suppression and distortion during DT periods than the baseline SM s2(n). Performance of the baseline SMs2(n) decreases (reduction of the amount of echo suppression and increase of amount of distortion in DT) as dxy2 decreases. The poor echo attenuation for small values of dxy2 is due to that the more the secondary microphone gets closer to the loudspeaker, the more clipping might occur on this microphone. As a consequence, the baseline SM s2(n) lead to poor echo suppression due to the presence of clipping (which is seem as non-linearity in our echo processing scheme).

As the secondary microphone is moved closer to the loudspeaker the amount of echo suppression achieved by the DM solutions (SM combined with DTD and PLD gain rule) increases. The gap between the DM and the SM s1(n) + DTD systems reduces as the secondary microphone gets closer to the loudspeaker. For all DM solutions considered, the SA increases as the distancedxy2 decreases. As we can see in Figure 5.14 (b), the DM system clearly outperforms the other methods in terms of SA: such differences will clearly be noticeable during a conversation. The PLD gain rule will potentially result in better speech quality, better intelligibility.

As the secondary microphone gets closer to the loudspeaker, the more saturation might occur on the secondary microphone. Experiments show that saturations (i.e. clipping) of the secondary microphone do not altered the performance of the DM solutions (SM

1 1.5 2 2.5 3 3.5 4 4.5 5 40

50 60 70 80 90 100 110 120

Distance in cm

ERLE in dB

SM ˆs1(n) SM ˆs2(n) SM ˆs1(n) + DTD DM

(a) Echo suppression

1 1.5 2 2.5 3 3.5 4 4.5 5

5 10 15 20 25 30 35 40 45 50

Distance in cm

SA in dB

SM ˆs1(n) SM ˆs2(n) SM ˆs1(n) + DTD DM

(b) Speech attenuation

Figure 5.14: Objective performance against distance between loudspeaker and secondary microphone

combined with DTD and PLD gain rule). The experimentation about the position of the secondary microphone can be interpreted as a recommendation about the positioning of the transducers on the mobile devices. Optimum behavior of the DM solutions are obtained by considering some hardware constraints:

• placing the microphones in bottom-top configuration

• Placing the secondary microphone as closed as possible of the loudspeaker.

Referring the ERLE curve in Figure 5.14 (a), we see that ifdxy2 is lower than 2.75cm, the PLD gain outperforms the Baseline SM solution. This value of dxy2 might be interpreted as the maximal distance between to loudspeaker and the microphone which guarantees that the PLD gain rule achieves more echo suppression than the Baseline solution.

102 5. Echo cancellation for dual channel terminals

13 13.5 14 14.5 15 15.5

−20 0 20 40 60 80 100 120

Time in seconds

ERLE in dB

SM ˆs1(n) SM ˆs2(n) SM ˆs1(n) + DTD DM

(a) Echo suppression

34 34.5 35 35.5

0 20 40 60 80 100 120

Time in seconds

SA in dB

SM ˆs1(n) SM ˆs2(n) SM ˆs1(n) + DTD DM

(b) Modified SA

Figure 5.15: Echo suppression performance with real data

In real time systems, the echo postfilter uses a noise floor so as to avoid excessive attenuation during echo-only periods and noise modulation. The noise floor is used to force the output of the postfilter to be higher or equal to the noise level. The use of the noise floor improves the overall perceived quality of processed signals but results in lower echo suppression. Therefore, although the PLD gain rule does achieve the most ERLE, it still be considered an alternative echo postfiltering approach. In addition it presents the advantage of reduced level of near-end speech attenuation during DT periods especially when the secondary microphone is placed close to the loudspeaker.