• Aucun résultat trouvé

4.3 Validation of the GPHM

4.3.2 Results

The response of all three devices to the exponential sine-sweep input signal is shown in the form of spectrograms in the top row of Fig. 4.10. The 1st and particularly the 3rd device (left and right columns in Fig. 4.10) exhibit significant nonlinear distortion; spectrograms show additional higher order harmonics in addition to the input exponential sine-sweep input signal. The nonlinearity is furthermore asymmetric; odd-order harmonics are more significant than even-order nonlinearities. We note that some independent studies [9,63]

have reported similar observations. In contrast, the second device exhibits comparatively less nonlinear distortion.

Results for each of the three devices are also illustrated in Fig. 4.10. The middle row shows results for the exponential sine-sweep input signal whereas the lower row shows results for the real-speech input signal. In all cases, results are shown for different orders of nonlinearity P (vertical axes) and different Volterra kernel lengthsL (horizontal axes).

Blue colours illustrate lower MCDs whereas red colours indicate higher MCDs.

For satisfactory performance, the order of nonlinearity P should be high enough to capture the principal sources of nonlinear distortion, i.e. the most dominant harmonics.

The simplified Volterra kernel filter lengthLshould be sufficiently high so as to capture accurately both linear and nonlinear loudspeaker behavior. Both parameters are however a compromise between performance and computational efficiency.

The response of each device to the exponential sine-sweep input signal is illustrated in middle row of Fig. 4.10. For the 1st and 3rd devices, the MCD is higher for lower values of P, irrespective of the number of filter tapsL. The MCD nonetheless decreases with increasing P. This behavior is not observed for the 2nd device where, in any case, the level of nonlinear distortion is comparatively low. It is nonetheless reassuring that there is negligible change in model accuracy for increasing (overestimated) P. For the 1st and 2nd devices, the MCD decreases as the kernel length Lincreases. However, for the 3rd device, with a value of P >2 performance is relatively stable for varyingL. One possible explanation for such behavior is that the highest order of significant nonlinearity exceeds that of the model (P = 10). Since the 3rd device exhibits nonlinearity greater than 10th order, P is not sufficient in this case to reduce the MCD. Accordingly, values of P >10 would be needed where processing capacity allows.

Real-speech input

Results for real-speech inputs are illustrated in the last row of Fig. 4.10. Due to aliasing caused by the static nonlinearity modeling, MCD values are generally lower for speech than sine-sweep inputs. For the 1st device, the best performance is obtained for lower values ofP and higher values ofL. For the 2nd device, performance is best for higher values ofL but is independent of P. For the 3rd device performance is best in the case of P = 1 and values ofL around 64.

These results show that, for the two cases where nonlinearity is significant, the linear model (P = 1) outperforms the nonlinear model (P > 1) in the case of real-speech inputs. Despite the estimation of the nonlinearity is based on procedure that permits advanced analysis of nonlinear system [7,8], our results show that this model does not match with approximation of nonlinearity observed with speech excitation signal for mobile devices. This leads to questions about the reasons of the observed mismatch. One explanation for this behavior lies in the wider variation in amplitude for speech signals compared to sine-sweep signals; lower amplitude speech signals may provoke significantly less nonlinear distortion. It is also possible that the model obtained from the system response to sine-sweep signals is overly simplistic. Whereas the sine-sweep signal consists in a single sinusoidal frequency at any instant, speech has a far more complex spectral density whereas the model neglects inter-spectral influences.

4.4 Summary

This chapter presents an overview of nonlinear loudspeaker modeling and an experimental way of identifying the nonlinearities of the loudspeakers used in mobile phones. This chapter also reports our work to assess the suitability of Volterra series derivatives in modeling the nonlinear distortion introduced by the mobile phone loudspeakers. We compared the synthesized outputs of two loudspeaker models to empirically measured, real loudspeaker outputs. The work suggests that the generalized polynomial Hammerstein model (GPHM) approximates more reliable practical nonlinear loudspeaker behavior.

This chapter also presents the key results of the GPHM model validation for the char-acterization and modeling of nonlinear loudspeakers. The simplified Volterra kernels, which characterize the nonlinear system, are empirically measured and then used to predict the response of three different mobile phones. Whereas validation with the same sine-sweep input signals used for characterization shows the potential, the model yields worse performance than a conventional linear model in the case of real-speech inputs.

Benefits and limitations of the GPHM identification technique are discussed along with requirements for updating this technique to improve the ability to simulate behavior of complex loudspeaker systems.

The work highlights the challenge to model accurately the distortion introduced by nonlinear loudspeakers. Further refinements are thus necessary to achieve consistent practical performance, in particular with respect to inter-spectral influences. Future work should develop new modeling strategies based on real-speech input signals rather than specially-crafted, yet artificial inputs such as those used in this work. This will allow for the full consideration of intrinsic speech characteristics and the response of nonlinear systems to amplitude variations and the distribution of nonlinearities across the full spectrum.

Part II

Nonlinear Acoustic Echo

Cancellation

2 is majorly devoted to NAEC. Established state-of-the-art approaches to NAEC are described in Chapter 5 before a comprehensive performance evaluation and analysis of selected NAEC algorithms. This work highlights their strengths and weaknesses.

Chapter 6 discusses a nonlinear and nonstationary signal analysis technique known as Empirical Mode Decomposition (EMD) and also reports a novel solution to NAEC based on EMD. The Hilbert-Huang Transform (HHT) allows spectral analysis of nonlinear and nonstationary signals by using EMD followed by the Hilbert transform. Chapter 7 reports our first attempt to apply the Hilbert-Huang Transform (HHT) to the analysis of nonlinear distortion produced by miniature loudspeakers. The work furthermore questions the suitability of traditional signal analysis approaches while giving weight to the use of HHT analysis in future work.

Chapter 5

State-of-the-Art NAEC Solutions

In this chapter, first we briefly review the existing techniques for the nonlinear acoustic echo cancellation and/or suppression, which have been developed within the last decade for mobile telephoney applications. Next, we have reported a comprehensive performance and stability analysis of widely used NAEC algorithms under various practical acoustic environments. A part of this work is presented as a technical report in [64].

Literature Survey

There is a considerable amount of literature in the last decade dedicated to the solutions that can handle nonlinear distortion and to maintain stable echo cancellation performance.

First of all, we can divide these solutions into two main categories:

• Hardware-based solutions

• Software-based solutions

This is shown in Fig. 5.1. We should note that these two types of solutions are mutually exclusive in the literature till date and there are no solutions that can combine these two approaches within a single AEC system. In practice, however, AEC systems tend to end up in one or the other of the categories, for reasons of development and/or computational cost. We proceed then to the description of these solutions.

5.1 Hardware-based Solutions

Hardware-based solutions to handle nonlinear distortion are two fold:

NRES

NAEC

Software Hardware

Pre Filtering

Post Filtering

Nonlinear Adaptive

Filtering Cascaded

Parallel Others

Figure 5.1 – Approaches to handle nonlinearities.

• Aim at furbishing up the loudspeaker’s structural faults and resume normal linear performance levels.

• Choose to use additional hardware changes in the downlink path to get reference signals related to nonlinear distortion.

while the first option seems simple solution, hardware structural changes to miniature devices is not at all a cost effective solution. Using larger and high quality loudspeakers and its associated amplifiers can reduce the maximum amount of nonlinear distortion in the LEMS. But such a strategy will certainly increase the manufacturing cost and thereby price of the mobile devices significantly. The second option works on the idea of using nonlinearly distorted loudspeaker signal as reference signal to the linear AEC system. The advantage of this kind of method is that the echo canceler would have to model only the linear acoustic path from loudspeaker output to microphone output because the reference signal is nonlinearly distorted signal. Echo canceler doesn’t have to model any nonlinearities. In order to do so, some additional hardware changes in the downlink path are mandatory to obtain the pure loudspeaker signal without room or any other acoustic artifacts. In [65], the authors used a sensor (an accelerometer) in the downlink path attached to the loudspeaker, expected to capture the nonlinear signal.

This sensor signal is used as a reference signal to the NLMS based AEC to model the linear acoustic path. This approach provides 15dB ERLE improvement compared to conventional NLMS based AEC with no additional complexity. In [66,67], the authors proposed to use either the voltage or the current signal that drives the loudspeaker as the reference signal to linear AEC. These signals capture some or all of the nonlinearities

in the downlink path. In order to obtain these signals to use as the reference signals, some hardware modifications based on RC filters are suggested in the loudspeaker cavity.

This approach provides an average of 6dB ERLE improvement compared to conventional NLMS based AEC with no additional complexity.

Though the advantage of using nonlinearly distorted alternate reference signals is clearly seen, the usage of additional hardware changes in the downlink path is superfluous.

Such additional hardware changes certainly increase the cost, and may cause additional distortions. For this reason, researchers believe that digital compensation of nonlinear distortion by means of Digital Signal Processing (DSP) algorithms may be a realistic solution.