• Aucun résultat trouvé

6.2 Introduction to EMD

6.2.4 EMD Applications

EMD does the unsupervised signal decomposition based on local characteristic time scale of the data. Besides, EMD is adaptive, highly efficient and does not leave time domain.

These properties of EMD were claimed to be well suited for nonlinear and nonstationary data and have prompted many researchers to investigate EMD method to various research fields. Accordingly, there have been hundreds of papers in the literature during the last decade dedicated to apply EMD technique to various engineering and non-engineering applications, for example, biomedical applications [100,105,106,107], image processing and computer vision [101,108,109], meteorology and climate studies [110,111], financial studies [112], ocean and seismic wave studies [113,114], mechanical engineering [115,116, 117] and many other diverse research areas.

Speech Processing using EMD

In the last few years, the application of EMD has also been extended to speech and audio signal processing. Before presenting our work on EMD-based NAEC, this section discusses other applications of EMD in speech and audio enhancement that are widely reported.

Like many real-world signals, speech signals are also highly nonstationary, making traditional signal analysis dissatisfying due to dynamic variation of spectral content across the time. EMD is a better alternative suitable to analyse highly nonstationary signals like speech. An example of speech signal analysis using EMD is illustrated in Fig. 6.5. The input signal is a clean speech signal sampled at 8kHz. EMD decomposes the clean speech signal into 18 IMFs; the first 6 IMFs are shown in Fig. 6.5. The first IMF has a high-pass characteristic but also contains lower energy, low frequency content.

The higher order IMFs have overlapping band-pass characteristics [118]. It is necessary to emphasize that the cut-offfrequency between the consecutive IMFs is time varying and input signal dependent. Several efforts were recently made to use the IMFs for speech enhancement [119,120,121,122]. In [121], linear predictive coding (LPC) analysis was computed on the IMFs in order to extract the resonant frequencies of the vocal tract (formants). The results are compared to the LPC analysis operated on original

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 IMF 6

IMF 5 IMF 4 IMF 3 IMF 2 IMF 1 Speech

Samples

Figure 6.5 – An illustration of EMD. Illustrated is a clean speech signal (top) and the first 6 IMFs.

(full-band) speech signal and it was shown that the latter is more reprehensive than the same analysis operated on IMFs.

Further observing the IMFs in Fig. 6.5, it is clear that most of the energy content of the clean speech signal is concentrated in the first few IMFs. Fig. 6.6a illustrates the variance of IMFs with respect to the order of IMFs for a clean speech signal. The variance (or energy) of the IMFs decreases as the order of IMF increases. Similarly, if a speech signal is contaminated with a low-frequency noise (example: wind noise or car interior noise) then EMD has the capability to characterize the IMFs as either speech dominant or noise dominant. To illustrate such an example, a noisy speech signal was generated artificially by adding wind noise to the same clean speech signal shown in Fig. 6.5 at 0dB SNR level. Applying EMD to the noisy speech signal resulted in 12 IMFs. Fig. 6.6b illustrates the IMF variance plot of noisy speech signal. It can be seen that there is a sudden increase in the energy (variance) at higher order IMFs due to the low-frequency wind noise components. In [118], the authors showed that these observations hold true for most of the speech signals and low-frequency noisy signals.

Thus, one way to reconstruct the speech signal from the noisy signal is by ignoring the higher-order noise-dominant IMFs and summing the first few IMFs composed of the

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0

0.5 1 1.5 2 2.5 3 3.5 4

IMF Order

IMF Variance

(a) IMF variance(energy) plot of a clean speech signal

0 1 2 3 4 5 6 7 8 9 10 11 12 13

0 0.5 1 1.5 2 2.5 3 3.5

4x 10−3

IMF Order

IMF Variance

(b) IMF variance(energy) plot of a speech signal contaminated with wind noise at 0 dB SNR Figure 6.6 – IMF variance plots: indicates energy content in each IMF

desired speech signal. By taking advantage of this principle that separates the speech from the noise, approaches to EMD-based speech enhancement/noise cancellation are proposed in the literature [118,119,120].

Recently, an EMD-based sub-band approach to linear AEC is reported in [122]. In typical sub-band approaches to linear AEC, the far-end signal and the microphone output signal are divided into mutually exclusive multiple sub-band signals using identical analysis filter-banks2. The resulting sub-band signals are down-sampled by a known factor and

2A filter-bank is a bank of band-pass filters that divides the input signal into a set of sub-band signals

each down-sampled sub-band far-end signal serves as an input to an independent adaptive filter. The outputs of the sub-band adaptive filters are subtracted from the sub-band microphone signals forming the sub-band errors. These errors are then up-sampled and combined using synthesis filter-banks leading to the full-band echo-free output. This sub-band adaptive filtering technique allows for fast convergence and reduced complexity through the use of a robust NLMS algorithm, especially in longer reverberant acoustic environments [2]. However, in practice, the performance of sub-band adaptive filtering techniques is often degraded due to artifacts introduced by the filter-banks [123]. Authors in [122] presented an EMD-based sub-band adaptive filtering scheme to reduce the filter-banks artifacts. This structure uses EMD in the place of filter-banks, but there is no guarantee that the same number of IMFs with the same bandwidths will be produced at both downlink (far-end) and uplink (microphone) signals using EMD. Hence, the authors proposed to use an IMF separation process after EMD, where IMFs are grouped into separate bands according to the power spectral densities (PSD) of the IMFs. A detailed description about IMFs grouping is provided in [122]. The different IMF groups are treated with multiple adaptive filters similar to the traditional sub-band adaptive filtering techniques. The results presented show effective ERLE values with a faster convergence rate using the proposed EMD-based structure.

Other applications of EMD in speech and audio processing include speech analysis [124], source separation [125], voice activity detection [126] and pitch estimation [127].