Analytic form of LeSF - LeSF applied to Speech

5.3 LeSF applied to Speech

5.3.3 Analytic form of LeSF

From now onward we will not make any distinction between the exponential decay factorsα^voiced and α^unvoiced as the functional form of the equations remain the same. Therefore the following discussion is valid for both voiced speech and unvoiced speech.

To be able to use the general form of the solution of the LeSF filter as in (5.12), we need a pole-zero model of the input autocorrelation in the form as described in (5.11). Under the approximation that the decaying exponentials are widely spaced along the unit circle, the power spectrumSxx(z) in (5.15) that consists of sum of certain terms can be approximated by a ratio of the product of terms (of the form(z−e^ρ+jθ)), leading to a rational ’z’ transform. Specifically, as explained in (Anderson et al., 1983; Satorius et al., 1978) and making the following assumptions,

• The pole pairs in (5.15) lie sufficiently close to the unit circle (easily satisfied asα'0.)

• All the frequency pairs(ωi, ωj)in (5.15) are sufficiently separated from each other such that their contribution to the total power spectrum do not overlap significantly.

theztransform of the total input can be expressed as, Sxx(z) = σ²

m=1(z−e^−β^m^+jω^m)(z−e^+β^m^+jω^m) QM

m=1(z−e^−α^m^+jω^m)(z−e^+α^m^+jω^m)

× (z−e^+β^m^−jω^m)(z−e^−β^m^−jω^m)

(z−e^+α^m^−jω^m)(z−e^−α^m^−jω^m) (5.19)

whereαm = 1/N

Corresponding to each of the sinusoidal component in the input signal there are four poles at lo-cationsz =e±α±ωm and there are four zeros on the same radial lines as the signal poles but at different distances away from the unit circle. Using the general solution described in (5.12), which has been derived at length in (Anderson et al., 1983), the solution of the LeSF weight vector to the present problem is,

w(n) = XM m=1

Bme^−β^mⁿ+Cme^+β^mⁿ

cosωm(n+P) (5.20)

The values ofβm, BmandCmcan be determined by substituting (5.20) and (5.14) in (5.9). The detailed solution of this equation is provided in the Appendix A. The l^thequation in the linear-system described in (5.9) has terms with coefficientsexp(−βml),exp(+βml),exp(−αl) cos(ωm(l+P)) andexp(αl) cos(ωm(l+P)). Besides these, there are two other kind of terms that can be neglected.

• “Non-stationary” terms that are modulated by a sinusoid at frequency2ωmwherem∈[1, M].

Forωm6= 0,ωm6=π, their total contribution is approximately zero.³

• Interference terms that are modulated by a sinusoid at frequency ∆ω = (ωi −ωj) where (i, j) ∈ [1, . . . , M]. If filter length L 2π/∆ω, these interference terms approximately sum up to zero and hence can be neglected.

The coefficients of the terms exp(−βml), exp(+βml)are the same for each of theL equations and setting them to zero leads to just one equation which relates βmto αand the SNR. Let ρi denote

3due to self cancelling positive and negative half periods of a sinusoid.

5.3. LESF APPLIED TO SPEECH 63 the “partial” SNR of the sinusoid at frequencyωii.eρi=A²_i/σ²and the complementary signal SNR be denoted asγi= (PM

m=1,m6=iA²_i)/σ². Then we have the following relation, coshβi= coshα+ ρi

2γi+ρi+ 2sinhα (5.21)

There are two interesting cases. First case is when the sinusoid at frequencyωi is significantly stronger than other sinusoids such that γi is quite low. This is illustrated in figure (5.3), where we plot the bandwidth βi of the LeSF’s pass-band that is centered aroundωi as a function of the partial SNR of thei^thsinusoid,ρi. The complementary signal’s SNR is quite low atγi =−6.99db.

We plot curves for different “effective” input sinusoid’s bandwidthα. From (5.15), we note thatαis reciprocal of frame lengthN. The vertical line in figure (5.3) corresponds to the case whenρi=γi. We note that for a given partial SNRρi, the LeSF bandwidth becomes narrower as the frame length N increases, indicating a better selectivity of the LeSF filter.

−50 −40 −30 −20 −10 0 10 20 30

−30

−25

−20

−15

−10

−5 0

Partial SNR of a sinusoid in db, complemetary signal SNR= −6.99

bandwidth of the filter in db

alpha=0.01 alpha=0.005

alpha=0.001

Figure 5.3. Plot of the filter bandwidthβi centered around frequencyωi as a function of partial sinusoid SNRρi

for a given complementary signal SNRγi = −6.99dband “effective” input bandwidthα(alpha) = 0.01,0.005,0.001 respectively. The vertical line meets the three curves whenρi=γi.

In figure (5.4), we plot the bandwidthβi as a function ofρi for the cases when complementary signal SNR is high at γi = 10dband is low at γi = −6.99db. The two dots correspond to the case whenρi=γi. We note thatγi= 10dbcorresponds to a signal with high overall SNR⁴. Therefore the cross-over point (γi=ρi) for lowγioccurs at narrower bandwidth as compared to highγicase. This is so because in the former case the overall signal SNR is low and thus the LeSF filter has to have narrower pass-bands to reject as much of noise as possible.

BiandCiin (5.20) are determined by equating their respective coefficients. The “non-stationary”

interference terms between all of the pairs of the frequency(ωi, ωj),can be neglected if(ωi−ωj)>>

2π/L. This requires that LeSF’s frequency resolution (2π/L) should be able to resolve the con-stituent sinusoids.

Bi = 2e^−βⁱe^−αP(α+βi)²(βi−α) ((α+βi)²−e^−2βⁱ^L(βi−α)²) Ci = 2e^−βⁱ^(2L+1)+1e^−αP(α+βi)(βi−α)²

((α+βi)²−e^−2βⁱ^L(βi−α)²) (5.22)

4As overall SNR of the signal =10 log₁₀(1010γi+ 1010ρi)

We note from (5.21) that the various sinusoids are coupled with each other through the dependence of their bandwidthβion the complementary signal SNRγi. As a consequence of thatBi, Ciare also indirectly dependent on the powers of the other sinusoids throughβi.

−50 −40 −30 −20 −10 0 10 20 30

−30

−25

−20

−15

−10

−5 0

Partial SNR of a sinusoid in db

bandwidth of the filter in db Complementary signal

SNR= 10 Complementary signal SNR= −6.99

Figure 5.4. Plot of the filter bandwidthβicentered around frequencyωias a function of partial sinusoid SNRρifor given complementary signal SNRsγi=−6.99db,10dbrespectively. The “effective” input bandwidthα(alpha) = 0.01for both the curves. The two dots correspond to the cases when the partial SNRρiis equal to complementary signal SNR γi.

0 0.1 0.2 0.3 0.4 0.5

−30

−25

−20

−15

−10

−5 0

Normalized frequency

Magnitude response (in db)

SNR=3

SNR=−7

SNR=−17

Figure 5.5. Plot of the magnitude response of the LeSF filter as a function of the input SNR. The input consists of three sinusoids at normalized frequencies (0.1, 0.2, 0.4) with relative strength(1 : 0.6 : 0.4)respectively.

In Fig.5.5, the magnitude response of the LeSF filter is plotted for various SNR. The input in this case consist of three sinusoids at normalized frequencies ( 0.1, 0.2, 0.4). The frame length is N = 500and filter length is(L= 100). As the signal SNR decreases, the bandwidth of the LeSF filter starts to decrease in order to reject as much of noise as possible. The LESF filter’s gain decreases with decreasing SNR. Similar results were reported in (Anderson et al., 1983; Zeidler et al., 1978) for the case of stationary inputs.

In Fig. 5.6, we plot the spectrograms of a clean speech utterance. Fig. 5.7 and Fig. 5.8 display the same utterance embedded in F16-cockpit noise at SNR 6dB and its LeSF enhanced version

Dans le document Novel speech processing techniques for robust automatic speech recognition (Page 78-81)