To the full feature vector - Extension of uncertainty propagation

Part II Uncertainty handling 31

5.2 Extension of uncertainty propagation

5.2.3 To the full feature vector

In the third step, we propagate the uncertainty about the static features to the full feature vector. The static features in the preceding 4 frames, in the current frame, and in the following 4 frames are concatenated into a column vector ¯z_n = [z^T_n−4 z^T_n−3. . .z^T_n+4]^T. The full feature vectorc_n= [z_n ∆z_n ∆²z_n] can be expressed in matrix form as

c_n= (A⊗I_C)¯z_n (5.18)

where ⊗is the Kronecker product, I_C the identity matrix of size C = 13, and the matrixA is given by (4.30). The mean and the covariance matrix of c_n are derived as

µb_c_n = (A⊗I_C)µb_¯_z_n (5.19)

Σb_c_n = (A⊗I_C)Σb¯zn(A⊗I_C)^T (5.20) where µb_¯_z_n and Σbz¯n are obtained by concatenating µb_z_n−4, . . . , µb_z_n−4 into a column vector and Σb_z_n−4, . . . ,Σb_z_n+4 into a block-diagonal matrix. The full uncertainty covariance matrix Σb_c_n is then exploited to dynamically adapt the recognizer using the uncertainty decoding rule (4.35).

5.3 Experimental results

The resulting ASR accuracy on the Track 1 dataset is reported in Table 5.1. Similar trends are observed on development and test data. We recall that, on average over all SNRs in the test set, the baseline accuracy with conventional decoding (no uncertainty) is 85.01%. State-of-the-art uncertainty decoding with diagonal uncertainty covariance on static features (and no uncertainty on dynamic features) increases the accuracy to 85.74%, that is 4% relative error rate reduction with respect to the baseline. Using the full uncertainty covariance and modeling the uncertainty

5.3. Experimental results

Figure 5.2: Example of estimated uncertainty covariance using Wiener + VTS and full oracle uncertainty covariance matrix over time. This example corresponds to the utterance ’bin blue with s nine soon’ on the Track 1 dataset. Top: uncertainty is estimated using Wiener + VTS.

Bottom: oracle uncertainty obtained using equation (4.44).

over the dynamic features systematically improve the average performance. The best system using full uncertainty covariance on all features achieves 87.00% accuracy. This corresponds to 13% relative error rate reduction with respect to the baseline, that is three times more than the reduction achieved with diagonal uncertainty covariance for static features.

In an additional experiment (not shown in the table), we evaluated the performance obtained with full uncertainty covariance matrices where the covariance coefficients between the features derived from the magnitude spectrum (MFCCs and their derivatives) and the features derived from the power spectrum (log-energy and its derivatives) have been zeroed out. The resulting ASR performance was equal to that obtained from the full uncertainty covariance matrix which justifies the approximation made by Astudillo [Astudillo and Kolossa, 2011]. As a matter of fact, covariance coefficients far from the diagonal tend to be smaller compared to those close to the diagonal. The covariance coefficients between MFCCs and log-energy also tend to be quite small, despite the fact that the magnitude and power spectra from which they are derived are strongly correlated. This indicates that the estimation of full uncertainty covariance matrices may be simplified by separately estimating and propagating uncertainty for the MFCCs and their derivatives on the one hand and for the log-energy and its derivatives on the other hand.

Performing uncertainty decoding for diagonal uncertainty covariance matrix achieved 86.30%

FEATURE INDEX

Figure 5.3: Example of full uncertainty covariance matrix on one time frame. This example corresponds to frame 50 of the utterance ’bin blue with s nine soon’ on the Track 1 dataset.

Left side: uncertainty estimated using Wiener + VTS. Right side: oracle uncertainty obtained using equation (4.45).

accuracy which correspond to 8% relative error rate reduction compared to baseline (enhance-ment only). Figure 5.2 show an example of estimated diagonal uncertainty covariance matrix over time. The estimated diagonal uncertainties are underestimated in all frames.

An example of estimated full uncertainty covariance matrix is show in Figure 5.3. The off-diagonal elements of the estimated full uncertainty covariance matrix are nonzero except for the elements modeling the uncertainties between the MFCCs and their derivatives. In addition, again, the diagonal elements of the estimated uncertainty covariance matrix look much smaller compare to those of the oracle uncertainty covariance matrix. This underestimation problem is addressed in the next Chapter.

5.4 Summary

In this chapter we presented a technique to propagete uncertainty through MFCC computation that results in a full uncertainty covariance matrix. Experimental results on Track 1 of the 2nd CHiME challenge show that full uncertainty covariance matrices achieved a 5% relative WER reduction compare to diagonal uncertainty matrices. However, modeling the correlation of uncertainty between the MFCCs and the log-energy does not seem to improve ASR performance.

6

Generative learning based uncertainty estimator

This chapter presents the proposed generative learning based uncertainty estimator. The mo-tivation of this work is presented in Section 6.1. The theoretical idea is presented in Section 6.2. Section 6.3 explains how this generative method can learn parameters from the data. The experimental evaluation is presented in Sections 6.4 and 6.5. This work was published in [Tran et al., 2014b; Tran et al., 2015c].

6.1 Motivation

Although the proposed extension to a full uncertainty covariance matrix improves ASR perfor-mance, the estimated uncertainty is still inaccurate. As can be seen by comparing Tables 4.1 and 5.1, estimating the full uncertainty covariance matrix improved the accuracy from 85.01% to 87.00% but this is still far from the accuracy of 96.31% achieved by the oracle full uncertainty co-variance matrix. It turns out that there is still a big gap between the estimated uncertainty and the oracle uncertainty in general. In particular, as shown in Figures 5.2 and 5.3, the estimated uncertainty is underestimated compared to the oracle one.

Existing methods typically rely on a single uncertainty estimator and propagator fixed by mathematical approximation. As shown in Figure 4.2, the three spectral-domain uncertainty es-timators introduced in Section 4.3 have different behaviors. Kolossa’s estimator decreases when the estimated speech power spectrum increases. The two other estimators reach a maximum when the estimated power spectra of speech and noise are equal. Nesta’s estimator increases more quickly than the Wiener estimator. Motivated by this observation, we now propose a new paradigm where we seek to learn more powerful mappings to predict uncertainty from data. We investigate two such possible mappings: linear fusion of multiple uncertainty estima-tors/propagators and nonparametric uncertainty estimation/propagation.

Dans le document Traitement de l’incertitude pour la reconnaissance de la parole robuste au bruit ~ Association Francophone de la Communication Parlée (Page 66-70)