Speech Codecs - LOSS CONCEALMENT FOR LAPPED TRANSFORM CODECS

FIGURE 3.2: Block diagram of a basic CELP codec

3.4 LOSS CONCEALMENT FOR LAPPED TRANSFORM CODECS

3.4.1 Speech Codecs

The ITU standard recommends that lost blocks be replaced with the previous block. While this technique is reasonable for low loss rates, artifacts are still present and become significant at loss rates that are common in the Internet.

In particular, replication of coefficients does not take into account the alignment of pitch periods between past and lost frames. (See examples of speech codecs, G.722.1.)

In Section 3.2.1 we presented one of the main principles behind loss conceal-ment for speech: pitch replication. As we will see, the algorithm presented in [16]

can be seen as an elaborate pitch replication system. It uses the partial information available to synthesize a signal that has similar spectral characteristics and aligns well with the surrounding blocks.

The MLT transform can be decomposed into a windowing operation, followed by a folding and a DCT. Each block of coefficients can thus be written in matrix form as

m=dct (F J x), (3.1)

wherem is the N ×1 vector of the resulting transform coefficients, F is the N×2Nfold-over matrix

andJ is a scaling matrix, that is, anN×Ndiagonal matrix with the windowing

Furthermore, we will often need to refer to the signal before the DCT is applied.

Let’s call thatz. So, we write

z=F J x. (3.4)

Note that in (3.2) the nonzero elements of the folding matrix form two nonover-lapping subblocks. In other words, we can decompose F in four submatrices, where two of them are zero matrices:

F =

Looking at the block diagonal structure ofF andJ, we can easily see that only the first half of the samples ofx is used in computing the first half of the folded vectorF J x(and similarly for the second half). That is, we can write

z₁=F₁J₁x₁. (3.7)

Therefore, if the next block of coefficients (which would also be using the sec-ond half of samples ofx) is lost, we can use this partial knowledge about the samples to try to estimatex₂.

More specifically, suppose an isolated block is lost (i.e., both the preceding and the subsequent blocks to the missing block of coefficients are correctly received).

The missing (incomplete) set of samples is 2N long. By computing the inverse DCT of the received data (but before applying the unfolding matrix), we have access toy. We can therefore write the following equation, applying to the first incomplete N samples:

z2=F2J2x2. (3.8)

Note that thex₁ andx₂ in (3.7) and (3.8) refer to different blocks. To avoid confusion, we will now add a time index to our notation. Namely to represent blocks at different time instants we will add a superscript index, indicating the block ordering. For example,xⁿwill mean the vectorxand time instantn.

Assume the block at timenis missing, but both the previous and the subsequent blocks are correctly received. So, since blocknis missing, but we have received blocksn−1 andn+1, we can write to the originalFandJmatrices. For simplicity, let’s refer to these modified (block rotated) matrices in the aforementioned equation asGandH. We therefore write

zⁿ₂⁻¹ zⁿ₂⁺¹

=GH xⁿ. (3.10)

Note that this is an underdetermined system of equations. We knowzⁿ₂⁻¹and z₂ⁿ⁺¹, and we are trying to estimate the 2Nsamples ofxⁿ. This underdetermined system could be solved for the minimum energy vector xⁿ using the Moore–

Penrose generalized inverse of GH. This would provide the minimum energy signal segmentx that satisfies the received (partial) information. Nevertheless, simulations show that this is not a good choice forx, as the nature of the matrix Jtends to concentrate the energy in the higher gain samples. A better choice is to find the solution minimizing the energy of the windowed signalH x. This solution does distribute more evenly the energy across the samples ofx. Nevertheless, it still does not use the information about the neighboring frames. Before proceeding to describe the best mode, let us introduce a small change in interpretation. Let us introduce an identity matrixI in (3.10), which becomes

zⁿ₂⁻¹ zⁿ₂⁺¹

=GH I xⁿ. (3.11)

We now interpretInot as a simple identity matrix, but as a matrix whose columns form a basis for the space of x. In this context, the basis I consists simply of impulses at each sample location. Using the generalized inverse ofGHwould be minimizing the energy of the basis representation over these impulses. That takes into account the partial information about the missing samples, but it does not take into account all the prior information we have about the missing segment:

the properly received signal segments just before (and possibly after) the missing segment. To fully exploit that information, we will reshape the aforementioned equation by introducing two small modifications. The first modification improves the signal continuity across frames by removing the no-excitation response. The

second biases the reconstructed signal toward having the same spectrum and pitch as the neighboring segments.

To account for the signal continuity, we estimate the LPC filter corresponding to the previous block and compute the no-excitation response of the LPC filter into the missing segment,x. We then modify (3.11) to account forˇ xˇand write

zⁿ₂⁻¹ zⁿ₂⁺¹

−GHxˇ=GH Ixˆⁿ, (3.12)

wherexˆ=x− ˇx.

To account for the spectral continuity, we invoke our interpretation ofI as a basis for the vectorx (nowxˆ) to claim we should not be minimizing the energy of x. Instead, we should be minimizing the energy of the representation of x under a basis whose functions have a spectrum corresponding to the desired LPC spectrum. To that end, we apply the LPC filter to the identity matrix, to obtain a new basisL, where each column ofLcorresponds to a time-shifted version of the impulse response of the LPC filter.

Finally, we compute an estimate of the periodicity and pitch period for the seg-ment and apply that to the basis functions as well. Each column ofLis now a series of “colored” pulses, each apart by the pitch period, each with the impulse response of the LPC filter, and each with decreasing amplitude, based on the esti-mated periodicity index. For simplicity, we still call this final basis matrixL. The representation on this new basis is notxany more, so let’s call itr. We now have

zⁿ₂⁻¹ zⁿ₂⁺¹

−GHxˇ=GH Lrⁿ, (3.13)

which is then solved by the pseudo inverse ofGH L, that is, rⁿ=(GH L)^†

where † denotes the pseudo inverse. Note that this is the solution that minimizes the LPC residual ofx, as we wanted. The final solution forxis obtained by simply computing

xⁿ=Lrⁿ+ ˇx. (3.15)

Figure 3.5 shows a sample of the results obtained by the concealment algo-rithm. The first signal is the original, the second is the signal reconstructed using the proposed technique, and the third is the results of concealment by a pitch replication method. In both cases every third packet is lost.

FIGURE 3.5:

Sample results. (a) Original signal. (b) Concealed using the partial information method, after losing every third frame. (c) Con-cealed using the pitch replication method.

In this section, we presented an error concealment technique that exploits the partial information available for the missing segment of a signal encoded by an overlapped transform. The discussion was centered around a speech codec, sim-ply because speech is of foremost importance for real-time communication. Nev-ertheless, the same principle can be applied to other overlapped transform codecs.

In particular, the same ideas apply to error concealment in music, as long as we remove the conditions relating to pitch and introduce a higher order model to account for the harmonic nature of music.

Dans le document MULTIMEDIA OVER IP AND WIRELESS NETWORKS (Page 89-93)