• Aucun résultat trouvé

Interactive Error Control

Dans le document MULTIMEDIA OVER IP AND WIRELESS NETWORKS (Page 66-72)

FIGURE 2.20: H.264/AVC video encoder with selectable encoding pa- pa-rameters highlighted

2.5.4 Interactive Error Control

The availability of a feedback channel, especially for conversational applications, has led to different standardization and research activities in recent years to in-clude this feedback in the video encoding process. Assume that, in contrast to the previous scenario where only the statistics of the channel processCˆare known to the encoder, in the case of timely feedback we can even assume that aδ-frame delayed versionCtδ of the loss process experienced at the receiver is known at the encoder. This characteristic can be conveyed from the decoder to the encoder by sending acknowledgment for correctly received data units, negative acknowl-edgment messages for missing slices, or both types of messages.

In less time-critical applications, such as streaming or downloading, the en-coder could obviously decide to retransmit lost data units in case it has stored a backup of the data unit at the transmitter. However, in low-delay applications the retransmitted data units, especially in end-to-end connections, would in general arrive too late to be useful at the decoder. In case of online encoding, the observed and possibly delayed receiver channel realization,Ctδ, can still be useful to the encoder, although the erroneous frame has already been decoded and concealed at the decoder. The basic goal of these approaches is to reduce, limit, or even com-pletely avoid error propagation by integrating the decoder state information into the encoding process.

The exploitation of the observed channel at the encoder has been introduced in [41] and [12] under the acronym Error Tracking for standards such as MPEG-2, H.261 or H.263 version 1, but has been limited by the reduced syntax capa-bilities of these video standards. When receiving the information that a certain data unit—typically including the coded representation of several or all MBs of a certain frame stδ—has not been received correctly at the decoder, the encoder attempts to track the error to obtain an estimate of the decoded framesˆt1 serv-ing as reference for the frame to be encoded, st. Appropriate actions after having tracked the error are discussed in [12,41,53,56,61]. However, all these concepts have in common that error propagation in framesˆt is only removed if frames ˆ

stδ+1, . . . ,ˆst1have been received at the decoder without any error.

Nevertheless, this promising performance when exploiting decoder state infor-mation at the encoder has been recognized by standardization bodies, and the problem of continuing error propagation has been addressed by extending the syntax of existing standards. In MPEG-4 [17, version 2] a tool to stop temporal error propagation has been introduced under the acronym New Prediction (NEW-PRED) [10,27,50]. Similarly, in H.263+ Annex N [18, Annex N] RPS for each Group-of-Blocks (GOB) is specified. If combined with slice structured mode as specified in H.263+ Annex K [18, Annex K], as well as Independent Segment Decoding (ISD) as specified in H.263+ Annex R [18, Annex R], the same NEW-PRED techniques can be applied within the H.263 codec family.

NEWPRED relies on the availability of timely feedback, online encoding, and the possibility that the encoder can choose other reference frames than the tempo-rally preceding ones. In addition, it allows one to completely eliminate error prop-agation in framesˆt even if additional errors have occurred for the transmission of framesˆstδ+1, . . . ,ˆst1. Different encoder operation modes have been discussed in the literature [10], which can basically be distinguished in a mode where only acknowledged areas are used for reference and another mode, in which the opera-tion is only altered when informaopera-tion is received that the decoder is missing some data units.

In H.263++ Annex U [18, Annex U], NEWPRED was introduced exclusively for the purpose of improving error resilience. In H.264/AVC, the extended syn-tax allowing selection of reference frames on an MB or even sub-MB basis has a dual impact: enhanced compression efficiency and, at the same time, ease of in-corporating methods for limiting error propagation [61]. We will in the following introduce conceptual operation modes when combining decoder state information in the encoding process.

Therefore, we assume that at the encoder each generated data unitPi is as-signed a decoder stateCenc,i ∈ {ACK,NAK,OAK}, whereby Cenc,i =ACK re-flects that data unitPi is known to be correctly received at the decoder,Cenc,i= NAK reflects that data unit Pi is known to be missing at the decoder, and

Cenc,i=OAK reflects that for data unit Pi the acknowledgment message is still outstanding and it is not known whether this data unit will be received correctly.

With feedback messages conveying the observed channel state at the receiver, that is,B(Ct)=Ct, and a back channel that delays the back channel messages by δ frames, we assume in the remainder that for the encoding ofst, the encoder is aware of the following information:

This information about the decoder stateCenc,i can be integrated in a modified rate–distortion optimized operational encoder control similar to what has been discussed in Subsection 2.5.2. In this case the MB mode mb is selected from a modified set of options, O, with a modified distortionˆ dˆb,m for each selected optionmas

b mb=arg min

m∈ ˆO(dˆb,m+λOˆrb,m). (2.9) In the following we distinguish four different operation modes, which differ only by the set of coding options available to the encoder in the encoding process,Oˆ, as well as the applied distortion metric,dˆb,m. The encoder’s reaction to delayed positive acknowledgment (ACK) and negative acknowledgment (NAK) messages is shown in Figure 2.21, assuming that framed is lost and the feedback delay is δ=2 frames for three different feedback modes.

Feedback Mode 1: Acknowledged Reference Area Only

Figure 2.21a shows this operation mode: Only the decoded representation of data units Pi that have been positively acknowledged at the encoder, that is, Cenc,i=ACK, are allowed to be referenced in the encoding process. In the con-text of operational encoder control, this is formalized by applying the encoding distortion in (2.9), that is, dˆb,m=db,m, as well as the set of encoding options that is restricted to acknowledged areas only, that is,Oˆ=OACK,t. Note that the restricted option setOACK,t depends on the frame to be encoded and is applied to the motion estimation and reference frame selection process. Obviously, if no reference area is available, the option set is restricted to intra modes only, or if no satisfying match is found in the accessible reference area, intra coding is applied.

With this mode in use, an error might still be visible in the presentation of a single frame; however, error propagation and reference frame mismatch are completely avoided.

(a)

(b)

(c)

FIGURE 2.21:

Operation of different feedback modes. (a) Feedback Mode 1. (b) Feedback Mode 2. (c) Feedback Mode 3.

Figure 2.22a shows the performance in terms of average Peak Signal-to-Noise Ratio (PSNR), denoted by PSNR, for feedback mode 1 with different feedback de-laysδcompared to the channel-adaptive mode selection scheme for foreman, error pattern 10 (as given in test conditions specified in [58]), AEC, andNMB/DU=33.

The number of reference frames isNref=5, except forδ=8 withNref=10. The results show that for any delay this system with feedback outperforms the best system without any feedback. For small delays, the gains are significant and for the same average PSNR the bit rate is less than 50% compared to the forward-only mode. With increasing delay the gains are reduced, but compared with the

(a) 40 60 80 100 120 140 160 180 200 220

FIGURE 2.22:

Average PSNR (PSNR) versus bit rate for different

feed-back modes for sequence foreman. (a) Average PSNR (PSNR) versus bit

rate for Feedback Mode 1. (b) Average PSNR (PSNR) versus bit rate for

Feedback Mode 2 (solid lines), Feedback mode 1 replotted for

compari-son (dashed lines). (c) Average PSNR (PSNR) versus bit rate for Feedback

Mode 3 (solid lines), Feedback mode 2 replotted for comparison (dashed

lines).

highly complex mode decision without feedback, this method is still very attrac-tive. Obviously, these high delay results are strongly sequence dependent but for other sequences similar results have been verified.

Feedback Mode 2: Synchronized Reference Frames

Feedback mode 2 as shown in Figure 2.21b differs from mode 1 in that not only positively acknowledged data units but also a concealed version of data units with decoder stateCenc,i=NAK are allowed to be referenced. This is formalized by applying the encoding distortion in (2.9), that is,dˆb,m=db,m, but the restricted reference area and the option set in this case also include concealed image parts, Oˆ =ONAK,tOACK,t. The critical aspect when operating in this mode results from the fact that for the reference frames to be synchronized the encoder must apply exactly the same error concealment as the decoder.

Figure 2.22b shows the performance in terms of average PSNR, denoted as PSNR, for feedback mode 2 with different feedback delays δ compared to the curves in Figure 2.22a for the same parameters. The results for feedback mode 2 show similar results as for feedback mode 1. However, the advantage of feedback mode 2 can be seen in two cases: for low bit rates and for delaysδ < Nref−1.

This is so because referencing concealed areas is preferred over intra coding by the rate–distortion optimization. For higher bit rates this advantage vanishes as the intra mode is preferred anyways over the selection of “bad” reference areas.

For delayδ=4 withNref=5, that is, only a single reference frame is available at the encoder, the gains of feedback mode 2 are more obvious, since for feed-back mode 1, in case of a lost slice, the encoder basically is forced to use intra coding.

Feedback Mode 3: Regular Prediction with Limited Error Propagation

Feedback modes 1 and 2 are mainly suitable in cases of higher loss rates. If the loss rates are low or negligible, the performance is significantly degraded by the longer prediction chains due to the feedback delay. Therefore, in feedback mode 3 as shown in Figure 2.21c it is proposed to only alter the prediction in the encoder in case of the reception of a NAK. Again, the encoding distortion in (2.9) is ap-plied, that is,dˆb,m=db,m, but the reference area and the option set in this case are altered only in case of receiving a NAK to already acknowledged image parts, that is,Oˆ =OACK,t, or, as applied in our case to acknowledged and concealed image parts,Oˆ=ONAK,t. Areas that are possibly corrupted by error propagation are also excluded as references. This mode obviously performs well in cases of lower error rates. However, for higher error rates error propagation still occurs quite frequently.

Figure 2.22c shows the performance in terms of average PSNR for feedback mode 3 compared to channel-adaptive mode selection and feedback mode 2, again for the same parameters as in Figure 2.22a. Note that feedback mode 2 and feed-back mode 3 are identical for zero feedfeed-back delay. However, surprisingly, for increasing delay, feedback mode 3 performs significantly worse than feedback mode 2. The error propagation, though only present for at maximumδ−1 frames, degrades the overall quality much more significantly; the gain in compression ef-ficiency cannot compensate the distortion due to packet losses. Obviously, the performance depends on the sequence characteristics and especially on the loss rate. For lower rates it is expected (and shown later) that the differences between feedback modes 2 and 3 are less significant, but in general feedback mode 2 is also preferable over feedback mode 3 from the subjective performance.

Feedback Mode 4: Unrestricted Reference Areas with Expected Distortion Update For completeness we present an even more powerful feedback mode, which ex-tends feedback mode 3 to address error propagation with more intra updates. We also discuss its drawbacks and justify why it is hardly used. In [61] and [67] tech-niques have been proposed that combine the error-resilient mode selection with available decoder state information in the encoder. In this case the set of encoding options is not altered, that is,Oˆ=O, but only the computation of the distortion is altered. Only for all data units with outstanding acknowledgment at the encoder, that is, Cenc,i=OAK, is the randomness of the observed channel state consid-ered; for all other data units the observed channel state is no longer random. The expected distortion in this case is computed as

dˆb,m=

Compared to feedback modes 1 and 2, this method is especially beneficial if the feedback is significantly delayed. Compared to feedback mode 3, it reduces the unsatisfying performance in case of error propagation. Note that forδ→ ∞this mode turns into the mode selection without any feedback at all, and forδ=0 this mode is identical to feedback mode 2 and feedback mode 3. However, whenever the encoder gets information on the state of a certain data unit at the decoder, the statistics in the encoder have to be recomputed. Thus, the computational, storage, and implementation complexities are significantly increased [67].

Dans le document MULTIMEDIA OVER IP AND WIRELESS NETWORKS (Page 66-72)