CODING TECHNIQUES FOR BANDWIDTH ADAPTATION

FIGURE 4.4: Comparison on service scalability and overall network utilization when serving multiple clients

4.4 CODING TECHNIQUES FOR BANDWIDTH ADAPTATION

Previous sections have discussed general bandwidth adaptation architectures un-der the assumption that a mechanism would be available to adjust the number of bits transmitted to represent multimedia sources. In this section we provide an overview of coding techniques that can be used in practice to adjust the coding rate of transmitted multimedia sources.

Many criteria can be used to compare different coding techniques. Since their primary goal is to enable representation of the sources at different rate levels, one primary concern is what reproduction quality is achievable at each of those rate levels. Thus, as for all coding techniques, it will be important to know the rate distortion (RD) characteristic of each possible operating point.

In addition, there are other criteria that are specific to bandwidth adaptation scenarios.

First, it will be useful to provide as many rate operating points as possible (i.e., so that fine grain adaptation is possible). Generally speaking, finer grain in the adaptation will come at the cost of increases in achievable distortion for a given rate.

Second, some coding techniques will only allow adaptation to take place at the encoder, while others will enable adaptation anywhere in the network. The latter model will typically also lead to some RD inefficiency.

Finally, adaptation granularity can be evaluated not only in terms of achievable rate points, but also in terms of temporal constraints. In some applications it may be desirable to adjust the rate of individual temporal components (e.g., frames in a video sequence), which again may come at the cost of reduced RD performance.

4.4.1 Rate Control

Rate control techniques are used during the encoding process. They rely on ad-justing multiple coding parameters to meet a target encoding rate. We focus here on rate control techniques for video, as in both audio and speech coding variable bit rate encoding techniques (which tend to lead to more challenging rate control) are not as popular.

In the case of video, when the same coding parameters (e.g., quantization step size, prediction mode) are used throughout a video session, the number of bits per frame will change depending on the video content so that the output bit rate will vary from frame to frame. Thus, when video content is “easy” to encode (e.g., low motion and low complexity scenes) and a given quantization selection is chosen, the rate will tend to be lower than if the same combination of quantizers was used for a more complex scene. Even though the encoder and decoder buffers can help smooth the (short term) variations in the rate per frame, a rate-control algorithm is usually needed in order to allocate bits among all coding units (e.g., frame, macroblock, or others) to maximize the end quality subject to the rate constraint.

All major video coding standards provide mechanisms for flexible coding para-meter selection, with the chosen parapara-meters being communicated to the decoder as overhead. To illustrate the key concepts, here we concentrate on a hybrid video coding structure, which is an essential component of all major standards, and in particular on one based on block-based motion-compensated prediction and Dis-crete Cosine Transform (DCT) coding. In such a framework, a frame is divided into a number of macroblocks (MB), each containing a luminance block (of size 16×16) and two chrominance blocks (e.g., 8×8 Cb and 8×8 Cr).

A series of coding decisions have to be made in compressing each frame:

1. Type of frame (e.g., I-, P-, or B-frame) to be chosen or whether the frame is to be skipped, that is, not encoded at all.

2. Mode to be used for each MB, for example, Intra, Inter, Skip, etc.

3. If an MB is coded in INTRA mode,

(a) What quantization step size (QP) should be used to code the DCT coefficients of each block?

(b) If intra prediction is allowed, for example, in H.264, how to perform intra prediction; that is, how to generate the reference block from the neighboring blocks in the same frame.

4. If an MB is coded in INTER mode,

(a) What motion compensation should be used, for example, with or without overlapping, reference frame selection, search range, and block size?

(b) How to code the residual frame, for example, which QP should be chosen?

The options just listed are by no means exhaustive; they are intended to serve as an illustration of the range of coding mode choices available in modern video coders. Note that as the number of possible modes increases so does the com-plexity of the encoding process and the importance of selecting efficient rate con-trol algorithms. In fact, one can attribute much of the substantial coding gains achieved by recent standards, such as H.264/MPEG-4 part 10 AVC [2], to the

ad-dition of several new coding modes combined with efficient mode decision tools based on RD criteria.

A very common approach to rate control is to modify the QP [29,65]. A large QP can reduce the number of encoded bits at the expense of an increased quanti-zation error, and vice versa. However, changing QP only while keeping the other coding modes constant may not achieve the optimal performance. For example, coding in INTER mode is effective in most cases when changes in video con-tent are due to the motion of objects in the scene. Instead, INTRA mode may be more appropriate in situations when there is a significant difference between coded and reference images, such as uncovered regions (part of the scene is un-covered by a moving object) or lighting changes. However, the optimal selection of INTER/INTRA coding for a given block may in fact be different at different QPs. More general rate-control algorithms should optimize different coding pa-rameters as well, such as frame rate, coding modes for each frame and MB, and motion estimation methods [13,24,76].

Each combination of these coding parameters results in a different trade-off between rate and distortion. Thus efficient parameter settings will be those that are chosen based on rate–distortion optimized techniques. The typical problem formulation seeks to select the coding parameters that minimize the distortion under constraints on the rate (usually the average bit rate over a short interval).

Many solutions have been proposed, with some based on heuristic approaches and others following well-known techniques such as Lagrangian optimization or dynamic programming. More details on this topic can be found in [53,65] and references therein.

The computation involved in the optimization approach mainly includes two parts: (1) collection of rate–distortion data, which may require to actually code the source with all different parameter settings, and (2) the optimization algorithm itself. Both parts can be computationally intensive but often the data collection it-self represents the bulk of the complexity, which has led to the development of numerous approaches to model the R–D characteristics of multimedia data [20, 27,28,43]. Two main types of modeling approaches have been reviewed in [28].

One class of techniques [27] involves defining models for both the coding system and the source so that R–D functions can be estimated before actually compress-ing the source. The modelcompress-ing accuracy depends on the robustness of the R–D model to handle different source characteristics. The second class of techniques requires actually coding the source several times and then processing the observed R–D data to obtain a complete R–D curve. Examples include the estimation algo-rithms proposed in [20,43]. These approaches are usually more computationally intensive, as well as more accurate, since they estimate the parameters from the actual coding results of the corresponding source.

In summary, the choice of an appropriate rate-control algorithm depends on the multimedia application, especially on whether it is delay constrained. For

in-stance, a complicated approach can be used for off-line coding. However, heuristic approaches may be more practical for online live multimedia communications.

4.4.2 Transcoding

The term “media transcoding” is normally used to describe techniques where a compressed media bit stream format is converted into format. It is often used at either the server or the proxy when the source is only available as a pre-encoded stream so as to match limitations in transmission, storage, processing, or display capabilities of specific network, terminals, or display devices. Transcoding is one of the key technologies for end-to-end compatibility of two or more different net-works or systems operating with different characteristics and constraints.

Because the transcoder takes as an input a compressed media stream, the de-coded quality of the transcoder output is limited by the input stream, which has certain information loss compared to the original source. However, the transcoder has access to all the coding parameters and statistics, which can be easily ex-tracted from the input stream. This information can be used not only to reduce the transcoding complexity, but also to improve the quality of the transcoded stream using a rate–distortion optimization algorithm.

A typical application of transcoding is to adapt the bit rate of a precompressed video stream to a reduced channel bandwidth. Clearly, we can first reconstruct video back to the pixel domain by decoding the input compressed bit stream and then re-encode the decoded video to meet the target bit rate. The rate control techniques described earlier can then be used at the encoding stage. However, the whole process (decoding and encoding) is very computationally expensive, and more efficient techniques have been developed that reuse information contained in the original input bit stream.

The main drawback of these more efficient transcoding techniques is the drift problem (which will also arise in some of the other coding techniques intro-duced in this chapter). Drift is created if the reference frame used for motion compensation at the encoder is different from that used at the decoder. This hap-pens, for example, when the transcoder simply requantizes the residual DCT coefficients with a larger QP to reduce the output bit rate. When a decoder re-ceives the transcoded bit stream, it reconstructs the frame at a reduced quality and stores it into the frame buffer. If this frame is used as prediction for fu-ture frames, the mismatch error is added to the residual of the predicted frame, leading to a degraded quality for all the following frames until the next I frame.

Based on the traoff between complexity and coding quality, we briefly de-scribe two basic transcoding architectures, namely, open-loop and closed-loop transcoders.

Figure 4.5a shows an open-loop architecture based on a requantization ap-proach [51]. The bit stream is dequantized and requantized to match the bit rate

(Q₁)^-1

VLD Q₂ VLC

(a)

(Q₁)^-1

VLD +

(Q2)^-1 + IDCT +

Frame +

Memory Motion

Compensation DCT

(b)

Q₂ VLC

Dans le document MULTIMEDIA OVER IP AND WIRELESS NETWORKS (Page 119-123)