Spatial Downsizing impact in the Transrating tradeoff for content/context awareness in media network

(1)

Fig. 1: Generic Network Adaptation Framework

Spatial Downsizing impact in the Transrating tradeoff for content/context awareness in media network

Willy Aubry¹²³, Daniel Négru¹, Bertrand Le Gal² and Dominique Dallet²

[email protected], [email protected], [email protected], [email protected]

1University of Bordeaux LaBRI Laboratory CNRS UMR 5800, IPB

Talence, France

2University of Bordeaux IMS Laboratory, CNRS UMR 5218, IPB

Talence, France

3Viotech Communications Montigny-le-Bretonneux, France

Abstract— Bitrate control in video content is one of the key features of the emerging content/context aware network.

Within the content/context aware network framework, network devices are able to adapt video content to fit network (bandwidth …) and terminal (screen size …) constraints. But reducing video spatial resolution for video-to-terminal adaptation also impacts the bitrate of the video which is a video-to-network adaptation. In this paper, we propose a method to evaluate the impact of spatial downsizing as a technique to reduce the video bitrate. We study the tradeoff between traditional bitrate adaptation and spatial downsizing through various use case. Each use case is emulated by experiments. Quality evaluation using SSIM and PSNR ratings is performed.

Keywords-component; Video quality, Spatial Downsizing, Bitrate

I. INTRODUCTION

Nowadays, the Internet is widely spread and network consumption is restlessly growing. Network basis are mastered and now research activities focus on content – and context – awareness for next generation network. This new axis of research aims to adapt data streams depending on their contents (text, video, music, etc) and the context (user, terminal, bandwidth, etc). The main point of interest focuses on video contents that are one of the most consumed data over the Internet and the most bandwidth consuming.

Hence, it deeply impacts the global network. In this area two major trends have appeared:

1. Protocols have been created to fetch and stream the best suited stream according to the bandwidth and terminal characteristics [1]-[3];

2. Algorithms have been proposed to adapt video stream to change codec, spatial/temporal resolution and/or bitrate [4] [5].

Those two major trends are merging to deliver the best Quality of Experience (QoE) to the end user. We proposed to address this issue by using a network device as shown in Fig. 1. The main objective of such system is to embed a video adaptation-processing engine in an external device that has network monitoring capabilities. This device will be

able to detect and adapt the video content depending on the user’s context (network load, terminal characteristics, etc.), making it into a content – and context - aware network device. This system offers a video distribution that is seamless for both the consumer and the provider. Indeed, the consumer can access video stream based only on its content without worrying on its own capability to read it and the content provider does not have to take into account context parameters when asked for a video stream. This feature is achieved by embedding video adaptation capabilities in network devices.

For scalability purpose, this platform should be implemented in last hop devices that possess a better and quicker knowledge of the end user context. However, those devices are mainly gateways with network switching responsibilities (such as 3G antennas or home gateways) having low-computation performances. Adding video adaptation capabilities to a network device will lead to new considerations. Indeed, reducing video spatial resolution also reduces its bitrate. Because it reduces the number of pixels per frame, the number of information per second is reduced. By enabling spatial resolution downsizing along with bitrate adaptation in network devices the content – and context – awareness for next generation network will now be facing the challenge of using the optimal balance between each reduction (bitrate and spatial resolution).The aim of content- and context – awareness for next generation network is to achieve the best perceived video quality while fitting the context constraints (network bandwidth, terminal resolution …). We propose a method to evaluate the effect of each reduction in order to find the best tradeoff between the two adaptations. This method will be illustrated by using 2012 International Conference on Telecommunications and Multimedia (TEMU)

(2)

Table 1: Use Cases

Use Case Source Adaptation

1 Resolution,

bitrate none

2 none Resolution,

bitrate

3 Bitrate Resolution

4 Resolution Bitrate

Fig. 2: Adaptation Device in streaming over http framework

Fig 3: Bitrate Adaptation algorithms with low computation complexity that will be

implemented in a real-time video adaptation system¹. In section II, existing literature approaches for content – and context – aware network are presented. Those approaches raise one main issue: “which technique shall be used (and in which order) to optimize bitrate reduction?” A method to answer to this issue is proposed on Section III before being used with some selected techniques to illustrate it. Results of a first evaluation are shown on Section IV.

Conclusions and future works are drawn in Section V.

II. RELATED WORKS

The content – and context – aware network domain lies in proposing or generating different version of the desired video. Each version possesses different set of reduced characteristics from the original video stream which can be classified as:

x Reduced bitrate in order to decrease the network bandwidth usage [6]-[10];

x Reduced resolution according to the video device features in order to minimize its power consumption [11]-[13];

x Modified stream format (i.e. codec) to the one supported by the embedded device [14]- [16].

A. Server/client solution

In a classic server/client framework, solutions have been proposed targeting the protocol layer. Those solutions let the end user choose the video version among those stored on the server. The emerging field of streaming over http ([1]-[3]) lies on the following process. The video is cut in temporal samples. Each temporal sample is used to generate multiple files encoded using various features (frame resolution, bitrates …). Video files are chosen by the client device depending on the different constraints like terminal resolution, network bandwidth, etc. If network condition changes, the client is able to ask video files with different characteristics to change the video quality while playing it.

1 Work supported by the French project ARDMAHN within French National ARPEGE ANR Program http://www.ardmahn.org and by the European project ALICANTE within EU FP7 ICT, under grant agreement n° 248652. http://www.ict-alicante.eu.

The scalable video coding which is defined in the Annex G of the H.264/MPEG-4 AVC standard [17], encodes the video in layers, with a base layer that stores the minimal version of the video and several enhancement layers that add information to the base layer in order to enhance its spatial resolution, its frame rate and/or its bitrate. The client is then allowed to fetch the base layer and the desired enhancement layer to fit its context.

These solutions face two main issues. First and foremost, these standards are emerging and are not well spread among terminals. Secondly, the decision is left to the client but how does the client chooses among samples/layers?

B. Video adaptation empowered network devices

Other researches have been held in a framework using an adaptation device located between the client and the server.

In this framework, the server possesses only one sample of a video and a device is needed to adapt such video according to various constraints. Adaptation devices are able to change video characteristics such as bitrate, frame rate, spatial resolution and codec. Researches focus on lowering the adaptation complexity while keeping the best resulting quality. Proposed algorithms are based on reusing information contained inside the video stream [4][5]:

x Frame rate adaptation is rarely used because the quality of experience is greatly impacted after few frames are dropped.

x Bitrate resolution is one of the most research activities about video adaptation and a lot of propositions have been done to find the best algorithm - raising the quantizer scale [6]-[8], discarding high frequency coefficient [9] in order to improve bandwidth control [10].

(3)

Fig 5: Resize then Bitrate Adaptation Fig 4: Bitrate Adaptation then Resize

x Spatial downscaling, used to adapt the video stream to the terminal (mainly due to screen size correlation), is one of the most complex processes in video adaptation.

This adaptation has to deal with pixels [11], [12], metadata and motion vectors [13].

x Codec conversion [14]-[16] is the last possible adaptation process and is used to change the video codec to the one supported by the terminal. Changing the codec also affects the bitrate but its impact is deeply bound to the employed codec. Although, the terminal supports a restricted list of codec. Adapting the video to the terminal leave the codec choice to a few list of codec that are not always the more efficient one.

Using an adaptation device allows that the video can be optimally adapted to its context but suffers two main drawbacks:

1. Such network is yet to be widely deployed;

2. When does the context require a bitrate reduction? What adaptation shall the device do?

C. Merging both solutions

Both solution complete each other and can be merged (Fig. 2). Then, the adaptation device will play two roles:

1. Converting from one protocol to another (if needed) 2. Adapting the video chunks to deliver optimal video

chunks that may not be on the server.

But still hangs the question ‘what adaptation is needed to fulfill the bandwidth requirement while keeping the highest quality of experience for the end user?’.

III. ADAPTATION TRADEOFF EVALUATION METHOD. In order to properly evaluate the tradeoff between pure bitrate adaptation and spatial resolution downscaling, an enumeration of the different use cases that will occur is necessary.

The need to activate content – and context – awareness in networking is when operating on congested networks. Use cases appear from the three network environments presented above.

A. Definition of the use cases 1) Server/client based network

The server uses a streaming over http technology or proposes video in the H.264-SVC standard. To fit the bandwidth there are two types of video (or layer combination) to be considered: (1) the one in full resolution and (2) the other with reduced resolution. Which one to choose? This use case will be referred as Use Case 1.

2) In the middle of the adaptation device

The video is coming with its original resolution and bitrate and needs to be adapted to a lower bandwidth. Shall the adaptation only focus on bitrate or also involves spatial downscaling? This use case can be translated to the previous one and will be referred now as Use Case 2.

3) Combined solution

In this situation, several use cases could happen depending on what happens on the server. First and foremost there is only sample with different bitrates but with the same resolution on the server (Use Case 3). The adaptation device is in charge of the resolution scaling, whereas bitrate shall be selected on the server.

Secondly, the roles are reversed and the server possesses only video version with different resolutions and the adaptation device is in charge of fitting the bitrate to the bandwidth (Use Case 4), whereas resolution shall be selected on the server.

The four use cases are shown in Table 1. Those three use cases will be evaluated in order to find the best tradeoff between everyone.

B. Evaluation Process 1) Testbench

The first two use cases (1 and 2) tackle the issue of the adaptation order. They also raise the resizing efficiency in a bitrate reduction scenario. Thus, as a reference, bitrate adaptation (without spatial resizing) will be evaluated (Fig 3) and compared to bitrate and spatial resolution adaptation (Fig 4 and Fig 5). The third use case is tested by two adaptation processes: one is performing bitrate adaptation, the second one spatial resolution adaptation (Fig 4). The fourth use case also uses two adaptation processes with, first a spatial

(4)

Table 2: Video Selection

Video Name Characteristics Candidat Portrait; slow motion

CITY Slow traveling

CREW Lots of slow movement Harbour Movement in Background Presentatrice Sill background, slow

motion SIMPSON Various fast movement

SOCCER Various moving object Tennis Fast moving object on still

background

Fig 4: PSNR results

Fig 5: SSIM results downscaling and second a bitrate reduction (Fig 5).

In order to magnify errors from spatial resizing, when the video resolution is downscaled, an up-scaling process is done before evaluating the quality. The up-scaling process emulates the up-scaling step done by the terminal when decoding and displaying the video.

2) Adaptation Process

The adaptation processes have been designed using SystemC language that is a part of the design process of a video adaptation hardware accelerator on an FPGA chip. We implemented a simple bitrate adaptation process that takes a target and increment/decrement the quantizer scale when the bitrate is too high/low. The spatial resizing process has been designed regarding the best tradeoff between high quality and low resource consumption. This design was presented in [18].

3) Video Selection

To avoid bias in the result, the incoming videos have to possess different characteristics. Hence, videos have been selected based on activities such as background (moving or not), number of moving object... Real life conditions are tested with 720x504 video (taken directly from a numeric TV channel). Table 2 lists the tested video and their characteristics.

C. Quality Metrics

In order to evaluate the best process, a quality metric is required. The Peak Signal to Noise Ratio (PSNR) will be used as it is the legacy metrics in this field. But as the PSNR results are not always correlated to the perceived quality we also used the Structural SIMilarity (SSIM) [19] that has been defined for image processing evaluation and is much more correlated to perceived quality.

The evaluation has been developed using OpenCV [20]

and FFmpeg [21] API to open the resulting video and to fetch every frame one by one. The quality evaluation itself have been done using libIQA [22] (Image Quality Assessment) that has been developed with reference to the MATLAB program designed by the creator of the SSIM [19].

IV. EVALUATION RESULTS

PSNR and SSIM results from the three test benches are respectively shown on Fig 6 and Fig 7. For each video, results were following the same behavior and only the average results are shown. Standard deviations for both compression ratio (horizontal axis) and for quality measurement (vertical axis) are drawn using brackets and confirm the generic trend for every video.

PSNR and SSIM results show that reducing spatial resolution achieves around 50-55% bitrate reduction while keeping a better quality than only reducing the bitrate of the original video by 50-55% using quantizer scale manipulation. But manipulating bitrate on an already spatially resized video has a bigger impact on its quality than reducing the bitrate on a full resolution video. But both results lie close in results. SSIM for both solutions lie in 100%-95% for 50 to 70% bitrate reduction which can be considered very close. The PSNR rates higher the downscaling solution in the same area which confirms it.

Reducing the video resolution after reducing its bitrate is not a good solution.

V. CONCLUSION.

In this paper we have presented the content – and context – awareness in future networks. We have tackled the issue of the impact of video resizing in the bitrate adaptation context and proposed a method to evaluate such impact.

(5)

This method first lists all of the use cases possible in the content – and context- awareness paradigm and proposes testbenches that emulates such use case. Then we have used our own algorithms and presented the results obtained.

Results shows that spatial reduction achieves results comparable to bitrate reduction in the 50%-70% bitrate reduction area but quality drops to quickly to consider spatial resizing as a good technique to reduce bitrate for great reduction.

REFERENCES

[1] “ISO/IEC 23009-1:2012, Dynamic adaptive streaming over HTTP (DASH)”

[2] «Smooth Streaming» [online] Available at

http://www.iis.net/download/SmoothStreaming.

[3] «RFC HLS» 2012. [online] Available at : tools.ietf.org/html/draft- pantos-http-live-streaming.

[4] I. Ahmad, X. Wei, Y. S. & Zhang, Y.-Q. “Video Transcoding: An Overview of Various Techniques and Research Issues IEEE Transactions On Multimedia, October2005.

[5] Y. Xin, C-W. Lin and M-T. Sun “Digital Video Transcoding”

Proceedings of the IEEE, Vol. 93, No. 1, January 2005

[6] Z. Lei and N.D. Georganas, "A rate adaptation transcoding scheme for real-time video transmission over wireless channels", Signal processing. Image communication, vol. 18, pp. 641-658, 2003.

[7] P.A.A. Assunçao et al.“A Frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2 Bit Streams” IEEE Trans.on Circuits and Systems for Video Technology, vol 8, n°8, 1998.

[8] M. Lavrentiev and D. Malah, "Transrating of MPEG-2 coded video via requantization with optimal trellis-based DCT coefficients modification". EUSIPCO 2004 September 6-10, 2004, Vienna, Austria

[9] Eleftheriadis, A. & Anastassiou, D. Meeting “Arbitrary QoS Constraints Using Dynamic Rate Shaping of Coded Digital Video”.

In the 5th International Workshop on Network and Operating System Support for Digital Audio and Video, April 1995

[10] A. Leventer and M. Porat, “Towards optimal bit-rate control in video transcoding” ICIP 2003. 14-17 Sept. 2003 On page(s): III - 265-8 vol.2

[11] A. Vetro et al.“Complexity-Quality Analysis of Transcoding Architectures for Reduced Spatial Resolution” IEEE Transaction on Consumer Electronics, vol. 48, no. 3, pp. 515-521, August 2002.

[12] P. Yin et al.“ Drift Compensation for Reduced Spatial Resolution Transcoding” IEEE Transaction on Circuits and Systems for Video Technology, vol. 12, no. 11, pp. 1009-1020, November 2002

[13] B. Shen, I. K. Sethi andB. Vasudev,“Adaptive Motion-Vector Resampling for Compressed Video Downscaling” IEEE Transactions On Circuits And Systems For Video Technology, September 1999.

[14] N. Feamster and S. Wee, "An MPEG-2 to H.263 transcoder".In SPIE Voice, Video and Data Communications Conference, September 1999.

[15] J. Xin, A. Vetro and H. Sun “Converting DCT Coefficients to H.264/AVC”. IEEE Pacific-Rim Conference on Multimedia (PCM), Lecure Notes in Computer Science, Vol. 3332/2004 pp. 939, 2004 [16] H. Kalva, B. Petljanski and B. Furht “Complexity Reduction Tools

for MPEG-2 to H.264 Video Transcoding” WSEAS Transaction on Information Science & Applications, Vol. 2, March 2005, pp 295- 300

[17] Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, “ITU-T Recommendation and international Standard of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC)”

ANNEX G

[18] W. Aubry, B. Le Gal, D. Dallet, S. Desfarges and D. Negru, ”A system approach for reducing power consumption of multimedia

devices with a low QoE impact” in IEEE International Conference on Electronics, Circuits and Systems (ICECS), Dec 2011, pp.5-8 [19] Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, “Image

Quality Assessment: From Error Visibility to Structural Similarity”, IEEE Trans. on image processing, vol. 13, no. 4, April 2004.

[20] Open Computer Vision [online]. Available at : http://opencv.willowgarage.com/wiki/

[21] FFmpeg [online] Available at: http://ffmpeg.org

[22] Image Quality Assessment library [online]. Available at : http://tdistler.com/iqa/