Methods for Improving the Tone Mapping for Backward Compatible High Dynamic Range Image and Video Coding

(1)

HAL Id: hal-01433782

https://hal.archives-ouvertes.fr/hal-01433782

Submitted on 10 Jan 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Methods for Improving the Tone Mapping for Backward Compatible High Dynamic Range Image and Video

Coding

Alper Koz, Frederic Dufaux

To cite this version:

Alper Koz, Frederic Dufaux. Methods for Improving the Tone Mapping for Backward Compatible High Dynamic Range Image and Video Coding. Signal Processing: Image Communication, Elsevier, 2014, 29 (2), pp.274-292. �hal-01433782�

(2)

Methods for Improving the Tone Mapping for Backward Compatible High Dynamic Range

Image and Video Coding

Alper Koz and Frederic Dufaux

Multimedia Group, Signal and Image Processing Department (TSI), Télécom ParisTech, 75014 Paris, France.

e-mail: {alper.koz, frederic.dufaux}@telecom-paristech.fr Abstract

Backward compatibility for high dynamic range image and video compression forms one of the essential requirements in the transition phase from low dynamic range (LDR) displays to high dynamic range (HDR) displays. In a recent work [1], the problems of tone mapping and HDR video coding are originally fused together in the same mathematical framework, and an optimized solution for tone mapping is achieved in terms of the mean square error (MSE) of the logarithm of luminance values. In this paper, we improve this pioneer study in three aspects by considering its three shortcomings. First, the proposed method [1] works over the logarithms of luminance values which are not uniform with respect to Human Visual System (HVS) sensitivity. We propose to use the perceptually uniform luminance values as an alternative for the optimization of tone mapping curve. Second, the proposed method [1] does not take the quality of the resulting tone mapped images into account during the formulation in contrary to the main goal of tone mapping research. We include the LDR image quality as a constraint to the optimization problem and develop a generic methodology to compromise the trade-off between HDR and LDR image qualities for coding. Third, the proposed method [1] simply applies a low-pass filter to the generated tone curves for video frames to avoid flickering during the adaptation of the method to the video. We instead include an HVS based flickering constraint to the optimization and derive a methodology to compromise the trade-off between the rate-distortion performance and flickering distortion. The superiority of the proposed methodologies is verified with experiments on HDR images and video sequences.

Index Terms— High Dynamic Range Video Compression, Tone Mapping, Perceptually Uniform Luminance Values, Backward Compatibility, H.264/AVC, Constrained Optimization, Flickering

(3)

1. INTRODUCTION

Video content represented by eight bits per pixel has been widely accepted in many applications such as internet/video streaming, DVD, HDTV, camcorders, personal computers, televisions and other consumer electronics products. The reasons of such a wide acceptance of eight-bit representation can be argued as the compact representation of one pixel value as a byte for the storage in memory chips and the sufficiency of 256 levels to cover the luminance range of commonly used low dynamic range (LDR) displays (0.1-80 cd/m²) [2]-[5]. However, image capturing and display technologies have improved to span a wider dynamic range and true color representations in recent years. LCD and plasma displays with a peak luminance of 500-1000 cd/m² have dominated the market. HDR prototype displays are now available with a dynamic range of 1,000,000:1 and a peak luminance of 4000 cd/m² [6]. The cinema industries have also begun to capture movies as digital images, rather than film, by using modern digital cameras that have extended dynamic range feature [7]. Even more, the smart phones, which have reached 55 % of usage penetration in EU5 countries (UK, DE, FR, ES, and IT) by October 2012 [8], now possess HDR photo utilities to generate more realistic views. Accordingly, standard eight-bit representation has become insufficient for these new technologies and related applications in digital cinema, medical imaging, and post-production.

HDR image and video formats are proposed to overcome the limitations of standard eight-bit representation. The main goal with these formats is to encode, transmit, and display the accurate physical luminance values (cd/m²) in a real-world scene rather than the eight-bit intensity values and hence to generate a scene referred image independent of the display technology [1]-[5]. The proposed formats for such a goal obviously require higher bit depths than standard eight-bits when the entire luminance range of natural scenes ranging from extreme darkness (10^-6 cd/ m²) to bright sunshine (10⁸ cd/ m²) is considered. Current formats such as Radiance RGBE (.hdr), OpenEXR (.exr), and LogLuv TIFF (.tiff) use 16-bit floating point, 32-bit floating point or 32-bit integer pixels [1].

(4)

A major challenge for such a high-bit depth representation is an efficient compression, in particular for HDR video which requires significantly more storage size and transmission bandwidth than standard 8-bit LDR video. The size of the HDR video captured by the first HDR camera demonstrated at the annual ACM SIGGRAPH conference in 2009 was reported as 42 GB for a footage of one minute, compared with just 9 GB for its LDR counterpart [2]. The previous work has mainly handled this challenge in two ways (Figure 1). The first approach [9]-[11] is to take the advantage of high profile of existing state-of-the-art H.264/AVC (Advanced Video Coding) codec, which can support the bit-depths more than 8 bits up to 14 bits. These methods first convert the real luminance values (cd/m²) of the HDR video pixels into a bit-depth between [9-14] bits and apply the H.264/AVC high profile encoder. After the decoding and reconversion to luminance values (cd/m²), the reconstructed HDR video is shown on a HDR display, or on a LDR display by performing a real-time tone mapping operation, as illustrated in Figure 1- a.

The second approach [1], [5], [12]-[13] aims to provide the backward compatibility with existing standard 8-bit displays in order to enable a successful transition to HDR technology (Figure 1-b). The methods in this approach first map the HDR video into 8-bit to produce an LDR video. The LDR video goes through the video encoding and decoding process by using a standard 8-bit (e.g. H.264/AVC) encoder. Then, the high dynamic range video is reconstructed from LDR video by applying a real-time inverse tone mapping operation. The parameters regarding the tone mapping and inverse tone mapping can be sent as a look-up table [1] or can be encoded using supplemental enhancement information [15].

The residue of the reconstructed HDR video can also be further encoded as an enhancement layer in the bit-stream. As this approach includes the LDR stream separately in the resulting bit-stream, it provides a direct compatibility to LDR displays. However, it requires real-time inverse-tone mapping operation to show the video content on HDR displays.

(5)

(a)

(b)

Figure 1: A general scheme for the two approaches in HDR video compression, (a) using high profile H.264/AVC codec, and (b) using 8-bit H.264/AVC codec for backward compatibility to LDR

displays.

In a recent work [1], the optimum tone mapping for backward compatible HDR video compression which minimizes the mean square error between the logarithm of the luminance values of the original and reconstructed HDR frames (HDR MSE) (Figure 1-b) is formulated in a detailed and complete manner. In order to find a computationally efficient solution, Mai et al. [1] have estimated the distortion in the chain of tone-mapping, encoding, decoding, and inverse-tone mapping with a statistical distortion model and found a closed-form solution for tone mapping based on the luminance histogram of the HDR video frames. The pioneer aspects of this performed research were to consider the tone mapping and HDR

(6)

video coding problems jointly in the same mathematical framework and to derive a simple closed-form solution. However, it also includes the following shortcomings on the downside.

First, the proposed method by Mai et al. [1] works on the logarithm of actual physical luminance values (cd/m²) in accordance with the classical Weber law to produce uniform values with respect to the HVS sensitivity. However, Aydin et al. in [16] has indicated that the luminance sensitivities of HVS cannot be just modeled with the Weber law for the whole luminance range and has instead developed a perceptual mapping as an alternative to the logarithm operation to convert the luminance values into perceptually uniform encoded values. Second, the proposed approach [1] does not take the quality of the resulting LDR images into account during the formulation of the optimization. Such an approach contradicts with one of the main goals of tone mapping research, which aims to produce more pleasant LDR views for a given HDR image. Consequently, the proposed method [1] can produce some non- natural over illumination in the bright regions of the produced LDR images, although it minimizes the HDR-MSE for the reconstructed HDR images. Although the authors extend this pioneer study in [17] by means of considering the LDR image quality and the bitrates of LDR layer and residue coding, they do not provide a practical methodology and a closed form solution to set the trade-off between HDR and LDR image qualities. Third, the adaptation of the proposed tone mapping [1] to video sequences is performed by applying a low-pass filter to the generated tone curves for video frames. Such an approach does not consider any HVS measure for the flickering kind of distortions which emerges as sudden changes in the average brightness of the consecutive frames, when a high luminance object or region enters or leaves the camera view [18]. It is therefore not an optimum solution giving the best rate distortion performance which avoids the flickering distortions at the same time.

In this paper, we improve the optimized tone mapping for backward compatible HDR image and video compression in three aspects by considering the mentioned three shortcomings. Our contribution is three-folded:

• First, we propose to use perceptually uniform encoded values rather than the logarithm of the absolute luminance values for the optimization of tone mapping curve. We verify that the

(7)

proposed approach gives better results in terms of the perceptual metric, perceptually uniform peak signal to noise ratio (PU-PSNR) [16], and produces more realistic LDR images.

• Second, we include the LDR image quality as a constraint to optimized tone mapping problem.

We reformulate the problem with a more general description, namely, finding the optimum tone mapping giving the best rate-distortion performance for HDR images while keeping the quality of produced LDR images in a desired level. The quality of the produced LDR image is determined with respect to the MSE distance to a reference LDR image, which is produced by a reference global tone mapping operator (TMO). With such a constrained formulation, we develop a generic methodology to compromise the trade-off between the HDR image quality and LDR image quality for HDR video coding.

• Third, we include a flickering constraint to the optimized tone mapping problem for the adaptation of the method to the video. We use a perceptual measure based on the classical Weber law [19] developed in [18] to control the level of flickering between the average intensities of consecutive frames. By using such a measure as a constraint, we derive a methodology to compromise the trade-off between the rate-distortion performance and flickering distortion.

The next section gives an overview of the HDR video coding methods. We present the optimized tone mapping for backward compatible HDR video coding developed in [1] in more detail as it forms a base for our proposed methodologies. Section 3 first describes the perceptually uniform (PU) encoding of luminance values [16] and then presents the proposed optimized tone mapping based on PU values.

Section 4 presents the tone mapping optimization problem with the LDR image quality constraint. Section 5 follows with the tone mapping optimization problem with the flickering constraint. The experimental results and comparisons for the proposed methods are presented in Section 6. Finally, we draw our conclusions in Section 7.

(8)

2. HDR VIDEO COMPRESSION CLASSIFICATION

The methods on HDR video compression can be classified in two categories as the methods for native encoding of HDR video by using high profile H.264/AVC encoder, and the methods for backward compatible encoding of HDR video by using standard 8-bit encoders.

A. Methods for native encoding of HDR video using high profile H.264/AVC encoder and decoder

The core part of the methods in this approach [9]-[11] is to convert the real luminance values of HDR video pixels into the bit-depths that the high profile can support. The pioneer work by Thoma et al. [9]

performs this conversion by applying the modified versions of LogLuv transform [20], which maps the luminance range of each HDR frame to the range [0, 2ⁿ-1] with respect to the maximum and minimum luminance values of the frame, where n is the bit-depth of the resulting video.

The LogLuv transform [20] is originally proposed by Larson to provide a compact encoding suitable for the transfer, manipulation, and storage of high dynamic range color images. This format is a replacement for conventional RGB images and encodes color pixels as log luminance values and chromaticity components (u, v) [20] as follows:

















































=

B G R

Z Y X

864 . 0 113 . 0 023 . 0

066 . 0 678 . 0 256 . 0

164 . 0 339 . 0 497 . 0

(1)

Z Y X x X

+

= + (2)

Z Y X y Y

+

= + (3)



 





+ +

= −

3 12 2 410 4

y x

u_e x (4)



 





+ +

= −

3 12 2 410 4

y x

v_e y (5)



256(log₂ +64)



= Y

L (6).

(9)

Such a transform maps the real luminance values (Y) in the interval [5.44x10^-20, 1.84x10¹⁹] to 15 bit integer luma values (L) in the range [0, 2¹⁵-1] and covers over 38 orders of magnitude in 0.27 % relative step size. As this step size is much below the visible quantization threshold of 1 % [19], Thoma et al. [9]

modify the LogLuv mapping to exploit the full range of luma code values for a given bit depth as:



 



 − −

= (log ( ) log ( ))

) / ( log

1

410 2 ₂ ₂ _min

min max 2

Y Y

Y Y L

n

n (7)

) 2( log 1 )

2

) / ( )(log 5 . 0 (

2 ^min

min max

2 Y Y Y

L

Y ⁿ

n +

+ −

= (8)

where Y_max and Y_min are the maximum and minimum luminance values of a HDR video frame. The transformation is applied to each frame of the HDR video, and Y_max and Y_minfor each frame are also transferred with the bit-stream to perform the reconversion (Figure 1-a).

In the extended version of this work [10], temporal motion compensation problems generated by individual mapping of frames is solved by the weighted prediction (WP) tool of H.264/AVC. In a similar work [11], such a mapping is enhanced by smoothing the HDR frames with respect to the Human Visual System (HVS) sensitivities in different wavelet resolutions. In [14], Mantiuk et al. have perceptually optimized the luminance quantization levels for the contrast threshold perception of HVS. The optimized quantization scheme can support 10-11 bits to encode 12 orders of magnitude of visible luminance range without causing any visible distortion. The advantage of these methods was mostly their conceptual simplicity. On the down side, they do not support the backward compatibility with 8-bit displays which is required for a smooth transition to HDR technology.

B. Methods for backward compatible encoding of HDR video

The methods in this approach [1], [5], [12]-[13] are proposed to provide the backward compatibility with existing standard 8-bit displays. The basic strategy for such an aim is to include an inter-layer prediction stage in order to estimate the HDR video from LDR video. The residue of the estimation and the prediction coefficients are then encoded and written to the output bit-stream with the independently

(10)

encoded LDR video. In [12], a block wise approach is proposed for interlayer prediction. A linear multiplication and an addition of offset are applied to each block of LDR video frame to estimate the corresponding block of HDR video frame. This method is improved in [5] by differentially encoding the offsets with respect to the previously encoded left and upper blocks.

Mantiuk et al. [13] has used a frame-wise reconstruction function instead of a block-wise prediction.

Their approach first applies a color transformation [14] to both the LDR and corresponding HDR frames to end up with a compatible domain where two contents are comparable. Then, they apply a reconstruction function, which is based on the statistics (i.e. arithmetic mean, median or midrange) of the HDR pixel values corresponding to each LDR pixel value, to the LDR frame. The invisible noise in the residual frames is also filtered after the prediction. Some other similar approaches [21] are also proposed for backward compatible compression of HDR textures to enable real-time applications on existing GPU- hardwares.

In a recent work [1], tone mapping and inverse tone mapping are considered jointly as the parts of coding and decoding processes. By statistically modeling the error in the chain of tone-mapping, encoding, decoding, and inverse-tone mapping (Figure 1-b), Mai et al. [1] has achieved to find the optimum tone mapping and inverse tone mapping which minimizes the mean square error between the logarithms of the luminance values of the original and reconstructed HDR frames (HDR-MSE). As this method forms a base for our proposed improvements, we would give its description in detail.

Optimum Tone Mapping for backward compatible HDR video compression

Given the notations l and v, corresponding to the logarithm of the luminance of the HDR frame and the pixel values of tone mapped LDR version respectively, tone mapping curve in [1] is first parameterized as a piece-wise linear function with the nodes (lk, vk) as shown in Figure 2. Each segment k between two nodes (lk, vk) and (lk+1, vk+1) has a constant width in HDR values equal to δ (selected as 0.1).

The tone mapping operation is then characterized by a set of slopes

(11)

Figure 2: Parameterization of the TM curve in Mai et al. [1]. The bar-plot is the luminance histogram of the HDR frame.

δ

k k k

v

s v −

= ⁺¹ (9),

which forms a vector of tone-mapping parameters θ. Using such a parameterization and the statistical modeling of the H.264/AVC coding error, the optimized tone mapping problem is given as

∑

= N

k k

k

N s

p s

s₁...min ₁ ² arg

∑

=

N

k k

s v

1

subject to max

δ (10),

where pk is the summation of the normalized histogram of luminance values for the k’th bin, N is the total number of bins in the histogram, and vmax is the maximum LDR value [1]. By computing the first order Karush-Kahn-Tucker (KKT) optimality conditions of corresponding Lagrangian, their ultimate closed form solution is derived as

∑

=

= _N

k k k k

p p s v

1 3 / 1

3 / 1 max

. . δ

(11).

A tone mapping characterized by the slopes in (11) minimizes the MSE between the luminance values of the original and reconstructed HDR frames (Figure 1-b).

(12)

3. OPTIMUM TONE MAPPING WITH PERCEPTUALLY UNIFORM ENCODING

The proposed tone mapping method by Mai et al. [1] operates on the logarithm of luminance values to comply with the sensitivity of HVS to different luminance levels in accordance with Weber law [19].

However, Weber law is not an exact model of HVS contrast sensitivity for the whole luminance range [16]. Therefore, such an optimization procedure is not optimum in the perceptual sense although it minimizes the mean square error between the logarithms of the luminance values of the original and reconstructed HDR frames. By considering this fact, we propose to use the perceptually uniform (PU) encoded luminance values developed in [16] instead of the logarithm operation, and to minimize the mean square error between the PU values to maximize the perceptual metric, PU-PSNR. We first briefly summarize the PU encoding in the next subsection and then present the proposed tone mapping method.

A. Perceptually Uniform Encoding of Luminance Values

Commonly used quality metrics such as PSNR and SSIM assume that pixel intensities are perceptually uniform regardless of their luminance values. Although such an assumption applies well for CRT displays with the luminance range of 0.1-80 cd/m², it does not exactly fit to much brighter LDR (500 cd/m²), plasma (1000 cd/m²) and HDR (4000 cd/m²) displays as the distortions in image pixels become more visible with higher luminance. Aydin et al. [16] has therefore proposed a mapping which converts luminance values into approximately perceptually uniform values in order to enable an extension of the well-known PSNR metric to all luminance levels visible to HVS.

Their motivation to this end is mainly based on two observations. First, classical Weber law [19]

stating that the ratio of the perceivable luminance difference to the background luminance (i.e contrast threshold) is constant is only valid for the background luminance greater than 500 cd/m², as indicated with the contrast versus intensity (cvi) curve in Figure 3. It is therefore not a very accurate model for the range of luminance shown in typical displays (0.1 cd/m² to 100-1000 cd/m²). Second, the compressive transform function of sRGB color space used in typical displays to convert luma values (pixel intensities) to luminance values is not very appropriate for brighter displays. This is verified in Figure 3-b by

(13)

showing the strong deviations in the slopes of the quantization error between the luminance values corresponding to consecutive luma values from cvi curve for brighter displays.

In order to derive a perceptually uniform (PU) encoding that is applicable to whole luminance range, contrast thresholds should be properly estimated as a function of the pixel luminance. The thresholds are however dependent on many aspects of complex images such as spatial frequency, orientation, image size, and adapting luminance. In order to end up with such a function which is only dependent on the pixel luminance, the authors use the contrast sensitivity function (CSF) in [22] and choose the value that corresponds to the maximum sensitivity for each factor in the CSF function except the adapting luminance. Then, they also reduce the dependence of the contrast thresholds to adapting luminance by assuming that it is equal to pixel luminance. Such choices result in a conservative estimation of the contrast thresholds. The estimated thresholds correspond to the differentials of the PU encoding curve that is searched for. At the same time, such an encoding should also be backward compatible with the sRGB transform resulting in similar values to be able to use in LDR displays. Based on these constraints, the authors in [16] first find the forward mapping from luminance to luma values by a recursive process.

Then, they perform an approximation to make the PU encoding curve as close as possible to the sRGB curve for the LDR range of 0.1-100 cd/m². The mathematical details of this recursive process and the approximation are given in [14]. The resulting PU encoding is shown in Figure 4. While there is a closer match to sRGB color transform for the LDR range of 1-100 cd/m², the difference for the higher luminance is quite significant.

PU encoding curve is used to define the PU-PSNR metric between the HDR images where the luminance values are converted to PU values and the PSNR in terms of the PU values is computed between the HDR images. Such a metric is tested by means of psychovisual tests performed on regular and bright displays and validated as giving better results than the standard PSNR. We will use the perceptually uniform encoded values in this paper to derive the optimum tone mapping which maximizes the PU-PSNR value between the original and reconstructed images for backward-compatible HDR image and video compression.

(14)

(a) (b)

Figure 3: Contrast versus Intensity (cvi) function and quantization errors due to sRGB encoding, (a) in a regular display (80 cd/m²) and, (b) in a bright display (1000 cd/m²) [16].

Figure 4: Perceptually Uniform Encoding and sRGB encoding as a function of luminance (cd/m²).

B. Proposed Method with Perceptually Uniform Luminance Values

Figure 5 illustrates the main stages of the proposed method in parallel with Figure 1-b. First, the luminance values of HDR frames (represented as floating point real numbers in cd/m²) are converted into perceptually uniform values by PU encoding (Figure 4). The resulting values are represented as l_u in floating points. Then, the optimized tone mapping is applied to lu values to obtain the LDR video in 8-bit representation. After the LDR video goes through the coding and decoding, inverse tone mapping is applied to the resulting sequence to obtain the perceptually uniform values. Finally, the actual luminance values are found by using the inverse PU encoding.

As the solution derived in [1] is only based on the histogram of the input image and the statistical modeling of H.264/AVC coding error, such a formulation for the optimum solution can also be applied to

-2 -1 0 1 2 3 4 5 6 7

0 250 500 750 1 000 1250 1500

Luminance(cd/m2)(log10)

Luma

PU encoding sRGB encoding

(15)

Figure 5: Proposed scheme for backward compatible HDR video compression with PU encoding.

perceptually uniform values without loss of generality. For the given frame, the histogram over lu values is computed and the slopes of the tone mapping operator is found similar to Equation (11). Then, the resulting tone mapping is applied to the frames.

4. OPTIMUM TONE MAPPING WITH LDR IMAGE QUALITY CONSTRAINT

One of the important shortcomings of the optimized tone mapping developed by Mai et al. [1] was not to take the quality of the resulting LDR images into account during the formulation of optimization. The proposed method [1] can therefore produce some non-natural over illumination in the bright regions of the produced LDR images, although it minimizes the HDR-MSE for the reconstructed HDR images.

Considering that the global tone mapping operations are widely studied to obtain a pleasant LDR view in HDR research, we would search in this section to find the optimized tone mapping while preserving the LDR quality in a desired level. The quality of the LDR images is determined with respect to the MSE distance to a reference LDR image, which is produced by a reference global tone mapping operator. MSE or equivalently PSNR is one of the most common metrics used in image and video processing to present the level of distortion on the content. In our case, it also provides a tractable optimization problem where the derivatives of the expressions can be analytically computed.

(16)

Let us assume that h represents a reference global tone mapping operator and s is the optimized TMO we are searching (see Figure 6.a.). Using the same piece-wise linear parameterization also for h, the MSE between the produced LDR image and the reference LDR image can be approximated as

( )

^N _k

k

n n k

n

n h p

s l

h l s

E( ( ) ( )) ( . ) ( . ) .

2

1 1 1

2

∑ ∑ ∑

= = =



 



 −

≅

− δ δ _k

N

k k

n

n h p

s ) . (

2

1 1

2

∑ ∑

= =



 



 −

=δ (12)

where s_k and h_k correspond to the slopes for s and h, respectively. By constraining also the MSE distance to the reference LDR image, the minimization problem in (10) returns into a double constraint problem as

∑

= N

k k

k

N s

p

s

s₁...min ₁ ² arg

∑

=

N

k k

s v

1

subject to max

δ

 =∆



 



 −

∑ ∑

= =

k N

k k

n

n h p

s ) . (

2

1 1

δ2 (13)

After defining the Lagrangian and performing some algebra on the derivative of the MSE in (12), the KKT conditions for the above optimization problem can be reduced to the following set of equations:

N k p

h s s

p

a N

k a

a

n

n n k

k 2( ) . 0, 1..

2

1 2 2

3 1  = =



 



 −

+

− +

∑ ∑

= =

δ λ λ

0 .

) (

0

2

1 1

2 1

max

=

∆

 −



 



 −

=

−

∑ ∑

∑

= =

=

k N

k k

n

n n N

k k

p h s s v

δ

(14)

where λ₁and λ₂ are the Lagrangian coefficients for the two constraints in (13). Finding a closed form solution seems not feasible for such a set of equations as it ultimately ends up with a fourth order polynomials of s_k’s. Although adaptive search algorithms can be considered as an alternative, such an approach would also be computationally heavy for coding applications. Considering these facts, we

(17)

(a) (b)

Figure 6 (a): the searched optimum TM with the LDR constraint (s) and the reference TM operator (h) in (13), (b) the optimized TM minimizing HDR MSE (s), the reference TM operator (h), and the searched

optimum TM with the LDR constraint (g) for the modified problem in (15).

propose to follow a two-step strategy to arrive a sub-optimal solution to arrange the trade-off between the LDR quality and HDR quality.

Our proposed solution is first to use the found optimum solution for s_kin (11) for the one-constraint optimization problem, and then to redefine the cost function as a MSE distance to this optimal solution, while keeping the MSE distance to the reference LDR image in a desired level. Let us assume that s_k,h_k , and g_k correspond to the given optimal solution in (11), the reference global tone mapping operation, and the tone mapping operation we are looking for, respectively (see Figure 6.b). Then, the new optimization problem in terms of the MSEs can be expressed as:

∆

 =



 



 −



 



 −

∑ ∑

= =

k N

k k

n

n n

k N

k k

n

n n N

p h g

p g s

g g

. ) (

subject to

. ) (

min arg

2

1 1

2

1 1

2 1...

δ δ

(15).

The solution of the above constrained optimization problem with second order expressions can be found analytically by calculating the KKT conditions of its Lagrangian function:

(18)

0 .

) (

..

1 , 0 . )) (

( 2

2

1 1

2 1 2

=

∆

 −



 



 −

=

 =



 



 − + −

∑ ∑

= =

k N

k k

n

n n

a N

k a

a

n

n n n

n

p h g

N k p

h g s g

δ

λ δ

(16)

where λ is the Lagrangian coefficient for the constraint. By using also the formulation for MSE in (12), the solution to (16) can be finalized as:

( )

)) 1 ( ) ( 1 (

. ) (

..

1 1 ,

) (

1

2 2

1 1

2

∆ −

= −

∆ −



 



 −

=

+ = + − + =

= +

∑ ∑

= = s h p E s l hl

N h k

h s h g s

k N

k k

n

n n

k k k k k k

δ λ

λ λ

λ

(17).

The λ parameter in such a closed form solution sets the trade-off between the HDR image quality and LDR image quality by adjusting the distance of the resulting tone mapping curve to the corresponding two tone mapping curves. When λ is equal to 0, g becomes equal to s. The MSE between the original and reconstructed HDR frames after the coding is minimized, but the MSE distance of the resulting LDR image to the one produced with reference TMO (h) is maximized. As λ increases (or ∆ decreases), g moves away from s and results in an increase in HDR MSE, but gets closer to the reference TMO (h) on the other hand. Our experiments will focus on the verification of this trade-off.

5. OPTIMUM TONE MAPPING WITH FLICKERING CONSTRAINT

Another shortcoming of the optimized method developed by Mai et al. [1] was not to handle the adaptation of the derived tone mapping to the video in detail. As a simple remedy solution to the flickering kind of distortions that can occur on the generated LDR video, they propose to apply a low pass filtering to the generated tone curves for video frames without taking explicitly any HVS aspect into account. In this section, we reformulate the optimized tone mapping problem with an HVS based flickering constraint to achieve its application to video.

(19)

In vision research, the minimum amount of perceptible change in the luminance of a stimulus is defined as just noticeable difference (JND). The relation of the JND to the stimulus luminance is stated by Weber law as:

L k_w ∆L

= (18).

Such a relation says that the size of the JND (∆L) is a constant proportion (k_w) of the initial stimulus luminance (L) [19].

Guthier et al. [18] in a recent work propose to use the Weber constant (k_w) as a perceptual measure to avoid the flicker distortion in the resulting tone mapped video. Their proposed solution imposes the average brightness of the consecutive frame to be in the JND (∆L) range of the average brightness of previous frame while performing the tone mapping operation. The performance of such a perceptual metric to detect the flickering on LDR frames is in fact equivalent of the problem of binary hypothesis in decision theory. A metric can give false positives if there is not a flickering effect in the real HDR sequence, or can miss the flickering detection if there is really a flickering effect on the LDR video. Such a performance analysis over the false positive and missed detection rates is performed in [18] for the Weber measure by using a ground truth obtained by the psychovisual experiments over a large set of video sequences. The authors [18] statistically validate the Weber measure as a good indicator of the flickering distortion with these tests. In this paper, we will use such a measure to represent the flickering distortion on the LDR video and derive the flickering constrained optimized tone mapping.

Without loss of generality, let us first define the following variables and parameters:

s Optimized tone mapping of Mai et al. [1] in (11) for the n-th frame of the video (Figure 7).

g Optimized tone mapping we are searching for the n-th frame with the flickering constraint (Figure 7).

Ls The average intensity of the produced LDR frame with s.

(20)

Figure 7: Illustration for the optimum tone mapping of Mai et al. (s) and the flickering constrained optimized tone mapping (g).

Lg The average intensity of the produced LDR frame with g.

1

−

Ln The average intensity of the produced (n-1)-th frame.

−1

∆L_n The JND level for the average brightness of the (n-1)-th frame, which isk_wL_n₋₁.

In order to avoid a flicker effect between the (n-1)-th and n-th frames, the average intensity of the resulting LDR frame (L_g) should be in the JND (∆L_n₋₁) interval of L_n₋₁:







−

<

−

+

>

+

= ₋ ₋ ₋ ₋

−

otherwise if if

,

) (

,

) (

,

1 1

s

n w n s n

w n

n w n s n

w n g

L

L k L L L

k L

L k L L L

k L

L (19).

In terms of the histogram values, the average brightness Lg can be expressed as

∑

=

= +

≅

N

k k k N

k

k k

g p g l p G

L

1 1

) 2 /

( δ (20)

where Gk corresponds to the tone mapped value for the average luminance of the k-th bin of the histogram (Figure 7).

With such a constraint on Lg value, we will find the optimum tone mapping curve g, whose produced LDR image has the minimum mean square error (MSE) distance to the one produced with the tone mapping s:

(21)

∑

=

−

N

k

k k k N

S G p

G

G 1

2 1

) (

min arg

...

g k N

k

kG L

p =

∑

=1

.

subject to (21) where Sk=s(lk+δ/2) (Figure 7).

After defining the Lagrangian and solving the KKT conditions, the solution to the above problem is found as:

N k S

G_k _k , 1..

2 =

−

= λ

) 2(L_s−L_g

λ = (22) where λ is the Lagrangian coefficient for the constraint. The above equation states that the optimized tone mapping (s) should be shifted vertically to upwards or downwards with an amount equal to the difference of the average values Ls and Lg to obtain the optimized tone mapping with the flickering constraint.

The algorithm to avoid flickering with such an equation is applied iteratively to each frame of the HDR video sequence as follows:

• Find and apply the optimum tone mapping s in (11) to the first frame of the sequence.

• For each frame beginning from the second frame,

o find the optimum tone mapping s and compute the average intensity, Ls,

o compute the required average intensity, Lg, in (20) for a given kw value by using the average intensity of the previous frame, Ln-1,

o find the constrained optimized solution by using (22) and apply to the frame.

The Weber measure kw in such an algorithm controls the level of the flickering effect on the resulting LDR video. As k_w is decreased, the flickering between the average values of consecutive frames decreases. However, this results in a deviation from the optimized tone mapping s and therefore, the rate distortion performance of HDR video coding is expected to decline. If kw is increased, Ls mostly falls in the JND range of Ln-1 (Eq. (19)) and the constrained tone mapping g gets closer or equal to the optimized

(22)

tone mapping s. In this case, the rate-distortion is expected to be better but the resulting video contains more flickering distortions. Our experiments will focus on the verification of this trade-off.

6. EXPERIMENTS AND COMPARISONS

We have utilized the HDR images and video sequences from the databases given in [23]-[25] for the experiments. H.264/AVC reference software JM 17.2 [26] is used for the encoding of LDR images and video sequences. The experiments on the HDR images are performed by using the intra coding of the H.264/AVC whereas the video sequences are coded with a GOP structure IPPP and a GOP size 15. The supplemental information for the characterization of tone mapping curves is ignored in the given bitrates as it is very negligible (in byte levels) compared to the coding bitrate. In addition, the complexities of the performed PU encoding and the weighting or shifting operations on the tone mapping curves for the LDR quality constraint and flickering constraint optimizations are also negligible compared to the complexity of LDR video coding, which is common in all the compared methods. The results and comparisons for each of the proposed methods are presented in the following subsections.

A. Experiments on PU Encoding based Tone Mapping

We have utilized a look-up table for PU encoding and inverse PU encoding, which gives the absolute luminance value and its corresponding PU encoded value as a pair, rather than fitting a model, during the experiments. A look-up table offers better accuracy and is usually faster to compute than power or logarithmic functions approximating such encodings [16]. The size of the table for the whole luminance range from 10^-2 to 10⁷ cd /m² is about 96 kB, which is quite small for storage both at the encoder and decoder.

The performance of the proposed tone mapping based on perceptually uniform encoded values is compared with the tone mapping based on luminance values [1]. The comparisons are performed with respect to the quality of the produced LDR images and with respect to the quality of reconstructed HDR image after the coding of LDR images at different bitrates.

(23)

Figure 8 illustrates the resulting tone mapping curves and the histograms for the logarithm of the luminance values (l) and PU values (lu), respectively, for the HDR test images Atrium Night, Coby, Memorial, and Forest Path. The number of the histogram bins in the proposed method is held equal to the one in the compared method for a fair comparison. The resulting tone mapping curves for the first three images are different for both approaches. Tone mapping curves for Forest path are similar as the luminance range of Forest Path is mostly higher than the 100 cd/m², where PU encoding is mostly linear with respect to the logarithm of the luminance (Fig. 4).

Figure 9, 10 and 11 shows the produced LDR images for both approaches. Some over-illumination spread over the whole image is observable for the compared method [1] in particular for Atrium Night and Coby. Such an affect can also be seen in the high luminance regions (e.g. the windows and the dome) of Memorial. The proposed method eliminates these effects by using the perceptually uniform luminance values and produces more natural LDR images. A drawback is seen however for the darker region of Memorial which is zoomed in Figure 12. Although the high luminance regions are more realistic for the proposed approach, the details in the darker regions (e.g. the ceiling in the middle-left of Memorial) are less visible in contrast to the compared method [1]. The proposed PU encoding is performing more compression compared to the logarithmic transformation as can be deduced from Figure 4. The produced LDR images for Forest Path are very similar as the tone mapping curves are very close.

Figure 13 gives the rate distortion curves in terms of the coding bit rate of LDR image vs. the quality of the reconstructed HDR image. We have used three different quality metrics for the comparisons, namely, HDR MSE (i.e. mean square error between the logarithm of luminance values of original and reconstructed HDR images) [1], PU-PSNR [16], and the HDR visible difference predictor (HDR VDP) [24]. HDR-VDP [27] is proposed as an extension of the VDP metric [22] to HDR images and forms one of the few perceptual metrics with PU-PSNR that is specially tailored for HDR images. The metric takes two HDR images as the input and produces a probability map which shows the probability of the visibility of the difference between two images at each pixel location. The percentage of the pixels that exceeds 0.95 or 0.75 probability is given as the output of the metric. As expected, the compared method

(24)

(a) Atrium Night (b) Coby

(c) Memorial (d) Forest Path

Fig. 8: Tone curves with log. luminance [1] and with the proposed PU encoding. (a) Atrium Night, lmax = 3.95 ; lmin = -1.77 ; lu, max =506.62 ;l_{u, min} =0.32, (b) Coby, lmax =2.93 ; l_min =-0.59 ; l_{u, max} = 355.47 ;l_{u, min}

=22.16, (c) Memorial, lmax =5.05 ; lmin =-0.48 ; lu, max = 673.89 ;lu, min =25.88, (d) Forest Path, lmax

=4.38 ; lmin =1.36 ; lu, max = 572.81; lu, min =149.04 . (Horizontal axes of the graphs are normalized with respect to the max. and min. values of log. luminance and PU values.)

[1] gives lower MSE results than the proposed method, as this method is optimally designed to give minimum MSE for the logarithm of luminance values. However, the proposed method gives better PU- PSNR results as it minimizes the MSE between the perceptually uniform values. While the two methods give very similar HDR VDP results for Atrium Night and Memorial, the compared method [1] is better for Coby. It is not easy to judge a conclusion regarding the superiority of one method to the other with respect to HDR VDP results.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250

Normalized HDR Log Luminance/ Normalized PU values

LDR pixel value

Tone Curves for HDR Image: Atrium Night Log Luminance Histogram

TM curve with Log Luminance [1]

TM curve with PU values

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250

LDR pixel value

Tone Curves for HDR Image: Coby

Log Luminance Histogram TM curve with Log Luminance [1]

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250

LDR pixel value

Tone Curves for HDR Image: Memorial

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 50 100 150 200 250

LDR pixel value

Tone Curves for HDR Image: Forest path

(25)

(a) with Log. Luminance [1] (b) with proposed PU Encoding Fig. 9: Produced LDR images for Atrium Night.

(a) with Log. Luminance [1] (b) with proposed PU Encoding Fig. 10: Produced LDR images for Coby.

(26)

(a) with Log. Luminance [1] (b) with proposed PU Encoding Fig. 11: Produced LDR images for Memorial.

(a) with Log. Luminance [1] (b) with proposed PU Encoding Fig. 12: Zoom-in of the darker regions for Memorial.

B. Experiments on LDR Image Quality Constraint Optimization

We select the global version of the photographic tone reproduction (PTR) in [28] as a reference TMO in our experiments. PTR [28] is a popular TMO with its better performance than others in terms of contrast loss and psycho-visual experiments [29]. Our main focus in the experiments is however to verify the methodology to compromise the trade-off between the HDR quality and LDR quality, rather than the selection of which TMO as a reference.

(27)

(a)Atrium Night HDR MSE (b) Atrium Night PU-PSNR (c) Atrium Night HDR VDP

(d) Coby HDR MSE (e) Coby PU-PSNR (f) Coby HDR VDP

(g) Memorial HDR MSE (h) Memorial PU-PSNR (i) Memorial HDR VDP Fig. 13: HDR MSE (log10), PU-PSNR (dB) and HDR VDP (75 %) results for Atrium Night, Memorial and Bristol Bridge for the proposed method and compared method [1]. Some encoding parameters for the utilized H.264/AVC reference software JM 17.2 are as follows: ProfileIDC = 244;

LevelIDC = 40; YUVformat=1 (i.e. 4:2:0).

Figure 14 shows the resulting tone mapping curves for the HDR test image Atrium Night and Coby.

The λ parameter is adjusted such that the PSNR between the resulting and reference tone mapped images is 35 dB and 30 dB for two cases to be able to show the trade-off between the rate-distortion performance and the LDR image quality. Without loss of generality, other selections of PSNR values in a decreasing manner are also possible to validate such a trade-off. While λ is smaller, the resulting tone mapping curve for the LDR constrained case is closer to optimum tone mapping of Mai et al.[1]. When it is higher, the curve is becoming nearer to the reference TMO [28]. It should be noted that the λ value to

0.5 1 1.5 2 2.5 3 3.5

-4.2 -4 -3.8 -3.6 -3.4 -3.2 -3 -2.8

HDR MSE vs Bit Rate,HDR image: Atrium Night

Bit Rate(bits/pixel)

HDR MSE (log10)

Mai et al.[1]

Proposed Method

0.5 1 1.5 2 2.5 3 3.5

56 57 58 59 60 61 62 63 64 65 66

PU PSNR [dB] vs. Bit Rate,HDR image: Atrium Night

PU PSNR [dB]

Mai et al.[1]

Proposed Method

0.5 1 1.5 2 2.5 3 3.5

0 5 10 15 20

25 HDR VDP (75 %) vs. Bit Rate,HDR image: Atrium Night

HDR VDP (75 %)

Mai et al.[1]

Proposed Method

0.5 1 1.5 2 2.5 3

-4.1 -4 -3.9 -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1

HDR MSE vs Bit Rate,HDR image: Coby

HDR MSE (log10)

Mai et al.[1]

Proposed Method

0.5 1 1.5 2 2.5 3

15 16 17 18 19 20 21 22 23 24 25

PU PSNR [dB] vs. Bit Rate,HDR image: Coby

PU PSNR [dB]

Mai et al.[1]

Proposed Method

0.5 1 1.5 2 2.5 3

0 5 10 15 20 25

30 HDR VDP (75 %) vs. Bit Rate, HDR image: Coby

HDR VDP (75 %)

Mai et al.[1]

Proposed Method

1 1.5 2 2.5 3 3.5 4

-4 -3.8 -3.6 -3.4 -3.2 -3 -2.8

-2.6 HDR MSE vs Bit Rate,HDR image: Memorial

HDR MSE (log10)

Mai et al.[1]

Proposed Method

1 1.5 2 2.5 3 3.5 4

22 24 26 28 30 32

34 PU PSNR [dB] vs. Bit Rate,HDR image: Memorial

PU PSNR [dB]

Mai et al.[1]

Proposed Method

1 1.5 2 2.5 3 3.5 4

0 5 10 15 20 25 30 35

40 HDR VDP (75 %) vs. Bit Rate,HDR image: Memorial

HDR VDP (75 %)

Mai et al.[1]

Proposed Method