Perceptual content and tone adaptation for HDR display technologies

(1)

Pour l'obtention du grade de

DOCTEUR DE L'UNIVERSITÉ DE POITIERS UFR des sciences fondamentales et appliquées

XLIM-SIC

(Diplôme National - Arrêté du 25 mai 2016)

École doctorale : Sciences et ingénierie pour l'information, mathématiques - S2IM (Poitiers) Secteur de recherche : Traitement du signal et des images

Présentée par :

Mekides Assefa Abebe

Perceptual content and tone adaptation for

HDR display technologies

Directeur(s) de Thèse :

Christine Fernandez-Maloigne, Mohamed-Chaker Larabi Soutenue le 07 octobre 2016 devant le jury Jury :

Président Frédéric Dufaux Directeur de recherche CNRS, Télécom ParisTech Rapporteur Alain Trémeau Professeur des Universités, Université de Saint-Etienne Rapporteur Philip Green Professor, Gjovik University College, Norway

Membre Christine Fernandez-Maloigne Professeur des Universités, Université de Poitiers Membre Mohamed-Chaker Larabi Maître de conférences, Université de Poitiers

Membre Tania Pouli Ingénieur de recherche, Technicolor, Cesson-Sévigné

Pour citer cette thèse :

Mekides Assefa Abebe. Perceptual content and tone adaptation for HDR display technologies [En ligne]. Thèse Traitement du signal et des images. Poitiers : Université de Poitiers, 2016. Disponible sur Internet

(2)

Thèse

Pour l’obtention du grade de

Docteur de l’université de poitiers

Faculté des sciences Fondamentales et Appliquées

(Diplôme national – arrêté du 7 août 2006)

École Doctorale : Sciences et Ingénierie pour l’Information

Secteur de Recherche : Traitement du Signal et des Images

Présentée par

Mekides Assefa ABEBE

*******************************

Perceptual Content and Tone

Adaptation for HDR Display

Technologies

*******************************

Thèse soutenue le 7 octobre 2016 devant le jury composé de :

Phil GREEN, Professeur, NTNU Gjovik - Norvège . . . Rapporteur Alain TREMEAU, Professeur, Hubert Curien, Univ. Jean-Monnet, Saint-Etienne . . . Rapporteur Frédéric DUFAUX, Directeur de Recherches, LTCI, Telecom ParisTech . . . Examinateur Tania POULI, Ingénieur de Recherche, Technicolor . . . Examinateur Mohamed-Chaker LARABI, Maître de Conférences, Université de Poitiers . Co-Directeur de Thèse Christine FERNANDEZ, Professeur, Université de Poitiers . . . Directeur de Thèse

(3)

(4)

A SAMVIQ Experimental Steps 145 B iTMO and Color Appearance Models Evaluation Data 147 C Experimental Data for HDR Lightness Modeling 148 D Experimental Data for Analysis of Perceptual Attributes Under Higher Luminance Ranges 149 E Over-exposure and Color Clipping Correction Additional Results 155 E.1 Results for The Proposed Thresholding Method . . . 155

E.2 Comparisons . . . 155

F Image Quality Metrics 172 F.1 Visible Difference Metric for High Dynamic Range Images . . . 172

(7)

1.1 Example image for the demonstration of over-exposure, under-exposure,

and saturation issues of dynamic range expansion methods. . . 3

2.1 Exposure triangle. . . 7

2.2 Demonstration of Aperture openings and their effects. [1] . . . 7

2.3 Example digital camera ISO values and their effects. . . 8

2.4 Demonstration of example shutter speed values and their effects. . . 8

2.5 Several images of the same scene captured using the same camera but with different exposure settings. . . 9

2.6 Dual modulation or local-dimming display structure. . . 12

2.7 Basic heuristic image-processing algorithm of dual-modulation HDR dis-plays. . . 12

3.1 Color management pipeline in color image reproduction. . . 17

3.2 Explanation of basic human vision process. . . 17

3.3 The dynamic range of human visual system [2]. . . 18

3.4 Sensitivities of LMS photoreceptor cells. . . 19

3.5 CIE 1931 and CIE1964, 2◦ _{and 10}◦ _{standard observer, color matching} functions. . . 20

3.6 Components of a viewing field. [3] . . . 22

3.7 A hue circle illustrating the color opponent theory of human vision.. . . . 23

3.8 a) Simultaneous contrast. b) Crispening c) Spreading effect. [3] . . . 26

3.9 Steven’s effect: The increase in lightness contrast as the adapting lumi-nance increases. [3] . . . 26

3.10 Abney effect: a) Lines of constant perceived hue for mixtures of monochro-matic and white stimuli in psychophysical experiments. b) Lines of con-stant perceived hue from Munsel color samples. [3]. . . 27

3.11 Cartesian and cylindrical representations of CIELAB color space. . . 29

3.12 Flowchart of the refined iCAM color appearance model taken from [4, 5] . 41 3.13 The basic steps of Reinhard et al. color appearance model taken from [6]. 45 3.14 Example image with color clipped and over-exposed region. The figure, on the right, shows the RGB profile plot of pixels, marked in black, of the left image. Pixels with RGB intensity values of greater than a particular threshold th are considered to be clipped. . . 48

3.15 Flow chart of Xu et al. clipping correction method.(adapted from [7].) . . 50

3.16 Illustration of Xu et al. clipped region and surrounding region detection. (taken from [7].) . . . 50

3.17 Example clipped and corrected images together with their RGB his-tograms. (adapted from [8].) . . . 51

(8)

3.18 Wang et al. texture and lightness hallucination process. (taken from [9]). a) Users identify over-exposed region in blue stroke and the well exposed region, which will be used as the source of lightness and texture informa-tion, in green. b) Zoomed over-exposed region. c) Corrected image d) Close-up view of the corrected region. e)High frequency texture compo-nent f) Low frequency illumination compocompo-nent g) Hallucinated texture h) Hallucinated illumination. . . 52

3.19 The basic steps of Guo et al. over-exposure correction method. (taken from [10]). . . 53

3.20 An overview of Rouf et al. gradient domain over-exposure correction. (taken from [11]). . . 54

3.21 The general workflow of bright video features enhancement method pro-posed by [12]. (adapted from [12]) . . . 55

3.22 Dynamic range expansion curves used in Akyuz et al. experiment [13]. . 57

3.23 Dynamic range expansion function used in Meylan et al. rTMO [14]. . . 58

3.24 Basic components of the Rempel et al. rTMO [15]. . . 58

4.1 Experimental LDR images generated from the absolute radiometric HDR images. Images 8 and 9 are taken from the dataset of [16], the remaining images are from the HDR Photographic Survey [17]. . . 65

4.2 Channel independence and chromaticity constancy of SIM2 HDR display. (a) The average differences between the additive combination of RGB primary measurements and the white patch measurements. (b) The chro-maticity points of the measured RGB primaries of SIM2 HDR display with in CIE 1931 chromaticity diagram and the average disparity from the mean RGB chromaticity point. . . 67

4.3 Testing the power limitation of the display. The different lines in this figure correspond to the display response using patches of different reso-lutions (detailed in the legend). For each patch size, we have generated several patches with logarithmically increasing intensity values from 0 to 5011. Finally, the measured luminance values of the patches, for each respective patch size, are plotted. . . 67

4.4 Experimental Setup. . . 69

4.5 Mean scores and corresponding errors (95% confidence interval) averaged over all methods (a) and scenes (b). The corresponding results from the post-hoc analysis are shown in (c) and (d). . . 70

4.6 Mean scores and errors for each image for the best scoring ITMO and CAM: the best ITMO method based on the experiment scores was the global method of Akyuz et al. [13], and the best CAM was found to be the model by Kim et al. [18]. . . 71

4.7 Example results from each of the tested methods and color maps visual-izing the luminance range encoded in each image. Note that results in this figure are tone mapped as their dynamic range is much higher than what could be reproduced on paper. . . 72

(9)

5.1 An example stimulus. The top patch represents the reference adapting white point, while the bottom patches depict the maximum luminance on the left, the minimum luminance on the right and the observer adjustable patch centrally. . . 80

5.2 ANOVA multiple comparison diagram of the normalized luminance values of the three block experiments. The red lines indicate blocks with mean values which are significantly different from block 1 (blue line). . . 81

5.3 Summary of the experimental data for each block. Each bar indicates statistics of adjusted luminance over all observers for each reference patch. 82

5.4 Luminance values for the three blocks of our experiment. . . 82

5.5 Lightness predictions of the optimized models and the corresponding RMSE between their prediction for measured reference patch luminance values and twice their prediction for measured observer visual adjustments. 84

5.6 Perceptual lightness prediction of achromatic Munsell patches according to the proposed and former perceptual lightness models . . . 85

5.7 Perceptual lightness predictions of the proposed and Kim et al.’s lightness models for Kim et al. HDR dataset. . . 86

5.8 Kim et al. and the proposed models predictions are given for the three blocks of our dataset. Kim et al.’s prediction were normalized by 100 to achieve the same [0, 1] range of our model. . . 86

5.9 The steps for HDR image reproduction application using our proposed models. . . 87

5.10 Experimental LDR images generated from the absolute radiometric HDR images, taken from the HDR Photographic Survey [19]. . . 87

5.11 Multiple comparison results of our experiment comparing the different methods (left) and individual scenes (right). Circles indicate the mean score for each case, while horizontal lines indicate 95% confidence intervals. 89

5.12 Example results of different methods for dynamic range expansion as well as their corresponding HDRVDP2 and SSIM comparison maps against the HDR ground truth images. (Note that the expanded results have been tone mapped back to LDR using the iCAM06 model for visualization.) . . 91

5.13 Tone mapped HDR images using several HDR color appearance models, linear scaling within different lightness models as well as traditional tone mapping operators. . . 92

6.1 The tristimulus values of the experimental stimuli and the gamut of SIM2 HDR display plotted in CIE1931 chromaticity diagram. . . 97

6.2 Colorfulness experiment example stimulus. The top patch represents the reference adapting white point of the display, while the bottom patches depict the maximum Chroma on the left, the minimum Chroma on the right and the observer adjustable patch at the center. . . 98

6.3 Mean chroma values of the adjusted patches for all the five lightness and hue levels. . . 99

6.4 Mean tristimulus values of the measured reference (Red dots) and ad-justed (Blue dots) patches for all chroma and lightness levels. . . 100

6.5 3D gamut visualization of sRGB in the Cartesian and cylindrical spaces of CIELab. In the left figure, lightness Cartesian coordinate is made to point towards the reader. . . 100

(10)

6.6 The Cartesian and cylindrical representations of the proposed HDR color space. . . 102

6.7 Computed appearance attributes of the reference patches versus that of the adjusted patches in the proposed and CIELab color spaces. For the chroma and saturation comparisons, the chroma values of the adjusted patches are multiplied by two. Note that the saturation values are com-puted using the standard C/L formula. . . 103

6.8 The demonstration of our surface optimization process of the new satu-ration model for Red patches. . . 104

6.9 Saturation predictions of the former and newly corrected saturation for-mulas for reference and perceptually adjusted patches of our experimental dataset. . . 105

6.10 Saturation predictions of HDR color appearance models for reference and perceptually adjusted patches of our experimental dataset. In each plot, the average saturation values of patches of 4 chroma levels are used in each lightness levels. . . 106

6.11 Sample rTMO results of 8 color appearance models. All the images for the rTMO results, except the LDR input image, are tone mapped using iCAM06 [5] tone mapping operator for visualization purpose. . . 107

6.12 Sample TMO results of HDR color appearance models and TMOs. . . 108

6.13 Comparisons of color appearance, structural and quality differences of the proposed color space and our previous power low lightness model for rTMO application. All the images for the rTMO results, except the LDR input image, are tone mapped using iCAM06 [5] tone mapping operator for visualization purpose. . . 108

7.1 Overview of our method. For the clipping mask of step 1, mask pixels are set to 255 if the corresponding image location is clipped. The color in the mask indicates the combination of clipped channels in the region. Detailed descriptions of the steps 1,2 and 3 can be found in Sec. 7.2.1, Sec. 7.2.2, and Sec. 7.2.3 respectively. . . 113

7.2 An illustration of the reconstruction process for a single scanline. . . 115

7.3 ∆E94 and HDRVDP2 differences between the reference HDR image and a sample saturated image extracted from it (top), corrected with our method (middle) and Masood’s (bottom). Our result is closer to the HDR ground truth. . . 117

7.4 Sample correction results of our, Masood at al. and Guo et al. methods. . 117

7.5 An overview of the steps of our correction method. . . 119

7.6 The effect of different threshold values (top) and the differences between the input and corrected images (b) are shown. Lower thresholds lead to a more visible correction but cause potentially useful information in the image to be ignored. . . 120

7.7 Sky regions showing between the branches (a) should be consistently treated. If these regions are processed separately, their appearance be-comes inconsistent (b). Our hue and distance refinement step treats these regions similarly, producing a more consistent result (c). . . 120

7.8 The effect of different values of the hue and distance thresholds in Eq. (7.7) on the region grouping step. The image with the dashed border represents our default settings. . . 121

(11)

7.9 Visualization of our correction process for a sample one channel clipping case. In this particular case, the red channel is clipped in all pixels of the line profile of an image and the green channel is used as a reference channel.(Taken from [? ]). . . 124

7.10 A simple clipping example (a) and its corresponding correction (b). A single scan line (shown by the dashed line in a and b) of the two images is shown in (c) and (d). This image demonstrates our correction for one and two channel clipping as well as over-exposure. The input is scaled according to the output maximum for comparison. . . 124

7.11 An illustration of the different masks used in the adaptive smoothing step of our over-exposure correction. The red dotted line shows how the border of the different masks relate to each other. . . 127

7.12 The effect of changing the values of parameters for the over-exposure correction step of our algorithm. Specifically, we vary the following pa-rameters:the size of the erosion , the percentage of the number of pixels within a region c, and the exponent γ used to finally compute the param-eter σs for the cross bilateral filtering step. . . 127

7.13 Example HDRVDP2 probability of detection and sCIELab color differ-ence maps generated by the proposed method and tested white point estimation methods. . . 129

7.14 Example results which demonstrate the better performance cases of the proposed thresholding methods. The linearly scaled results of the two best performing proposed methods (Huo et al. and Grey-Edge) and the fixed thresholding methods are provided.. . . 130

7.15 Example results on still images using our method. Detailed views of the clipped input (a) and the corrected result (b) are shown in (c), normal-ized difference maps between the input and output are shown in (d) to illustrate the effect of the correction. . . 131

7.16 Visual and quantitative comparisons of color clipping correction methods. For visualization purposes the images are tone mapped using Fattal et al. TMO [20] . . . 132

7.17 Our SAMVIQ experimental set up. The Keys from A-E represents one of the five images (Hidden Reference, Input, Ours, Masood’s, and Guo’s results) . . . 133

7.18 Experiment results over all images and participants. (a) Mean scores for each method and corresponding standard deviations. A score of 1 means ‘not similar’ while a score of 5 means ‘very similar’. (b) Post-hoc analysis using the Tukey-Kramer test. (c,d) Average scores for each participant and image respectively.. . . 134

7.19 An example frame (a) from a video sequence is shown here corrected with our approach (b). To avoid global flickering, we smooth the per-frame maximum temporally. The maximum pixel value per frame is plotted in (c) for our full method with temporal smoothing, with no temporal smoothing and for the method by Masood et al. [21]. . . 136

7.20 Sample video correction result. The subsequent 8 frames of the clipped video are provided in the first column. The remaining three columns are, the corresponding clipping masks, our algorithm corrections, and the differences between the clipped and corrected regions of each video frames, respectively. . . 137

(12)

7.21 The clipped input and our result compressed with different TMOs [22–24] to achieve different visualization styles. . . 138

7.22 Our method is implemented as a color grading plugin, allowing for inter-active modification of parameters and fast visualization. . . 138

7.23 Diffuse surfaces require more aggressive smoothing when correcting over-exposure. The settings used for specularities and highlights lead to still visible discontinuities. We propose a set of diffuse settings for such images.139

A.1 Example test organization using SAMVIQ methodology, [25].. . . 146

E.1 The contouring artifacts occurred in the sky, left image, and around the light sources, center image, as well as the over-saturated petals of the flower, right image, were avoided by using thresholds computed based on the proposed Hue et al. [26] white point detection method. Even though, Gray-Edge method [27] gives decent results, in the left and right images, Hue et al. [26] method gives the most robust results and it is our main recommendation. . . 156

E.2 Image 1: Color clipped pixels = 20.37% Over-exposed pixels = 0.326% . . 157

E.10 Image 9: Color clipped pixels = 8.53% Over-exposed pixels = 0.0%. . . . 165

E.13 Image 12: Color clipped pixels = 11.18% Over-exposed pixels = 0.154% . 168

F.1 Data flow diagram of the High Dynamic Range Visible Difference Predic-tor (HDR VDP), taken from [28]. . . 173

F.2 Example two-dimentional map of HDR-VDP2 probablity of detection, taken from [29]. . . 173

(13)

3.1 CIECAM97s input parameters [3]. . . 31

3.2 CIECAM02 surrounding parameters for input viewing condition [3, 30, 31]. 33

3.3 Three main HDR image formats (taken from Reinhard et al. [2]) . . . 36

3.4 Main HDR encodings (taken from Reinhard et al. [2]) . . . 37

4.1 Used parameters for the algorithms evaluated in the experiment. . . 66

5.1 Average correlation of individual observers’ adjustments with mean ad-justed luminance . . . 81

5.2 Fitting parameters for Steven’s power law and Michaelis-Menten based models according to experimental results for Block 1 (LDR) and combined Blocks 2 and 3 (HDR). . . 83

5.3 Quality and structural evaluations of several reverse tone mapping oper-ators. Results with no degradation will have values Q_{− MOS = 100 and} SSIM values of 1. [32, 33] . . . 88

6.1 Average correlation of individual observers’ adjustments with mean ad-justed Chroma . . . 98

6.2 RMSE errors for the state of the art HDR color appearance model satu-ration predictions. The error between the adjusted and reference patches are averaged over all chroma and hue in each luminance levels. . . 106

6.3 RMSE errors for the state of the art HDR color appearance model satu-ration predictions. The error between the adjusted and reference patches are averaged over all chroma and lightness levels for each hue values. . . . 107

6.4 Image difference and Image quality results of the state of the art HDR color appearance models and rTMOs relative to ground truth image. . . . 108

7.1 Color fidelity and quality comparison between clipped and corrected im-ages with the reference HDR imim-ages for 45 imim-ages from the RIT database [19]. Best values are: Hue difference_{! 0, ∆E94 ! 0, S-CIELAB ! 0 and} HDRVDP 2! 100. . . 116

7.2 The different clipping cases and the reference channel(s) used in our cor-rection. . . 118

7.3 HDRVDP2 quality and sCIELab color difference evaluation results of the proposed method for the different white point estimation algorithms. . . . 130

7.4 One-way ANOVA results comparing the different methods. . . 135

B.1 Observers average scores of all experimental methods and stimuli.. . . 147

(14)

C.1 Measured XYZ values of the three blocks experimental reference gray

patches . . . 148

C.2 Average XYZ measurments of the abserver adjusted achromatic patches for the 10 luminance levels of the three block experiments. . . 148

D.1 Appearance attributes and XYZ values of red reference patches.. . . 149

D.2 Appearance attributes and XYZ values of yellow reference patches. . . 150

D.3 Appearance attributes and XYZ values of green reference patches. . . 150

D.4 Appearance attributes and XYZ values of blue reference patches. . . 151

D.5 Appearance attributes and XYZ values of Purple reference patches.. . . . 151

D.6 Appearance attributes and average of measured XYZ values of observers adjusted red patches. . . 152

D.7 Appearance attributes and average of measured XYZ values of observers adjusted yellow patches. . . 152

D.8 Appearance attributes and average of measured XYZ values of observers adjusted green patches. . . 153

D.9 Appearance attributes and average of measured XYZ values of observers adjusted blue patches. . . 153

D.10 Appearance attributes and average of measured XYZ values of observers adjusted purple patches. . . 154

(15)

Camera and display technologies have greatly advanced in their capacities of reproducing color, dynamic range and details of real scenes in recent years. New and powerful high dynamic range (HDR) camera and display technologies are currently available in the market and, recently, these new HDR technologies offer higher flexibility to content creators, allowing them to produce a more precise representation of real world scenes. At the same time, limitations of conventional camera and display technologies mean that there is a significant gap in terms of content and color appearance reproduction between new and existing technologies. These mismatches lead to an increased demand for appearance studies and algorithms which take such under consideration and help content creators to perform accurate cross-media reproductions.

In this thesis we mainly considered the adaptation and reproduction of standard dynamic range content towards HDR displays. First, existing color appearance models and re-verse tone mapping operators were subjectively studied for their color fidelity during dynamic range expansion of legacy contents. Then perceptual lightness, chroma and saturation attributes were analyzed for emissive stimuli with higher range of luminance levels using adapted psycho-visual experimental setups based on the partition scaling method. The experimental results lead to a new, more efficient and accurate HDR color space, specifically optimized for dynamic range adaptation applications. Finally, to re-cover lost details and enhance the visual quality of legacy content before visualizing on an HDR display, two methods for color-clipping and over-exposure correction were intro-duced. The models and algorithms presented, were evaluated relative to HDR ground truth content, showing that our results are closer to the real scene than can be achieved with previous methods.

(16)

Introduction

Digital technologies are advancing rapidly in terms of their capabilities to capture and present real world scenes in a realistic and accurate manner. Novel and powerful camera and display technologies are becoming increasingly available, allowing us to produce more precise representations of reality along several dimensions. From increased contrast and luminance range to wider color gamut or even 3D technologies, such advancements offer a more immersive and rich experience to viewers. At the same time, the increased freedom they offer calls for more precise control and for tools enabling content creators to accurately reproduce color appearance across potentially very different devices. The increased difference in color coordinates and basic technologies between the new and the conventional digital devices broaden the gap in their capacities of reproducing color, dynamic range and details of scenes. Therefore, the effects of the additional features of the new devices on their quality of color appearance reproduction should be investigated. At the same time, to create contents for these new technologies, the introduction of new models and tools will be invaluable.

In this thesis, the focus is one particular feature which is currently reaching the market, namely that of extended luminance range. Devices and solutions permitting the cap-ture, processing and display of content with a wider range of luminance and contrast are termed high dynamic range (HDR). Although such technologies can offer a more im-mersive experience to viewers, better matching the light in real scenes, they also require different considerations in terms of color appearance management and reproduction. Over the past years, several color spaces and color appearance models have been pro-posed for more accurate and perceptually motivated color appearance reproduction among conventional devices. Many of these models were proposed based on the conven-tional standards and technologies which can cover a limited dynamic range of illumina-tion, and as such are not directly applicable to HDR [34,35].

Due to the recent advancements of camera and display technologies, an increased number of high dynamic range (HDR) imaging algorithms intended for cross-media reproduction of content have been introduced [2,35–37]. However, since the display technologies and digital content standards commonly used by consumers are still limited in dynamic range, the majority of appearance reproduction studies are focused on the accurate transformation of HDR or low dynamic range (LDR) contents towards conventional display and printing technologies.

In contrast, very little work exists exploring the inverse problem, namely the prepara-tion of legacy LDR contents for viewing on HDR displays. Given the recent adopprepara-tion of HDR technologies by consumer manufacturers, the ’up conversion’ of existing content

(17)

libraries is a very timely problem [5, 6, 18]. Furthermore, problems arise due to limi-tations of conventional camera technologies, like color desaturation, color clipping and over-exposure (highlighted in Figure 1.1). Such issues critically affect the over-all color appearance and the visual quality of images and videos and although some correction methods exist in the literature, they have not been explored in the context of dynamic

range expansion, where such image defects might be amplified [38–41].

Therefore, in this work, we focus mostly on the aspect of increasing the dynamic range and enhancing the contents of LDR images and videos for the new HDR consumer devices and standards.

We have started our work by conducting an experiment studying the existing solutions around image and color appearance reproduction for HDR imaging applications. Based on the findings from the experiment, we formulate new models that improve upon pre-vious work and focus specifically on HDR display applications. Finally, we explore tools for creating or improving content for viewing on HDR displays. Specifically, we propose new lightness models with a corresponding saturation correction that are specifically adapted to HDR display applications. Further, we develop solutions for color clip-ping and over-exposure problems of HDR imaging, primarily in reverse tone mapclip-ping applications.Through thorough psychophysical evaluation of the proposed models and methods, we show that our results are able to approach the appearance of HDR ground truth scenes more than previous methods, therefore allowing for a more immersive and realistic viewing experience.

1.1 Goals and Scopes

There are many traditional and few HDR color appearance models introduced in the

literature(Section 3.2.3). The HDR models are mainly designed with the intention of

dynamic range compression applications. In contrast, there are a number of simple lo-cal and global tone expansion algorithms introduced in the past recent years, but they typically only process luminance information in images [5,6,18]. Although the need for accurate color reproduction on HDR display is increasing, the color appearance repro-duction qualities of both the HDR color appearance models and simple tone expansion methods have been little studied.

Particularly, the change in color appearance attributes such as lightness, chroma, hue and colorfulness, during dynamic range expansion of LDR contents is not investigated

enough [18]. Loss of details and appearance changes due to color clipping and

over-exposures problems were also not given the necessary attention in the existing reverse tone mapping operators (rTMOs) and HDR color appearance models.

For example, the color appearance of the two dynamic range expansion methods [18,42],

shown in Figure1.1, are far from that of the given HDR image. Even if they successfully

increase the luminance range and enhance the overall contrast of the LDR image, the methods were unable to recover the loss of details in the highlighted clouds and their color reproduction was not very accurate.

Therefore, in this thesis we have conducted a series of psychophysical experiments for

the investigation of color appearance attributes for HDR display technology (Chapter5

and 6) and assessing the color fidelity of several HDR color appearance models and

dynamic range expansion methods (Chapter 4). Based on the gathered experimental

(18)

In addition to the color appearance studies, the color clipping and over-exposure prob-lems of LDR image and video contents were investigated, as they were found to con-tribute significantly to the quality of HDR content created from LDR sources. We have studied methods designed specifically for color clipping and over-exposure correction as well as dynamic range expansion methods with special treatments for specular highlight

and light sources (Chapter 3.3.4). Finally, a scan line based and region based color

clipping and over-exposure correction methods for still images and video contents were introduced providing solutions suitable for hardware implementations such as televi-sions in the first case, and color grading and post-production applications in the latter (Chapter7).

HDR LDR Banterle et al. Kim et al. Dynamic range expansion results Over-exposure

Figure 1.1: Example image for the demonstration of over-exposure, under-exposure, and saturation issues of dynamic range expansion methods.

1.2 Contributions

In this thesis, the following contributions are presented:

• Psycho-visual evaluation of dynamic range expansion methods in terms of color

appearance reproduction accuracy [43].

• A specifically designed experimental methodology based on the method of adjust-ment, which is used to measure the perceptual lightness and chroma for emissive stimuli at extended luminance ranges.

• A novel lightness model covering an extended dynamic range, allowing for both compression and expansion of luminance.

• An application and psycho-visual evaluation of the proposed model in the context of reverse tone mapping.

• A psycho-visual investigation of the perception of chroma under extended lumi-nance ranges and an HDR saturation correction method based on the resulting perceptual findings.

• A new scan-line based color clipping correction method that relies on computa-tionally and algorithmically simple processing steps [44].

• An improved region based over-exposure and color clipping correction method with an efficient video extension [45].

(19)

1.3 Thesis Outline

The rest of the thesis is structured in to 7 chapters.

• Chapter 2: presents the technological advancements of digital cameras and dis-play devices with additional information of their dynamic range and color repro-duction capabilities.

• Chapter 3: provides background in topics of human color vision, colorimetry, color appearance and high dynamic range imaging.

• Chapter 4: presents a psycho-visual comparison of existing rTMOs and HDR color appearance models, assessing their ability to accurately reproduce the color appearance of images when expanding their dynamic range.

• Chapter 5: introduces a new experimental methodology based on the method of adjustment, designed specifically for the study of perceptual appearance at-tributes of emissive stimuli. A novel lightness model is also presented, based on the proposed experimental methodology, which covers extended luminance ranges. • Chapter 6: presents detailed investigation of chroma under extended luminance

range and introduce saturation correction method for rTMOs.

• Chapter 7: presents two color clipping and over-exposure correction methods, exploring both an efficient scan-line based solution as well as a more accurate, segmentation based approach.

• Chapter 8: summarizes the contributions, limitations of the major algorithms and experiments of the thesis and gives future work recommendations.

(20)

Evolution of Digital Acquisition

and Display Devices

Digital technologies involved in the capturing, processing, exchanging, and displaying of color images have been advancing at a rapid pace in recent years. Particularly, the change in acquisition and display technologies is outstanding. There have been developments in quality, ease of use and affordability. Capturing and presenting real world scenes to the intended audience has become more precise, allowing viewers to experience a more accurate representation of the world around them. They become more interactive and easy to use. This, combined with the unprecedented availability of both cameras and displays, makes accurate reproduction of color and light an increasingly important consideration, guiding both hardware and algorithm design.

This chapter will focus on the evolution of digital camera and display device qualities. The overall quality of digital cameras and display devices is highly influenced by factors such as resolution, dynamic range and color gamut all of which affect the quality and ability of a device to record and re-create the scene.

2.1 Development of Digital Cameras

The world we live in has a virtually unlimited range of brightness levels and colors. But we can only perceive a limited range of these aspects, due to the limitations of different

parts of the human visual system (further discussed in Chapter3). Image capture devices

are even more limited than our visual system.

Conventional digital camera technologies cannot capture the full spectral content and dynamic range of a real scene. The construction of their image sensor and the design of their optical elements limits the recordable amount of information.

The quality of digital cameras is mainly determined by their image sensor resolution, pixel size, and bit-depth as well as their lens resolving power [46]. The resolution deter-mines the amount of detail the camera is able to capture: the more pixels a camera has, the more detail will be captured.

In addition to resolution, pixel size and lens resolving power have a significant effect on the recordable amount of details. Pixel size is dependent on the actual sensor size and it is defined as the ratio between the width of the sensor and the width of the image resolution. Generally camera sensor has larger pixel size, then the amount of noise will be reduced and more details with a higher dynamic range can be captured. On the other

(21)

hand, if the lens is too poor to resolve enough detail to provide data for each pixel on the sensor then having a higher resolution sensor not necessarily increase the amount of detail recorded.

It is also important to note that the precision of the digitization process of the analog to digital converter (A/D), where light measurements are translated into digital values, can also limit the usable dynamic range. The accuracy of an A/D converter is usually described in terms of bits of precision, which directly relate to the number of distinct steps that can be encoded. A particular A/D with a bit precision of ‘n’ can only translate

a possible brightness range of 0–2n _{which is a contrast ratio of 2}n _{: 1 or n f-stops}

of dynamic range [1]. f-stop (Focal-stop) stands for the measurement of the aperture

setting of a camera lens which is defined as focal length of the lens divided by the diameter of its opening.

Most digital cameras offer a limited dynamic range due to their limited sensor sizes, varying from 8 stops for mobile or point shot cameras and reaches up to 10 - 11 for DSLR cameras. But recently, the increased demand for higher resolution, higher dy-namic range, wider color gamut, and a better low light performance has led to rapid advancements of camera sensor and optical element designs.

The first advancement in camera technology were seen by the rapid increase of their resolution. In the beginning digital cameras were produced with resolution of 1 MP and less. Within a few years, digital cameras resolution rises immensely. Currently, new DSLR digital cameras such as Canon 5DS are commercially available with a resolution

of up to 51 MP [47]. Even consumer cameras integrated in mobile phones are now

available with high resolution sensors, [48,49].

In addition to resolution, there has been also an increase in the pixels size of digital camera sensors which pushes the camera technologies towards high dynamic range. For example, DSLR cameras like Canon 5DS, Nikon D610 and Sony A7R II are now capable

of capturing the dynamic range of about 12 stops at a lower ISO settings [47,48,50].

Whereas, cameras with a limited sensor size like mobile phones are providing image processing and computational imaging solutions to achieve high dynamic range image capture2.1.2.

In cinematography, video camera technologies have advanced even further. Film cam-eras, for example, can have about 4 log units, 10000:1 contrast ratio. Other high end

cameras like RED Epic Dragon cameras [51] claimed to have a capacity of capturing

over 16 stops of dynamic range.

Nevertheless, these types of powerful cameras are not still affordable by the majority of consumers. Still, most image and video contents are represented with 8-bits of infor-mation per channel. This means that there are only 256 intensity values for the red, green and blue components of each pixel, which is insufficient to represent many types of scenes. Therefore, in a single shot all radiance values which are outside this range will be clipped, resulting in under or over-exposure. To better understand these phenomena, a brief explanation of camera exposure is given in the following section.

2.1.1 Camera Exposure

Camera exposure can be defined as the measurement of the amount of light allowed to reach the image sensor during the capturing process. Conventional cameras, due to sensor limitations, need a thoughtful adjustment and control of their various settings for achieving a correct exposure. Otherwise, all light information which is beyond the capacity of the sensor will be lost.

(22)

If a part of the camera sensor is exposed with a strong light, which is beyond the sensor capacity, then the corresponding pixels will be saturated and all the information above the limit will be clipped. This type of clipping usually results in detail and color loss, which is referred as over-exposure. On the opposite side, the exposure of a camera sensor, with low sensitivity by a very low light will lead to dark pixel values, where details become indistinguishable from pure black a phenomenon known as under-exposure. In digital photography, the three major camera settings which have a huge impact on the overall exposure of the final image are aperture, ISO and shutter speed. They are also collectively known as the exposure triangle, as depicted in Figure2.1 [1].

ISO

Aperture Shutter Speed Motion blur Noise

Depth of field Noise

Depth of field Motion blur

Figure 2.1: Exposure triangle.

Each setting has a different influence on several images properties and they affect the camera exposure differently. Since several combinations of the settings might give the correct amount of exposure, there should be an extensive understanding of each term and, among them, a careful trade-off should be made.

Aperture

Aperture is a circular opening in the camera lens that is adjustable from a very small circle to almost as large as the lens itself. It is used to control the amount of light that reaches the digital sensor [1]. The wider the aperture is, the more light will hit the

camera sensor. The aperture opening is expressed in f stops and Figure2.2demonstrates

some common values.

f1.4 f2.0 f2.8 f4.0 f8.0 f16.0 More light

Shallow Depth of field

Less light Deep Depth of field

Figure 2.2: Demonstration of Aperture openings and their effects. [1]

ISO

ISO is the sensitivity to light of the camera sensor. It is adjustable in a range of about 50 - 6400 depending on the capacity camera technologies and way beyond for high end cameras [1], Figure2.3. For higher sensitivity setups, the shutter speed or the aperture should be set to a lower value in order to reduce the amount of light hitting the sensor for the correct exposure.

Shutter Speed

(23)

Low

Noise and Sensitivity

High

Noise and Sensitivity

ISO 100 200 400 800 1600 3200 More light Less light

Figure 2.3: Example digital camera ISO values and their effects.

to the digital sensor. It is expressed in fractions of a second, eg. 1/8, 1/125, 1/1000 etc,

Figure2.4. The longer the time, the more light will expose the sensor and more motion

will be captured [1].

Longer Shutter speed Shorter Shutter speed Seconds 1 1/2 1/4 1/8 1/15 1/30 1/60 1/125 1/250 1/500

Visible motion Freezed motion

Telephoto Hand capture possible

Preferable to use a Tripod

Figure 2.4: Demonstration of example shutter speed values and their effects.

2.1.2 HDR Image Capture

Even if capturing well exposed images is possible by controlling the above mentioned camera settings, the overall dynamic range that can be captured by a single shot of conventional digital cameras is still limited. The reason behind this problem is that the conventional digital cameras are constrained by their physical limitations like photo site size, thermal and shot noise. However, the full dynamic range of a scene can be recorded by capturing several exposures of the scene and combining them using the right type of software. The end result of such multiple exposure image capture will have more details in both dark and bright parts of the scene. This type of image is referred to as an HDR image or a radiance map of the scene.

The HDR image capturing process begins by capturing a number of exposures of the

scene at increasing exposure times, as shown in Figure 2.5. Large exposure differences

between these exposures results in inaccurate registration, distortions and ghosting arti-facts. To remove such problems, methods like Fibonacci bracketing where each exposure

time is taken to be the sum of the previous exposure times [52] and exposure

meter-ing with adjustable settmeter-ings for minimal visual artifact exposures captures [53] can be

applied. The resulting images of such exposure bracketing methods will have different pixels which are properly exposed and other pixels which are under- or overexposed. Also, the same spatial position can also be properly exposed within a number of images in the exposure sequence.

Once the exposure sequence is captured, the HDR image will be created by fusing the images so that details are visible everywhere in the image. For this type of fusion to be possible, the capturing device is expected to be linear. But in practice, digital cameras are not linear. Therefore, the camera response function (CRF) should be recovered and used to linearize the images.

(24)

Figure 2.5: Several images of the same scene captured using the same camera but with different exposure settings.

Since camera manufacturers do not provide the CRF of their camera typically, there have been a number of methods proposed to infer the CRF from a given exposure image

sequence. The first method, proposed by Mann and Picard [54], calculates the camera

response curve by fitting the values of pixels at different exposures. Then Debevec and Malik proposed a table based method using linear optimization to find a smooth curve,

which minimizes the squared error over the derived response function [55]. This method

was later improved by Mitsunaga and Nayar with a more robust method based on a polynomial representation [56].

After the CRF is known and the images are linearized to be in the same units of mea-surement, then the corresponding pixels can be averaged across the exposure sequences, excluding under- and overexposed pixels by applying weighting factors. For example, let us assume that the linearized pixel values, the weighting factors and the exposure

time are represented by Pij, w(Pij), and ∆tk, respectively. Then, the final HDR image

pixel values Hij will be computed as in Equation 2.1, [57].

Hij = N X k=1 Pijw(Pij) ∆tk N X k=1 w(Pij) (2.1)

The multiple exposure method, explained above, assumes that the exposure sequence images are perfectly aligned and there are no moving objects in the scene. In most practical cases, objects do not remain still between individual exposures and there is usually camera movement. Therefore, image alignment, lens flare and ghosting removal should also be considered. Although significant research has been dedicated to these areas, it is beyond the scope of this work. For a thorough discussion of existing techniques

(25)

2.2 Development of Display Devices

Display devices have shown substantial developments in terms of color image reproduc-tion quality. Their ability to reproduce a higher dynamic range, and more colors and pixels has increased significantly. The advancements in the technologies used for display production has played a considerable role for these improvements. Hence, to begin with, we would like to give a brief overview of the common display technologies.

2.2.1 Overview of Display Technologies

The first electronic display invented by Karl Ferdinand Braun, a German physicist, was

a Cathode Ray Tube (CRT) display [2, 36, 58]. CRT displays use vacuum tubes (one

for black and white displays, or three for color displays), electron guns and a phosphor coated screen for generating intensities and for converting the light energy into photons, respectively. These devices were mainly criticized for their nonlinearity, bulky size and shape, higher power consumption, higher amount of glare, need of refresh rate, heat production and burn-in problems. On the other hand, their ability to produce images in any resolution, with higher number of bits (up to 32 bit per channel), and a wider viewing angle, as well as their ability to display true black, made them preferable for many applications, including perceptual experiments and color grading.

To address some of the limitations of CRT displays, Plasma displays were introduced, which also use phosphors for light generation. Instead of using space-consuming electron beams for phosphor excitation, this technology ionizes the noble gas and turns it into plasma by applying high voltage. The collision among plasma ions generates the imi-tation of photons in the ultraviolet range which then be converted into visible light by the phosphors. The introduction of plasma technologies lead to some advantages over previous types of displays [2,36,58]. The construction of much thinner displays with the same optical properties of CRTs become possible. Additionally, Plasma displays have the capability of producing deep blacks and higher brightness levels than CRTs, which creates a higher contrast ratio. But, problems like phosphor burn-in, increased glare, higher power consumption, and heat generation are still unresolved in this technology. The shimmering or flickering noise, screen-door effects, and radio frequency interference are also troubling parts of this technology, which have in recent years led to decreasing adoption.

Most of these problems are resolved in Liquid crystal displays (LCDs). LCDs are flat panel displays that operate based on light filtering, rather than light emission, using the light modulating properties of liquid crystals (LCs). Due to the tendencies of LCs to be aligned with the electric field line, LCDs manipulate the polarization direction of transmitted light in combination with electronically controllable static polarizing sheets. The source of light in such displays is a backlight module, typically made of a collection

of cold cathode fluorescent lamps (CCFLs) and light guides [59]. The imperfection of

the polarizer and LC materials make it difficult to achieve a true black. Regardless of their capability of producing higher brightness, this limits the local contrast that can be achieved by LCDs, relative to CRTs and plasma displays.

Color in LCDs is produced by using three LC cells with red, green and blue filters. Since the CCFLs backlights have almost flat spectrum, in order to achieve a wider color gamut narrower color filters are necessary. The process of narrowing the color filters results in the reduction of over-all brightness of the display. Also the presence of absorption in the color filters and the polarizers together with the control circuitry and

(26)

wires causes the LCD panels to filter up to 95% of the light emitted by the backlight. Hence, creating a brighter LCD display requires significantly increasing amount of power. This power problem of LCD panels has been resolved by using white or colored (red, green and blue) light emitting diodes (LEDs) which can achieve higher brightness levels with relatively low power consumption. This offers several additional advantages: LEDs can be individually controlled, allowing for increased local contrast, and color LEDs can be used to expand the display color gamut.

Two emerging technologies which enable a wider color gamut and a higher dynamic

range are based on organic light-emitting diode (OLEDs) and laser technologies [2,36].

They allow a very high brightness level and a true black with lower power consumption. Laser displays have the disadvantage of creating interference patterns in the human eye,

which is very hard to prevent [2]. On the other hand, OLED displays remain expensive

and have a limited lifetime, although this is expected to be resolved in the future. Most of the discussed emissive and reflective technologies consume high energy, making them unsuitable for use in mobile devices. Therefore, most mobile devices and electronic

papers are using reflective display technologies like reflective LCD panels [36] and

ma-terials made of titanium dioxide and absorbing diodes [2]. These technologies preserve

energy and make the displays bright enough to be operated under direct sunlight. At the same time, getting a very deep absorption (true black) is very difficult with reflective technologies and it highly limits their dynamic range and contrast ratio.

In addition to such self-contained displays, there are different format projection systems that project an image onto the wider projection screen through an optical system. Pro-jection technologies mainly operate by collimating light from a bright light source and filtering it using reflective or transmissive displays. Many such technologies use field sequential color — a way of projecting red, green and blue channel images, one after the other, by passing the projected light through different color filters — for achieving a color projector. Higher-end projectors on the other hand use three separate modulat-ing display panels (like LCDs) and most modern projectors use liquid-crystal-on-silicon (LCoS) LCD technologies, which allows them to have higher-resolution at the same physical dimension than other technologies. Digital light-processing (DLP) projectors, finally, use micro-mirrors, which are operated by a microelectromechanical system for representing different pixel intensities [2,36,60].

2.2.2 HDR Displays

So far, several international display standards have been proposed with maximum

lumi-nance range limitations as the basis of maximum possible CRT phosphor excitation [61].

Due to these standards, the produced standard cinema projectors and television

dis-plays targeted for the maximum luminance of 48 and 100 nits (cd/m2), respectively.

Even though, recently television displays with a higher maximum luminance (300 – 400

cd/m2) have been introduced, so far they have been used only with standard or low

dynamic range content.

However, currently, new LCD television sets (like Sony BVM-X300 [62]) are becoming

available, which are capable of producing peak luminance up to 1000 cd/m2. There are

also displays (like the Dolby DR 37p and SIM2 HDR47 [63,64]) used in different research

laboratories, for several types of perceptual studies, which produce a peak luminance of

up to 4000cd/m2 and contrast ratio of 150000 : 1.

Due to these recent advances as well as concurrent advances in resolution, a new set of standards has just been adopted by the International Telecommunications Union (ITU),

(27)

referenced as BT2020 and commercially referred to as Ultra high definition (UHD). In contrast to previous standards (e.g. BT. 709) new standards anticipate extended luminance ranges as well as the possibility for a wider color gamut, moving closer to a visualization of the real world.

The core principle behind HDR display technologies is local-dimming or dual-modulation,

as shown in Figure2.6. In this approach, two light modulators or displays will be

opti-cally combined in order to multiply their contrast. Mainly, the common CCFL uniform backlight of the LCD panels is replaced by a low resolution array of white LEDs, which can be altered individually or in clusters. This way, the dark image region LEDs can be totally turned off in order to achieve true black, while for bright image regions the

corresponding LEDs can be turned on to the maximum power, [2,64].

Low Resolution LED Backlights High Resolution LCD pannel

Low Resolution LED Backlights

Figure 2.6: Dual modulation or local-dimming display structure.

Creating real time and high quality HDR content on such local-dimming and dual-modulation displays needs a heuristic way of fast image processing in order to choose

LED intensities and the corresponding LCD pixel values. One is illustrated in Figure2.7,

[2,65]. Input HDR image I I1/2 Convolution by LED PSF L LCD transparency I/L Downsampling IL Deconvolution LED PSF LCD responce function LED values LCD pixel values

Figure 2.7: Basic heuristic image-processing algorithm of dual-modulation HDR dis-plays.

First, the square root function divides the input HDR image, I, into two target contrast images for the LED backlight and the LCD panel. The final LED values will be computed

by down-sampling LED target image to the LED resolution, IL, and by deconvolving it

by the LED point spread function. Then, the actual LED light distribution, L, will be computed through a convolution of the chosen LED values with the LED point spread function. Finally, the resulting LED pattern will be divided by the original HDR image

(28)

to determine the best LCD transparency values, which will later be converted in to pixel values using the LCD response function.

As explained by Seetzen et al. [66], this concept of dual-modulation can also be

ap-plied to a large screen projection technologies. The apparent way of dual-modulation implementation in projectors is by introducing a monochromatic low-resolution second modulator directly into the projection unit. Creating the projection screen using, ma-terials like electronic paper for changing it into a low-resolution second modulator, and using a large number of small projectors with their respective small LCD panels, single controllable LED lights and simple optics, have been also proposed as another way of

implementing the dual-modulation concept [2,66].

2.2.3 Display Characterization

To produce colorimetrically accurate images or stimuli the characteristics of a given display should be well-known. The process of measuring display characteristic is referred to as display characterization and the adjustment of the display device in order to make the display have a specified characterization is called display calibration.

Generally, to characterize a display device one should measure its CIE tristimulus values for every pixel position. This, however, would require millions of measurements, and is infeasible in reality. Therefore, to practically characterize a display, the number of measurements should be reduced. To that end, there are two common assumptions which should be validated by the display.

The first requirement is the channel independence of the display. This means that changing the digital input values of the one channel signal will not affect the signals of the other channels. This assumption can be proven by testing if the additive behavior

of Equation2.2 holds for the white pixels of the display:

2 4 XRGB YRGB ZRGB 3 5= 2 4 XR YR ZR 3 5+ 2 4 XG YG ZG 3 5+ 2 4 XB YB ZB 3 5+ 2 4 X0 Y0 Z0 3 5 (2.2)

where the subscript RGB denotes the tristimulus values of a white patch and R, G and

B denote the tristimulus values of the individual channel patches. In addition, the

subscript0 denotes the tristimulus values of the black correction, which is computed as

the sum of the unwanted surrounding lights reflected from the display screen while all the pixels of the display are set to zero.

For example, if we take the measured XYZ values of the white pixel of a display at intensity values of (255,255,255) and the measured XYZ values of individual primaries (Red (255,0,0) Green (0,255,0), and Blue (0,0,255)) the difference between the sum of the XYZ values of the primaries and the XYZ values of the white should be negligible for the channel independence of the display said to be hold.

The second important assumption for simplification of display characterization is the display’s channel chromaticity constancy. If the basic color of the emitted light of the display primaries remain unchanged regardless of the changes in their intensity values, then the display is considered to have chromaticity consistency, which can be expressed

(29)

as in Equation 2.3a, 2.3b and 2.3c. 2 4 XR YR ZR 3 5= f_r(R) 2 4 XRmax YRmax ZRmax 3 5 (2.3a) 2 4 XG YG ZG 3 5= f_g(G) 2 4 XGmax YGmax ZGmax 3 5 (2.3b) 2 4 XB YB ZB 3 5= fb(B) 2 4 XBmax YBmax ZBmax 3 5 (2.3c)

where Rmax, Gmax and Bmax are the tristimulus values measured when each respective

channel is set to the maximum intensity. The functions fr(R), fg(G) and fb(B) are the

tone response curves or electro-optical transfer functions (EOTF) of the Red, Green and Blue channels, respectively. These EOTF curves are generated by preparing, displaying, measuring and fitting the Y tristimulus values of several ramps for each channel from zero to its maximum intensity values. The fitting process can be done using linear or non-linear interpolation techniques.

Generally, if a given display satisfies the two mentioned assumptions then its general operation can be simulated by a forward model. The forward model is used to compute the XYZ tristimulus responses of the display for any input (R,G,B) triplets, as shown in Equation 2.4. 2 4 XRGB YRGB ZRGB 3 5= M 2 4 fr(R) fg(G) fb(B) 3 5+ 2 4 X0 Y0 Z0 3 5 (2.4)

where the matrix M is the 3x3 tristimulus matrix measured from the maximum RGB

pri-maries, shown in Equation2.5, or computed by following SMPTE recommendations [67].

M = 2

4

XRmax XGmax XBmax

YRmax YGmax YBmax

ZRmax ZGmax ZBmax

3

5 (2.5)

Another key aspect of of display characterization is knowing the necessary input (R’,G’,B’) values in order to be able to reproduce the desired XYZ tristimulus values on the display.

To do that, the forward model should be inverted as shown in Equation 2.6and 2.7.

2 4 RXYZ GXYZ BXYZ 3 5= M−1 2 4 X− X0 Y _{− Y}0 Z_{− Z}0 3 5 (2.6) 2 4 R0 G0 B0 3 5= 2 4 f−1 r (RXYZ) f_g−1(GXYZ) f_b−1(BXYZ) 3 5 (2.7)

The accuracy of these forward and inverse models depends on how well the display device in question meets the chromaticity consistency and channel independence assumptions. For more accurate modeling, the matrix M and the EOTF can be further optimized by

(30)

minimizing the difference between the real measured and predicted tristimulus values of several training and test patches.

2.2.3.1 HDR Display Characterization

Existing characterization studies show that several CRT and LCD displays satisfy the

channel independent and chromaticity constancy assumptions [35]. But some displays

may not satisfy these assumptions and therefore, they should be characterized by more complex methods than the standard technique explained in the previous section. For example, currently proposed HDR displays operate based on a dual modulation

procedure, as described in Section2.2.2. Therefore, some studies reason that the

one-to-one mapping relation between the input digital counts and measured tristimulus values

of the conventional LCD displays will no longer holds for such type of HDR displays [65].

Hence, the matrix and the EOTF parameters will no longer be constant.

One suggestion, for example, is that the normalized LED backlight intensity L, explained

in Section 2.2.2, should also be considered and incorporated in the conversion formula,

as in Equation 2.8 [65]. In the same study, it is also shown that the corresponding

LC driving signals with their EOTF functions need more optimization for a better colorimetric reproduction on the HDR LCD displays.

2 4 XRGB YRGB ZRGB 3 5= M 2 4 fr(R) fg(G) fb(B) 3 5L + 2 4 X0 Y0 Z0 3 5L (2.8)

However, there might be some HDR displays which satisfy the channel independence and chromaticity constancy assumptions. For such displays, the standard characterization technique, without the special consideration for LED backlight and LC modulation characteristics of the display, might be enough. Nevertheless, applying an additional optimization step will ensure the quality of the color reproduction of such displays. For example, we have characterized one of the commonly used experimental HDR displays,

namely the SIM2 HDR47 [64], for several of our experiments in Chapters 4, ?? and7.

The channel independence, chromaticity consistency, and accuracy of the display with just the standard characterization technique was found to be adequate. In this case, further details of the precise characterization procedure followed for characterizing this particular display is given in Section 4.

(31)

Digital Color Image Reproduction

Color image reproduction is the study of accurate reproduction of a real world scene across various digital color imaging devices. To achieve that with in a digital imaging

pipeline, as shown in Figure 3.1, several considerations and appropriate models are

necessary. This chapter will give an overview of the basic components necessary for accurately reproducing color images in digital media.

Every color imaging device has its own device dependent color coordinates. Hence, ac-curate transform of image color appearance from one device to another needs a thorough understanding and characterization of the color behavior of each implicated device. To this end, a key component is device independent color coordinates like CIEXYZ, which serves as intermediary for all transforms. In more complex scenarios, this is not suffi-cient. For instance, in many cases the viewing environment of the two devices can be very different. In this situation tristimulus values are not enough to accurately model the perceptual color appearance of the scene. Therefore, the preferred practice is to use color appearance models as a device independent color spaces, since they encompass many more aspects of the scene and environments.

For a proper communication between different devices, each individual device used in the color reproduction process (e.g. digital cameras and displays), needs to be characterized. Device characterization in color image reproduction helps to relate the color information of the device with standard device independent color coordinates which can describe the content’s appearance in perceptually meaningful terms.

In addition to their color coordinates, the basic technologies and capabilities of different

devices also vary, Chapter 2. Some devices might have a capacity of producing or

capturing a wider color gamut and higher dynamic range than others. During the transformation of color from a high capacity device to a lower one, the color gamut or the tone has to be mapped in a controlled manner to avoid possible loss of color and detail information. Also, inversely, the gamut and tone can be expanded when a device with higher capabilities becomes available to obtain a better viewing experience [34,35].

3.1 Human Color Vision and Basic Colorimetry

Color image reproduction aims to accurately replicate the human perception of the real world scene on different digital color image devices. To do that, the color perception mechanisms of the human visual system (HVS) need to be studied and modeled. Over the years, several studies examine the physiological structure of the use, the processing of light, or the range of visual capabilities of HVS.

(32)

Device Characterisation Device characterisation Color appearance or device independent color spaces Output-referred/LDR Scene-referred/HDR TMO rTMO Gamut mapping

HDR or Wide color gamut devices Conventional devices

Figure 3.1: Color management pipeline in color image reproduction.

The basic steps of color perception are visually illustrated in Figure 3.2. The sensory

process of vision is initiated by visible light reaching the eye which, after will be inter-acting with the physical and chemical properties of objects in its path. The modulated light then gets processed by the HVS to produce perception of color.

Physiologically, the HVS consists of the eye, neural system, and brain. Color perception is the result of the interaction between the physical stimulus and each of these parts of HVS. The receptor cells of the eye, convert the physical light stimulus, to electrically signal which is sent to the visual cortex of the brain through the optic nerve of the neural system. The visual cortex then interprets the signals in cooperation with other parts of the brain.

The retina of the human eye contains two types of receptors cells, rods and cones, which are sensitive to visible light, in the wavelength range of 360 nm to 830 nm. There are approximately 6 to 7 million cone receptors on a human retina. Cones are mainly concentrated at the center of the retina (the fovea) and they are responsible for discerning fine details. Cone vision, also known as photopic or bright-light vision, occurs for luminance levels greater than 0.3cd/m2 [2].

Illuminant

Onject reflectance

Illuminant SPD Reflectance SPD

Color mattching functions

Perceptual content and tone adaptation for HDR display technologies

Mekides Assefa Abebe

Perceptual content and tone adaptation for

HDR display technologies

Thèse

Docteur de l’université de poitiers

Mekides Assefa ABEBE

*******************************

Perceptual Content and Tone

Adaptation for HDR Display

Technologies

*******************************

Contents

Introduction

1.1

Goals and Scopes

1.2

Contributions

1.3

Thesis Outline

Evolution of Digital Acquisition

and Display Devices

2.1

Development of Digital Cameras

2.2

Development of Display Devices

Digital Color Image Reproduction

3.1

Human Color Vision and Basic Colorimetry