Learning geometric and lighting priors from natural images

(1)

Learning geometric and lighting priors from natural

images

Thèse

Yannick Hold Geoffroy

Doctorat en génie électrique

Philosophiæ doctor (Ph. D.)

(2)

Learning Geometric and Lighting Priors

from Natural Images

Thèse

Yannick Hold-Geoffroy

Sous la direction de:

Jean-François Lalonde, directeur de recherche Paulo F.U. Gotardo, codirecteur de recherche

(3)

Résumé

Comprendre les images est d’une importance cruciale pour une pléthore de tâches, de la composition numérique au ré-éclairage d’une image, en passant par la reconstruction 3D d’objets. Ces tâches permettent aux artistes visuels de réaliser des chef-d’œuvres ou d’aider des opérateurs à prendre des décisions de façon sécuritaire en fonction de stimulis visuels. Pour beaucoup de ces tâches, les modèles physiques et géométriques que la communauté scientifique a développés donnent lieu à des problèmes mal posés possédant plusieurs solutions, dont généralement une seule est raisonnable. Pour résoudre ces indéterminations, le raisonnement sur le contexte visuel et sémantique d’une scène est habituellement relayé à un artiste ou un expert qui emploie son expérience pour réaliser son travail. Ceci est dû au fait qu’il est généralement nécessaire de raisonner sur la scène de façon globale afin d’obtenir des résultats plausibles et appréciables. Serait-il possible de modéliser l’expérience à partir de données visuelles et d’automatiser en partie ou en totalité ces tâches ? Le sujet de cette thèse est celui-ci : la modélisation d’a priori par apprentissage automatique profond pour permettre la résolution de problèmes typiquement mal posés. Plus spécifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photométrie, 2) l’estimation d’illumination extérieure à partir d’une seule image et 3) l’estimation de calibration de caméra à partir d’une seule image avec un contenu générique. Ces trois sujets seront abordés avec une perspective axée sur les données. Chacun de ces axes comporte des analyses de performance approfondies et, malgré la réputation d’opacité des algorithmes d’apprentissage machine profonds, nous proposons des études sur les indices visuels captés par nos méthodes.

(4)

Abstract

Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods.

(5)

List of Tables

5.1 Sampling of camera parameters used to generate the dataset for the human

sensitivity study. . . 67

5.2 Human method preference study results . . . 76

A.1 Mean sun visibility available in our database . . . 83

(8)

List of Figures

1.1 State-of-the-art calibrated photometric stereo . . . 7

1.2 State-of-the-art semi-calibrated photometric stereo . . . 9

1.3 Visualisations of the Rayleigh and Mie scattering . . . 11

1.4 Examples of skies produced by the Hošek-Wilkie sky model . . . 12

1.5 Examples of suns produced by the Hošek-Wilkie solar model. . . 12

1.6 Examples renders lit by the Hošek-Wilkie solar and sky models . . . 14

2.1 Sky Database . . . 18

2.2 Database capture apparatus . . . 19

2.3 HDRDB dataset excerpt . . . 19

2.4 Hemispheres deﬁned by surface normals . . . 20

2.5 Impact of cloud coverage on PS conditioning . . . 23

2.6 Simulated noise gain as function of solar arc and mean light vector shift . . . . 23

2.7 Examples of mean light vectors in function of the normal studied . . . 26

2.8 Noise gain over the sphere . . . 27

2.9 Fine-grained analysis of PS uncertainty . . . 28

2.10 Distribution of noise gain ratio in function of interval duration . . . 29

2.11 Normal recovery error as a function of time interval and start time . . . 30

2.12 Real data capture setup . . . 30

2.13 Validation on real data . . . 31

3.1 Solar analemma: position of the sun in the sky . . . 37

3.2 Method overview and model architecture. . . 38

3.3 Lighting environment maps and renders throughout a day . . . 41

3.4 Reconstruction error on real lighting . . . 43

3.5 Surface reconstruction performance in function of camera calibration error . . . 44

3.6 Ablation study: surface reconstruction performance in function of the number of input images . . . 45

3.7 CNN focus analysis. . . 45

3.8 Results on non-uniform BRDF . . . 46

4.1 Presentation of the proposed method . . . 48

4.2 Impact of sky turbidity t on rendered objects . . . 50

4.3 Neural network architecture . . . 52

4.4 Quantitative evaluation of sun position estimation . . . 54

(9)

4.8 Quantitative relighting comparison with ground truth lighting on SUN360 . . . 57

4.9 Virtual object insertion with authomated lighting estimation . . . 58

4.10 Virtual object insertion with automated lighting and camera elevation estimation 59 4.11 Object relighting comparison with ground truth illumination conditions on HDR panoramas . . . 60

4.12 Typical failure cases of sun position estimation . . . 61

5.1 Example results of horizon line estimation . . . 64

5.2 Camera parameters. . . 66

5.3 Pitch and roll estimation performance . . . 69

5.4 Field of view estimation performance . . . 70

5.5 Analysis of the neural network focus . . . 70

5.6 Human sensitivity to calibration errors . . . 73

5.7 Human sensitivity to errors in calibration per parameter . . . 74

5.8 Examples of image retrieval . . . 75

5.9 Performance on human sensitivity measure . . . 75

5.10 2D compositing example . . . 77

5.11 Examples of virtual object insertion . . . 78

A.1 Fine-grained analysis of the expected uncertainty of outdoor PS . . . 84

A.2 Fine-grained analysis of the expected uncertainty of outdoor PS (cont.) . . . . 85

A.3 Distribution of noise gain ratio rt . . . 86

A.4 Distribution of noise gain ratio rt (cont.) . . . 87

A.5 Example of input images used for reconstruction . . . 88

A.6 Surface reconstruction errors as a function of capture interval duration . . . 89

D.1 Outdoor dynamic range . . . 118

E.1 Inﬂuence of parameter errors on human sensitivity . . . 129

E.2 Influence of parameter errors wrt. other parameter values on human sensitivity 129 E.3 Influence of parameter values wrt. other parameter errors on human sensitivity 129 E.4 Influence of parameter values between themselves on human sensitivity . . . 130

(10)

Introduction

Natural images are but a glimpse of captured light. Whether they depict a morning in the park, an afternoon at the beach or an evening in the living room, they all resonate strongly with the human visual system. We are able to appreciate such images because evolution endowed us with high dynamic range wide-angle stereoscopic sensors, namely the eyes. But even equipped with those high-performance sensors, our complex nervous system evolved to heavily rely on past memories to perform its task. In human brains, the main central connection to the optic nerve is the lateral geniculate nucleus (LGN), the relay center for the visual pathway. It has been shown that only 5–10% of the input to the LGN derives from the retina, the remainder being connected to other regions of the brain [120]. This is why we are able to navigate through complex environments like university oﬃces or correctly estimate distances, even with a single eye opened. In short, the meaning of the perceived light intensity is not only deﬁned by its surroundings and contexts, but also by what happened in the past, as shaped by decades of learning experienced by the observer.

As we begin to better understand our own visual system, it becomes clear that, just like humans, high-level computer vision tasks must also rely on prior knowledge to be performed successfully. In this setting, this knowledge must be gathered from a large amount of image data and human annotations. Fortunately, the advent of social media has brought a phenomenal inﬂux of images of all sorts each day to public databases, enabling the development of data-hungry machine learning algorithms such as deep neural networks. Such data-driven approaches are not only possible due to the newly available datasets, but also given the recent increase in computing speed and storage capacity of contemporary computers, allowing to handle the staggering amount of data publicly available nowadays. These machine learning methods bring a new paradigm to tackle vision problems: modeling prior knowledge on natural images. These priors (initial beliefs on a probability distribution) are eﬀectively additional constraints that can complement classical physics- or geometry-based approaches, which can improve solutions to ambiguous problems.

This is the main topic of this dissertation: modeling priors through machine learning to understand and solve problems that are ill-posed when considering them exclusively from a

(11)

1. understanding and performing 3D surface reconstruction using photometric cues under outdoor lighting conditions;

2. modeling outdoor lighting by observing it indirectly, through the scene;

3. estimating geometric camera calibration using a single image of generic scenes.

For all those scenarios, we are interested in cases when the environment is mostly uncontrolled, resulting in ambiguous situations, impossible to solve robustly with classical approaches. First, we will focus on recovering the surface geometry of a 3D object, which can be done by photometric stereo (PS), a popular dense shape reconstruction technique that has matured extensively over nearly 40 years [127]. Simply put, this technique proposes to recover the surface normals of a 3D object observed under varying illumination without requiring multiple viewpoints. PS is reputed to give accurate surface normal estimations in fully calibrated environments, where lighting is controlled (given the material reﬂectance matches the assumed model).

Recent investigations have turned to the more challenging problem of applying PS in outdoor environments, under uncontrolled, natural illumination. To do so, people used the sun as a point light source to model outdoor lighting conditions. However, the sun follows a mostly coplanar path throughout a single day, leading to redundant photometric cues which do not deﬁne a unique solution to the PS problem. Recent approaches proposed to capture images over the course of many months [2, 1]. This time interval provides enough shifting to the solar plane to constrain correctly the PS problem under the point light source assumption. Unfortunately, waiting for several months is tedious and impractical.

In this thesis, we propose to leverage the richness of natural illumination to solve the out-door photometric stereo problem in shorter time intervals. This brings us to our ﬁrst main contribution:

Short-Term Photometric StereoWe present a systematic analysis of the ex-pected performance of PS algorithms in outdoor settings on a single day or less, and propose to solve the short-term PS problem under various weather conditions by compensating the missing information from photometric cues with learned priors.

By using a richer lighting model than the point light source to solve the outdoor PS problem, we are able to understand why and when outdoor photometric cues alone can result in a stable surface reconstruction. We further show that, in some cases, photometric cues alone cannot solve this problem robustly. In such cases, we propose to augment the photometric cues with knowledge on local surface geometry, material and their interaction with natural lighting in order to solve the PS problem in those hard cases.

(12)

from a single outdoor image. Outdoor lighting conditions throughout the day are mainly governed by the position of the sun and the amount of clouds, or sky turbidity (quantity of aerosols like water vapor present in the atmosphere). The goal is to estimate those parameters from an image of a generic scene taken with a standard camera. What makes this problem particularly challenging is the uncertainty present in generic images. Indeed, the sky and the sun may not be directly visible in the image, so we must estimate their properties by observing their impact on the scene. A classical way to do so would be to rely on known properties of the scene like its geometry and the reﬂectance of its surfaces and obtain the illumination conditions by solving the rendering equation for light. However, those scene properties are typically unknown and estimating them robustly is still an open research question. Relying on explicit scene properties estimations to estimate lighting conditions is a process prone to errors. In this dissertation, we demonstrate that machine learning can be used to model priors on generic scenes and natural illumination and overcome the problem of explicit scene parameter estimation. Using this additional information, it is possible to improve current single-image lighting estimation techniques, leading to our second contribution:

Single Image Outdoor Lighting EstimationWe present a single-image learning-based approach to perform outdoor lighting estimation on generic scenes under natural daylight.

Estimating the sun position from shading cues or the sky (when present in the image) is a hard task that requires the estimation of the scene geometry and the segmentation of the sky. While we have no formal proof that our method performs those tasks (it is implicit in the network), our experiments and overall performance indicates that our method understands basic photometric cues, even though we never explicitly provided it information about light physics. This implicit understanding of light allows our method to yield state-of-the-art illumination estimation performance, enabling photorealistic virtual object insertions and automatic relighting. The last research axis focuses on geometric camera calibration from a single image of a generic scene. More specifically, we recover some intrinsic and extrinsic parameters of the camera. This task usually requires some specific object to be inserted in the scene, like a checkerboard pattern on a planar surface. Then, around a dozen images are taken, where this checkerboard is positioned in various locations and orientations in the image. Having to insert an object in the scene and capturing multiple images makes this process tedious. Furthermore, the large corpus of images that have already been taken typically does not contain such a specific object, making them inappropriate for classical geometric camera calibration. To simplify geometric calibration, one could perform it using the content present in a single image. However, this setup turns the calibration process into a severely ill-posed problem.

(13)

third contribution:

Single Image Camera CalibrationWe propose a single-image learning-based approach to perform geometric camera calibration. The proposed method works on generic scenes and does not require a speciﬁc object to be present in the image. Inferring geometric camera calibration directly from image pixels using a deep neural network improves performance and robustness over classical approaches. In particular, our method obtains state-of-the-art focal length and horizon estimation performance when comparing against other single image methods. However, we argue that we are often more interested in the human perception of accuracy instead of comparing against the ground truth. To this end, we performed a large-scale study of the human sensitivity to calibration errors and, based on this study, developed a novel perceptual measure to demonstrate that our deep calibration network outperforms other methods also in terms of human perception.

Recent advances in machine learning allow learned models to encode the structure of data in large datasets, surpassing what would be possible with handcrafted feature extractors. However, this learned structure and priors are implicitly encoded inside the model, meaning that what has been learned is not directly available once the model is trained. Throughout the contributions proposed by this thesis, we strive to understand through indirect analysis the behavior of the learned model. Concretely, we perform experiments like ablation studies understand how the model performs when some information is missing and show low-dimensional embedding representations to map the inner space learned by our method.

These three contributions can have a direct impact on various applications. For instance, surface reconstruction can allow the preservation of historical landmarks or cultural heritage like statues and architectural features in high-deﬁnition. It is also useful for the entertainment industry, where scanning real-life models is critical for realism in video games or the movie industry. Single-image lighting and geometric camera calibration enables automated photorealistic virtual object insertion, image alteration and relighting. As such, some of the proposed methods have already been transferred to Dimension, the new 3D editing software from Adobe Systems. It could also be applied to forensics, allowing the analysis of potentially manipulated images.

Organization

As described above, this thesis focuses on three main axes: 1) surface reconstruction through photometric cues, 2) learning-based lighting estimation and 3) single image camera calibration. All research axes are explored through a data-driven paradigm and exploit both existing datasets and data that we acquired.

First, chapter 2provides an in-depth analysis of the information contained in photometric cues throughout a single day and gives performance bounds on Photometric Stereo when performed

(14)

on intervals down to a single hour. Upon investigation, we found that partially cloudy days brought enough constraints to solve the PS problem using an adapted PS algorithm. However, we show that sunny days in general are lacking the constraints to provide a stable surface reconstruction. In chapter 3, we go one step further than what is possible using exclusively photometric cues. Since there is not enough information in a single sunny day to do a stable surface reconstruction solely from photometric cues, we employ a deep learning model to learn priors on local surface geometry and sun trajectory patterns. This additional information brings enough supplemental constraints to the PS problem to allow stable surface reconstructions. Chapters4and5propose to extract priors from large datasets and utilize them to improve single-image outdoor lighting and camera calibration, respectively. Most current outdoor lighting estimation techniques rely exclusively on handcrafted features, limiting their application to a determined environment, for example urban scenes [75]. We propose a lighting estimation approach that is robust to generic scenes. We are able to do so by learning features on a large number of scenes that captures the essence of natural illumination, while devoid of speciﬁc content. Finally, the last chapter covers the problem of camera calibration from a single image. An emphasis is put on focal length estimation and extrinsic calibration with respect to the earth. To do so, we focus on ﬁnding the horizon within the image, even when it is hidden, like in most indoor scenes. We further analyze human tolerance to errors on those calibration parameters.

(15)

Chapter 1

Related Work

In this chapter, we will cover the literature common to the research axes presented in this thesis. First, an overview of the recent literature on photometric stereo is discussed. An emphasis is put on dealing with various illumination conditions, from controlled laboratory conditions to the harder case of outdoor lighting. Then, a review of prior and current art in lighting modeling and estimation is presented. Since single image camera calibration is somewhat orthogonal to the other two research axes, we will review the literature relevant to this topic in chapter 5.

1.1 Photometric Stereo

As shown in Woodham’s seminal work [127], for Lambertian surfaces, calibrated PS computes a (scaled) normal vector in closed form as a simple convex linear function of the input image pixels (see annexB); this linear mapping is only well-deﬁned for images obtained under three or more (known) non-coplanar lighting directions. Since its inception in the early 80s, it has been explored under many an angle. Whether it has been to improve its ability to deal with complex materials [3], lighting conditions [3, 7, 62, 90] or enhance other techniques like multiview stereo [112], the myriad of papers published on the topic are testament to the interest this technique has garnered in the community. A more detailed overview of general PS can be found in the recent, excellent review in [109]. While most of the papers on this topic have focused on images captured in the lab, recent progress has allowed the application of PS on images captured outdoors, lit by the more challenging case of uncontrollable, natural illumination. A central question to any PS practitioner is that of the quality and amount of data required to achieve good performance. What should the lighting conditions be during data capture? How many images (illumination conditions) are needed? What is the shortest time interval required to collect these samples?

In the lab, theoretical analyses for Lambertian surfaces, lit by point light sources, reveal that the minimum number of images is three [127] and that the optimal light conﬁguration yields an

(16)

Figure 1.1 – When performing photometric stereo on images (top row), using a chrome sphere as light probe to capture the illumination (middle row) leads to a richer lighting model than the point light source approach typically used. Results are shown on the bottom row, showing the normal map n displayed as n · l with lighting l coming from (1_/√₃_{) · [-1, 1, 1]}T _{(ﬁrst image)}

and (1_/√₃_{) · [1, 1, 1]}T _{(second image). The third image shows the color-coded normal map. The}

fourth and ﬁfth images show novel views of the reconstructed 3D surface. Figure from [134].

orthogonal triplet of light directions [26]. While such theoretical guarantees are reassuring, they are however much harder to obtain for the case of more complex, non-Lambertian reﬂectance, or with more general lighting models. Thus, practitioners are left without guidance in the task of determining when to stop capturing data, an inherently tedious trial-and-error process. As a result, it is not rare for PS datasets to include hundreds of images [3] in an uncertain attempt to obtain accurate reconstruction.

Subsequent work on outdoor PS has struggled to meet the light non-coplanarity requirement since, over the course of a day, the sun shines from directions that nearly lie on a plane. These co-planar sun directions then yield an ill-posed problem known as two-source PS; despite extensive research using integrability and smoothness constraints [89, 46], results still present strong regularization artifacts on surfaces that are not smooth everywhere. To avoid this problem in outdoor PS, authors initially proposed gathering months of data, watching the sun elevation change over the seasons [1,2]. More recently, Shen et al. [107] noted that the

(17)

near the equinoxes. A creative solution to this problem was proposed in [56], but it is limited to objects that can be placed on a small moving platform. Therefore, capturing more data for fixed, large objects meant until recently waiting days, or potentially even months [2,1]. To compensate for limited sun motion, other approaches use richer illumination models that account for additional atmospheric factors in the sky. One way to achieve this is by employing (hemi-)spherical environment maps [22] of real sky captures [134,108,56]. By inserting light probes into the scene, one is able to capture both the object appearance and its illumination at the same time. An example of one of those methods proposed by Yu et al. [134] performing single-day photometric stereo is shown in fig. 1.1. In their work, they propose an iterative method that performs two steps in alternation. First, they initialize the surface normal that minimizes the pixel relighting error by testing normals uniformly sampled across the sphere. This initial normal defines a visibility hemisphere in the environment map, which leads to a finer lighting estimation. This new lighting estimation can be used to obtain a new refined normal. These two steps can be repeated until convergence, which usually happens in 3-5 iterations.

A second way to use richer illumination for photometric stereo is to synthesize the sky using a parametric sky model [57,63]. This leads to a more constrained formulation of photometric stereo, enabling the relaxation of the calibration requirements. As such, semi-calibrated methods like the one proposed by Jung et al. [63] began to appear. In their work, they do not need a light probe, removing the full calibration requirement. They propose to estimate the albedo of the surface using the ﬁrst eigenvector of the observed pixels in RGB space. However, their technique is not fully uncalibrated, as they rely on the precise geolocation of the camera in order to constrain the sun path in the sky on a single axis. An example of the results obtained by this algorithm is shown in ﬁg. 1.2.

Despite these developments, calibrated [134] and semi-calibrated [63] (based on precise geolo-cation) outdoor PS are still prone to potentially long waits for ideal conditions to arise in the sky and verifying the occurrence of such events is still a trial-and-error process.

A subsequent way that was explored to improve photometric stereo is to use machine learning to constrain the surface reconstruction problem with learned priors on images. So far, deep learning had only been applied for photometric stereo in indoor scenarios with rich and controlled illumination [135,102, 118,109], focusing on learning inverse functions for non-Lambertian reﬂectances.

Finally, under more extreme ambiguity, techniques for shape-from-shading (SfS) [52, 136,

79, 90,62, 6] attempt to recover 3D normals from a single input image, in which case the shading cue alone is obviously insufficient to uniquely define a solution. Thus, SfS relies strongly on priors of different complexities and deep learning is quickly bringing advances to the field [28,110,131,110].

(18)

Figure 1.2 – Example of single day outdoor photometric stereo without light probe inserted. Figure taken from [63].

1.2 Models for approximating outdoor lighting

Lighting is the fundamental element that makes up all images. As Paul Cézanne said, “there is no model; there is only color.” Knowing the lighting of a scene allows for a deeper understanding of the camera, the scene and its materials, and enables a myriad of applications like automatic virtual object insertion and relighting.

In this section, we present the most notable simulation-based and analytical outdoor illumination models as well as methods used to estimate their parameters from images.

1.2.1 Simulation-based outdoor illumination models

Simulation-based models seek to generate accurate representations of the sky. They are based on physics-based simulations to provide the luminance distribution of the sky as a function of the sun position and some atmospheric characteristics. They are usually slow to evaluate, but typically closer to real sky luminance measurements than analytical models.

The first model to explain the main physical phenomenons coming into play to form daylight was published by Nishita et al. in their seminal work of 1993 [88]. They explain that the most important physical phenomenon affecting the appearance of the sky is the Rayleigh scattering. It models interactions of the electromagnetic field (visible light, in this case) with small aerosols. Specifically, it encompasses interactions with particles having a radius r significantly smaller than the light wavelength λ. This domain of interactions, characterized by 2πr_λ ≪ 1, predicts the behavior of light when it interacts with nitrogen and oxygen, the major constituents of

(19)

inversely proportionally to the fourth power of the wavelength of light λ4_{. In the visible}

spectrum, this translates in small wavelengths (blues) being much more scattered than large wavelengths (reds), explaining why the sky is perceived blue by the human eye. During sunrises and sunsets, this phenomenon is ampliﬁed as light propagates through thicker atmosphere. The scattering of large wavelengths (reds) then becomes non-negligible, giving rise to the colorful sunrises and sunsets we know.

The second most important type of interactions solves Maxwell’s equations for the case of particles with a radius r roughly the same size as the light wavelength λ. This regime 2πr_λ ≈ 1 represents interactions of light with heavier molecules like water particles in suspension in the atmosphere due to humidity, in clouds or during fog. This domain of interactions is called Mie scattering and is responsible for the white appearance of the clouds.

Summing both Rayleigh and Mie scattering together gives luminance estimations of the sky dome very close to measurements performed during daylight, conﬁrming the dominance of those two phenomenons for the appearance of the sky. Example of those simulations of those phenomenons are shown in ﬁg. 1.3.

In subsequent work, Nishita et al. [87] enhanced their model by adding two elements: 1) the inﬂuence of ground albedo on the sky and 2) multiple scattering. A single scattering event per observed light ray was considered in the ﬁrst formulation of their work. In this more recent work, they consider multiple scattering interactions before observation.

Most published simulation-based models are variants of the Rayleigh and Mie scattering model, either with single or multiple scattering, as proposed in [88, 87], usually focusing on speeding up the simulation [91], slightly increasing agreement with measures of the real sky [38,10] or extending it to arbitrary atmospheres like oceans [29].

Daylight is not the only phenomenon that received attention in the literature. A night sky model has also been developed [60], taking into consideration the sun’s reﬂection on the moon as well as the light emitted from the stars.

1.2.2 Physically-based analytic outdoor illumination models

Contrarily to sky models based on physics simulations, analytical sky models are composed of a single empirically-derived closed-form equation to obtain the luminance of the sky. The parameters of this equation are then ﬁt to one of the simulation described in the previous section. While they are typically used for rendering and for real-time applications because of their high evaluation speed, their accuracy is somewhat worse than simulation-based models, giving sky reconstructions that agree less with real skies measurements.

The ﬁrst weather condition that was modeled in the literature is the overcast sky. Kimball and Hand [69] were the ﬁrst to report in 1921 that cloudy days had higher luminosity near

(20)

Rayleigh Scattering Mie Scattering Photon wavelength (λ) small particle radius (r) Photon wavelength (λ) medium particle radius (r) 2πr λ ≪ 1 2πrλ ≈ 1

Figure 1.3 – Rayleigh (left) and Mie (right) scattering when the sun is at zenith. The images of the sky use the skyangular representation, where the zenith is at the center of the image and the horizon lies on the largest circle of the image. This representation is equivalent to a picture taken with a 180◦ _{ﬁsheye lens pointed up at the sky.}

their zenith, and decreasing luminosity toward the horizon. Two decades later, Moon and Spencer [86] formulated the ﬁrst luminance distribution model for the overcast sky. This work has been revisited and adopted as the CIE overcast sky model in 1996 [15]. In this model, the luminance of the sky Yz in function of the zenith angle of a sky element θ (pixel) is deﬁned as

Yz =

1 + 2 cos (θ)

3 . (1.1)

To summarize, the sky luminance of overcast days is proportional to the cos of the zenith angle. While this model predicts quite accurately overcast days, it does not tell the whole story about daylight in general. In particular, other weather conditions like clear and partially cloudy days are more challenging to model accurately, but also oﬀer more complex lighting conditions.

(21)

T = 2 T = 4 T = 6 T = 8

Figure 1.4 – Examples of skies produced by the Hošek-Wilkie sky model for various turbidity T values. Figure from [53].

Figure 1.5 – Examples of suns produced by the Hošek-Wilkie solar model for various sun elevations and turbidity T values. Figure from [53].

sky model [20]. Based on this model, Perez et al. [93] proposed a generalization called the all-weather sky luminance distribution model. This model deﬁnes the luminance of the sky dome as a function of the zenith angle of the considered sky element θ and the angular distance between this sky element and the sun position γ:

Yz = 1 + AeB/cosθ | {z } geometric factor · 1 + CeDγ+ E cos2γ | {z } indicatrix function . (1.2)

This equation is parameterized by five coefficients (A-E) that can be varied to generate a wide range of skies. The first term in parenthesis represents the luminance decrease as the zenith angle increase while the second term is called an indicatrix function and approximates the weather.

(22)

to a single physically meaningful value: Linke’s turbidity [84]. This parameter is deﬁned as T = tm+ th

tm

, (1.3)

where tm is the optical thickness of the molecular atmosphere (devoid of haze), and th is

the optical thickness of the haze atmosphere. In order to apply this turbidity parameter to the Perez model, they perform the physics-based simulation proposed in [87] by varying the turbidity T between 2 and 6. They then ﬁt the 5 parameters (A-E) of Perez’s model to the simulation results using a Levenberg-Marquardt non-linear least squares optimization. From this, they obtain a linear function that maps the parameters optimization to the turbidity T . They also extend their model with chrominance, leading to the ﬁrst analytical model with color. Over the years, some limitations of this Preetham model were detected. Eight years after its publication, a critic was published [141] relating the relatively small valid turbidity range, the anthelic region (opposite of the sun with the same elevation from an observer’s perspective) being systematically too bright and the too smooth intensity peak toward the sun.

Lalonde and Matthews [77] contributed to solving the lack of peakiness issue of the Preetham model by combining it with a novel empirical sun model. Hošek and Wilkie also proposed a sky luminance model [53] that fixed most of the antisolar region mismatch while keeping the same turbidity T parameter. In order to do so, they performed more simulations, still based on [87], and added 4 parameters to the indicatrix term of eq. (1.2). During their simulation, they sample the wavelength over eleven spectral channels from 320nm to 720nm, leading to an hyperspectral model that can be interpolated over the whole visible spectrum and slightly into the ultraviolet spectrum. Furthermore, they noted that the linear interpolation used by the Preetham model cannot account for abrupt variations in sky appearance at low solar elevations. To remedy this, they use a quintic-order Bezier polynomial to perform the fit between turbidity, wavelength, ground albedo and the 9 parameters (A-I) of their proposed analytical equation. Examples of the sky dome produced by this sky model are shown in fig. 1.4. In subsequent work, they extended their model to include a solar radiance function [54], as shown in fig. 1.5. Some of the rendering capabilities of this model is shown in fig. 1.6. Finally, they added earth-like extrasolar skies capabilities to their model [125].

1.3 Parameters estimation for outdoor illumination models

Aside from the fully overcast sky model, all sky models require some parameters to be estimated in order to reproduce a given sky. A fundamental parameter common to all non-overcast daylight sky models is the position of the sun in the sky. As seen in the previous section, most models then add some atmospheric parameters (for example turbidity) to increase agreement between prediction and real sky measurements. Estimating all those parameters allows us

(23)

Figure 1.6 – Examples of renders lit by the Hošek-Wilkie solar and sky models. A sun elevation of 8◦ _{was used to produce these renders, with turbidities of (a) 1, (b) 2, (c) 3, (d) 4, (e) 5, (f)}

6, (g) 7, (h) 8, (i) 9, and (j) 10. Images are tone-mapped to appear roughly equally bright. Figure from [53].

partially visible or absent. If we want to model lighting on those images, we must estimate its parameters by looking indirectly at lighting, through the scene or partial observations. This section gives an overview of techniques proposed to estimate the lighting from images taken with a regular camera.

Lalonde et al. [75] predicts the direction and visibility of the sun from a single outdoor image. In order to do so, they obtain an estimation of sky illumination (represented by the Perez model [93]) from sky pixels [78] and combine it with four other cues: the direction of shadows attached to vertical surfaces, the shading of vertical surfaces, the appearance of pedestrians and a general fixed prior on sun position extracted from a dataset of 6 millions image. To evaluate their method, they propose a novel dataset of urban images annotated with sun position. However, most of the hand-crafted features they use are tailored specifically for urban scenes, where buildings propose vertical surfaces to observe and humans can be detected using a pedestrian detector. This makes their method perform well in urban scenes, but poorly on images in other locations and devoid of such specific features.

Other techniques for single image illumination estimation rely on known geometry and/or strong priors on scene reﬂectance, geometry and illumination [6,5, 82]. These priors are usually crafted for speciﬁc scenes, and typically do not generalize to large-scale outdoor scenes. In the outdoor case, the typical approach to model the sky is to use a mirror sphere as light probe [22]. Newer techniques propose to relax the mirror sphere requirement to more generic objects. For example, Calian et al. [11] proposed to use the human face as light probe by modeling the appearance of the skin under outdoor illumination, as modeled by the Lalonde-Matthews sky model [77]. Karsch et al. [64] retrieve panoramas (from the SUN360 panorama dataset [132])

(24)

with features similar to the input image, and reﬁne the retrieved panoramas to compute the illumination. However, the matching metric is based on image content which may not be directly linked with illumination.

Another class of techniques simplify the problem by estimating illumination from image collections. Multi-view image collections have been used to reconstruct geometry, which is used to recover outdoor illumination [39,77,106,27], sun direction [124], or place and time of capture [42]. Appearance changes have also been used to recover colorimetric variations of outdoor sun-sky illumination [117].

While estimating outdoor illumination is useful to reason about the scene, it can also tell a lot about the camera. In this vein, Lalonde et al. [78] presents a method to estimate the camera focal length as well as its zenith and azimuth angles with respect to the earth horizon. Their method can be used to perform geolocalization and cloud segmentation. Kawakami et al. [65] pushes these ideas further and proposes a technique to estimate the camera spectral sensitivities and white balance from sky regions within images.

(25)

Chapter 2

Analysis of Short-Term Outdoor

Photometric Stereo

Photometric stereo has been studied extensively in the past decades, especially on how to deal with various lighting conditions [3,7, 62,90, 2,1]. Most of the work on this topic has been done in the laboratory, where illumination is controlled. When trying to apply the technique outdoor, where the illumination is uncontrolled, (e.g., [116,61,72,107]) we realized that the sun, when modeled as a point light source, does not oﬀer enough variability throughout a single day to robustly solve the PS problem. To circumvent this issue, researchers proposed to capture for extended amount of time in order to have enough lighting diversity to constrain suﬃciently the problem [2,1]. Although previous work has presented sensitivity analyses for standard PS with point light sources, so far as we know, the case of outdoor PS with more complex illumination models has received little attention.

A promising approach to answer this question is to use more elaborate models of illumination— high dynamic range (HDR) environment maps [98]—as input to outdoor PS. Promising results have been reported in [134] for outdoor images taken within an interval of just eight hours within a single day. However, the quality of outdoor results is reported to be inferior to that obtained in indoor environments, the decline being attributed to modest variation in sunlight. This observation leads to many interesting, unanswered questions: had the sun path and atmosphere conditions been diﬀerent on that day, could the quality of their results have been better? What is the minimum time interval required to obtain good results in outdoor PS? Is a full day of observations enough and/or necessary to obtain good results in outdoor PS? Or could similar results be obtained over a shorter time interval (say, 1 hour)? If so, what should be happening in the sky over that time interval? Besides easing the requirements on data capture, these questions are also important in that, in future work, they could extend the applicability of outdoor PS to new scenarios (e.g., to more quickly capture non-static outdoor objects that show gradual changes in shape or appearance over time).

(26)

In this chapter, we present the ﬁrst answers to the questions above1_{. Here, we seek to determine}

the relationship between expected surface reconstruction performance and photometric cues available outdoors over the course of the day. Our main goal is to assess the reconstructibility of surface patches as a function of their orientation and the illumination conditions, given a set of HDR environment maps captured throughout a single day or less. To achieve this goal, we use a large database of natural, outdoor illumination (sky probes), which we will discuss ﬁrst. Then, a detailed look at the conditions under which normals can be reconstructed reliably will be presented, followed by an analysis of surface reconstruction stability. Finally, we explore the occurrence of these conditions over the course of a single day and whether PS can be applied over intervals that span less than one day.

2.1 HDR database

So far, in most literature on Photometric Stereo, the light visible by a surface patch is modeled with simple directional illumination, where optimal lighting configurations can be theoretically derived [26,72,107]. Until recently, no attempt has been made yet to model natural lighting conditions with more realistic illumination models in an outdoor setup, where lighting cannot be controlled and atmospheric effects are difficult to predict. In such an uncontrolled environment, exploiting the subtlety and richness of natural lighting is key to increase the efficiency of PS and successfully apply it to short intervals of time.

In order to understand this complex natural phenomenon, we built a rich dataset of High Dynamic Range (HDR) images of the sky, captured under a wide variety of conditions. The database built is based on and used by [77, 50, 51] as well as chapter 3of this thesis. A public version of this database is available at http://hdrdb.com/.

To build this database, we captured HDR images of the sky hemisphere using the approach described in [114]. Pictures of the capture system are shown in fig. 2.2. For each image, we captured seven exposures of the sky ranging from 1/8000 to 1 second, using a Canon EOS 5D Mark III camera installed on a tripod, and fitted with a SIGMA EXDG 8mm fisheye lens. A 3.0 ND filter was installed behind the lens, necessary to accurately measure the sun intensity. The exposures were stored as 14-bit RAW images at the full resolution of the camera. The camera was controlled using a Raspberry Pi via a USB connection, and the setup was mounted on the roof of a tall building to capture the entire sky hemisphere. Every two minutes, we captured the seven exposures required to span the full dynamic range of the outdoor sky. The fisheye lens was radiometrically calibrated to account for chromaticity shifts caused by the ND filter, geometrically calibrated using [103], and the resulting light probes mapped to the angular environment map representation [98] for storage in floating-point EXR format. We merged the seven exposures using [24] to create one HDR sky probe per exposure set.

(27)

Figure 2.1 – Dynamic range in our sky database. Four diﬀerent exposures of the same sky probe are shown, each expressed as factors (indicated as insets) of a reference image (1). The left-most image appears completely black, but zooming in (inset) reveals that the sun intensity is captured without saturation.

Because the camera may have shifted from one capture day to another, we automatically align all sky probes to the world reference frame. This was done by detecting the sun in at least 3 images for a given day, and by computing the rotation matrix which best aligned the detected positions and the real sun coordinates (obtained with [96]). For days when the sun was never visible, the probes were manually aligned using other aligned light probes as examples, and by matching visible buildings close to the horizon.

In all, the dataset totals more than 5,000 illumination conditions, captured over 50 diﬀerent days. Fig. 2.3shows examples of these environment maps. Note that while the examples have been tone mapped for display, the actual sky images have extremely high dynamic range, and span the full 22 stops required to properly capture outdoor lighting, as shown in ﬁg. 2.1.

2.2 Image formation model

Consider a small, Lambertian surface patch with normal vector n and albedo ρ (w.l.o.g., assume albedo is monochromatic). At time t, this surface patch is observed under natural, outdoor illumination represented by the environment map Lt(ω) (e.g., ﬁg. 2.3), with ω denoting a

direction in the unit sphere. With an orthographic camera model, this patch is depicted as an image pixel with intensity

bt= ρ

π Z

Ωn

Lt(ω)hω, nidω , (2.1)

where h·, ·i denotes the dot product. Integration is carried out over the hemisphere of incoming light, Ω_n, deﬁned by the local orientation n of the surface, ﬁg.2.4. This hemisphere corresponds to an occlusion (or attached shadow) mask; only half of the pixels in the environment map contribute to the illumination of the surface patch. To make the analysis tractable and independent on object geometry, this chapter focuses on the simpler case without cast shadows.

(28)

Figure 2.2 – Left: sky capture apparatus we developed. Right: the inside of the box, holding the Canon 5D mark III and the Raspberry Pi control system.

11:00 11:30 12:00 12:30 13:00 13:30 14:00 14:30 15:00 15:30 16:00 16:30 08/24/2013 ligh t clouds 11/06/2013 mixed 11/08/2014 hea vy clds.

Figure 2.3 – Examples from our dataset of HDR outdoor illumination conditions. In all, our dataset contains 3,800 diﬀerent illumination conditions, captured from 10:30 until 16:30, during 23 days, spread over ten months and at two geographical locations. Each image is stored in the 32-bit ﬂoating point EXR format, and shown tone mapped here for display (with γ = 1.6). The companion videoa

(29)

Figure 2.4 – A normal n deﬁnes an integration domain Ωn equivalent to a hemisphere on the

entire spherical environment map. Only light emanating from this hemisphere contribute to the shading on that patch. Therefore, patches with diﬀerent normals are lit diﬀerently even if the environment map is the same.

This image formation model is then discretized as, bt= ρ π X ω_j∈Ω_n b Lt(ωj)hωj, ni , (2.2)

with bLt(ωj) = Lt(ωj)∆ωj representing the environment map weighted by the solid angle

∆ωj spanned by pixel j (∆ωj, ∀j, are normalized as to sum to 2π). Eq. (2.2) can be further

summarized into the equivalent form

bt= ¯lTtx (2.3)

where x = ρn is the albedo-scaled normal vector and ¯_l_t₌ 1 π X ωj∈Ωn b Lt(ωj)ωj ∈ R3 (2.4)

is interpreted as the mean light vector (MLV) for the environment map at a time t. It is important to note that, as opposed to the traditional PS scenario where point light sources are fixed and thus independent of n, here the per-pixel MLV is a function of n. Thus, patches with different orientations define different sets of MLVs (as discussed later and shown in fig. 2.7). Given multiple images taken at times t ∈ {1, 2, . . . , T }, we collect all photometric constraints for patch x to obtain:

b=       b1 b2 .. . bT      =       ¯_lT 1 ¯_lT 2 .. . ¯_lT T      x= Lx . (2.5)

(30)

With (2.5), this model of natural environmental illumination becomes quite similar to a model with a distant point light source, the well-known case in PS. However, note that each ¯lt in L is

a function of Ωn and, thus, of n.

Most importantly, in outdoor PS, a well-deﬁned solution x may exist even if the relative sun motion is nearly planar during a certain time interval. Instead of relying solely on sun direction, now, the solution requires non-coplanar mean light vectors ¯lt, which are determined by a

comprehensive set of natural illumination factors.

2.3 Modeling uncertainty

From (2.5), the least-squares solution x = (LTL)−1_LT_b _{of outdoor PS is clearly aﬀected by}

the condition number of L. Thus, we next characterize how well the solution x is constrained by natural, outdoor illumination within a given time interval (e.g., one day)—which is encoded by the set of mean light vectors ¯lt in L or, equivalently, the set of environment maps Lt(·).

To assess the reliability of a solution x, we follow standard practice in PS [72,116] and consider image measurements corrupted by zero-mean Gaussian noise with equal variance σ2 _{(as least}

squares estimation is only optimal for this practical, most common noise model). Thus, b in (2.5) follows a normal distribution:

b∼ N µ_b, σ2I , (2.6)

where µ_b has the (unknown) uncorrupted pixel values.

Since the desired least-squares solution for the albedo-scaled normal, x = LTL−1LTb, is a linear transformation of a Gaussian random vector, it is easy to show that

x∼ N µ_x, σ2(LTL)−1 , (2.7)

where µ_x= LT_L−1_LT_µ

b is the expected value of x. Once the albedo of a surface patch is

known, we analyze its contribution to the uncertainty in the estimated normal vector, n = ρ−1_x,

using a similar distribution,

n∼ N _µ x ρ , σ2 ρ2(L T_L)−1 . (2.8)

The marginal distributions in (2.8) allow us to derive conﬁdence intervals that indicate the uncertainty in each component of the least squares estimate ˆn= [ˆnx nˆynˆz]T of n = [nxny nz]T.

The corresponding 95% conﬁdence interval [41] is given by ˆ

n± δ , with δk= 1.96

σλk

(31)

gain factor λk in (2.9) reveals how outdoor illumination (the conditioning of L) can amplify

the effect of noise on the solution ˆn. The albedo ρ also impacts the solution stability, where a lower albedo translates in a larger variance in the obtained estimate ˆn (as less light is reflected towards the camera). Our goal is then to answer the remaining question: how do natural changes in outdoor illumination affect this noise gain factor (λk) and, therefore, uncertainty?

Note that the condition number, determinant, and trace of matrix (LTL)−1 _{can also be used}

as measures of total variance in the estimated solutions—as done in [116]—to ﬁnd the optimal location of point light sources in PS. These measures are closely related to the rank of matrix L, which must be three for a solution to exist; that is, LTL must be nonsingular. In practice, this matrix is always full-rank, although it is often poorly conditioned [107]. In the following sections, we consider the noise gain factor λkas a measure of uncertainty independent of albedo

and sensor noise. In this chapter, we focus on analyzing our ability to recover geometry and will assume that the albedo is constant.

2.4 Theoretical analysis of mean light vector variability

In this section, we seek what is the minimum capture interval required by PS to work. We do so by tying what is happening in the sky to reconstruction performance. The analysis is done on images that were captured during a continuous 6 hour time interval on each of the days in our database (see sec.2.1), from 10:30 until 16:30.2

From the 95% confidence interval defined in eq. (2.9), we can deduce that the only light-dependent stability factor in the confidence interval δ is λk; the other two factors are related

to the camera (σ), and surface reﬂectance (ρ). In the remaining of this chapter, we analyze the maximum uncertainty λmax= maxk(λk), as a conservative performance measure that is

independent of albedo and sensor noise; λmax is a maximum noise gain factor, i.e., the intensity

of noise amplification in the solution. Here, we are interested in 1) investigating how the noise gain λmax is influenced by the duration of outdoor data capture, and in 2) identifying specific

changes, or events, in outdoor lighting that are associated with more stable PS solutions (smaller λmax).

To make our analysis tractable, we do not model cast shadows and inter-reﬂections. In addition, we assume that the sky hemisphere (around zenith) provides the dominant part of incoming light. Unless stated otherwise, our simulations consider a day near an equinox, which corresponds to the worst case scenario with coplanar sun directions [107].

This section provides the first answers to the questions raised above by looking at collections of mean light vectors (MLVs) as defined in 2.2 from both simulated and real sky data. The main goal is to analyze the behavior of the illumination factors λk (and associated confidence

2. Companion video showing data samples available at http://vision.gel.ulaval.ca/~jflalonde/

(32)

solar plane

out-of-plane MLV shift (clouds) Shift towards n (near collinear) MLVs

solar arc

solar plane

East West

(a) Clear sky

(b) Overcast

(c) Partly Cloudy

object

θa θe

Figure 2.5 – Impact of cloud coverage on the numerical conditioning of outdoor PS: clear (a) and overcast (b) days present MLVs with stronger coplanarity; in partly cloudy days (c) the sun is often obscured by clouds, which may lead to out-of-plane shifts of MLVs.

solar arc (θ

a hours)

elevation from solar plane (

θe

degrees)

(a) Noise gain λ(θ

a,θe) 1 2 3 4 5 15 30 45 60 75 1 2 4 6 8 10 12 0 1 2 3 4 5 6 1 2 4 6 8 10 12 solar arc (θ a hours) Noise gain λmax ( θa , θe )

(b) constant−shift cross sections 15o shift 20o shift 30o shift 40o shift 90o shift 0 20 40 60 80 1 2 4 6 8 10 12 MLV shift (θ e degrees) Noise gain λmax ( θa , θe )

(c) constant−arc cross sections 1.0−hour 1.5−hour 2.0−hour 3.0−hour 6.0−hour

Figure 2.6 – Simulated noise gain λmax(θa, θe) as a function of solar arc θa and MLV shift

(elevation) angle θe. See discussion in the text.

interval) of normal estimation. More specifically, we investigate numerical stability (MLV coplanarity) as a function of the apparent sun motion and cloud coverage within capture intervals of different durations, containing different atmospheric events. We also compare the resulting performance measures of x-hour outdoor PS to those of full-day outdoor PS.

(33)

2.4.1 Cloud coverage and MLV shifting

Under clear skies, the MLVs of the model above will point nearly towards the sun, from which arrives most of the incoming light. Thus, near an equinox (worldwide), the resulting set of MLVs are nearly coplanar [107], resulting in poor performance, fig.2.5(a). For a day with an overcast sky, performance is also poor because the set of MLVs are nearly colinear and shifted towards the patch normal n, fig. 2.5(b). Finally, in partly cloudy days (mixed skies), the sun is often obscured by clouds and such occlusion shifts some MLVs away from the solar plane, improving numerical stability, fig. 2.5(c).

2.4.2 Solar arcs and MLV elevation

Here, we seek to provide a sense of the minimal length of solar arc and amount of out-of-plane MLV shift required in single-day outdoor PS.

Assuming a day near an equinox, the apparent sun trajectory worldwide describes an arc θa

within the solar plane of about 15◦ _{per hour. We now use this observation to evaluate the}

numerical stability of outdoor PS for data capture intervals (solar arcs) of diﬀerent lengths. Considering a partly cloudy sky, we also investigate the interaction of solar arc and cloud coverage; we quantify performance as a function of both acquisition time (solar arc θa) and

amount of out-of-plane MLV shift (elevation angle θe) introduced by clouds.

A simple and effective way to investigate conditioning with different capture scenarios is to consider a simulation with the minimum number of three MLVs, as required for outdoor PS using (2.5). We simulate solar arcs θa of different lengths by defining two MLVs on a reference

solar plane, with the third MLV presenting varying elevation θe away from this plane, as

illustrated in ﬁg. 2.5(c). The actual orientation of the solar plane varies with the latitude of the observer; thus, we represent MLV shift relative to this plane.

The numerical conditioning of outdoor PS, as observed with diﬀerent conﬁgurations for these three MLVs, is then scored using the noise gain λmax (sec.2.3). This measure is independent of

albedo and sensor noise; it is also related to the condition number of the illumination matrix L in (2.5).

We compute λmax(θa, θe) for solar arcs θa of up to 6 hours (90◦) and MLV elevations θe

up to 90◦_{. For simplicity, we consider triplets of unit-length (normalized) MLVs—thus,}

conditioning depends on the magnitude sin(θe) of the out-of-plane component of the third MLV.

Clearly, the optimal noise gain λmax= 1 is obtained when the MLVs are mutually orthogonal

(θa= θe= 90◦).

Fig. 2.6(a) shows that the noise gain λmax drops quickly to under 6 for capture intervals at

just above 1 hour and for MLV shifts θe > 150. This result suggests that even the performance

(34)

ρ. To ease visualization, ﬁgs. 2.6(b,c) show cross sections of the λmax(θa, θe) gain surface for a

constant shift or solar arc.

A second important prediction from ﬁg. 2.6, considering (more realistic) small to moderate amounts of MLV shifts θe ≤ 40◦, is that conditioning will improve very little for data capture

intervals above 3 hours (45◦ _{solar arcs). Reducing data capture from 3 to 2 hours would lead}

to an additional increase in uncertainty (λmax) of less than 30% (from about 2.8 to nearly

3.6). Still, 2-hour outdoor PS with noise gains under 4× may be possible if an MLV shift of θe > 20◦ is introduced by atmospheric events during capture. Uncertainty in the results of

1-hour outdoor PS would be about 5 to 7 times that of full-day (6-hour) outdoor PS.

2.5 Mean light vector shifts in real sky probes

While the analysis above suggests that outdoor PS may be possible with a capture interval of only about 1 to 3 hours, it does not answer whether it is possible to observe an adequate amount of MLV shift (elevation away from the solar plane) within a single partly cloudy day. In the following, we analyze the shifting (coplanarity) of real MLVs obtained from the database of real environment maps (sky probes) presented in sec. 2.1.

First, it is important to note that surface patches of different orientations (normals) are exposed to different hemispheres of illumination, with light arriving from above (sky) and below (ground). This fact is illustrated in fig. 2.7 for three different normal vectors (rows) and two different days (columns). Each globe represents the coordinate system for the environment maps captured in a day. For each combination normal-day, the time-trajectory of computed MLV directions (dots, one per timestamp) and intensities (colors) are shown on the globe. Brighter MLVs lie close to the solar arc, while darker MLVs may shift away from it. Note that we present normals that are mainly southward as they receive the most direct sunlight throughout the day in the Northern hemisphere. Surfaces with normals pointing north, for example, would be in shadow throughout the day in latitudes higher than the Tropic of Cancer around the winter solstice. As such, for the remainder of this chapter, we consider a camera that points toward the north.

To more closely match the scenario considered above, we scale these real MLVs so that the brightest one over all days (i.e., for the most clear sky) has unit-length. From ﬁg. 2.7, also note that some MLVs are shifted very far from the solar arc but, as indicated by the darker colors, their intensity is dimmed considerably by cloud coverage; little improvement in conditioning is obtained from these MLVs.

Most important, ﬁg.2.7shows that the amount of out-of-plane MLV shift (elevation) relative to the solar arc also depends on the orientation n of the surface patch. This suggests that outdoor

(35)

normal MLVs

(a) Mixed clouds (06-NOV-13) (b) Mixed clouds (11-OCT-14)

Figure 2.7 – Globes representing the coordinate system of sky probes. Each normal (blue arrow) deﬁnes a shaded hemisphere in the environmental map that does not contribute light to the computed MLVs (dots). All MLVs in two particular partly cloudy days (columns) were computed from real environment maps [50] for 3 example normal vectors (rows). Relative MLV intensities are shown in the color bar on the left.a

a. See also video inhttp://jflalonde.ca/projects/xHourPS/

the normal of each patch. Indeed, the noise gain (λmax) values in ﬁg. 2.8 show that patches

with nearly horizontal normals (orthogonal to the zenith direction) are associated with sets of MLVs that are closer to being coplanar throughout the day. As expected, patches oriented towards the bottom also present worse conditioning since they receive less light.

Although these MLVs were computed from environment maps captured in the Northern hemisphere (Pittsburgh, USA, and Quebec City, Canada [50]), similar conclusions can be drawn for the Southern hemisphere. Finally, note that this section has considered MLV shifts in whole-day datasets. Next, we look at subsets of MLVs from time intervals of varying lengths and analyze some of the atmospheric events associated with improved conditioning.

(36)

(a) 06-NOV-13 (b) 11-OCT-14

Figure 2.8 – Noise gain for each normal direction n visible to the camera; the colors indicate the shifting (coplanarity) of the associated MLVs. The camera is assumed to lie to the South of this hypothetical target object. For both days, normals that are nearly horizontal are associated with more coplanar MLVs (smaller shifts, higher gains). These normals deﬁne a zero-crossing

region between positive and negative out-of-plane shifts (mid row in ﬁg. 2.7), where occlusion

of the sun results in shifts that are predominantly along the solar arc.a

a. See also video inhttp://jflalonde.ca/projects/xHourPS/

2.5.1 Analyzing time intervals

In this section, we show how the conditioning of outdoor PS evolves over time. Analyzing the patterns in its evolution will allow us to isolate important “events”—points at which uncertainty suddenly drops—and investigate whether such events occur in close succession.

The main results are given in ﬁg.2.9, which plots the gain factor λmax for all possible time

intervals in four diﬀerent days. Since λmax varies with n, we plot the median gain over 321

normal vectors visible to the camera (by subdividing an icosahedron three times) for each time interval.

The first row of fig.2.9(a,b) illustrates the case of two days identified in sec. 2.4.1 as yielding poor outdoor PS reconstructions. As seen in the plots, low noise gains are never reached, irrespective of the start time and duration of the capture interval. We note that the (nearly) overcast sky of fig. 2.9(b) exhibits better behavior than the completely clear sky of fig. 2.9(a). This is because that day is not completely overcast, and the sun sometimes becomes visible (see the sun log-intensity plot). MLVs are thus shifted away from their main axes, while improving conditioning only slightly.

More interesting scenarios arise on days exhibiting a better mix of sun and moving clouds, such as the two examples in fig. 2.9(c,d). The two black vertical lines in fig. 2.9(c) identify capture intervals starting at two different times. Following the line labeled “start time 1” (beginning at 11:00), we notice that uncertainty remains high for approximately two hours, then suddenly drops at around 13:00. This time instant is followed by sudden changes in sun intensity (due

(37)

uncertainty continues to decrease, albeit at a much slower pace, over the rest of the day. The second time interval (identiﬁed as “start time 2”) starts at 14:00, so it does not beneﬁt from that period of sun intensity changes. The maximum gain at the end of the interval is therefore higher.

Of course, this could be due to a simple fact: the ﬁrst interval is longer than the second one. However, ﬁg. 2.9(d) shows that longer intervals do not always result in lower uncertainty. This time, two 2-hour intervals are considered. The time interval labeled as “start time 1” stops right before the 14:00 mark, and only sees clear skies; as expected, the uncertainty is very high. “Start time 2”, beginning at 13:30, can fully exploit the MLV shifts caused by moving clouds to dramatically decrease PS uncertainty, even while the interval length is kept constant.

0 2 4 6 8 10 12

log sun intensity start time

0 5 10 15 20 25 30 35 40 ma ximu m g a in e n d t ime e n d t ime 0 5 10 15 20 25 30 35 40 ma ximu m g a in 0 2 4 6 8 10 12

log sun intensity start time

(a) Clear day (03-OCT-14) (b) Nearly overcast day (19-NOV-13)

end ti me 0 2 4 6 8 10 12 0 5 10 15 20 25 30 35 40 start time 1 log sun intensity

start time 2 ma xi mu m gai n start time e n d t ime 0 2 4 6 8 10 12 0 5 10 15 20 25 30 35 40 start time 1

log sun intensity

start time 2 ma ximu m g a in start time

(c) Clear sky and mixed clouds (24-AUG-13) (d) Mixed clouds (06-NOV-13)

Figure 2.9 – Fine-grained analysis of the expected uncertainty of outdoor PS as a function of time over four selected days in the dataset. Colored plots show the maximum gain λmax

as a function of start time (diagonal along the plot), and duration of the interval. The black lines identify particular time intervals discussed in the text. The blue curve to the left of each colored plot represents the log sun intensity over the course of that day. Photographs of the sky for each day are also shown on the left.

2.5.2 Overall performance of x-hour PS

We noted in ﬁg.2.9(d) that suﬃcient conditions for low uncertainty could be met in as little as 2 hours. In this section, we evaluate how often one can achieve low uncertainty in short time

Learning geometric and lighting priors from natural images

Learning geometric and lighting priors from natural

images

Thèse

Yannick Hold Geoffroy

Doctorat en génie électrique

Philosophiæ doctor (Ph. D.)

Learning Geometric and Lighting Priors

from Natural Images

Résumé

Abstract

Contents

List of Tables

List of Figures

Introduction

Organization

Chapter 1

Related Work

1.1

Photometric Stereo

1.2

Models for approximating outdoor lighting

1.3

Parameters estimation for outdoor illumination models

Chapter 2

Analysis of Short-Term Outdoor

Photometric Stereo

2.1

HDR database

2.2

Image formation model

2.3

Modeling uncertainty

2.4

Theoretical analysis of mean light vector variability

2.5

Mean light vector shifts in real sky probes