• Aucun résultat trouvé

4.5 Sampling

4.5.1 Three-step Gibbs sampler

Each iteration of the Gibbs sampler comprises three steps which are (i) drawing a sample of f from p(f|z, t,B, F,θ), (ii) drawing pairs of zi, ti for each galaxy i from p(zi, ti|f,B, Fi, θi) using the newly drawn f, and (iii) drawing a sample of the biasing functionsB for each redshift bin given the zi assignments in step (ii). The conditional distributions can be derived from the joint distribution in Equation (4.7), but next we detail the expressions used in

each of the three steps. The first two steps of the sampler are as in SB18 and hence we will skip the full derivation for brevity (see SB18 for more details), and the third step is new and will be considered in more detail.

(i) The conditional posterior onf depends on the counts of sources ofzand t (in the last iteration),N ={Nzt} where Nzt is the number of sources assigned to redshiftz and phenotype t, and it also depends on the prior information on f,p(f):

p(f|z, t,B, F,θ)∝p(f)Y

z,t

fztNzt. (4.17) The prior condition that P

fzt = 1, and 0≤fzt≤1, allows us to write the conditional posterior on f as a Dirichlet distribution. Following the derivation in SB18, if M = {Mzt} are the counts of the prior sample found at eachz, tpair, then the prior distribution off follows a Dirichlet distribution with parametersM, and hence the conditional posterior fol-lows a Dirichlet on the data counts from the last iteration plus the prior counts: (ii) For each galaxy, the posterior for the zi, ti pair conditioned onfand Bis

p(zi, ti|f,B, Fi, θi)∝ LitiftiziBz

δˆizi

(4.20) where apart from using the f obtained in the first step of the sampler (i), we make use of the measurement likelihood and the clustering terms discussed above. The sampling in this step (ii) will produce pairs ofz, t for each galaxy that constitute the next realization ofN ={Nzt}, to be used in the step (i) of the next iteration of the Gibbs sampler.

(iii) After we have z assignments for all galaxies in the sample from step (ii), we can now separate galaxies into redshift bins according to those assignments. Then, for each redshift bin, the posterior on the biasing

4.6 Results

function of that bin conditioned on all other variables looks like:

p(Bz|f, z, t, F,θ) =p(Bz|z,θ)

∝ Y

zgz

Bz(ˆδizig))

. (4.21)

According to this expression, different biasing function parameters, as described in Equation (4.13), will yield different posterior probability for each Gibbs step. In order to explore the biasing functions parameter space and yield samples of this posterior, we use a Markov Chain Monte Carlo (MCMC) sampler for each redshift bin and each step of the Gibbs sampler. In each case, we run 1000 iterations of the MCMC, and the pro-posal distribution for each case is given by the steps of an MCMC chain run on the prior sample described above in this section. Using the prior chain steps with equal probability on the proposal distribution effectively uses the prior information and enables an informed sampling without the need to blindly tweak the proposal distribution or the parameter limits of the MCMC chains.

4.6 Results

Now we present the results for the third tomographic bin which contains

∼3.3×106 objects. We define a calibration sample where redshift and type are known from one healpy pixel with nside=25, which has an area of ∼3.5deg2. We apply the same tomographic bin selection to the calibration sample, which leaves a total of 10758 objects. These objects are used to estimate the prior probabilityp(z, c) and obtain the sampled prior on the mapping function pa-rametersB({bi})(see section4.5 for details about the sampling).

The HBM method yields samples of the individual redshift and type posterior for each galaxy, the redshift and type posterior of the population and samples of the posterior of the mapping function parameters. In this work we focus on the redshift population posterior, marginalizing over all other parameters, since this is what is usually used in photometric cosmological analysis. Current and future weak lensing analysis are very sensitive to small biases in the mean redshift of the distribution, which can become the leading systematic uncertainty. Therefore, we define as a metric the difference between the mean of each samplej of our redshift posterior and the true mean from all the target galaxies,

∆zj =hzest,ji − hztruei. (4.22)

Figure 4.6: Posterior redshift probability distribution, marginalized over type and when in-cluding clustering marginalizing over mapping function parameters. The prior is obtained from a small calibration patch with 10758 objects over an area of 3.5deg2. The distri-butions are obtained from an HBM with photometry only, F, an HBM with photometry and clusteringF+δand a distribution from samples drawn from the prior. Panel: Shows violin plots for each distribution compared to the true redshift distribution. Panel: Shows the posterior distribution on redshift biaszvalues. Panel: Shows the distribution of Kull-back–Leibler divergence (DKL) between each sample and the true redshift distribution. The HBM (F) removes most of the redshift bias, since in this case the bias is tightly related to a bias in the type densityp(t), due to the sample variance of the calibration patch. The addition of clustering sharpens the distribution and improves the overall shape, reducing the DKL divergence.

Since we draw samples of the full redshift posterior fz another useful metric that is sensitive to the distribution shape is the Kullback–Leibler divergence (DKL) between each sample and the true redshift distribution,

DKL(fz,jest||fztrue) =X

z

fz,jestlog fz,jest fztrue

!

(4.23) This is a measurement of the relative entropy between the true distribution and the recovered distribution, and can be used to see how much information the photometry and density estimates are adding with respect to the prior knowl-edge. We will show results for samples drawn from the prior, from an HBM that only includes photometry information and from an HBM that includes both photometry and clustering information, marginalizing over the mapping

4.6 Results relation parameters. We denote the HBM with photometry asF (feature) and the HBM with photometry and clustering asF+δ.

We show one case where thep(z, t)probability comes from the calibration sample, and three cases where the prior is modified and biased. For each case, we show a violin plot of the posterior redshift distribution compared to the true distribution, the distribution of∆zj differences, the distribution of DKL divergences and a plot showing∆zj as a function of iteration steps.