• Aucun résultat trouvé

7.4 Fully-Bayesian unfolding

7.4.2 Application of Bayesian inference

The unfolding problem illustrated in previous section can be written down in terms of the Bayes’

theorem. We have measured the data distributionDof the observable of interest. Let us assume that we have an estimate of total background contributionB, composed ofNbbackground processes, and we have the knowledge of detector and reconstruction response encoded in a response matrixM. We would like to know, what is the probability of having a true distributionTgiven the observed dataD. In terms of the Bayes’ theorem:

P(T|D,M) ∝ L(D|S(T,M),B)π(T), (7.3) where P(T|D,M) is the posterior probability of the true distribution T, L(D|S(T,M),B) is the likelihood function ofDgivenS(T,M)andB, andπ(T)is the prior probability density for the true

(5)A bold symbol denotes a one-dimensional vector, such as a one-dimensional binned distribution.

7. Charge asymmetry measurement intt¯single-lepton channel

spectrumT. The spectra are represented by binned histograms withNr reconstructed bins andNttrue bins, respectively, i.e.D,SRNr,TRNt andM ∈RNt ×RNr. By sampling the prior probability distributionπ(T)and propagating the values of the sampledTthrough the likelihood, it is possible to obtain the full posterior distribution of the true spectrumT. Note that no matrix inversion is actually performed in this approach. In the following sections, we will explain how theπ(T)is chosen, what is the exact definition of the likelihoodL(D|S(T,M),B), and how the sampling of the priors is performed.

Likelihood

The likelihood in Eq.7.3is constructed under the assumption that the data is distributed according to Poisson probability:

L(D|S(T,M),B) =

Nr

Y

i=1

Poisson(di|si+bi), (7.4)

bi =

Nb

X

k=1

bki, (7.5)

wheresiis the expected signal yield,bki the expected kthbackground yield,di the observed data yield, inithbin, respectively. The expected detector-level signal distributionSis given by the (unknown) true distributionTand the response matrixM:

si(T,M)=

Nr

X

j=1Mi jtjfj−1, (7.6)

wheretjis the jth bin yield of the true distribution, and the fj ≤ 1 is an out-of-fiducial correction, accounting for events which are reconstructed, but do not pass the selection imposed at the true level (fiducialcuts). In this measurement, the unfolding is performed to full phase space, in other words there are no fiducial cuts and therefore fj =1.

Prior probability

The prior probability densityπ(T)should be based on what we know aboutTbefore the measurement is performed. The simplest possible choice is a flat, so-called “uninformative” prior, which is a bounded uniform distribution given by minimum and maximum valuetmini andtmaxi in each of the bins of the true distribution:

π(T) ∝



1 ifti ∈[tmini ,timax],∀i∈[1,Nt]

0 otherwise . (7.7)

The “uninformative” prior thus makes no preference for any choice ofTas long as it is within the specified bounds. It is possible to extend this choice of prior with additional information, effectively

7.4. Fully-Bayesian unfolding

introducing further regularisation(6), defined by a functionS(T):

π(T)∝ 



eαS(T) ifti ∈[tmini ,tmaxi ],∀i∈[1,Nt]

0 otherwise , (7.8)

whereα is a free parameter impacting the strength of the regularisation. This enables the use of additional a priori made assumptions aboutTto constrain the parameter space, potentially decreasing the variance of the unfolding, however at the cost of an additional bias.

In this analysis, an uninformative prior from Eq.7.7is used, wheretmini =0 andtmaxi =2tMCi , where tiMCis the predicted true yield inith bin of the∆|y| distribution, based on the SMtt¯MC prediction used in this analysis. In other words, the bounds of the uniform are given by a ±100 % interval around the MC-based prediction. It is found that no additional regularisation function is necessary to eliminate spurious fluctuations in the unfolded true distribution. The choice of the uninformative prior is also motivated by the fact that there is no previous unfolded measurement of the∆|y|observable at 13 TeV. Therefore the prior knowledge of the observable is limited to MC predictions which have been compared to unfolded data distributions of other observables but not the∆|y|observable. Other intervals of the choice of the prior interval were also tested, both smaller and larger, and were found to have no impact on the unfolded result.

Inclusion of nuisance parameters in FBU

The likelihood formalism employed in FBU allows for a natural inclusion of systematic uncertainties as nuisance parameters (NPs), that encode the imperfect knowledge of various parameters in the model of the analysis. The systematic uncertainties affect the detector-level distributions of both signal and background contributions, in other words their impact is quantified by an alternative distribution of the detector-level observable with generally different shape and/or yield. In FBU, the likelihood L(D|S(T,M),B) is extended by the NPs into a marginal likelihood defined as:

L(D|S(T,M),B) = Z

L(D|S(T,M;θθθ),B(θθθ))π(θθθ)dθθθ, (7.9)

whereθθθdenotes all the NPs andπ(θθθ)their priors. The priors are probability distribution functions of the parameters of the model that encompass the information aboutauxiliarymeasurements in which the unknown parameters of the model are determined with a limited precision that defines the systematic uncertainty. There are many various NPs that can be implemented in the marginal likelihood. The individual systematic uncertainties affecting the measurement are discussed in detail in Sec.7.6. Here we discuss the types of NPs that are considered.

Background normalisationsθθθb, which only affect the respective background prediction. The prior for these uncertainties is Gaussian with µ=0 andσ= 1, truncated to prevent negative yield.

These uncertainties introduce only a shift in total yield of the background contribution.

Uncertainties due to object reconstruction, identification and calibration,θθθr, impacting both signalS(T,M;θθθr)and backgroundB(θθθr, θθθb). These uncertainties are usually assumed to be Gaussian, providing a±1σvariations on the detector-level observable of interest. The prior is a Gaussian with

(6)It should be noted, that the bounded uniform prior contains in itself regularisation, since it by definition puts constraints on the possible values of the true spectrum.

7. Charge asymmetry measurement intt¯single-lepton channel

µ=0 andσ=1. Theoretical uncertainties on the MC predictions are also commonly introduced in this manner.

The bins of the detector-level signal distributionS(T,M;θθθr) with included NPs are defined as follows:

si(T,M;θθθr) =si(T,M;0)* ,

1+X

k

θrk∆ski+

-, (7.10)

wheresi(T,M;0) is defined in (7.6) and∆ski is the relative systematic uncertainty variation on the signal yield in theithbin corresponding to thekth nuisance parameterθrk. Theθkr∆ski terms assume a linear interpolation of the uncertainty variation in-between the±1σuncertainty bands and a linear extrapolation beyond this interval.

For each background, the respective background prediction is altered in a similar manner:

bki(θθθr, θθθb) =bki(0)(1+θkb∆bk)* . ,

1+X

j

θrj∆bk,i j+ /

-, (7.11)

wherebki(0) is the predicted yield of thekthbackground inithbin,∆bkis the relative uncertainty on the background normalisation and∆bk,i j is the relative systematic uncertainty variation on thekth background yield in theithbin corresponding to jthnuisance parameterθrj.

Statistical uncertainties on the background prediction. The prime example of this uncertainty is the MC prediction uncertainty due to limited number of generated events, where the true prediction in general differs from the generated prediction due to statistical fluctuations. To account for this effect, an approach inspired by a proposal from Barlow and Beeston in [215] is used, where each biniof the total background predictionBreceives an additional nuisance parameterγi, which has a flat prior and a Poisson constraint on the statistical uncertainty ofB. The full likelihood term is defined as:

L(D|S(T,M;θθθrrr),B(θθθrrr;θθθbbb), γγγ) =

Nr

Y

i

Poisson(di|si(T,M;θθθrrr)+γibi(θθθrrr, θθθbbb))Poisson(τiiτi) (7.12) γibi(θθθrrr, θθθbbb) =γi

Nb

X

k=1

bki(θθθrrr, θkb)

with thesi(T,M;θθθrrr)defined in Eq.7.10andbki(θθθrrr, θkb)defined in Eq.7.11. In the constraint term, τi = (bi/δbi)2, where bi is defined in Eq.7.5and δbi is the statistical uncertainty of bi. In this configuration, theγγγNPs are distributed according to Gamma probability distribution function, and the total background yieldγibiinithbin is allowed to fluctuate around the nominal valuebiat a penalty introduced by the Poisson constraint.

Finally, the nuisance parameters are not the parameters of interest, only the true distributionTis.

Hence, in the Bayesian marginalisation, the nuisance parameters are integrated out, or marginalised.

The marginal likelihood is defined as:

7.4. Fully-Bayesian unfolding

L(D|S(T,M),B) =Z

L(D|S(T,M;θθθr),B(θθθr, θθθb);γγγ) π(θθθr)π(θθθbflat(γγγ)dθθθrdθθθbdγγγ. (7.13) Combining Eq.7.3and7.13, the marginalised posterior probability density for a true distribution,T, is obtained numerically using random sampling in theNt+NNPparameter space and projecting the samples into theT-parameter space, whereNNPis the total number of nuisance parameters.

Similarly, marginalised posterior distributions of the individual NPs can also be obtained, by projecting all other parameters into the one-dimensional space of the NP of interest. An example marginalised posterior of a NP from unfolding of the Asimov dataset(7)is shown in Fig.7.5. Typically, the mean and the variance of the marginalised posterior of the NP are interpreted as the marginalised NP pull and constraint, and commonly, though not necessarily, the marginalised posterior distribution is Gaussian. The meaning of marginalised pull and constraint should not however be assumed to be equivalent to the meaning of pull and constraint in profile-likelihood formalism, because the marginalisation process includes the effect of correlations with the other marginalised parameters, which is not the case of profile-likelihood formalism, where a pull or a constraint of a parameter is simply the projected value of the parameter corresponding to the maximum of the likelihood. A Bayesian estimate equivalent to the profile likelihood is themaximum a posteriori(MAP) if the choice of the true distribution prior is uniform, which is the case in this analysis. In this case, the MAP estimate is basically the mode of the likelihood, obtained via minimization similar to profile likelihood formalism. Thus, for posterior distributions or results, where MAP estimate is quoted, such estimate is obtained by minimising the negative logarithm of the likelihood in Eq.7.12. If the observable in the likelihood is sensitive to a particular uncertainty, the corresponding nuisance parameter can be further constrained from the data. The constraint manifests itself by narrower width of the posterior distribution compared to the prior distribution. Constraints reduce the uncertainty on the observable of interest, one of the main motivations to use the Bayesian marginalisation. Additionally, likelihood formalism allows to take into account the NP correlations in the marginalisation, leading to further reduction of the total uncertainty. An estimate of the expected constraints can be obtained by unfolding the Asimov dataset.

Relation between ACand∆|y|observables in FBU

In the charge asymmetry measurement, the mean of the posterior for each bin ofTcorresponds to the central value in the corresponding bin of the unfolded∆|y|distribution. However, the quantity of interest is the charge asymmetry AC. Formally, the relation betweenACposterior and theTposterior is defined by:

p(AC|D)= Z

dTδ(AC−AC(T))P(T|D), (7.14) whereAC(T), is the definition of charge asymmetry from Eq.2.9,δdenotes the Diracδ-distribution andP(T|D)is the Nt-dimensional posterior distribution of∆|y|estimated by the FBU method from dataD. In practice, the posterior defined in Eq.7.14is obtained by calculating the ACvalue for every

(7)TheAsimov datasetis the total predicted distribution, i.e. the sum of predicted signal and background distributions of the observable of interest.

7. Charge asymmetry measurement intt¯single-lepton channel

4 2 0 2 4

NP JET_CategoryReduction_JET_JER_EffectiveNP_1__1up value 0

250 500 750 1000 1250 1500 1750

Number of samples

Gaussian prior Mean=-0.009 RMS=0.934

Posterior Gaussian fit Posterior

Fig. 7.5: Example of a marginalised posterior distribution of a NP corresponding to JER uncertainty.

The blue histograms shows the distribution of the samples from the FBU sampling. The Gaussian prior (red line) as well as Gaussian fit to the posterior (orange line) is shown, highlighting a small (≈7 %) constraint of the NP obtained in the inclusive AsimovACunfolding.

sample from theNt-dimensional∆|y|posterior distribution. The result of this per-sample calculation is a one-dimensional posterior distribution ofAC, where the mean and the variance of the distribution is interpreted as the central value and the uncertainty on AC. This approach correctly accounts for correlations between the bins of the unfolded∆|y|distribution in the uncertainty on AC.

Combination of multiple regions in the unfolding

The likelihood formalism allows for a natural combination of multiple regions with different signal and background composition by taking a product of per-region likelihoods, with the exception of the nuisance parameter priors which are shared among the regions. Combining multiple regions allows to constrain certain systematic uncertainties in regions sensitive to the source of the uncertainty. It also is useful if regions differ in their sensitivity to the observable of interest, potentially reducing sources of dilution. Multiple regions may also enhance the information on the correlations of the systematic uncertainties, which in general impact multiple regions simultaneously. The marginalised likelihood with multiple channels is defined as:

L(D1. . .Dν|S1(M1,T). . .Sν(Mν,T),B1. . .Bν) = Z

dθθθ Yν

i=1 L(Di|Si(Mi,T;θθθ),Bi(θθθ)) π(θθθ).

(7.15) In the unfolding, a single true distributionTis unfolded, therefore for each regioni, a separate response matrixMi describes the migration and acceptance of events from the region into the commonT distribution.

In this analysis, a combination of four regions is done, split in 1b-exclusive and 2b-inclusiveb-jet multiplicity and in resolved and boosted topologies.