• Aucun résultat trouvé

Approximating likelihood ratios with Calibrated Classifiers

N/A
N/A
Protected

Academic year: 2021

Partager "Approximating likelihood ratios with Calibrated Classifiers"

Copied!
16
0
0

Texte intégral

(1)

Approximating likelihood ratios with

calibrated classifiers

Gilles Louppe ATLAS ML workshop

(2)

Joint work with

Kyle Cranmer

New York University

Juan Pavez

Federico Santa Mar´ıa University

(3)

Likelihood-free setup

• Complex simulator p parameterized by θ; • Samples x ∼ p can be generated on-demand; • ... but the likelihood p(x|θ) cannot be evaluated!

(4)

Simple hypothesis testing

• Assume some observed data D = {x1, . . . , xn};

• Test a null θ = θ0 against an alternative θ = θ1;

• The Neyman-Pearson lemma states that the most powerful test statistic is λ(D; θ0, θ1) = Y x∈D pX(x|θ0) pX(x|θ1) .

(5)

Straight approximation

1. Approximate pX(x|θ0) and pX(x|θ1) individually, using density

estimation algorithms;

2. Evaluate their ratio r (x; θ0, θ1).

Works fine for low-dimensional data, but because of the curse of dimensionality, this is in general a difficult problem! Moreover,it is

not even necessary!

pX(x|θ0)

pX(x|θ1) = r (x; θ0, θ1)

/

When solving a problem of interest, do not solve a more general problem as an intermediate step. – Vladimir Vapnik

(6)

Approximating likelihood ratios with classifiers

• Theorem: The likelihood ratio is invariant under the change of variable U = s(X), provided s(x) is monotonic with r (x).

r (x) = pX(x|θ0) pX(x|θ1)

= pU(s(x)|θ0) pU(s(x)|θ1)

• Well, a classifier trained to distinguish x ∼ p0 from x ∼ p1

approximates

s∗(x) = pX(x|θ1) pX(x|θ0) + pX(x|θ1)

, which is monotonic with r (x).

• Estimating p(s(x)|θ) is now easy, since the change of variable s(x) projects x in a 1D space, where only the informative content of the ratio is preserved.

(7)

Inference and composite hypothesis testing

Approximated likelihood ratios can be used for inference, since

ˆ θ = arg max θ p(D|θ) = arg max θ Y x∈D p(x|θ) p(x|θ1) = arg max θ Y x∈D p(s(x; θ, θ1)|θ) p(s(x; θ, θ1)|θ1) (1)

where θ1 is fixed and s(x; θ, θ1) is a family of classifiers

parameterized by (θ, θ1).

Accordingly, generalized (or profile) likelihood ratio tests can be evaluated in the same way.

(8)

Parameterized learning

For inference, we need to build a family s(x; θ, θ1) of classifiers.

• One could build a classifier s independently for all θ, θ1. But

this is computationally expensive and would not guarantee a smooth evolution of s(x; θ, θ1) as θ varies.

• Solution: build a single parameterized classifier instead, where parameters are additional input features (Cranmer et al., 2015; Baldi et al., 2016). T := {}; while size(T ) < N do Draw θ0∼ πΘ0; Draw x ∼ p(x|θ0); T := T ∪ {((x, θ0, θ1), y = 0)}; Draw θ1∼ πΘ1; Draw x ∼ p(x|θ1); T := T ∪ {((x, θ0, θ1), y = 1)}; end while

(9)

Example: Inference from multidimensional data

Let assume 5D data x generated from the following process p0:

1. z := (z0, z1, z2, z3, z4), such that z0∼ N (µ = α, σ = 1), z1∼ N (µ = β, σ = 3), z2∼ Mixture(12N (µ = −2, σ = 1),12N (µ = 2, σ = 0.5)), z3∼ Exponential(λ = 3), and z4∼ Exponential(λ = 0.5);

2. x := Rz, where R is a fixed semi-positive definite 5 × 5 matrix defining a fixed projection of z into the observed space.

Our goal is to infer the values α and β based on D. 10 5 0 5 X 1 12 6 0 6 X2 6 3 0 3 X 3 3 0 3 X0 0 4 8 12 X 4 10 5 0 5 X1 12 6 0 6 X2 6 3 0 3 X3 0 4 8 12 X4 Observed data D

(10)

Example: Inference from multidimensional data

Recipe:

1. Build a single parameterized classifier s(x; θ0, θ1), in this case

a 2-layer NN trained on 5+2 features, with the alternative fixed to θ1 = (α = 0, β = 0).

2. Find the approximated MLE ˆα, ˆβ by solving Eqn.1.

Solve Eqn.1 using likelihood scans or through optimization.

Since the generator is inexpensive, p(s(x; θ0, θ1)|θ) can be

calibrated on-the-fly, for every candidate (α, β), e.g. using histograms.

3. Construct the log-likelihood ratio (LLR) statistic −2 log Λ(α, β) = −2 logp(D|α, β) p(D| ˆα, ˆβ)

(11)

Exact −2 log Λ(α, β) 0.90 0.95 1.00 1.05 1.10 1.15 α 1.4 1.2 1.0 0.8 0.6 β Approx. LLR (smoothed by a Gaussian Process) 0.90 0.95 1.00 1.05 1.10 1.15 α 1.4 1.2 1.0 0.8 0.6 β 0 4 8 12 16 20 24 28 32

(12)

Diagnostics

In practice ˆr (ˆs(x; θ0, θ1)) will not be exact. Diagnostic procedures

are needed to assess the quality of this approximation.

1. For inference, the value of the MLE ˆθ should be independent of the value of θ1 used in the denominator of the ratio.

2. Train a classifier to distinguish between unweighted samples from p(x|θ0) and samples from p(x|θ1) weighted by

ˆ r (ˆs(x; θ0, θ1)). 0.7 0.8 0.9 1.0 1.1 1.2 1.3 α 0 2 4 6 8 10 12 14 − 2l og Λ (θ ) Exact Approx., θ1=(α =0,β =1) Approx., θ1=(α =1,β =−1) Approx., θ1=(α =0,β =−1) ±1σ, θ1=(α =0,β =−1) 0.0 0.2 0.4 0.6 0.8 1.0

False Positive Rate 0.0 0.2 0.4 0.6 0.8 1.0

True Positive Rate

p(x|θ1)r(x|θ0,θ1) exact

p(x|θ1) no weights

(13)

Likelihood ratio of mixtures

For models defined as known mixtures of several components, the target LR can be formulated in terms of pairwise approximations of sub-LRs. Specifically, we can write

p(x|θ0) p(x|θ1) = P cwc(θ0)pc(x|θ0) P c0wc0(θ1)pc0(x|θ1) =X c " X c0 wc0(θ1) wc(θ0) pc0(x|θ1) pc(x|θ0) #−1 =X c " X c0 wc0(θ1) wc(θ0) pc0(sc,c0(x; θ0, θ1)|θ1) pc(sc,c0(x; θ0, θ1)|θ0) #−1 .

This allows to focus the capacity of the classifiers on simpler problems, resulting in better approximations.

In the common case where free parameters only concern the component weights, sub-LRs can all be precomputed before fitting!

(14)

Application to Higgs EFT

• Ongoing effort using the outlined method for estimating coupling parameters for Higgs EFT.

• In case of mixtures (as for Higgs EFT), the LR estimation problem can be rewritten as pairwise LR estimation problems.

(15)

Summary

• We proposed an approach for approximating LR in the likelihood-free setup.

• Evaluating likelihood ratios reduces to supervised learning. Both problems are deeply connected.

• Alternative to Approximate Bayesian Computation, without the need to define a prior over parameters.

(16)

References

Baldi, P., Cranmer, K., Faucett, T., Sadowski, P., and Whiteson, D. (2016). Parameterized Machine Learning for High-Energy Physics. arXiv preprint arXiv:1601.07913.

Cranmer, K., Pavez, J., and Louppe, G. (2015). Approximating likelihood ratios with calibrated discriminative classifiers. arXiv preprint arXiv:1506.02169. Louppe, G., Cranmer, K., and Pavez, J. (2016). carl: a likelihood-free inference

toolbox. http://dx.doi.org/10.5281/zenodo.47798,

Références

Documents relatifs

Global sensitivity analysis (GSA)is an efficient method to quantify the impact of both causes of LCA results variance from each input’s variationandto support LCA model

Hence, adjusting the profile score by subtracting its bias gives a fixed T unbiased estimating equation and, upon integration, an adjusted profile likelihood in the sense of Pace

Therefore, sensory attenuation seems to be only one aspect of a phenomenon which has implications for several cognitive processes (Smallwood &amp; Schooler, 2015). For example, a

In agreement with a recent report on the Euroepan network of excellence (1), we believe that EPIZONE could somehow play a role in improving quality, particularly in the present

Soit X une variable al´ eatoire

traités thermiquement en appliquant la même procédure de traitement, et ce dans le but d’investiguer la décomposition de la solution solide sursaturée dans les

Il se dégage le gaz de dihydrogéne (H 2 ) et du chlorure d’aluminium de formule chimique (AℓCℓ 3 ).laréaction s'accompagne d'une élévation de température a) Donner

Figure 2 presents the results we obtained (with a 5 fold cross-validation procedure [17]) over 50 iterations with (i) 2-B OOST , (ii) the two single boosted weak learners, and