• Aucun résultat trouvé

Adversarial Games for Particle Physics

N/A
N/A
Protected

Academic year: 2021

Partager "Adversarial Games for Particle Physics"

Copied!
33
0
0

Texte intégral

(1)

Adversarial Games for Particle Physics

Gilles Louppe

Deep Learning for Physical Sciences workshop December 8, 2017

(2)
(3)
(4)
(5)
(6)

Generative adversarial networks

Ld =Ex∼p(x|θ)[d (x; φ)] −Ex∼pr(x)[d (x; φ)] + λΩ(φ)

Lg = −Ex∼p(x|θ)[d (x; φ)]

(7)

Fast simulation

Challenges: • How to ensure physical properties? • Non-uniform geometry • Mostly sparse • How to scale to full

resolution?

(8)

Evaluation

Physics: Evaluate well-known physical

variates ML: Look at generated images

How to be sure the generator is physically correct?

(9)

II. Learning to Pivot

(10)

Independence from physics variates

• Analysis often rely on the model beingindependent from some physics variates (e.g., mass).

• Correlation leads to systematic uncertainties, that cannot easily be characterized and controlled.

(11)

Independence from known unknowns

• The generation process is oftennot uniquely specified or known exactly, hence systematic uncertainties.

• Parametrization through

nuisance parameters.

• Ideally, we would like a classifier that is robust to nuisance parameters.

(12)

Problem statement

• Assume a family of data generation processes p(X , Y , Z ) where

X are the data (taking values x ∈X), Y are the target labels (taking values y ∈Y),

Z is an auxiliary random variable (taking values z ∈Z).

• Z corresponds to physics variates or nuisance parameters.

• Supervised learning: learn a function f (·; θf) :X 7→ Y. • We want inference based on f (X ; θf) to berobustto the

value z ∈Z.

E.g., we want a classifier that does not change with systematic variations, even though the data might.

(13)

Pivot

• We define robustness as requiring the distribution of f (X ; θf)

conditional on Z to be invariantwith Z . That is, such that p(f (X ; θf) =s|z) = p(f (X ; θf) =s|z0)

for all z, z0∈Z and all values s ∈ S of f (X ; θf).

• If f satisfies this criterion, then f is known as a pivotal

quantity.

• Same as requiring f (X ; θf) and Z to be independent random

(14)

Adversarial game

Classifier f X data p(signal|data) θf f (X ; θf) Lf(θf) ... p(signal|data)

Regression of Z from f ’s output

Adversary r γ1(f (X ; θf); θr) γ2(f (X ; θf); θr) . . . θr ... Z pθr(Z|f (X ; θf)) P(γ1, γ2, . . . ) Lr(θf, θr)

Consider a classifier f built as usual, minimizing the cross-entropyLf(θf) =Ex∼XEy∼Y |x[−log pθf(y|x)].

Pit f against an adversary network r producing as output the posterior pθr(z|f (X ; θf) =s).

Set r to minimize the cross entropy

Lr(θf, θr) =Es∼f (X ;θf)Ez∼Z|s[−log pθr(z|s)].

Goal is to solve: ˆθf, ˆθr =arg minθf maxθrLf(θf)−Lr(θf, θr)

(15)

Adversarial game

Classifier f X data p(signal|data) θf f (X ; θf) Lf(θf) ... p(signal|data)

Regression of Z from f ’s output

Adversary r γ1(f (X ; θf); θr) γ2(f (X ; θf); θr) . . . θr ... Z pθr(Z|f (X ; θf)) P(γ1, γ2, . . . ) Lr(θf, θr)

Consider a classifier f built as usual, minimizing the cross-entropyLf(θf) =Ex∼XEy∼Y |x[−log pθf(y|x)].

Pit f against an adversary network r producing as output the posterior pθr(z|f (X ; θf) =s).

Set r to minimize the cross entropy

Lr(θf, θr) =Es∼f (X ;θf)Ez∼Z|s[−log pθr(z|s)].

Goal is to solve: ˆθf, ˆθr =arg minθf maxθrLf(θf)−Lr(θf, θr)

(16)

Adversarial game

Classifier f X data p(signal|data) θf f (X ; θf) Lf(θf) ... p(signal|data)

Regression of Z from f ’s output

Adversary r γ1(f (X ; θf); θr) γ2(f (X ; θf); θr) . . . θr ... Z pθr(Z|f (X ; θf)) P(γ1, γ2, . . . ) Lr(θf, θr)

Consider a classifier f built as usual, minimizing the cross-entropyLf(θf) =Ex∼XEy∼Y |x[−log pθf(y|x)].

Pit f against an adversary network r producing as output the posterior pθr(z|f (X ; θf) =s).

Set r to minimize the cross entropy

Lr(θf, θr) =Es∼f (X ;θf)Ez∼Z|s[−log pθr(z|s)].

Goal is to solve: ˆθf, ˆθr =arg minθf maxθrLf(θf)−Lr(θf, θr)

(17)

In practice

• The assumption of existence of a classifier that is both optimal and pivotal may not hold.

• However, the minimax objective can be rewritten as Eλ(θf, θr) =Lf(θf) − λLr(θf, θr)

where λ controls the trade-off between the performance of f and its independence w.r.t. Z .

Setting λ to a large value enforces f to be pivotal. Setting λ close to 0 constraints f to be optimal.

• Tuning λ is guided by a higher-level objective (e.g., statistical significance).

(18)

Toy example (without adversarial training)

0.0 0.2 0.4 0.6 0.8 1.0 f(X) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 p( f( X )) p(f(X)|Z=−σ) p(f(X)|Z= 0) p(f(X)|Z= +σ) 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Z =−σ Z = 0 Z = + σ µ0 µ1|Z=z 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

(Left) The conditional probability distributions of f (X ; θf)|Z = z changes with z.

(19)

Toy example (with adversarial training)

0.0 0.2 0.4 0.6 0.8 1.0 f(X) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 p( f( X )) p(f(X)|Z=−σ) p(f(X)|Z= 0) p(f(X)|Z= +σ) 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Z =−σ Z = 0 Z = σ µ0 µ1|Z=z 0.12 0.24 0.36 0.48 0.60 0.72 0.84

(Left) The conditional probability distributions of f (X ; θf)|Z = z are now (almost) invariant with z!

(20)

Applications

Decorrelated Jet Substructure Tagging using Adversarial Neural Networks

Fader networks

(21)

III. Adversarial Variational

Optimization

(22)

Microscopic picture

Pencil and paper calculable from first principles.

Controlled approximation of first principles.

(23)

Macroscopic picture

(24)

Likelihood-free assumptions

Operationally,

x∼ p(x|θ) ⇔ z ∼ p(z|θ), x = g(z; θ) where

• z provides a source of randomness;

• g is a non-differentiable deterministic function (e.g. a computer program).

Accordingly, the density p(x|θ) can be written as p(x|θ) =

Z

{z:g(z;θ)=x}p(z|θ)dz

(25)

Inference

Given observations x∼ pr(x), we seek:

θ∗ =arg min

(26)

Adversarial game

(27)

Variational Optimization

min

θ f (θ) 6 Eθ∼q(θ|ψ)[f (θ)] = U(ψ)

ψU(ψ) =Eθ∼q(θ|ψ)[f (θ)∇ψlog q(θ|ψ)]

Piecewise constant −sin(x)x q(θ|ψ = (µ, β)) = N(µ, eβ) (Similar to REINFORCE gradient estimates)

(28)

Adversarial Variational Optimization

• Replace the generative network with a non-differentiable forward simulator g (z; θ).

• With VO, optimize upper bounds of the adversarial objectives:

Ud =Eθ∼q(θ|ψ)[Ld] (1)

Ug =Eθ∼q(θ|ψ)[Lg] (2)

(29)

Operationally,

x∼ q(x|ψ) ⇔ θ ∼ q(θ|ψ), z ∼ p(z|θ), x = g(z; θ) Therefore, q(x|ψ) is the marginalRp(x|θ)q(θ|ψ)dθ.

• If p(x|θ) is misspecified, q(x|ψ) will to attempt to smear the simulator to approach pr(x).

• If not, q(x|ψ) will concentrate its mass around the true data-generating parameters.

(30)

Preliminary results

Simplified simulator for electron–positron collisions resulting in muon–antimuon pairs. • Parameters: Ebeam, Gf. • Observations: x = cos(A) ∈ [−1, 1], where A is the polar angle of the outgoing muon wrt incoming electron. 41 42 43 Ebeam 0.75 0.80 0.85 0.90 0.95 1.00 1.05 Gf 1 2 3 q( | ) = 0 q( | ) = 5 *= (42, 0.9) 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x pr(x) x p(x| ) = 0 x p(x| ) = 5 0 50 100 150 200 250 300 0.0 0.5 1.0 1.5 Ud = 0 Ud = 5

(31)

Ongoing work

• Benchmark against alternative methods (e.g., ABC).

• Scale to a full scientific simulator.

(32)

Summary

• Adversarial training = indirectly specifying complicated loss functions.

For generation

For enforcing constraints

(33)

Questions?

Références

Documents relatifs

[1] addresses two main issues: the links between belowground and aboveground plant traits and the links between plant strategies (as defined by these traits) and the

After recalling some elements of frequentist analysis, we have discussed three different approaches for theoretical uncertainties: the random-δ approach treats theoretical

On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the clas- sifier and has to probe the classifier with

C) The set of ideas for which non-accelerator and passive underground experiments have been, or should be, mounted (these ideas mainly refer to grand unified

Although Reid realized that the derivation of ideas from sensations has the character of judgment, according to Rosmini (1875, I: pp.77–103, sections 109-133) he did not realize

Figure 6: Turbulent Poisson II point process (about 19 000 points): original sample and two samples generated using the ergodic descriptor as well as the full scale descriptor

For these games, we give an efficient value iteration algorithm to compute the values and optimal strategies (when they exist), that runs in pseudo-polynomial time.. We also

This case clearly illustrates the fact that the transverse shear stress doesn’t vanish in the Kirchhoff–Love’s model: it is close to zero on most of the shell but far from it near