• Aucun résultat trouvé

Adversarial Games for Particle Physics

N/A
N/A
Protected

Academic year: 2021

Partager "Adversarial Games for Particle Physics"

Copied!
40
0
0

Texte intégral

(1)

Adversarial Games for Particle Physics

Gilles Louppe

Accelerating the Search for Dark Matter with Machine Learning January 18, 2018

(2)
(3)

Generative adversarial networks

Ld(φ) =Ex∼p(x|θ)[d (x; φ)] −Ex∼pr(x)[d (x; φ)] + λΩ(φ)

Lg(θ) = −Ex∼p(x|θ)[d (x; φ)]

(Wasserstein GAN + Gradient Penalty)

Goodfellow et al, 2014, arXiv:1406.2661

(4)
(5)

State-of-the-art

(6)

Style transfer

Fromsimulated data torealisticdata?

(7)

Super-resolution

(8)

Text-to-image synthesis

(9)

GANs for galaxies

(10)
(11)
(12)
(13)
(14)
(15)

Fast simulation

Challenges: • How to ensure physical properties? • Non-uniform geometry • Mostly sparse

• How to scale to full resolution?

(16)

Evaluation

Physics: Evaluate well-known physical

variates ML: Look at generated images

How to be sure the generator is physically correct?

(17)

II. Learning to Pivot

(18)

Independence from physics variates

• Analysis often rely on the model beingindependent from some

physics variates (e.g., mass).

• Correlation leads to systematic uncertainties, that cannot easily be characterized and controlled.

(19)

Independence from known unknowns

• The generation process is oftennot uniquely specified

or known exactly, hence systematic uncertainties.

• Parametrization through

nuisance parameters.

• Ideally, we would like a classifier that is robust to nuisance parameters.

(20)

Problem statement

• Assume a family of data generation processes p(X , Y , Z ) where

X are the data (taking values x ∈X),

Y are the target labels (taking values y ∈Y),

Z is an auxiliary random variable (taking values z ∈Z).

• Z corresponds to physics variates or nuisance parameters.

• Supervised learning: learn a function f (·; θf) :X 7→ Y.

• We want inference based on f (X ; θf) to berobustto the

value z ∈Z.

E.g., we want a classifier that does not change with systematic variations, even though the data might.

(21)

Pivot

• We define robustness as requiring the distribution of f (X ; θf)

conditional on Z to be invariantwith Z . That is, such that p(f (X ; θf) =s|z) = p(f (X ; θf) =s|z0)

for all z, z0∈Z and all values s ∈ S of f (X ; θf).

• If f satisfies this criterion, then f is known as a pivotal quantity.

• Same as requiring f (X ; θf) and Z to be independent random

(22)

Adversarial game

Classifier f X data p(signal|data) θf f (X ; θf) Lf(θf) ... p(signal|data)

Regression of Z from f ’s output

Adversary r γ1(f (X ; θf); θr) γ2(f (X ; θf); θr) . . . θr ... Z pθr(Z|f (X ; θf)) P(γ1, γ2, . . . ) Lr(θf, θr)

Consider a classifier f built as usual, minimizing the cross-entropyLf(θf) =Ex∼XEy∼Y |x[−log pθf(y|x)].

Pit f against an adversary network r producing as output the posterior pθr(z|f (X ; θf) =s).

Set r to minimize the cross entropy

Lr(θf, θr) =Es∼f (X ;θf)Ez∼Z|s[−log pθr(z|s)].

Goal is to solve: ˆθf, ˆθr =arg minθf maxθrLf(θf)−Lr(θf, θr)

(23)

Adversarial game

Classifier f X data p(signal|data) θf f (X ; θf) Lf(θf) ... p(signal|data)

Regression of Z from f ’s output

Adversary r γ1(f (X ; θf); θr) γ2(f (X ; θf); θr) . . . θr ... Z pθr(Z|f (X ; θf)) P(γ1, γ2, . . . ) Lr(θf, θr)

Consider a classifier f built as usual, minimizing the cross-entropyLf(θf) =Ex∼XEy∼Y |x[−log pθf(y|x)].

Pit f against an adversary network r producing as output the posterior pθr(z|f (X ; θf) =s).

Set r to minimize the cross entropy

Lr(θf, θr) =Es∼f (X ;θf)Ez∼Z|s[−log pθr(z|s)].

Goal is to solve: ˆθf, ˆθr =arg minθf maxθrLf(θf)−Lr(θf, θr)

(24)

Adversarial game

Classifier f X data p(signal|data) θf f (X ; θf) Lf(θf) ... p(signal|data)

Regression of Z from f ’s output

Adversary r γ1(f (X ; θf); θr) γ2(f (X ; θf); θr) . . . θr ... Z pθr(Z|f (X ; θf)) P(γ1, γ2, . . . ) Lr(θf, θr)

Consider a classifier f built as usual, minimizing the cross-entropyLf(θf) =Ex∼XEy∼Y |x[−log pθf(y|x)].

Pit f against an adversary network r producing as output the posterior pθr(z|f (X ; θf) =s).

Set r to minimize the cross entropy

Lr(θf, θr) =Es∼f (X ;θf)Ez∼Z|s[−log pθr(z|s)].

Goal is to solve: ˆθf, ˆθr =arg minθf maxθrLf(θf)−Lr(θf, θr)

(25)

In practice

• The assumption of existence of a classifier that is both optimal and pivotal may not hold.

• However, the minimax objective can be rewritten as Eλ(θf, θr) =Lf(θf) − λLr(θf, θr)

where λ controls the trade-off between the performance of f and its independence w.r.t. Z .

Setting λ to a large value enforces f to be pivotal. Setting λ close to 0 constraints f to be optimal.

• Tuning λ is guided by a higher-level objective (e.g., statistical significance).

(26)

Toy example (without adversarial training)

0.0 0.2 0.4 0.6 0.8 1.0 f(X) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 p( f( X )) p(f(X)|Z=−σ) p(f(X)|Z= 0) p(f(X)|Z= +σ) 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Z =−σ Z = 0 Z = + σ µ0 µ1|Z=z 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

(Left) The conditional probability distributions of f (X ; θf)|Z = z changes with z.

(27)

Toy example (with adversarial training)

0.0 0.2 0.4 0.6 0.8 1.0 f(X) 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 p( f( X )) p(f(X)|Z=−σ) p(f(X)|Z= 0) p(f(X)|Z= +σ) 1.0 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Z =−σ Z = 0 Z = σ µ0 µ1|Z=z 0.12 0.24 0.36 0.48 0.60 0.72 0.84

(Left) The conditional probability distributions of f (X ; θf)|Z = z are now (almost) invariant with z!

(28)

Applications

Decorrelated Jet Substructure Tagging using Adversarial Neural Networks

Fader networks

(29)

III. Adversarial Variational

Optimization

(30)

Microscopic picture

Pencil and paper calculable from first principles.

Controlled approximation of first principles.

(31)

Macroscopic picture

(32)

Likelihood-free assumptions

Operationally,

x∼ p(x|θ) ⇔ z ∼ p(z|θ), x = g(z; θ) where

• z provides a source of randomness;

• g is a non-differentiable deterministic function (e.g. a computer program).

Accordingly, the density p(x|θ) can be written as p(x|θ) =

Z

{z:g(z;θ)=x}p(z|θ)dz

(33)

Inference

Given observations x∼ pr(x), we seek:

θ∗ =arg min

(34)

Adversarial game

(35)

Variational Optimization

min

θ f (θ) 6 Eθ∼q(θ|ψ)[f (θ)] = U(ψ)

ψU(ψ) =Eθ∼q(θ|ψ)[f (θ)∇ψlog q(θ|ψ)]

Piecewise constant −sin(x)x q(θ|ψ = (µ, β)) = N(µ, eβ) (Similar to REINFORCE gradient estimates)

(36)

Adversarial Variational Optimization

• Replace the generative network with a non-differentiable forward simulator g (z; θ).

• With VO, optimize upper bounds of the adversarial objectives:

Ud =Eθ∼q(θ|ψ)[Ld] (1)

Ug =Eθ∼q(θ|ψ)[Lg] (2)

(37)

Operationally,

x∼ q(x|ψ) ⇔ θ ∼ q(θ|ψ), z ∼ p(z|θ), x = g(z; θ) Therefore, q(x|ψ) is the marginalRp(x|θ)q(θ|ψ)dθ.

• If p(x|θ) is misspecified, q(x|ψ) will to attempt to smear the simulator to approach pr(x).

• If not, q(x|ψ) will concentrate its mass around the true data-generating parameters.

(38)

Preliminary results

Simplified simulator for electron–positron collisions resulting in muon–antimuon pairs. • Parameters: Ebeam, Gf. • Observations: x = cos(A) ∈ [−1, 1], where A is the polar angle of the outgoing muon wrt incoming electron. 41 42 43 Ebeam 0.75 0.80 0.85 0.90 0.95 1.00 1.05 Gf 1 2 3 q( | ) = 0 q( | ) = 5 *= (42, 0.9) 1.0 0.5 0.0 0.5 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x pr(x) x p(x| ) = 0 x p(x| ) = 5 0 50 100 150 200 250 300 0.0 0.5 1.0 1.5 Ud = 0 Ud = 5

(39)

Ongoing work

• Benchmark against alternative methods (e.g., ABC).

• Scale to a full scientific simulator.

(40)

Summary

• Adversarial training = indirectly specifying complicated loss functions.

For generation

For enforcing constraints

Références

Documents relatifs

Perceiving the problem as an image-to-image transla- tion task, conditional generative adversarial networks are utilized to generate masks conditioned by the images as inputs..

So, motivated by the insights of GANs in the domain of medical image analysis [8], we have proposed a deep conditional adversarial learning based network for automatic

DEAP dataset includes data of only 32 subjects, as well as other datasets for EEG-based emotion recognition also contain a limited vari- ability of subjects.. At the same time,

Visual tracking in video sequences has always been a main topic in computer vision and interest- ing results have been obtained by approaches based on Support Vector Machine,

On the other hand, black-box methods where the adversary does not have access to the gradient of the loss function from the clas- sifier and has to probe the classifier with

The main of the paper is the proposal and preliminary testing of the adversarial training architec- ture on hybrid word–char embedding based on the exist- ing framework

Specifically we prove lower bounds that replace T in the regret bound with the loss of the best action (called first-order bounds) and also with the quadratic variation of the

As a consequence, the ` ∞ -bounded and the ` 2 -bounded adversarial examples lie in different area of the space, and it explains why ` ∞ defense mechanisms perform poorly against `