• Aucun résultat trouvé

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

N/A
N/A
Protected

Academic year: 2022

Partager "Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification"

Copied!
39
0
0

Texte intégral

(1)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

Gianni FRANCHI

with Andrei Bursuc, Emanuel Aldea, Severine Dubuisson, Isabelle Bloch U2IS, ENSTA Paris

Presentation GDR ISIS 28/07/2021

1 / 39

(2)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification

Plan

1 Context

2 Uncertainty and Deep learning

3 Bayesian Deep Neural Network

4 MC dropout

5 Ensembling approaches

6 LP-BNN approache

7 Conclusion

8 Bibliography

2 / 39

(3)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context

What is uncertainty in machine/deep learning

We make observations using the sensors in the world (e.g. camera) Based on the observations, we intend to learn a model that makes decisions

Given thesameobservations, the decision should be thesame However,

The worldchanges, observationschange, our sensorschange, the output should not change!

We would like to know how confident we can be about the decisions

3 / 39

(4)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context

Why Uncertainty is important?

Figure:Confidence histograms (top) and reliability diagrams(bottom) for a 5-layer LeNet (left) and a 110-layer ResNet (right)on CIFAR-100. [1]

4 / 39

(5)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context

Why Uncertainty is important?

Imagine an autonomous car with a perception system based on Deep learning without Uncertainty:

5 / 39

(6)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context

Why Uncertainty is important?

Imagine a medical diagnostics based on Deep learning without Uncertainty:

6 / 39

(7)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context

Why Uncertainty is important?

We build models for predictions, can we trust them? Are they certain?

7 / 39

(8)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning

Types of Uncertainty

Aleatoric: Uncertainty inherent in the observation noise (problems caused by sensor quality, natural randomness, that cannot be explained by our data).

Epistemic: Our ignorance about the correct model that generated the data (lack of knowledge about the process that generated the data).

8 / 39

(9)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning

Aleatoric uncertainty

Aleatoric uncertainty captures noise inherent in the observations:

For example, sensor noise or motion noise result in uncertainty.

This uncertainty cannot be reduced with more data.

However, aleatoric uncertainty could be reduced with better measurements.

9 / 39

(10)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning

Aleatoric uncertainty

Aleatoric uncertainty can further be categorized into homoscedastic and heteroscedastic uncertainties:

Homoscedastic uncertainty relates to the uncertainty that a particular task might cause. It stays constant for different inputs.

Heteroscedastic uncertainty depends on the inputs to the model, with some inputs potentially having more noisy outputs than others.

10 / 39

(11)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning

Types of Uncertainty: Case 1

1

Let us consider a neural network model trained with several pictures of dogs. We ask the model to decide on a dog using a photo of a cat. What would you want the model to do?

1Credits: Gille Louppe

11 / 39

(12)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning

Types of Uncertainty: Case 2

2

We have three different types of images to classify, cat, dog, and cow, some of which may be noisy due to the limitations of the acquisition instrument.

2Credits: Gille Louppe

12 / 39

(13)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Bayesian approach and DNN

The Goal of DNN is to findP(Y|X,ω), most of the classical approach findωthat maximize the likelihood.

ω= arg max

ω

logP(Dl|ω)

ω= arg max

ω nl

X

i=1

logP(Yi|Xi,ω)

ω= arg max

ω

1/nl nl

X

i=1

logP(Yi|Xi,ω) ω= arg max

ω E(X,Y)∼P(Dl)logP(Y|X,ω) ω= arg min

ω

H[P(Dl),P(Y|X,ω)]

WithH the cross entropy.

13 / 39

(14)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Bayesian approach and DNN

The Goal of DNN is to findP(Y|X,ω). In the classical bayesian

approach we findωsuch that we have the maximum a posteriori (MAP).

ω= arg max

ω

logP(ω|Dl) ω= arg max

ω

logP(Dl|ω) + logP(ω)

This leads to l2 regularization.

14 / 39

(15)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Bayesian DNN[4]

Bayesian DNN is based on marginalization instead of MAP optimization.

P(Y|X) =Eω∼P(ω|Dl)(P(Y|X,ω)) P(Y|X) =

Z

P(Y|X,ω)P(ω|Dl)dω

In practice:

P(Y|X)'X

i

(P(Y|X,ωi)) withωi∼ P(ω|Dl)

Different techniques to estimateP(ω|Dl).

15 / 39

(16)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Variational inference[4]

Variational inference approximates the posteriorP(ω|Dl)with a family of distributionsqλ(ω/Dl)The variational parameterλindexes the family of distributions. For example, ifqwere Gaussian, it would be the mean and variance of the latent variables for each datapointλxi = (µxi, σ2x

i)).

Question : How can we know how well our variational posterior qλ(w/Dl)approximates the true posteriorP(ω|Dl)?

16 / 39

(17)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Variational inference[4]

Question : How can we know how well our variational posterior qλ(ω/Dl)approximates the true posterior P(ω|Dl)?

We can use the Kullback-Leibler divergence, which measures the information lost when usingq to approximateP :

KL(qλ(ω/Dl)|| P(ω|Dl)) = Z

ω

qλ(ω/Dl) log(qλ(ω/Dl) P(ω|Dl))

= Z

ω

qλ(ω/Dl) log( qλ(ω/Dl) P(Dl)P(Dl,ω))

=Eq[logqλ(ω/Dl)]−Eq[logP(ω,Dl)] + logP(Dl) Our goal is to find the variational parametersλthat minimize this divergence. The optimal approximate posterior is thus

17 / 39

(18)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Variational inference[4]

The optimal approximate posterior is thus:

qλ(ω/Dl) = arg minλKL(qλ(ω/Dl)|| P(ω|Dl)).

This is impossible to compute directly due toP(Dl)that appears in the divergence. So, we consider the following function:

ELBO(λ) =Eq[logP(ω,Dl)]−Eq[logqλ(ω/Dl)]

=− Z

ω

qλ(ω/Dl) log( qλ(ω/Dl) P(ω)P(Dl|ω))

=Eq[logP(Dl|ω)]−KL(qλ(ω/Dl)|| P(ω)) Note thatKL(qλ(ω/Dl)|| P(ω|Dl)) = logP(Dl)−ELBO(λ).

18 / 39

(19)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Variational inference: Reparametrization trick[4]

Theorem: Letbe a random variable having a probability density given byq()and letω=t(λ, ). Suppose thatqλ(ω/Dl), is such that q()d=qλ(ω/Dl)dω. Then for a functionf differentiable with respect toω:

∂λEqλ(ω/Dl)f(ω, λ) =Eq()

∂f(ω, λ)

∂ω

∂ω

∂λ +∂f(ω, λ)

∂λ

.

19 / 39

(20)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Variational inference [4]

20 / 39

(21)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network

Weight Uncertainty in Neural Networks [4]

3

3Image credit: Eric Ma

21 / 39

(22)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification MC dropout

Dropout

4

Dropout is an empirical technique that was proposed to avoid overfitting in CNN.

At each training step (i.e., for each sample within a mini-batch) Remove each node in the network with a probabilityp

Update the weights of the remaining nodes with backpropagation.

4Image credit: G. Louppe

22 / 39

(23)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification MC dropout

MC dropout [2]

They [2] propose to average the predictions of several DNN where they apply the dropout:

P(y|x) = 1 Nmodel

Nmodel

X

j=1

P(y|ω(t)bj,x) (1)

withbj a vector of the same size of ω(t)which is a realization of a binomial distribution.

23 / 39

(24)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches

Deep Ensembles[3]

They [3] propose to average the predictions of several DNN with different initial seeds:

P(y|x) = 1 Nmodel

Nmodel

X

j=1

P(yj(t),x) (2)

24 / 39

(25)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches

Deep Ensembles[5]

Figure:t-SNE plot of predictions from checkpoints corresponding to 3 different randomly initialized trajectorie

25 / 39

(26)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches

Deep Ensembles[5]

Figure:Diversity versus accuracy plots for 3 models trained on CIFAR-10

26 / 39

(27)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches

BatchEnsemble[6]

They [6] propose to approximate the average of the predictions of several DNN with different initial seeds by using a DNN with two kinds of weights. More specifically, let the weight setωbe composed of two subsetsωslow , ωfast.

For simplicity let us consider a DNN with just one fully connected layer and let us writeω={ωj}Nj=1model ={Wj}Nj=1model andωslow=W and ωfast={Fj}Nj=1model. We haveWj =W ·F =W ·(rjsjt)

Figure:An illustration on how to generate the ensemble weights for two

ensemble members 27 / 39

(28)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches

BatchEnsemble[6]

We have a set of weightWj =W ·F=W ·(rjsjt)withW that sees all images and(rjsjt)that does not see all the same images. If we denoteφ an activation function then when we apply the BatchEnsemble on an image we perform:

y=φ Wjtx

=φ (Wt·(rjsjt))tx

=φ (Wt(x·rj)·sj) Similarly to Deep Ensembles, to perform inference we just perform ensembling :

P(y|x) = 1 Nmodel

Nmodel

X

j=1

P(yj(t),x) (3)

Figure:An illustration on how to generate the ensemble weights for two ensemble members

28 / 39

(29)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

From DNN to LP-BNN [7]

In classical DNNs, all weights are just one realization. In classical BNNs, all weights are one random variable, yet all the weights are independent.

In LP-BNNs, all weights of the same layer are correlated.

29 / 39

(30)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

LP-BNN [7]

Similarly to Batchensemble we consider two kinds of weights:

h=a (Wshare> (xsj))ˆrj

, (4) Yet we useˆr and not r

30 / 39

(31)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

LP-BNN [7]

Similarly to Batchensemble we consider two kinds of weights:

we introduce a VAE composed of a one layer encodergφenc(·)with variational parametersφand a one layer decodergψdec(·)with parameters ψ.

the latent space of the VAE is denoted z∼ N(µ,σ2I)

We sample a latent variable zj ∼ N(µj2jI)and feed it to the decoder to haveˆrj =gψdec(zj).

31 / 39

(32)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

LP-BNN [7]

Interestingly mixing the LPBNN is more stable than classical BNN. It brings more diversity and allows a better quantification of uncertainty.

32 / 39

(33)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

Results on CIFAR

CIFAR-10 CIFAR-100

Method Acc AUC AUPR FPR-95-TPR ECE cA cE Acc ECE cA cE

MCP + cutout 96.33 0.9600 0.9767 0.115 0.0207 32.98 0.6167 80.19 0.1228 19.33 0.7844 MC dropout 95.95 0.9126 0.9511 0.282 0.0172 32.32 0.6673 75.40 0.0694 19.33 0.5830 MC dropout +cutout 96.50 0.9273 0.9603 0.242 0.0117 32.35 0.6403 77.92 0.0672 27.66 0.5909

DUQ 87.48 0.7083 0.8114 0.698 0.3983 64.89 0.2542 - - - -

DUQ Resnet18 93.36 0.8994 0.9213 0.1964 0.0131 69.01 0.5059 - - - -

EDL 85.73 0.9002 0.9198 0.247 0.0904 59.54 0.3412 - - - -

MIMO 94.96 0.9387 0.9648 0.175 0.0300 69.99 0.1846 0.7869 0.1018 0.4735 0.2832

Deep Ensembles + cutout 96.74 0.9803 0.9896 0.071 0.0093 68.75 0.1414 83.01 0.0673 47.35 0.2023 BatchEnsembles + cutout 96.48 0.9540 0.9731 0.132 0.0167 71.67 0.1928 81.27 0.0912 47.44 0.2909 LP-BNN (ours) + cutout 95.02 0.9691 0.9836 0.103 0.0094 69.51 0.1197 79.3 0.0702 48.40 0.2224

33 / 39

(34)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

Results on CIFAR

0 1 2 3 4

skew intensity 0.05

0.10 0.15 0.20 0.25 0.30

ECE

Method BatchEnsemble LP-BNN Deep Ensembles

Figure:Calibration at different levels of corruption. We report ECE scores for LP-BNN, BatchEnsemble and Deep Ensembles on CIFAR-10-C.

34 / 39

(35)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache

LP-BNN [7]

Input image MCP BE LP-BNN

Figure:Visual assessment on two BDD-Anomaly test images containing a motorcycle (OOD class). For each image: on the first row, input image and confidence maps from MCP, BE, and LP-BNN;on the second row,

ground-truth segmentation and segmentation maps from MCP, BE, and LP-BNN. LP-BNN is less confident on the OOD objects.

35 / 39

(36)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Conclusion

Conclusion

We propose a new BNN framework able to quantify uncertainty in the context of deep learning.

LP-BNNs are stable, efficient, and therefore easy to train compared to existing BNN models.

The extensive empirical comparisons on multiple tasks show that LP-BNNs reach state-of-the-art levels with substantially lower computational cost.

The code can be download in

https://github.com/giannifranchi/LP_BNN Thank you for your attention.

36 / 39

(37)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bibliography

Bibliography:

1 Guo, Chuan, et al. "On calibration of modern neural networks."

Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

2 Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning."

international conference on machine learning. 2016.

3 Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell.

"Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems.

2017.

37 / 39

(38)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bibliography

Bibliography:

4 Blundell, Charles, et al. "Weight uncertainty in neural networks."

arXiv preprint arXiv:1505.05424 (2015).

5 Fort, Stanislav, Huiyi Hu, and Balaji Lakshminarayanan. "Deep Ensembles: A Loss Landscape Perspective." arXiv preprint arXiv:1912.02757 (2019).

38 / 39

(39)

Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bibliography

Bibliography:

6 Wen, Yeming, Dustin Tran, and Jimmy Ba. "Batchensemble: an alternative approach to efficient ensemble and lifelong learning."

arXiv preprint arXiv:2002.06715 (2020).

7 Franchi, G., Bursuc, A., Aldea, E., Dubuisson, S., & Bloch, I.

(2020). Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification. arXiv preprint arXiv:2012.02818.

39 / 39

Références

Documents relatifs

We pro- pose a novel framework that separates the demands of the two tasks using two distribu- tions: (i) a conditional distribution to model the uncertainty of the latent variables

For each phase in the process model, we discuss its objectives to be achieved and the methods and metrics (M&M-s) that can be used to ensure higher trustworthiness..

Consistant en six étapes, SPIKES est un protocole ayant pour but de « permettre au clinicien d'accomplir les quatre objectifs les plus importants de l'entretien

Specifically, the calibration of the global chemistry model parameters employs synthetic observations of the laminar flame speed or/and thickness, generated from simulations using

Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks Gaël Letarte, Pascal Germain, Benjamin Guedj, François Laviolette.. To cite this version: Gaël

We developed a novel method (NetSparse) that uses Bayesian neural networks with variable selection for the prediction of genotypic values of individuals, includ- ing

2.3 Plausibility and belief functions Once the reliability engineer has defined the basic belief assignment for each component and the network has com- puted the assignment for

In Table 1, we report our single-image autoencoder-based results on this dataset along with those of the following state- of-the-art single image-based methods: KDE regression from