Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification
Gianni FRANCHI
with Andrei Bursuc, Emanuel Aldea, Severine Dubuisson, Isabelle Bloch U2IS, ENSTA Paris
Presentation GDR ISIS 28/07/2021
1 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification
Plan
1 Context
2 Uncertainty and Deep learning
3 Bayesian Deep Neural Network
4 MC dropout
5 Ensembling approaches
6 LP-BNN approache
7 Conclusion
8 Bibliography
2 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context
What is uncertainty in machine/deep learning
We make observations using the sensors in the world (e.g. camera) Based on the observations, we intend to learn a model that makes decisions
Given thesameobservations, the decision should be thesame However,
The worldchanges, observationschange, our sensorschange, the output should not change!
We would like to know how confident we can be about the decisions
3 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context
Why Uncertainty is important?
Figure:Confidence histograms (top) and reliability diagrams(bottom) for a 5-layer LeNet (left) and a 110-layer ResNet (right)on CIFAR-100. [1]
4 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context
Why Uncertainty is important?
Imagine an autonomous car with a perception system based on Deep learning without Uncertainty:
5 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context
Why Uncertainty is important?
Imagine a medical diagnostics based on Deep learning without Uncertainty:
6 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Context
Why Uncertainty is important?
We build models for predictions, can we trust them? Are they certain?
7 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning
Types of Uncertainty
Aleatoric: Uncertainty inherent in the observation noise (problems caused by sensor quality, natural randomness, that cannot be explained by our data).
Epistemic: Our ignorance about the correct model that generated the data (lack of knowledge about the process that generated the data).
8 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning
Aleatoric uncertainty
Aleatoric uncertainty captures noise inherent in the observations:
For example, sensor noise or motion noise result in uncertainty.
This uncertainty cannot be reduced with more data.
However, aleatoric uncertainty could be reduced with better measurements.
9 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning
Aleatoric uncertainty
Aleatoric uncertainty can further be categorized into homoscedastic and heteroscedastic uncertainties:
Homoscedastic uncertainty relates to the uncertainty that a particular task might cause. It stays constant for different inputs.
Heteroscedastic uncertainty depends on the inputs to the model, with some inputs potentially having more noisy outputs than others.
10 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning
Types of Uncertainty: Case 1
1Let us consider a neural network model trained with several pictures of dogs. We ask the model to decide on a dog using a photo of a cat. What would you want the model to do?
1Credits: Gille Louppe
11 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Uncertainty and Deep learning
Types of Uncertainty: Case 2
2We have three different types of images to classify, cat, dog, and cow, some of which may be noisy due to the limitations of the acquisition instrument.
2Credits: Gille Louppe
12 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Bayesian approach and DNN
The Goal of DNN is to findP(Y|X,ω), most of the classical approach findωthat maximize the likelihood.
ω= arg max
ω
logP(Dl|ω)
ω= arg max
ω nl
X
i=1
logP(Yi|Xi,ω)
ω= arg max
ω
1/nl nl
X
i=1
logP(Yi|Xi,ω) ω= arg max
ω E(X,Y)∼P(Dl)logP(Y|X,ω) ω= arg min
ω
H[P(Dl),P(Y|X,ω)]
WithH the cross entropy.
13 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Bayesian approach and DNN
The Goal of DNN is to findP(Y|X,ω). In the classical bayesian
approach we findωsuch that we have the maximum a posteriori (MAP).
ω= arg max
ω
logP(ω|Dl) ω= arg max
ω
logP(Dl|ω) + logP(ω)
This leads to l2 regularization.
14 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Bayesian DNN[4]
Bayesian DNN is based on marginalization instead of MAP optimization.
P(Y|X) =Eω∼P(ω|Dl)(P(Y|X,ω)) P(Y|X) =
Z
P(Y|X,ω)P(ω|Dl)dω
In practice:
P(Y|X)'X
i
(P(Y|X,ωi)) withωi∼ P(ω|Dl)
Different techniques to estimateP(ω|Dl).
15 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Variational inference[4]
Variational inference approximates the posteriorP(ω|Dl)with a family of distributionsqλ(ω/Dl)The variational parameterλindexes the family of distributions. For example, ifqwere Gaussian, it would be the mean and variance of the latent variables for each datapointλxi = (µxi, σ2x
i)).
Question : How can we know how well our variational posterior qλ(w/Dl)approximates the true posteriorP(ω|Dl)?
16 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Variational inference[4]
Question : How can we know how well our variational posterior qλ(ω/Dl)approximates the true posterior P(ω|Dl)?
We can use the Kullback-Leibler divergence, which measures the information lost when usingq to approximateP :
KL(qλ(ω/Dl)|| P(ω|Dl)) = Z
ω
qλ(ω/Dl) log(qλ(ω/Dl) P(ω|Dl))
dω
= Z
ω
qλ(ω/Dl) log( qλ(ω/Dl) P(Dl)P(Dl,ω))
dω
=Eq[logqλ(ω/Dl)]−Eq[logP(ω,Dl)] + logP(Dl) Our goal is to find the variational parametersλthat minimize this divergence. The optimal approximate posterior is thus
17 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Variational inference[4]
The optimal approximate posterior is thus:
qλ∗(ω/Dl) = arg minλKL(qλ(ω/Dl)|| P(ω|Dl)).
This is impossible to compute directly due toP(Dl)that appears in the divergence. So, we consider the following function:
ELBO(λ) =Eq[logP(ω,Dl)]−Eq[logqλ(ω/Dl)]
=− Z
ω
qλ(ω/Dl) log( qλ(ω/Dl) P(ω)P(Dl|ω))
dω
=Eq[logP(Dl|ω)]−KL(qλ(ω/Dl)|| P(ω)) Note thatKL(qλ(ω/Dl)|| P(ω|Dl)) = logP(Dl)−ELBO(λ).
18 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Variational inference: Reparametrization trick[4]
Theorem: Letbe a random variable having a probability density given byq()and letω=t(λ, ). Suppose thatqλ(ω/Dl), is such that q()d=qλ(ω/Dl)dω. Then for a functionf differentiable with respect toω:
∂
∂λEqλ(ω/Dl)f(ω, λ) =Eq()
∂f(ω, λ)
∂ω
∂ω
∂λ +∂f(ω, λ)
∂λ
.
19 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Variational inference [4]
20 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bayesian Deep Neural Network
Weight Uncertainty in Neural Networks [4]
33Image credit: Eric Ma
21 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification MC dropout
Dropout
4Dropout is an empirical technique that was proposed to avoid overfitting in CNN.
At each training step (i.e., for each sample within a mini-batch) Remove each node in the network with a probabilityp
Update the weights of the remaining nodes with backpropagation.
4Image credit: G. Louppe
22 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification MC dropout
MC dropout [2]
They [2] propose to average the predictions of several DNN where they apply the dropout:
P(y∗|x∗) = 1 Nmodel
Nmodel
X
j=1
P(y∗|ω(t∗)bj,x∗) (1)
withbj a vector of the same size of ω(t∗)which is a realization of a binomial distribution.
23 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches
Deep Ensembles[3]
They [3] propose to average the predictions of several DNN with different initial seeds:
P(y∗|x∗) = 1 Nmodel
Nmodel
X
j=1
P(y∗|ωj(t∗),x∗) (2)
24 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches
Deep Ensembles[5]
Figure:t-SNE plot of predictions from checkpoints corresponding to 3 different randomly initialized trajectorie
25 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches
Deep Ensembles[5]
Figure:Diversity versus accuracy plots for 3 models trained on CIFAR-10
26 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches
BatchEnsemble[6]
They [6] propose to approximate the average of the predictions of several DNN with different initial seeds by using a DNN with two kinds of weights. More specifically, let the weight setωbe composed of two subsetsωslow , ωfast.
For simplicity let us consider a DNN with just one fully connected layer and let us writeω={ωj}Nj=1model ={Wj}Nj=1model andωslow=W and ωfast={Fj}Nj=1model. We haveWj =W ·F =W ·(rjsjt)
Figure:An illustration on how to generate the ensemble weights for two
ensemble members 27 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Ensembling approaches
BatchEnsemble[6]
We have a set of weightWj =W ·F=W ·(rjsjt)withW that sees all images and(rjsjt)that does not see all the same images. If we denoteφ an activation function then when we apply the BatchEnsemble on an image we perform:
y=φ Wjtx
=φ (Wt·(rjsjt))tx
=φ (Wt(x·rj)·sj) Similarly to Deep Ensembles, to perform inference we just perform ensembling :
P(y∗|x∗) = 1 Nmodel
Nmodel
X
j=1
P(y∗|ωj(t∗),x∗) (3)
Figure:An illustration on how to generate the ensemble weights for two ensemble members
28 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
From DNN to LP-BNN [7]
In classical DNNs, all weights are just one realization. In classical BNNs, all weights are one random variable, yet all the weights are independent.
In LP-BNNs, all weights of the same layer are correlated.
29 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
LP-BNN [7]
Similarly to Batchensemble we consider two kinds of weights:
h=a (Wshare> (xsj))ˆrj
, (4) Yet we useˆr and not r
30 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
LP-BNN [7]
Similarly to Batchensemble we consider two kinds of weights:
we introduce a VAE composed of a one layer encodergφenc(·)with variational parametersφand a one layer decodergψdec(·)with parameters ψ.
the latent space of the VAE is denoted z∼ N(µ,σ2I)
We sample a latent variable zj ∼ N(µj,σ2jI)and feed it to the decoder to haveˆrj =gψdec(zj).
31 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
LP-BNN [7]
Interestingly mixing the LPBNN is more stable than classical BNN. It brings more diversity and allows a better quantification of uncertainty.
32 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
Results on CIFAR
CIFAR-10 CIFAR-100
Method Acc↑ AUC↑ AUPR↑ FPR-95-TPR↓ ECE↓ cA↑ cE↓ Acc↑ ECE↓ cA↑ cE↓
MCP + cutout 96.33 0.9600 0.9767 0.115 0.0207 32.98 0.6167 80.19 0.1228 19.33 0.7844 MC dropout 95.95 0.9126 0.9511 0.282 0.0172 32.32 0.6673 75.40 0.0694 19.33 0.5830 MC dropout +cutout 96.50 0.9273 0.9603 0.242 0.0117 32.35 0.6403 77.92 0.0672 27.66 0.5909
DUQ 87.48 0.7083 0.8114 0.698 0.3983 64.89 0.2542 - - - -
DUQ Resnet18 93.36 0.8994 0.9213 0.1964 0.0131 69.01 0.5059 - - - -
EDL 85.73 0.9002 0.9198 0.247 0.0904 59.54 0.3412 - - - -
MIMO 94.96 0.9387 0.9648 0.175 0.0300 69.99 0.1846 0.7869 0.1018 0.4735 0.2832
Deep Ensembles + cutout 96.74 0.9803 0.9896 0.071 0.0093 68.75 0.1414 83.01 0.0673 47.35 0.2023 BatchEnsembles + cutout 96.48 0.9540 0.9731 0.132 0.0167 71.67 0.1928 81.27 0.0912 47.44 0.2909 LP-BNN (ours) + cutout 95.02 0.9691 0.9836 0.103 0.0094 69.51 0.1197 79.3 0.0702 48.40 0.2224
33 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
Results on CIFAR
0 1 2 3 4
skew intensity 0.05
0.10 0.15 0.20 0.25 0.30
ECE
Method BatchEnsemble LP-BNN Deep Ensembles
Figure:Calibration at different levels of corruption. We report ECE scores for LP-BNN, BatchEnsemble and Deep Ensembles on CIFAR-10-C.
34 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification LP-BNN approache
LP-BNN [7]
Input image MCP BE LP-BNN
Figure:Visual assessment on two BDD-Anomaly test images containing a motorcycle (OOD class). For each image: on the first row, input image and confidence maps from MCP, BE, and LP-BNN;on the second row,
ground-truth segmentation and segmentation maps from MCP, BE, and LP-BNN. LP-BNN is less confident on the OOD objects.
35 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Conclusion
Conclusion
We propose a new BNN framework able to quantify uncertainty in the context of deep learning.
LP-BNNs are stable, efficient, and therefore easy to train compared to existing BNN models.
The extensive empirical comparisons on multiple tasks show that LP-BNNs reach state-of-the-art levels with substantially lower computational cost.
The code can be download in
https://github.com/giannifranchi/LP_BNN Thank you for your attention.
36 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bibliography
Bibliography:
1 Guo, Chuan, et al. "On calibration of modern neural networks."
Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
2 Gal, Yarin, and Zoubin Ghahramani. "Dropout as a bayesian approximation: Representing model uncertainty in deep learning."
international conference on machine learning. 2016.
3 Lakshminarayanan, Balaji, Alexander Pritzel, and Charles Blundell.
"Simple and scalable predictive uncertainty estimation using deep ensembles." Advances in neural information processing systems.
2017.
37 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bibliography
Bibliography:
4 Blundell, Charles, et al. "Weight uncertainty in neural networks."
arXiv preprint arXiv:1505.05424 (2015).
5 Fort, Stanislav, Huiyi Hu, and Balaji Lakshminarayanan. "Deep Ensembles: A Loss Landscape Perspective." arXiv preprint arXiv:1912.02757 (2019).
38 / 39
Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification Bibliography
Bibliography:
6 Wen, Yeming, Dustin Tran, and Jimmy Ba. "Batchensemble: an alternative approach to efficient ensemble and lifelong learning."
arXiv preprint arXiv:2002.06715 (2020).
7 Franchi, G., Bursuc, A., Aldea, E., Dubuisson, S., & Bloch, I.
(2020). Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification. arXiv preprint arXiv:2012.02818.
39 / 39