• Aucun résultat trouvé

Master Mathématiques, Vision, et Apprentissage Fundations of Deep Learning Robust Deep Learning

N/A
N/A
Protected

Academic year: 2022

Partager "Master Mathématiques, Vision, et Apprentissage Fundations of Deep Learning Robust Deep Learning"

Copied!
55
0
0

Texte intégral

(1)

Master Mathématiques, Vision, et Apprentissage Fundations of Deep Learning

Robust Deep Learning

Nicolas Thome

Conservatoire National des Arts et Métiers (Cnam) Laboratoire CEDRIC - équipe Vertigo

(2)

Outline

Bayesian Neural Networks for Classification Applications of Uncertainty in Deep Learning Other Robustness Issues

(3)

Recap: Bayesian Neural Networks (BNNs)

● Approximate posterior,e.g. variationalp(w∣D) ≈qθ(w)

▸ qθ(w)Gaussian or Bernoulli,e.g.MCDropout [Gal and Ghahramani, 2016a]

● Predictive distribution: p(y∣x,D) = ∫p(y∣x,w)p(w∣D)dw≈ ∫p(y∣x,w)qθ(w)dw

▸ p(y∣x,D)approximated by MC sampling

∫ p(y∣x,w)qθ(w)dw≈

S

s=1

p(y∣x,ws) ws∼qθ(w)

▸ Sampling: discrete approximation of the distributionp(y∣x,D)

Regression: ycontinuous: mean prediction, dispersionuncertainty

Classification:y discrete,e.g.binary classificationp(y=1∣x,D)

[email protected] Master MVA / Robustness in Deep Learning 1/ 47

(4)

BNNs: Uncertainty in Classification

● Only using discrete output: averagingp(y=1∣x,ws) ws∼qθ(w)

▸ Loosing the variability ofp(y=1∣x,ws)among ws samples

▸ Why not using the continuous output of the model (after softmax) as continuous output?

▸ In binary classification: lies in the interval[0,1]

▸ For a general K-class classifictaion: K−1 dimensional simplex

(5)

Uncertainty in classification: example with MCDropout

● Use of uncertainty in classification

▸ Qualitative expe to illustrate importance of predictive distribution sampling

▸ MNIST, LeNet CNN withdropout only on last fc layer, dropout probabilities p=0.5, SGD withLR=0.01 updated using momentum 0.9, weight decay 1e06,T =100 forward passes

Pred: [1 1 1 1 1 5 5 7 7 7 7 7]

[email protected] Master MVA / Robustness in Deep Learning 3/ 47

(6)

MCDropout: side note

● Performances: MC dropoutvs dropout [Gal and Ghahramani, 2015]

● Test error with LeNet,T =50 forward passes

● Take-home message: dropout only propagates the mean; MC dropout propagates whole distribution

(7)

Uncertainty in classification: metrics

1. Variation ratios

▸ For each stochastic forward passt∈ {1;T}, compute label from softmax probabilities

▸ c: most frequent label over theT passes, with frequencyfxc

▸ Compute variation-ratio var-ratio[x] =1−f

c x T

⇒Epistemic uncertainty

2. Predictive entropy: captures the average amount of information contained in the predictive distribution.

H[yˆ ∣x,Dtrain] = − ∑

c

( 1 T ∑

t

p(y =c∣x,wˆt))log( 1 T ∑

t

p(y =c∣x,wˆt))

⇒ Aleatoric uncertainty

3. Mutual information : maximise the mutual informations are points on which the model is uncertain on average

I [yˆ ,w∣x,Dtrain] =H[y∣xˆ ,Dtrain] + 1 T ∑

c,t

p(y=c∣x,wˆt)logp(y=c∣x,wˆt)

⇒ Epistemic uncertainty

[email protected] Master MVA / Robustness in Deep Learning 5/ 47

(8)

Uncertainty in classification

Let’s see three concrete examples:

● all equal to 1(i.e.probability vectors{(1,0), ...,(1,0)})

→high pred confidence, low model uncertainty

var-ratio=H[ˆ y∣x,Dtrain] =I[ˆ y,w∣x,Dtrain] =0

● half-half,i.e.probability vectors{(1,0),(0,1),(0,1), ...,(1,0)}

→low pred confidence, high model uncertainty

var-ratio=H[ˆ y∣x,Dtrain] =I[ˆ y,w∣x,Dtrain] =0.5

● all close to 0.5,e.g. {(0.51,0.49),(0.51,0.49), ...,(0.51,0.49)})

→low pred confidence,low model uncertainty ˆ

(9)

Practical Session: MC dropout on MNIST for classification

● MC dropout regularization on MNIST

▸ Training a convnet with dropout⇒improved training performances

▸ Improved Training performances with activating dropout at test time

● Use variation ratio to measure uncertainty

Most incertain test set examples Least incertain test set examples

[email protected] Master MVA / Robustness in Deep Learning 7/ 47

(10)

Practical Session: MC dropout on MNIST for classification

● Analyse MC sampling

Incertain example

Confident example

(11)

Bayesian Neural Networks & Ensembling

● Approximate posterior,e.g. variationalp(w∣D) ≈qθ(w)

▸ qθ(w)Gaussian or Bernoulli (dropout)

● Predictive distribution: p(y∣x,D) = ∫p(y∣x,w)p(w∣D)dw≈ ∫p(y∣x,w)qθ(w)dw

▸ p(y∣x,Dapproximated by MC sampling,e.g.MC dropout:

∫ p(y∣x,w)qθ(w)dw≈

S

s=1

p(y∣x,ws) ws∼qθ(w)

▸ Sampling: mean prediction, dispersion wrt mean

▸ Interpretation: Sampling⇔ensemble methods

[email protected] Master MVA / Robustness in Deep Learning 9/ 47

(12)

Bayesian Neural Networks & Ensembling

● Ensembling [Lakshminarayanan et al., 2017] simple method for predictive distribution approximation and uncertainty estimation

▸ Train several models on the same datasets

▸ Each model prediction: one sample estimate of predictive distribution

● Empirically, works well (better than MC dropout)

(13)

Bayesian Models & Ensembling

● Ensembling [Lakshminarayanan et al., 2017] predictive distribution approximation and uncertainty estimation

● Empirically, works well, BUT:

▸ Several predictions at test time (∼MC dropout)

▸ Several models training (≠MC dropout)

[email protected] Master MVA / Robustness in Deep Learning 11/ 47

(14)

Dirichlet Prior Networks

● Ensembling [Lakshminarayanan et al., 2017] implicit model of predictive distribution

● Dirichlet Prior Networks (DPN) [Sensoy et al., 2018,

Malinin and Gales, 2018]: explicitly modeling predictive distribution with a Dirichlet distribution

● x input variable, y= {1;K}output class variable

● Random variable z= {zk}k∈{1;K}: output probability value for classification

▸ zk≥0, ∑K

k=1

zk=1⇒z on theK−1 dim simplex

▸ Models distribution over probabilities distribution

(15)

Dirichlet distribution

Dir(z∣α)= Γ(α0)

K k=1

Γ(αk)

K

k=1

zαkk1 (1)

● αk>0 Dirichlet concentration params,α0= ∑K

k=1

αk ,ΓGamma function

▸ Γ(z) = ∫0+∞xz1exdx

● αk: evidence for class K,αposition on simplex⇒aleatoric uncertainty

● α0total evidence∼ distribution spread⇒ epistemic uncertainty

[email protected] Master MVA / Robustness in Deep Learning 13/ 47

(16)

Dirichlet distribution in Bayesian inference

Dice roll example

● N Dice rolls,D = {yi}i∈{1;N},yi∈ {1, ...,K},zk ∶=P(yk)

● Likelihood P(D∣z) =Mul(D∣z) = kN!Nk!

K k=1

zkNk

▸ Nk number of times classk

● Prior on probabilities: P(z) ∼Dir(α) ∝ ∏K

k=1

zαck1,e.g. α= (11...1)T

● Dirichlet conjugate with multinomial ⇒posteriorP(z∣D)Dirichlet P(z∣D) ∝P(D∣z)P(z) ∝ ∏K

k=1

zNck+αk1∼Dir(αk+Nk)

▸ αk pseudo count

▸ Increasing N⇒increasingNkk evidence, decreasing epistemic uncertainty

(17)

Dirichlet Prior Networks (DPN)

● Training datasetD = {(xi,yi)}i∈{1;N},yi∈ {1, ...,K},zk ∶=P(yk)

● Likelihood(ist sample): P(yi∣xi,zi) ∼Cat(yi∣xi,zi) ∝ ∏K

k=1

zkN˜,ik,i

▸ N˜k,i unknown, intuition large (infinite) for true class, low (0) for others

▸ Ex:N˜k,i∶=λ1δk,k

i,kiGT class, ande.g.λ=102

● Prior: Dirichlet as before: P(zi) ∼Dir(α),e.g.α= (11...1)T

● Posterior: P(zi∣xi,yi) ∝P(yi∣xi,zi)P(zi) ∝Dir(βi), βk,i= (αk+N˜k,i), e.g.βk=1+100⋅δk,k

● Training goal:approximatei P(zi∣xi,yi) ≈qθ(zi∣xi)(without yi)

▸ qθ(xi) ∶=Dir(γi): outputsγi Dirichlet concentration params forxi

Learnθto makeγi as close as possible to targetβi

Ex:γ(xi,θ) =ef(xi),f(xi,θ)K-dim output of a neural network (NN)

● Inference:P(y=c∣x,D) = ∫ p(y =c∣x,z)p(z∣x,D)dz (marginalization) P(y=c∣x,D) ≈ ∫ p(y=c∣x,z)qθ(z∣x)dz

P(y=c∣x,D) ≈Eqθ(zx)[zc] =γc

γ0 = efc(x)

K k=1

efk(x)

Soft-max prediction!

[email protected] Master MVA / Robustness in Deep Learning 15/ 47

(18)

Dirichlet Prior Networks (DPN)

● Training goal:approximateP(zi∣xi,yi) ≈qθ(zi∣xi)

▸ qθ(xi) ∶=Dir(γi): outputsγi Dirichlet concentration params forxi

Ex: γ(xi,θ) =ef(xi),f(xi,θ)K-dim output of a neural network (NN)

● MinimizeKL[qθ(zi∣xi)∣∣P(zi∣xi,yi)] = − ∫ qθ(zi∣xi)logPq(zixi,yi)

θ(zixi) dz

(19)

Dirichlet Prior Networks (DPN)

● Training goal:approximateP(zi∣xi,yi) ≈qθ(zi∣xi)

▸ qθ(xi) ∶=Dir(γi): outputsγi Dirichlet concentration params forxi

Ex: γ(xi,θ) =ef(xi),f(xi,θ)K-dim output of a neural network (NN)

● MinimizeKL[qθ(zi∣xi)∣∣P(zi∣xi,yi)] = − ∫ qθ(zi∣xi)logPq(zixi,yi)

θ(zi∣xi) dz

▸ Variationnal approximation (VI), deriving Evidence Lower Bound (ELBO) loss:

LVI(θ) = − ∫ qθ(zi∣xi)logP(yi∣xi,zi)P(zi∣xi) qθ(zi∣xi)P(yi∣xi)

dz

LVI(θ) = − ∫ qθ(zi∣xi)logP(yi∣xi,zi)dz+KL[qθ(zi∣xi)∣∣P(zi∣xi)] +P(yi∣xi) LVI(θ) ∝Eqθ(zixi)[ −logp(yi∣xi,zi)] +KL[qθ(zi∣xi)∣∣P(zi∣xi)] (2)

▸ Eqθ(zixi)[ −logp(yi∣xi,zi)] =λ1[ψ(γ0)−ψ(γy)]a,ψ digamma function

ψ(x) =Γ

(x)

Γ(x), asymptotic expansion:ψ(x) ≈ln(x) −2x1 +O(x2)

▸ P(zi∣xi) =Dir(1..1) ⇒KL[qθ(zi∣xi)∣∣P(zi∣xi)]differential entropy∝ 1

γ0, support distribution spread

aE[log(Xi)] =ψ(γi) −ψ(γ0),XDir(γ).logp(yi∣xi,zi) =λ1log(zy),λ1=N˜y,i

[email protected] Master MVA / Robustness in Deep Learning 17/ 47

(20)

Outline

Bayesian Neural Networks for Classification Applications of Uncertainty in Deep Learning Other Robustness Issues

(21)

Failure Prediction

● Detecting failure of a system crucial in practice

● Use uncertainty estimate to accept / reject predictions

[email protected] Master MVA / Robustness in Deep Learning 18/ 47

(22)

Failure Prediction: Model Calibration in Deep Learning

● Classification model trained on D = {(xi,yi)}Ni=1

● Model prediction: yˆ=arg maxk∈Yp(Y =k∣w,x)

● Model confidenceCˆ(x):

Simple baseline for deep neural networks: MCP(x maxp(Y k∣w,x

(23)

Model Calibration

● TestData: D = (X,Y) = {(x1,y1), ...,(xN,yN)}

● (yˆi,Cˆ(xi))class prediction and confidence level

● Perfect calibration:

p(Yˆ =Y∣Cˆ=p) =p, ∀p∈ [0,1]

● Predicted confidence match actual accuracy

▸ e.g. given 100 predictions, each with confidence of 0.8, we expect that 80 should be correctly classified.

▸ Link to thresholding probabilities for failure prediction:

non-calibrated probabilities⇒arbitrary threshold!

[email protected] Master MVA / Robustness in Deep Learning 20/ 47

(24)

Model Calibration in deep learning

Reliability Diagram M interval bins.

Bm set of samples whose predictions are inIm= (mM1,mM]

acc(Bm) = 1

∣Bm∣ ∑iBm

1(yˆi=yi)

conf(Bm) = 1

∣Bm∣ ∑iBm

ˆ pi

perfect calibration

⇐⇒acc(Bm) =conf(Bm) ∀m

(25)

Model Calibration in deep learning

● Simple solution to over-confident prediction: temperature scaling [Guo et al., 2017]

P(yˆk) = esk/T

K k=1

esk/T

● temperatureT optimized on val set s.t. acc(Bm) =conf(Bm)

[email protected] Master MVA / Robustness in Deep Learning 22/ 47

(26)

Failure Prediction in deep learning

● Confidence estimateCˆ(xi)goal: distinguish correct from erroneous predictions

● Sort examples wrtCˆ(x )

(27)

Failure Prediction

● MCP:MCP(x) =max

k∈Y p(Y =k∣w,x)unreliable confidence criterion

▸ For failure prediction: by design assign largest probability

● overlapping distributionsbetween successes vs. errors

⇒ hard to distinguish

[email protected] Master MVA / Robustness in Deep Learning 24/ 47

(28)

MCP, a sub-optimal ranking confidence measure

MCP(x) =max

k∈Y p(Y =k∣w,x)

● Overconfident prediction values

⇒ calibration [Guo et al., 2017]

● BUT: calibration does not solve the overlap problem

(29)

Failure Prediction with TCP [Corbière et al., 2019]

● True Class Probability (TCP): TCP(x,y) =p(Y =y∣w,x) unreliable confidence criterion

▸ For failure prediction: assign lower probability for errors

[email protected] Master MVA / Robustness in Deep Learning 26/ 47

(30)

ConfidNet [Corbière et al., 2019]

● However,TCP(x,y)isunknown at test time.

● ConfidNet [Corbière et al., 2019]: Learning TCP Model Confidence

GivenDtrain,learn a confidence model with parametersθ such that ∀x∈ Dtrain, its scalar output ˆ

c(x, θ)close to TCP(x,y)

(31)

Practical Session: Failure Prediction on MNSIT

● Compute difference confidence estimateCˆon MNIST test data

▸ MCPMCP(x) =max

k∈Y p(Y =k∣w,x), MC dropout, ConfdNet

● Compare confidence criterion quality

▸ Rank test examples wrt uncertainty criterion

▸ Compute Precision/Recall (PR) curve andAPerrors

[email protected] Master MVA / Robustness in Deep Learning 28/ 47

(32)

Applications: Out of Distribution Detection

● Standard classification:

same classes during train and test

● Out of Distribution Detection:

detecting unknown classes, far from training distribution

(33)

Out of Distribution Detection

● Detecting examples far from training distribution

⇒ natural use of Bayesian confidence estimates

● In classification: estimate dispersion (e.g.mutual info) for Bayesian models (e.g.MC dropout)

● Baseline solution: assume that logits (before soft-max) smaller for out- than for in-distribution samples

[email protected] Master MVA / Robustness in Deep Learning 30/ 47

(34)

Out of Distribution Detection

Other methods specifically learn dataset-specific OOD methods

● Ex: ODIN [Liang et al., 2018]

● Modify inputs so that maximizing differences between in and out samples

▸ Temperature scaling, as in [Guo et al., 2017] : P(yˆk) = esk

/T

K k=1

esk′ /T

▸ Apply "inverse adversarial attack": gradient of input wrt predicted class, then change input to increase prediction : x˜=x−sign[−∇x log(P(yˆk)]

▸ Hyper-parameters(T, )optimized on val set s.t. maximizing AUC between in-distribution and OOD-samples

(35)

Out of Distribution Example

● Model trained on MNIST - Out: Kuzushiji-MNIST

● Comparing MCP, MC dropout, ODIN for OOD detection

[email protected] Master MVA / Robustness in Deep Learning 32/ 47

(36)

Out of Distribution Detection

Other methods specifically learn dataset-specific OOD methods

● Ex: Outlier Exposure (OE) [Hendrycks et al., 2019]

▸ Training with in-distribution AND OOD samples :

E(x,y)∼Din[L(f(x),y)] +λEx∼DOEout[LOE(f(x))]

▸ e.g.L(f(x),y)cross-entropy,LOE(f(x)KL wrt uniform class distribution

● Out of Distribution with Dirichlet Prior Networks

▸ Recap for in-distribution samples: minimizeKL[qθ(zi∣xi)∣∣P(zi∣xi,yi)], with P(zi∣xi,yi) ∼Dir(β),e.g.β= (1,1, ..101, ..,1),i.e.βk=1+100⋅δk,k

i

▸ For out-of-distribution samples: minimizeKL[qθ(zi∣xi)∣∣P(zi∣xi,yi)], with P(zi∣xi,yi) ∼Dir(U),U= (1,1, ...,1)

(37)

Out of Distribution Detection

Relevance of using OOD training datasets?

● By essence, OOD are impossible to model!

● Performances way strongly depends on the training vs testing OOD datasets

[email protected] Master MVA / Robustness in Deep Learning 34/ 47

(38)

Applications: Active Learning

● Active/ interactive learning: learning a model with few data annotated by users

● Challenge: determining most informative data to annotate

▸ Most uncertain data: optimal convergence [Tong and Koller, 2002]

(39)

Applications: Active Learning

● Use of uncertainty in classification: active learning [Gal et al., 2017]

[email protected] Master MVA / Robustness in Deep Learning 36/ 47

(40)

Applications

● Use of uncertainty in classification: beyond MC nets

▸ Application to ConvNets

▸ and RNNs : BPTT: Bayesian Dropout [Gal and Ghahramani, 2016b]

(41)

Application in Semantic Segmentation

From [Kendall and Gal, 2017]

[email protected] Master MVA / Robustness in Deep Learning 38/ 47

(42)

Application in Reinforcement Learning

● Q-learning: agent selects the best action following current Q-function estimate with some probability, and explores otherwise (-greedy)

● Option: use uncertainty estimates (eg dropout Q-network) to drive

exploration, e.g.Thompson sampling (Thompson, 1933) and converge faster

(43)

Outline

Bayesian Neural Networks for Classification Applications of Uncertainty in Deep Learning Other Robustness Issues

(44)

Deep Learning Theory

● Deep Learning: huge impact in terms of experimental results

● BUT: formal understanding still limited,

▸ Optimization: non-convex problem

▸ Generalization & over-fitting

▸ Robustness

(45)

Deep Learning and generalization

● Rademacher complexity: capacity of a model to fit random label : Rn(H) =Eσ[suph∈H1

n

n

i=1

σih(xi)]

● Rethinking generalization: Zhanget. al. ICLR17 [Zhang et al., 2017]

▸ Deep models easily fits random labels !!

▸ Rn(H) ≈1⇒no theoretical guarantee on generalization performances

● Classical learning theory insufficient to explain the good generalization behavior of deep models

[email protected] Master MVA / Robustness in Deep Learning 41/ 47

(46)

Generalization and over-parametrized models

● Double U-curve phenomena observed with deep models! [Belkin et al., 2019]

(47)

Deep Learning (DL) & Stability

● Stability: decision function with "controlled" variations

▸ Small input variations⇔reasonably small output variations on decision,e.g.Lipschitz property

▸ Decision function of deep Models not always stable

Ex: Adversarial Examples

[email protected] Master MVA / Robustness in Deep Learning 43/ 47

(48)

Deep Learning (DL) & Stability

● Adversarial attacks in real-world

(49)

Deep Learning (DL) & Stability

Formal stability analysis of deep models

● Harmonic analysis in scattering operators [Mallat, 2012, Bruna and Mallat, 2013], i.e."deep wavelets"

▸ Show stability / invariance to diffeomorphisms

▸ Stability bounds

● Generalized to deep kernel machines, closer to SoTA deep ConvNet architectures [Bietti and Mairal, 2017]

[email protected] Master MVA / Robustness in Deep Learning 45/ 47

(50)

Deep Learning (DL) & Stability

Formal stability analysis of deep models

● Influence Functions [Cook and Weisberg, 1980]

▸ Characterize decision function influence on training examples

Removing a training point: Iup,loss(z,ztest) = −∇θL(ztest,θ)ˆTHθ1θL(z,θ)ˆ

Perturbing it:Ipert,loss(z,ztest)T= −∇θL(ztest,θ)ˆTHθ1xθL(z,θ)ˆ

▸ Adapted / applied to deep networks [Koh and Liang, 2017]

● Data poisoning

(51)

Deep Learning (DL) & Stability

Ad hoc stability training

● Regularization criterion supporting learning stable decision function

▸ Underlying model might not be stable, but helps to focus on a subet of stable functions of the family

● Robustness of the decision to transformations [Sajjadi et al., 2016], stability accross iterations [Laine and Aila, 2017, Tarvainen and Valpola, 2017]

[email protected] Master MVA / Robustness in Deep Learning 47/ 47

(52)

References I

[Belkin et al., 2019] Belkin, M., Hsu, D., Ma, S., and Mandal, S. (2019).

Reconciling modern machine-learning practice and the classical bias–variance trade-off.

Proceedings of the National Academy of Sciences, 116(32):15849–15854.

[Bietti and Mairal, 2017] Bietti, A. and Mairal, J. (2017).

Invariance and stability of deep convolutional representations.

In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 6210–6220. Curran Associates, Inc.

[Bruna and Mallat, 2013] Bruna, J. and Mallat, S. (2013).

Invariant scattering convolution networks.

IEEE Trans. Pattern Anal. Mach. Intell., 35(8):1872–1886.

[Cook and Weisberg, 1980] Cook, R. and Weisberg, S. (1980).

Characterizations of an empirical influence function for detecting influential cases in regression.

Technometrics, 22(4):495–508.

[Corbière et al., 2019] Corbière, C., Thome, N., Bar-Hen, A., Cord, M., and Pérez, P. (2019).

Addressing Failure Detection by Learning Model Confidence.

InAdvances in Neural Information Processing Systems (NeurIPS), Vancouver, Canada.

[Evtimov et al., 2017] Evtimov, I., Eykholt, K., Fernandes, E., Kohno, T., Li, B., Prakash, A., Rahmati, A., and Song, D. (2017).

Robust physical-world attacks on machine learning models.

CoRR, abs/1707.08945.

[Gal and Ghahramani, 2015] Gal, Y. and Ghahramani, Z. (2015).

(53)

References II

[Gal and Ghahramani, 2016b] Gal, Y. and Ghahramani, Z. (2016b).

A theoretically grounded application of dropout in recurrent neural networks.

InProceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, pages 1027–1035, USA. Curran Associates Inc.

[Gal et al., 2017] Gal, Y., Islam, R., and Ghahramani, Z. (2017).

Deep Bayesian Active Learning with Image Data.

InProceedings of the 34th International Conference on Machine Learning (ICML-17).

[Guo et al., 2017] Guo, C., Pleiss, G., Sun, Y., and Weinberger, K. Q. (2017).

On calibration of modern neural networks.

CoRR, abs/1706.04599.

[Hendrycks et al., 2019] Hendrycks, D., Mazeika, M., and Dietterich, T. G. (2019).

Deep anomaly detection with outlier exposure.

InICLR.

[Kendall and Gal, 2017] Kendall, A. and Gal, Y. (2017).

What uncertainties do we need in bayesian deep learning for computer vision?

In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems 30, pages 5574–5584. Curran Associates, Inc.

[Koh and Liang, 2017] Koh, P. W. and Liang, P. (2017).

Understanding black-box predictions via influence functions.

In Precup, D. and Teh, Y. W., editors,Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 1885–1894, International Convention Centre, Sydney, Australia. PMLR.

[Laine and Aila, 2017] Laine, S. and Aila, T. (2017).

Temporal ensembling for semi-supervised learning.

InInternational Conference on Learning Representations (ICLR).

(54)

References III

[Lakshminarayanan et al., 2017] Lakshminarayanan, B., Pritzel, A., and Blundell, C. (2017).

Simple and scalable predictive uncertainty estimation using deep ensembles.

In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc.

[Liang et al., 2018] Liang, S., Li, Y., and Srikant, R. (2018).

Enhancing the reliability of out-of-distribution image detection in neural networks.

InInternational Conference on Learning Representations.

[Malinin and Gales, 2018] Malinin, A. and Gales, M. (2018).

Predictive uncertainty estimation via prior networks.

In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.

[Mallat, 2012] Mallat, S. (2012).

Group invariant scattering.

Communications in Pure and Applied Mathematics, 10:1331–1398.

[Sajjadi et al., 2016] Sajjadi, M., Javanmardi, M., and Tasdizen, T. (2016).

Regularization with stochastic transformations and perturbations for deep semi-supervised learning.

InAdvances in Neural Information Processing Systems (NIPS).

[Sensoy et al., 2018] Sensoy, M., Kaplan, L., and Kandemir, M. (2018).

Evidential deep learning to quantify classification uncertainty.

In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors,Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc.

(55)

References IV

[Zhang et al., 2017] Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. (2017).

Understanding deep learning requires rethinking generalization.

Références

Documents relatifs

Probabilistic Deep Models for Accurate Modeling and Quantification of

Reconnaissance des formes et méthodes neuronales (US330X) - Neural Networks and Deep Learning..

James Martens and Ilya Sutskever, Learning recurrent neural networks with hessian-free optimization, Proceedings of the 28th International Conference on International Conference

Small visual shift Large visual shift Small dataset Transfer top layers Transfer lower layers Larger dataset Fine-tune top layers Fine-tune all layers. ● Small dataset, large

Deep Neural Network (DNN) with pre-training [7, 1] or regular- ized learning process [14] have achieved good performance on difficult high dimensional input tasks where Multi

For what concerns the k-fold learning strategy, we can notice that the results achieved by the model not using the k-fold learning strategy (1 STL NO K-FOLD) are always lower than

The deep neural network with an input layer, a set of hidden layers and an output layer can be used for further considerations.. As standard ReLu activation function is used for all

Using the balanced dataset, with a number of positive samples equal to the number of the negative samples that have been randomly reduced, the CNN network always has better