• Aucun résultat trouvé

Supplementary materials

N/A
N/A
Protected

Academic year: 2022

Partager "Supplementary materials"

Copied!
6
0
0

Texte intégral

(1)

autoencoders

No Author Given No Institute Given

Supplementary materials

Proofs

Proposition 1. A perfect lossy autoencoder of a random variableX with density pis a diffeomorphismrof Rn minimizing the following expected loss:

Lλ(r) =EX∼p

||x−r(x)||22+λlog|Jr|

(1) for someλ >0, whereJr being the Jacobian matrix(Jr)ij =∂x∂ri

j.

Proof. As a perfect lossy autoencoder minimizes the loss L2 over the set of lossy models, by introducing a Langrangian multiplierλ >0 it is equivalent to minimize over the whole universe of Rn-diffeomorphisms with a penalization λ.M I(X, r(X)) of the loss. Sincer is a function, then the joint distributiuon densityp(x, r(x)) is the same as the density p(x) and subsequently the mutual information can be rewritten asEX∼p[−logp(r(x))].

Sincer is a diffeomorphism, by change of variable:p(r(x)) = p(x)|J−1r|.

Therefore, the loss to be minimized writes:

Lλ(r) =EX∼p

||x−r(x)||22−λlogp(x)|J−1r|

(2) Sincep(x) is constant with respect to r, the minimizers of 3 are the same as the minimizers of:

Lλ(r) =EX∼p

||x−r(x)||22+λlog|Jr|

(3) We now aim at describing the analytical solution of equation 1. We formulate the problem in an Euler-Langrange setting by defining the following multivariate function:

H(x, r, r0) =p(x).

||x−r||22+λlog|r0|

(4) Wherex, r∈Rn,r0∈Rn×Rn and|r0|denote the determinant ofr0.

Using this formulation, finding a minimizer of 1 is equivalent to finding a minimizerrofR

RnH(x, r(x),Jr)dx.

Following [?], Volume 1, Chapter IV, eq. 18 and 25, it should satisfy in particular:

∂H

∂ri

=X

j

∂xj

∂H

∂rij0 ∀i= 1, ...n (5)

(2)

We have the following derivatives:

∂H

∂ri

= 2.p(x).(ri(x)−xi)

∂H

∂rij0 =λ.p(x).(r0−1)ji

∂xj

∂H

∂rij0 =λ. ∂p

∂xj

.(r0−1)ji

−λ.p(x).h

r0−1.( ∂

∂xj

r0)r0−1)i

ji

(6)

Assuming that ∀x∈ Rn and p(x) 6= 0, replacing the derivatives in 5 and dividing it by 2.p(x) we get the following identity:

ri(x) =xi+λ 2

X

j

∂logp

∂xj .(J−1r)ji

−X

j

hJ−1r.( ∂

∂xjJr)J−1r)i

ji

(7)

The local minima of the lossLλ(r) are described by their first order expension following:

Proposition 2. The first order term in the expansion with respect to λ of a minimizer of the loss defined in equation 1 is 12∂xlogp

i , and a perfect lossy autoencoder satisfies:

r(x) =x+λ2∂xlogp

i +o(λ) asλ→0.

Proof. Let us denote by g(x) the first-order term of the expansion of r with respect to λ:

r(x) =x+λ.g(x) +o(λ) (8)

Inducing the following expansions:

Jr=I+λ.Jg+o(λ) J−1r=I−λ.Jg+o(λ)

(9)

Substituting in equation 7 the expansions 8 and 9, we get:

gi(x) +o(1) =1 2

X

j

∂logp

∂xj

hI−λ.Jg+o(λ)i

ji

.

−h

(I−λ.Jg+o(λ))(λ ∂

∂xjJg) (I−λ.Jg+o(λ))i

ji

(10)

(3)

The only 0-orderλterm comes from the first member of the summation, and we finally get:

gi(x) =1 2

X

j

∂logp

∂xj Iji=1 2

∂logp

∂xi (11)

Heuristic

Algorithm 1Langevin with few restarts: heuristics to avoid oversampling high- density regions

Input:

– AEs (trained auto-encoders) – C (Trained classifier) – R= 10 (Random walk steps)

– Ns= 10,000 ( Number of random walks) – λ= 0.1 (Regularization coefficient) Output:

– X = [ ] (Synthetic samples) – Y = [ ](Synthetic labels) forn= 0 toNs do

xnAE ∼ N(0, I) fori= 0 toR do

if i >1then Drawn∼ N(0, I)

xi+1AE =AE(xiAE) + (λ∗n) X=X∪[xiAE]

Y =Y ∪[C.predict(xiAE)]

end if end for end for

(4)

Principal Component Analysis

Fig. 1.Visualization of the training data of the MNIST dataset embedded in the vector subspace spanned by the first two principal components. After projecting 1000 random points (xn∼ N(0, I)) into the subspace, we observe a relatively clear separation between them (black dots) and the training set (colored dots).

(5)

Fig. 2.Visualization of the training data of the CIFAR-10 dataset embedded in the vector subspace spanned by the first two principal components. After projecting 1000 random points (xn∼ N(0, I)) into the subspace, there is an overlapping between them (black dots) and the training set (colored dots).

(6)

Fig. 3.Visualization of the training data of the CIFAR-100 dataset embedded in the vector subspace spanned by the first two principal components. After projecting 1000 random points (xn∼ N(0, I)) into the subspace, there is an overlapping between them (black dots) and the training set (colored dots).

Références

Documents relatifs

Challenges for Reachability Graph Generation of Reference Nets The main challenges for a reachability graph generation tool are the definitions of a marking / state of the

cognitive ability based on the gaze data and selected areas of interest, thus showing, that there are distinct differences in patterns and interaction styles worth of adapting to

Avec GOCAD, les objets sont considérés comme des ensembles de points liés entre eux afin de constituer un maillage dont la dimension est la caractéristique de l'objet. Ainsi,

To sum up, modern systems should provide scalable techniques that on-the-fly effectively (i.e., in a way that can be easily explored) handle a large number of data objects over

As most past research in visual analytics has focused on understanding the needs and challenges of data analysts, less is known about the tasks and challenges of organizational

This release includes the first three years of MaNGA data plus a new suite of derived data products based on the MaNGA data cubes, a new data access tool for MaNGA known as Marvin,

To help answer that question, let us turn to what is probably the most accepted definition of InfoVis, one that comes from Card, Mackinlay, and Shneiderman and that actually is

Our study is much ampler than related work; first, in terms of system compo- nents/parameters analyzed since the data set is composed of a total of more than 20,000 system