• Aucun résultat trouvé

Semi-supervised Learning using Deep Generative Models and Metric Embedding Auxiliary Task

N/A
N/A
Protected

Academic year: 2021

Partager "Semi-supervised Learning using Deep Generative Models and Metric Embedding Auxiliary Task"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02268478

https://hal.archives-ouvertes.fr/hal-02268478

Submitted on 21 Aug 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Semi-supervised Learning using Deep Generative Models and Metric Embedding Auxiliary Task

Jhosimar George Arias Figueroa

To cite this version:

Jhosimar George Arias Figueroa. Semi-supervised Learning using Deep Generative Models and Metric Embedding Auxiliary Task. LatinX in AI Research at ICML 2019, Jun 2019, Long Beach, United States. �hal-02268478�

(2)

Semi-supervised Learning using Deep Generative Models and Metric Embedding Auxiliary Task

Jhosimar George Arias Figueroa

jariasf03@gmail.com

Motivation

In many real world applications, it is relatively easy to acquire large amounts of unla- beled data. However, their corresponding labels often requires the effort of a human expert and special devices. This bottleneck in the labeling results in small amounts of labeled data and large amounts of unlabeled data. Therefore, being able to utilize the surplus unlabeled data is desirable.

Background

Variational Autoencoder Evidence Lower Bound (ELBO)

Approximate pθ(z|x) with variational posterior qϕ(z|x), define lower bound on log p(x)

Intuition

Our encoder transforms the data into feature representations, from these representations we infer a Gaussian latent variable to provide more robust set of features and a discrete latent vari- able to provide category assignments.

We take advantage of the small amount of la- beled data to perform the category assignments for the unlabeled data.

The decoder uses the gaussian representations and category information for reconstruction.

Probabilistic Model

Inference Model Generative Model

Evidence Lower Bound (ELBO)

The first term is the reconstruction loss (LR). The sec- ond and third terms work as regularizers of the categorical (LC) and Gaussian (LG) distributions respectively.

Auxiliary Task Loss

Assignment Loss ( L

A

)

In order to assign labels to the unlabeled data, we use the Distance-Weighted k-nearest neighbor (k-NN) on the feature space.

Once we obtained the assignments for the un- labeled data, we use the Cross-entropy loss.

Metric Embedding Loss ( L

M

)

The goal of this loss is to regularize the fea- ture space, in such a way that the distance between features that belong to the same cat- egory should be smaller than features of dif- ferent categories.

We perform metric learning via the lifted structured loss.

LM = 1 2|Pˆ|

(i,j)Pˆ

max(0, J(fi, fj))2

J(fi, fj) = D(fi, fj) + max {

max

(i,k)Nˆ

α D(fi, fk), max

(j,l)Nˆ

α D(fj, fl) }

Proposed Loss Function

Proposed Auxiliary Task Loss

Assignment Loss ( L

A

)

In order to assign labels to the unlabeled data, we use the Distance-Weighted k-nearest

neighbor (k-NN) on the feature space.

Once we obtained the assignments for the

unlabeled data, we use the Cross-entropy loss.

Metric Embedding Loss ( L

M

)

The goal of this loss is to regularize the

feature space, in such a way that the distance between features that belong to the same

category should be smaller than features of different categories.

We perform metric learning via the lifted structured loss.

LM = 1 2|Pˆ|

(i,j)Pˆ

max(0, J(fi, fj))2

J(fi, fj) = D(fi, fj) + max {

max

(i,k)Nˆ

α D(fi, fk), max

(j,l)Nˆ

α D(fj, fl) }

Network Architecture

Architecture of our proposed model. The encoder and decoder are given by Convolutional Neural Networks. The red diamonds represent the reconstruction loss (LR), categorical loss (LC), gaussian loss (LG), assignment loss (LA) and metric embedding loss (LM).

Quantitative Evaluation

MNIST SVHN

Performance of our probabilistic model after adding the assignment (LA) and metric embedding (LM) loss at different epochs on the MNIST and SVHN datasets.

Semi-supervised test error (%) on MNIST, SVHN and CIFAR-10 datasets for randomly and evenly distributed labeled data. Colored rows denote methods that use Bayesian approaches.

Model MNIST SVHN CIFAR-10

100 1000 4000

M1+TSVM 11.82 (± 0.25) 54.33 (± 0.11)

SWWAE 8.71 (± 0.34) 23.56

M1+M2 3.33 (± 0.14) 36.02 (± 0.10) SS-Clustering 1.66 (± 0.20) 20.66 (± 1.09) SDGM 1.32 (± 0.07) 16.61 (± 0.24) ADGM 0.96 (± 0.02) 22.86

Improved GAN 0.93 (± 0.65) 8.11 (± 1.3) 18.63 (± 2.32)

Conv-Ladder 0.89 (± 0.50) 20.40 (± 0.47)

Triplet GAN 0.91 (± 0.58) 5.77 (± 0.17) 16.99 (± 0.36) Bad GAN 0.79 (± 0.09) 4.25 (± 0.03) 14.41 (± 0.30) Proposed 1.15 (± 0.21) 11.74 (± 0.05) 28.75 (± 0.04)

Qualitative Evaluation

Random generation Style generation

Generated images of our proposed model trained on MNIST.

Visualization of the Feature Space

MNIST SVHN CIFAR-10

LatinX in AI Workshop @ ICML 2019, Long Beach, CA, USA. https://github.com/jariasf/semisupervised-vae-metric-embedding

Références

Documents relatifs

This setting provides a novel view on neural networks which allows us to inter- pret activation operators as Bregman projection of convex functions as opposed to proximity operator

In Table 1, we report our single-image autoencoder-based results on this dataset along with those of the following state- of-the-art single image-based methods: KDE regression from

Average RMSE of the surrogate model (L96 with an RK4 structure) compared to the reference model (L96 with an RK4 inte- gration scheme) as a function of the forecast lead time

Since everything is optimal in the architecture we proposed the minimization of the number of filters used to reduce the learning time and improve the recognition rate.. •

Section 4 deals with recent advances and trends in the field, such as linear similarity learning, nonlinear and local methods, histogram distance learning, the derivation

Extensive experiments using the BU- 3DFE dataset for facial expression recognition and the Wash- ington object classification multimodal RGB-D dataset show that learning the fusion

Hence, we want to trans- form handwritten digit images into other handwritten digit images while removing identity features and keeping relevant features about the original

En l'absence du lubrifiant, le test laisse sur le chrome une trace continue assez importante tant en longueur qu'en profondeur. L'usure du chrome est plus