• Aucun résultat trouvé

GAN Applications with Discrete Data

N/A
N/A
Protected

Academic year: 2021

Partager "GAN Applications with Discrete Data"

Copied!
1
0
0

Texte intégral

(1)

GAN Applications with Discrete Data

Reproducing and Bringing State-of-the-Art to Industry

Ramiro CAMINO

SEDAN Lab, SnT, University of Luxembourg, Luxembourg

ramiro.camino@uni.lu — (+352) 46 66 44 - 6855

Abstract

Gaining insight into financial transactions is a legal require- ment for banks and financial service providers. Unfortunately, for most data available to banks and service provider, the truth about the transaction purpose is unknown and illicit activities have a low base rate. Additionally, after reporting suspicious activity, banks will no hearing back from investigations. Consequently, many su- pervised approaches, which inherently rely on the ground truth, cannot be used. In the following we study the use of Generative Adversarial Networks (GAN) for data augmentation to increase the scarce amount of known fraudulent cases, and for data im- putation to improve the quality of samples.

Introduction

• Issues analyzing financial transactions and accounts:

Class imbalance: e.g. fraud small / non-fraud big Unavailability and unreliability of labels

Missing values along multiple features Impossibility to share private data

• Possible applications of GANs:

Data augmentation for the minority class Data imputation for missing values

Synthetic data generation to replace private data Unsupervised feature extraction

• State-of-the-Art for GANs:

Plenty of work and promising results for continuous do- mains like images, video and sound

Studies involving discrete samples are mainly focused on sequences of words, characters or symbols

Related Work

Figure 1: Generative Adversarial Network (GAN) [4]. The generator weights can be trained thanks to the error backpropagated from the discriminator loss and through the discriminator weights.

Figure 2: When sampling from a discrete distribution, the backpropa- gation fails because the sample operation is non-differentiable.

Figure 3: MedGAN [2] first pre-trains an autoencoder. During the sec- ond training phase, the decoder transforms the generator outputs into samples where each feature is considered as a binary variable.

Figure 4: ARAE [7] trains and autoencoder and a WGAN [1] with al- ternating steps. The discriminator is replaced by a critic (along with the according loss), and takes as input codes instead of samples.

Figure 5: The Gumbel-Softmax [6] and the Concrete-Distribution [10]

were simultaneously proposed to sample from discrete distributions in the domain of VAEs [8]. Later [9] adapted the technique to GANs for sequences of discrete elements as shown in this diagram.

Figure 6: In [5] authors propose to add a gradient penalty to the WGAN loss (hence the name WGAN-GP) to improve training. Addi- tionaly, the authors claim that with WGAN-GP it is possible to generate discrete sequences training directly with softmax outputs.

Multi-Categorical GAN

Figure 7: Proposed architecture for either generator or decoder mod- els. After the model output, we place in parallel a dense layer per categorical variable, followed by an activation and a concatenation to obtain the final output. The choice between Gumbel-Softmax or Softmax depends on the model. The reconstruction losses should be transformed into the sum of one cross entropy loss per variable.

Experiments

• Split dataset in Dtrain and Dtest

• Train generative model with Dtrain

• Generate Dsample with the trained model

• Experiments proposed in [2]:

Probabilities by dimension: compare proportion of ones per dimension between Dtest and Dsample

Predictions by dimension: train one predictive model per dimension with Dtrain and evaluate with Dtest; re- peat but training Dsample and compare results.

• Predictions by categorical: analogous experiment but training one predictive model per categorical variable.

• Example results shown in Figure 8

Conclusions

• Compared to unmodified MedGAN and ARAE, all ap- proaches improve the performance across all datasets

• Even though we cannot identify a clear best model

• The performance improvement comes at the cost of re- quiring to add extra information (the variable sizes). In some cases, this information may not be available.

Future Work

• Analyze datasets with mixed variable types

• Explore more hyperparameters and architecture options

• Experiment with oversampling and undersampling

• Improve validation with domain specific metrics

0.0 0.2 0.4 0.6 0.8 1.0

ARAE

probabilities by dimensions predictions by dimensions predictions by categorical

0.0 0.2 0.4 0.6 0.8 1.0

MedGAN

0.0 0.2 0.4 0.6 0.8 1.0

MC­ARAE

0.0 0.2 0.4 0.6 0.8 1.0

MC­MedGAN

0.0 0.2 0.4 0.6 0.8 1.0

MC­Gumbel

0.0 0.2 0.4 0.6 0.8 1.0

data score 0.0

0.2 0.4 0.6 0.8 1.0

MC­WGAN­GP

0.0 0.2 0.4 0.6 0.8 1.0

data score 0.0 0.2 data score0.4 0.6 0.8 1.0

model scoremodel scoremodel scoremodel scoremodel scoremodel score

Figure 8: Experiments for the US Census 1990 [3]. Each column cor- responds to a metric and each row a to sample from a different model.

References

[1] Martin Arjovsky, Soumith Chintala, and Léon Bottou.

Wasserstein GAN. arXiv preprint arXiv:1701.07875, 2017.

[2] Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F. Stewart, and Jimeng Sun. Generat- ing multi-label discrete patient records using genera- tive adversarial networks. In PMLR, pages 286–305.

[3] D. Dua and E. Karra Taniskidou. UCI Machine Learn- ing Repository. 2017.

[4] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Z. Ghahramani, M. Welling, C. Cortes, N. D.

Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc.

[5] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vin- cent Dumoulin, and Aaron Courville. Improved training of wasserstein GANs.

[6] Eric Jang, Shixiang Gu, and Ben Poole. Categorical reparameterization with gumbel-softmax.

[7] Junbo, Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Rush, and Yann LeCun. Adversarially regularized au- toencoders.

[8] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes.

[9] Matt J. Kusner and José Miguel Hernández-Lobato.

GANS for sequences of discrete elements with the gumbel-softmax distribution.

[10] Chris J. Maddison, Andriy Mnih, and Yee Whye Teh.

The concrete distribution: A continuous relaxation of discrete random variables.

Références

Documents relatifs

Hexafluoropropylene P, ρ, T data measured at 11 temperatures from 263 to 362 K and at pressures up to 10 MPa have been examined through a consistency test presented herein and

Data augmentation for object detection and segmen- tation. Data augmentation is a major tool to train deep neural networks. It varies from trivial geometrical trans- formations such

To assure quality of, and access to, data, many funding agencies, such as the NSF and the DFG (Deutsche Forschungsgemeinschaft, Germany), require proposals to have data management

As a remedy to such problem, we propose to use the generative models which were proposed by Ian Goodfellow and al in [1] providing the AI world with a new class of Deep Learning

Second, a novel convolutional sketching filter that produces sketches similar to those drawn by designers during rapid prototyping and third, a comprehensive paired

If we consider that a higher error ratio implies a simulation of higher quality, then we notice that for all the proposed datasets the quality of the simulated data seems to be

Our goal is similar (reproducing pedestrian motion at the full trajectory level), but our approach is different: we learn the spatial and temporal properties of complete

2 In that regard, MD-GAN do not fully comply with the parameter server model, as the workers do not compute and synchronize to the same model architecture hosted at the server. Yet,