• Aucun résultat trouvé

Gaussian Mixtures with Missing Data: an Efficient EM Training Algorithm

N/A
N/A
Protected

Academic year: 2022

Partager "Gaussian Mixtures with Missing Data: an Efficient EM Training Algorithm"

Copied!
2
0
0

Texte intégral

(1)

Gaussian Mixtures with Missing Data: an Efficient EM Training Algorithm

Olivier Delalleau, Aaron Courville and Yoshua Bengio

Dept. IRO, Universit´e de Montr´eal C.P. 6128, Montreal, Qc, H3C 3J7, Canada {delallea,courvila,bengioy}@iro.umontreal.ca

http://www.iro.umontreal.ca/∼lisa

In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore here is the use of a generative model (a mixture of Gaussians with full covariances) to learn the underlying data distribution and replace missing values by their conditional expectation given the observed variables. Since training a Gaussian mixture with many different patterns of missing values can be computationally very expensive, we introduce a spanning-tree based algorithm that significantly speeds up training in these conditions. Such mixtures of Gaussians can be applied directly to supervised problems (Ghahramani and Jordan, 1994), but we observe that using them for missing value imputation before applying a separate discriminant learning algorithm yields better results.

Our contributions are two-fold:

1. We explain why the basic EM training algorithm is not practical in large-dimensional ap- plications in the presence of missing values, and we propose a novel training algorithm that significantly speeds up training by EM. The algorithm we propose relies on the idea to re-use the computations performed on one training sample as a basis for the next sample, in order to obtain the quantities required by the EM update equations. We show how these computa- tions can be minimized by ordering samples in such a way that two consecutive samples have similar “missing patterns”, i.e. share missing values for similar variables. On 28x28 images with random squares of 5x5 pixels being forced to missing values, we obtain a speed-up on the order of 8 compared to standard EM training.

2. We show, both by visual inspection on image data (figure 1) and by feeding the imputed values to another classification algorithm (figure 2), how a mixture of Gaussians can model the data distribution so as to provide a valuable tool for missing values imputation.

References

Ghahramani, Z. and Jordan, M. I. (1994). Supervised learning from incomplete data via an EM ap- proach. In Cowan, D., Tesauro, G., and Alspector, J., editors,Advances in Neural Information Processing Systems 6, San Mateo, CA. Morgan Kaufmann.

Topic: learning algorithms Preference: poster

1

(2)

Figure 1: Imputation of missing values in images (on the left, two digits with missing values in grey, and on the right the corresponding imputation predicted by the mixture of Gaussians).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65

Gaussian mixture Neural network

Kernel ridge regression

Figure 2: Test mean-squared error (Abalone) when the proportion of missing values increases.

“Gaussian mixture” is the mixture alone, used as a classifier. Both “Neural network” and “Kernel ridge regression” have been trained using the conditional mean imputation of missing values pro- vided by the same Gaussian mixture. We see that combining a discriminant algorithm with the generative Gaussian mixture model works better than the generative model alone.

2

Références

Documents relatifs

Donner la mesure de l’angle marqué en justifiant

1 -a – Compléter le schéma suivant avec les noms des changements d’état physique convenable ( fusion – vaporisation – solidification – liquéfaction ). 1 –b- donner le

The Expectation-Maximization (EM) principle compensates for missing (aka latent) data, replacing them by their expectations.. EM

([17] has tried without success in 3 dimensions.) Although the problem is really difficult, we have already achieved a step, since the LLL algorithm uses the Gaussian algorithm as

The heuristic approach we suggest to implement the semiparametric EM algo- rithm is based on this property, since the component parameters are expectations in the

In the specific case of the MTD models, the original single- matrix model is parametrized in a bijective way, whereas its gener- alized version with specific transition matrices

variables are indicator variables that are multinomially distributed according to the parameter vector π, and indicate which component generates the corresponding x i.. It is easiest

Time-average approximation and principal component analysis of the stochastic process underlying the functional data are the main tools for adapting NIPALS algorithm to estimate