• Aucun résultat trouvé

Experiment: Missing Values Imputation

N/A
N/A
Protected

Academic year: 2022

Partager "Experiment: Missing Values Imputation"

Copied!
1
0
0

Texte intégral

(1)

'

&

$

%

Combining Generative and Discriminative Models

Abalone dataset (regression task) with a various proportion of missing values being added

• Train a mixture of Gaussians to model the data (including the target value)

• Test error when varying the proportion of missing val- ues: the mixture directly used as a regressor does not work as well as a neural net- work or kernel ridge regres- sor trained with missing val- ues imputed by the mixture

• Test error with neural net- work: imputing the Gaussian mixture’s conditional mean works better than nearest neighbor or global mean im- putation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65

Gaussian mixture Neural network

Kernel ridge regression

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65

Global mean Nearest neighbor Conditional mean

'

&

$

%

Experiment: Missing Values Imputation

• Randomly set as missing 5x5 patches in MNIST digits database (28x28 images)

• For each class (digit), train a mixture of Gaussians

• Use the model with best NLL to impute missing values

• Compare speed with naive EM

⇒ speedup from 8 to 20 (architecture-dependent)

'

&

$

%

Proposed Algorithm

Desired quantities for sample xj can be obtained by updating quantities previously computed for another sample xi. This update is cheap as long as both samples have similar missing patterns ⇒ speed-up on the order of O(nm/nd) with nd the number of differences between missing patterns. The optimal path is a minimum spanning tree over missing patterns:

1. Compute minimum spanning tree on the graph of unique missing patterns (edge weight = cost of update)

2. Use K-means to initialize mixture means

3. Use empirical covariance in each cluster to initialize covari- ances (e.g. imputing missing values with cluster mean)

4. Perform EM, where at each iteration we go through the training set traversing the tree obtained in step 1, in order to perform computationally efficient updates

'

&

$

%

EM Computational Bottleneck

At each iteration of EM, for sample xi with no observed and nm missing variables, the main computational cost consists in:

1. the computation of pij: O(n3o)

2. the computation of its contribution to Cj: O(n3m) (roughly)

⇒ cubic in the dimension!

Basic observation: two samples having the same missing pattern share the same expensive computations

This is not enough: there may be (almost) as many different missing patterns as training samples!

'

&

$

%

EM Algorithm with Missing Values

1. E-Step: Compute posteriors

pij = P (G = j|Xo = xio) = P (Xo = xio|G = j) PM

k=1 P (Xo = xio|G = k)

2. M-Step: For each Gaussian j:

• Fill-in missing values by setting

xim ← E[Xm|Xo = xio, G = j]

• Update mean and covariance µj =

Pn

i=1 pijxi Pn

i=1 pij Σj =

Pn

i=1 pij(xi − µj)(xi − µj))T Pn

i=1 pij + Cj

'

&

$

%

Outlook

• A mixture of Gaussians is a standard probabilistic model for high dimensional data

• It is straightforward to handle missing values in the classic Expectation- Maximization (EM) training algorithm...

• But to do it efficiently with many missing values, you must read this poster!

Olivier Delalleau Aaron Courville Yoshua Bengio

Gaussian Mixtures with Missing Data:

an Efficient EM Training Algorithm

Références

Documents relatifs

It is the average, for all speakers in the training set, of the standard deviation of the feature, across all classes (all vowels), for a given speaker.. Let be the

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

Radial distance of the five globular clusters from the centre of their host halo as function of time, as they orbit within a cusped density distribution.. The arrow indicates the

It turns out that the imago (new name for the last stase) is missing in Tetranychus urticae and that its post-embryonic develop- ment may be summarized as follows:

– the fitness value of any point at distance 6 1 of (K, K,.. Algorithm 4 Algorithm for generating quasi-random Gaussian numbers. the lower the better) for n function evaluations

GAW15 simulated data on which we removed some genotypes to generate different levels of missing data, we show that this strategy might lead to an important loss in power to

While discussing and comparing the two implicational scales, Vandeloise touches several questions that will be addressed in depth in the following contributions of

The basic formalism associated with externality as an interdependence of utility functions (or production functions) allows the broadening of the concept to all